CN116644336A - Method, device, equipment and storage medium for recognizing Chinese medicine terms - Google Patents

Method, device, equipment and storage medium for recognizing Chinese medicine terms Download PDF

Info

Publication number
CN116644336A
CN116644336A CN202310622149.2A CN202310622149A CN116644336A CN 116644336 A CN116644336 A CN 116644336A CN 202310622149 A CN202310622149 A CN 202310622149A CN 116644336 A CN116644336 A CN 116644336A
Authority
CN
China
Prior art keywords
chinese medicine
term
traditional chinese
character
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310622149.2A
Other languages
Chinese (zh)
Inventor
胡意仪
阮晓雯
吴振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310622149.2A priority Critical patent/CN116644336A/en
Publication of CN116644336A publication Critical patent/CN116644336A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application relates to an artificial intelligence technology, and discloses a traditional Chinese medicine term identification method, which comprises the following steps: performing entity recognition on the Chinese medicine diagnosis text to obtain a Chinese medicine term entity; extracting features of the traditional Chinese medicine term entity by using a feature extraction model to obtain character feature vectors corresponding to each character in the traditional Chinese medicine term entity; performing cross entropy loss classification on the character feature vectors based on a pre-constructed standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary; and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain the target recognition result of the Chinese medicine diagnosis text. The application also provides a device, equipment and storage medium for identifying the Chinese medicine terms. The application can improve the accuracy of the Chinese medicine term identification.

Description

Method, device, equipment and storage medium for recognizing Chinese medicine terms
Technical Field
The application relates to an artificial intelligence technology, in particular to a traditional Chinese medicine term identification method, a device, electronic equipment and a storage medium.
Background
With the development of traditional Chinese medicine and artificial intelligence, applications based on traditional Chinese medicine semantic understanding (such as a traditional Chinese medicine search engine, a traditional Chinese medicine knowledge inquiry system and a traditional Chinese medicine inquiry system) are increasingly focused, but the basis of semantic understanding is the identification of symptom entities, so that the identification of traditional Chinese medicine term entities in traditional Chinese medicine diagnosis inquiry is required.
Because of the large number of cases where terms are combined with abbreviations in traditional Chinese medicine diagnostics, for example: the red tongue with yellow coating can be decomposed into red tongue body and yellow tongue coating. The yellow and greasy tongue coating can be decomposed into yellow and greasy tongue coating, and the existing traditional Chinese medicine term entity identification method can not identify all the combined terms in the traditional Chinese medicine term entity, so that the accuracy of the traditional Chinese medicine term identification is poor.
Disclosure of Invention
The application provides a traditional Chinese medicine term identification method, a device, electronic equipment and a storage medium, and mainly aims to improve the accuracy of traditional Chinese medicine term identification.
Obtaining a traditional Chinese medicine diagnosis text to be identified, and carrying out entity identification on the traditional Chinese medicine diagnosis text to obtain a traditional Chinese medicine term entity;
obtaining a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary;
extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity;
performing cross entropy loss classification on the character feature vectors based on the standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary;
and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
Optionally, the performing entity recognition on the Chinese medicine diagnosis text to obtain a Chinese medicine term entity includes:
converting each character in the Chinese medicine diagnosis text into a character vector;
combining all the character vectors according to the sequence of the corresponding characters in the Chinese medicine diagnosis text to obtain a text matrix;
performing double affine transformation on the text matrix to obtain a text initial feature matrix;
carrying out multi-channel convolution on the text initial feature matrix to obtain a first text feature matrix corresponding to each channel;
stacking all the first text feature matrixes as layers to obtain a second text feature matrix;
carrying out channel feature compression on the second text feature matrix to obtain a third text feature matrix;
performing cross entropy loss conversion on each element of the third text feature matrix to obtain a target text feature matrix, wherein the dimension of the rows and columns of the target feature matrix is equal to the number of characters in the Chinese medicine diagnosis text;
constructing an alphabetic interval according to the rank order corresponding to the elements larger than a preset screening threshold value in the target text feature matrix;
and cutting characters of the Chinese medicine diagnosis text, the character sequence of which is in the character sequence interval, so as to obtain the Chinese medicine term entity.
Optionally, the constructing an alphabetic interval according to the rank order corresponding to the element greater than the preset screening threshold in the target text feature matrix includes:
selecting elements larger than a preset screening threshold value in the target text feature matrix to obtain target elements;
acquiring row sequence and column sequence of the target element in the target text feature matrix;
respectively making a column sequence and a row sequence of the target element into a left end point and a right end point of a section to obtain an initial character sequence section of the target element;
and carrying out interval alignment screening on all the initial character sequence intervals according to the endpoints of the initial character sequence intervals to obtain character sequence intervals.
Optionally, the performing interval alignment screening on all the initial endian intervals according to the end points of the initial endian intervals to obtain endian intervals includes:
summarizing all the initial character sequence intervals to obtain an initial character sequence interval set;
extracting initial character sequence intervals with longest interval length in all initial character sequence intervals corresponding to the left end point of each interval in the initial character sequence interval set to obtain a target character sequence interval set;
and taking each initial endian section in the target endian section set as the endian section.
Optionally, the extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain a character characteristic vector corresponding to each character in the traditional Chinese medicine term entity includes:
extracting layer elements of the second text feature matrix according to the sequence of each character in the Chinese medicine diagnosis and inquiry text in the Chinese medicine term entity to obtain an initial character vector of each character;
combining all the initial character vectors according to the sequence of the corresponding characters in the traditional Chinese medicine term entity to obtain an entity feature matrix;
inputting the entity feature matrix into the feature extraction model to obtain a feature extraction matrix;
and extracting columns in the feature extraction matrix based on the sequence of each character in the Chinese medicine term entity to obtain the character feature vector.
Optionally, the screening all the standard Chinese medicine term words based on the analysis feature value to obtain the target recognition result of the Chinese medicine diagnosis text includes:
sorting all the standard Chinese medicine term words in a descending order according to the size of the analysis characteristic value to obtain a Chinese medicine term sequence;
selecting all standard Chinese medicinal term words within a preset ordering range in the Chinese medicinal term sequence as standardized results of the Chinese medicinal term entities;
and summarizing all the standardized results to obtain the target identification result.
In order to solve the above problems, the present application also provides a Chinese medicine term recognition device, the device comprising:
the entity recognition module is used for acquiring a traditional Chinese medicine diagnosis text to be recognized, and carrying out entity recognition on the traditional Chinese medicine diagnosis text to obtain a traditional Chinese medicine term entity;
the entity feature extraction module is used for obtaining a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary; extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity;
the entity standardization module is used for carrying out cross entropy loss classification on the character feature vectors based on the standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary; and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
Optionally, the screening all the standard Chinese medicine term words based on the analysis feature value to obtain the target recognition result of the Chinese medicine diagnosis text includes:
sorting all the standard Chinese medicine term words in a descending order according to the size of the analysis characteristic value to obtain a Chinese medicine term sequence;
selecting all standard Chinese medicinal term words within a preset ordering range in the Chinese medicinal term sequence as standardized results of the Chinese medicinal term entities;
and summarizing all the standardized results to obtain the target identification result.
In order to solve the above-mentioned problems, the present application also provides an electronic apparatus including:
a memory storing at least one computer program; a kind of electronic device with high-pressure air-conditioning system
And a processor executing the computer program stored in the memory to realize the above-mentioned Chinese medicine term identification method.
In order to solve the above-mentioned problems, the present application also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned Chinese medicine term recognition method.
According to the embodiment of the application, entity recognition is carried out on the Chinese medicine diagnosis text to obtain a Chinese medicine term entity; obtaining a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary; extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity; performing cross entropy loss classification on the character feature vectors based on the standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary; based on the analysis characteristic values, screening all standard Chinese medicine term words to obtain a target recognition result of the Chinese medicine diagnosis text, further recognizing all standard Chinese medicine term words corresponding to the Chinese medicine term entities on the basis of recognizing the Chinese medicine term entities in the Chinese medicine diagnosis text, solving the problem that the combined abbreviated terms in the Chinese medicine term words cannot be recognized, and improving the accuracy of Chinese medicine term entity recognition.
Drawings
FIG. 1 is a flow chart of a method for recognizing terms of Chinese traditional medicine according to an embodiment of the present application;
fig. 2 is a schematic block diagram of a device for recognizing terms of traditional Chinese medicine according to an embodiment of the present application;
fig. 3 is a schematic diagram of an internal structure of an electronic device for implementing a method for recognizing terms of traditional Chinese medicine according to an embodiment of the present application;
the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The embodiment of the application provides a traditional Chinese medicine term identification method. The execution subject of the Chinese medicine term identification method includes, but is not limited to, at least one of a server, a terminal and the like which can be configured to execute the method provided by the embodiment of the application. In other words, the Chinese medicine term recognition method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: the server can be an independent server, or can be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), basic cloud computing services such as big data and artificial intelligent platforms, and the like.
Referring to a schematic flow chart of a method for identifying Chinese medicine terms according to an embodiment of the present application shown in fig. 1, in an embodiment of the present application, the method for identifying Chinese medicine terms includes the following steps:
s1, acquiring a traditional Chinese medicine diagnosis text to be identified, and carrying out entity identification on the traditional Chinese medicine diagnosis text to obtain a traditional Chinese medicine term entity;
the Chinese medicine diagnosis text in the embodiment of the application is a Chinese medicine diagnosis text which needs to be identified in a standardized way.
Further, in the case that a large number of abbreviated Chinese medicine terms exist in the Chinese medicine diagnosis in the embodiment of the application, for example, the Chinese medicine terms "red tongue coating and yellow tongue coating" are standardized, and the standardized result is "red tongue coating" and "yellow tongue coating", so that in order to facilitate the semantic understanding of the Chinese medicine diagnosis text, the Chinese medicine term entity in the Chinese medicine diagnosis text needs to be identified and standardized, and therefore, the entity identification is performed on the Chinese medicine diagnosis text, and the Chinese medicine term entity is obtained.
In detail, in the embodiment of the present application, performing entity recognition on the diagnosis text to obtain a term entity of traditional Chinese medicine includes:
converting each character in the Chinese medicine diagnosis text into a character vector;
combining all the character vectors according to the sequence of the corresponding characters in the Chinese medicine diagnosis text to obtain a text matrix;
performing double affine transformation on the text matrix to obtain a text initial feature matrix;
carrying out multi-channel convolution on the text initial feature matrix to obtain a first text feature matrix corresponding to each channel;
stacking all the first text feature matrixes as layers to obtain a second text feature matrix;
carrying out channel feature compression on the second text feature matrix to obtain a third text feature matrix;
performing cross entropy loss conversion on each element of the third text feature matrix to obtain a target text feature matrix, wherein the dimension of the rows and columns of the target feature matrix is equal to the number of characters in the Chinese medicine diagnosis text;
constructing an alphabetic interval according to the rank order corresponding to the elements larger than a preset screening threshold value in the target text feature matrix;
and cutting characters of the Chinese medicine diagnosis text, the character sequence of which is in the character sequence interval, so as to obtain the Chinese medicine term entity.
Optionally, in the embodiment of the present application, the unique thermal algorithm and/or an Embedding layer of a deep learning model (such as a bert model) are used to convert characters into vectors, and a multi-layer perceptron is used to perform channel feature compression on the second text feature matrix, in the embodiment of the present application, each element in the third text feature matrix is input into a two-class cross entropy loss function as a predicted value, a tag value in the two-class cross entropy function is used as 1, and a loss classification value corresponding to each element in the third text feature matrix is calculated; and replacing the element corresponding to the third text feature matrix with each loss classification value to obtain the target text feature matrix.
Further, in the embodiment of the present application, constructing an endian section according to a rank order corresponding to an element greater than a preset screening threshold in the target text feature matrix includes:
selecting elements larger than a preset screening threshold value in the target text feature matrix to obtain target elements;
acquiring row sequence and column sequence of the target element in the target text feature matrix;
respectively making a column sequence and a row sequence of the target element into a left end point and a right end point of a section to obtain an initial character sequence section of the target element;
and carrying out interval alignment screening on all the initial character sequence intervals according to the endpoints of the initial character sequence intervals to obtain character sequence intervals.
Because of the fact that the entity nesting exists in the traditional Chinese medicine diagnosis text, in order to select a nested final entity, interval alignment screening is conducted on all initial character sequence intervals according to the endpoints of the initial character sequence intervals, and character sequence intervals are obtained.
Specifically, in the embodiment of the present application, performing interval alignment screening on all initial endian intervals according to the end points of the initial endian intervals to obtain endian intervals includes:
summarizing all the initial character sequence intervals to obtain an initial character sequence interval set;
extracting initial character sequence intervals with longest interval length in all initial character sequence intervals corresponding to the left end point of each interval in the initial character sequence interval set to obtain a target character sequence interval set;
and taking each initial endian section in the target endian section set as the endian section.
Further, in the embodiment of the present application, the characters of the character sequence in the Chinese medicine diagnosis text in the character sequence interval are cut to obtain the Chinese medicine term entity, for example: the Chinese medicine diagnosis text is 'yellow and greasy tongue of a patient', the character sequence interval is [4,6], and the fourth character to the sixth character in the Chinese medicine diagnosis text are divided, namely 'yellow and greasy tongue' is the identified Chinese medicine term entity.
In another embodiment of the present application, the entity of the traditional Chinese medicine term may be stored in a blockchain node, so as to improve the data access efficiency based on the high throughput characteristic of the blockchain node.
S2, acquiring a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary;
in the embodiment of the present application, the dictionary of standard Chinese medical terms is a dictionary containing different standard Chinese medical terms, the feature extraction model is a graph neural network model trained by using a dictionary tree constructed by the dictionary of standard Chinese medical terms, and the specific training process is the same as the training process of a general graph neural network model, and is not described herein.
S3, extracting features of the traditional Chinese medicine term entity by using the feature extraction model to obtain character feature vectors corresponding to each character in the traditional Chinese medicine term entity;
specifically, in the embodiment of the present application, the feature extraction model is used to perform feature extraction on the term entity of the traditional Chinese medicine, so as to obtain a character feature vector corresponding to each character in the term entity of the traditional Chinese medicine, including:
extracting layer elements of the second text feature matrix according to the sequence of each character in the Chinese medicine diagnosis and inquiry text in the Chinese medicine term entity to obtain an initial character vector of each character;
combining all the initial character vectors according to the sequence of the corresponding characters in the traditional Chinese medicine term entity to obtain an entity feature matrix;
inputting the entity feature matrix into the feature extraction model to obtain a feature extraction matrix;
and extracting columns in the feature extraction matrix based on the sequence of each character in the Chinese medicine term entity to obtain the character feature vector.
Specifically, in the embodiment of the present application, the number of columns in the feature extraction matrix is consistent with the number of characters in the term entity of traditional Chinese medicine.
For example: the sequence of a first character in the Chinese medicine diagnosis and inquiry text in the Chinese medicine term entity is a second character, and then all elements corresponding to a second row and a second column in the second text feature matrix are combined according to the sequence of layers to obtain an initial character vector of the character; and taking the first column in the feature extraction matrix as a character feature vector of the character.
S4, classifying the character feature vectors based on the standard Chinese medical science word dictionary to obtain analysis feature values of each standard Chinese medical science word in the standard Chinese medical science word dictionary;
in order to normalize the Chinese medicine term entity, the character feature vectors are subjected to cross entropy loss classification based on the standard Chinese medicine term dictionary to obtain the analysis feature value of each standard Chinese medicine term in the standard Chinese medicine term dictionary, and the standard Chinese medicine term in the standard Chinese medicine term dictionary is screened by utilizing the feature value to serve as a result after the Chinese medicine term entity is normalized.
Performing feature mapping classification on the character feature vectors based on a multi-layer perceptron to obtain word feature values of each standard Chinese medical term word corresponding to the character feature vectors in the standard Chinese medical term dictionary;
performing loss calculation according to the cross entropy loss function and the word feature value of each traditional Chinese medicine term word corresponding to the character feature vector to obtain a word initial analysis value of each standard traditional Chinese medicine term word corresponding to the character feature vector;
and calculating according to the initial analysis values of all the words of each standard traditional Chinese medicine word to obtain the analysis characteristic value of the standard traditional Chinese medicine word.
Further, in the embodiment of the application, feature mapping is performed on the character feature vector based on a pre-built multi-layer perceptron to obtain a word feature value of each standard Chinese medical term word in the standard Chinese medical term dictionary corresponding to the character feature vector, the classification is equivalent to the use of the character feature vector, each standard Chinese medical term word in the standard Chinese medical term dictionary is equivalent to a classified class, the character feature vector is input into the multi-layer perceptron to obtain a class prediction value of each Chinese medical term word, namely a word feature value, the number of output nodes of the multi-layer perceptron is equal to the total number of standard Chinese medical term words, each output node corresponds to one standard Chinese medical term word, and the cross entropy loss function is a multi-classification cross entropy loss function.
For example, in the embodiment of the present application, the word feature value of each standard chinese medical term corresponding to the character feature vector a is used as the word prediction value of the corresponding standard chinese medical term, the standard chinese medical term to be used for calculating the word initial analysis value is determined as the target word, the tag value of the target word in the cross entropy function is used as 1, and the tag values of the other standard chinese medical terms are used as 1, so as to calculate the word initial analysis value of the target word corresponding to the character feature vector a.
Specifically, in the embodiment of the present application, the calculating according to the initial analysis value of all the words of each standard chinese medical term to obtain the analysis feature value of the standard chinese medical term includes:
and carrying out average calculation according to the initial analysis values of all words corresponding to the standard Chinese medical term words to obtain the analysis characteristic value of the Chinese medical term value.
And S5, screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
In the embodiment of the present application, S5 includes:
sorting all the standard Chinese medicine term words in a descending order according to the size of the analysis characteristic value to obtain a Chinese medicine term sequence;
selecting all standard Chinese medicinal term words within a preset ordering range in the Chinese medicinal term sequence as standardized results of the Chinese medicinal term entities;
and summarizing all the standardized results to obtain the target identification result.
Further, after the screening result is used as the standardized result of the entity of the traditional Chinese medicine term in the implementation of the application, the method further comprises the following steps: and sending the target identification result to preset terminal equipment. The terminal equipment in the embodiment of the application comprises: intelligent terminals such as mobile phones, computers, tablets and the like.
FIG. 2 is a functional block diagram of the term identifying device in Chinese medicine according to the present application.
The recognition apparatus 100 of the chinese medical terms according to the present application may be installed in an electronic device. Depending on the functions implemented, the apparatus for recognizing Chinese medicine terms may include an entity recognition module 101, an entity feature extraction module 102, and an entity normalization module 103, which may also be referred to as a unit, refers to a series of computer program segments capable of being executed by a processor of an electronic device and of performing a fixed function, and stored in a memory of the electronic device.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the entity recognition module 101 is configured to obtain a diagnosis text of a traditional Chinese medicine to be recognized, and perform entity recognition on the diagnosis text of the traditional Chinese medicine to obtain a term entity of the traditional Chinese medicine;
the entity feature extraction module 102 is configured to obtain a standard Chinese medicine term word dictionary and a feature extraction model, where the feature extraction model is a graph neural network model trained by using a dictionary tree constructed by the standard Chinese medicine term word dictionary; extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity;
the entity standardization module 103 is configured to perform cross entropy loss classification on the character feature vector based on the standard chinese-medical term dictionary, so as to obtain an analysis feature value of each standard chinese-medical term in the standard chinese-medical term dictionary; and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
In detail, each module of the traditional Chinese medicine term identifying device 100 in the embodiment of the present application adopts the same technical means as the traditional Chinese medicine term identifying method described in fig. 1, and can produce the same technical effects, and will not be described herein.
Fig. 3 is a schematic structural diagram of an electronic device for implementing the method for recognizing the terms of traditional Chinese medicine according to the present application.
The electronic device may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a chinese medical term identification program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may in other embodiments also be an external storage device of the electronic device, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only for storing application software installed in an electronic device and various types of data, such as codes of a term recognition program of chinese medicine, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing Unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device and processes data by running or executing programs or modules (e.g., a chinese medicine term recognition program, etc.) stored in the memory 11, and calling data stored in the memory 11.
The communication bus 12 may be a peripheral component interconnect standard (PerIPheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the electronic device and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure classification circuit, power converter or inverter, power status indicator, etc. The electronic device may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.
Optionally, the communication interface 13 may comprise a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the communication interface 13 may further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The chinese medical term identification program stored in the memory 11 of the electronic device is a combination of a plurality of computer programs, which when run in the processor 10, can implement:
obtaining a traditional Chinese medicine diagnosis text to be identified, and carrying out entity identification on the traditional Chinese medicine diagnosis text to obtain a traditional Chinese medicine term entity;
obtaining a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary;
extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity;
performing cross entropy loss classification on the character feature vectors based on the standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary;
and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
In particular, the specific implementation method of the processor 10 on the computer program may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
Further, the electronic device integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or volatile. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
Embodiments of the present application may also provide a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, may implement:
obtaining a traditional Chinese medicine diagnosis text to be identified, and carrying out entity identification on the traditional Chinese medicine diagnosis text to obtain a traditional Chinese medicine term entity;
obtaining a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary;
extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity;
performing cross entropy loss classification on the character feature vectors based on the standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary;
and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims (10)

1. A method for identifying terms of traditional Chinese medicine, the method comprising:
obtaining a traditional Chinese medicine diagnosis text to be identified, and carrying out entity identification on the traditional Chinese medicine diagnosis text to obtain a traditional Chinese medicine term entity;
obtaining a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary;
extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity;
performing cross entropy loss classification on the character feature vectors based on the standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary;
and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
2. The method for recognizing the term of Chinese medicine according to claim 1, wherein the performing entity recognition on the text of diagnosis of Chinese medicine to obtain the entity of the term of Chinese medicine comprises:
converting each character in the Chinese medicine diagnosis text into a character vector;
combining all the character vectors according to the sequence of the corresponding characters in the Chinese medicine diagnosis text to obtain a text matrix;
performing double affine transformation on the text matrix to obtain a text initial feature matrix;
carrying out multi-channel convolution on the text initial feature matrix to obtain a first text feature matrix corresponding to each channel;
stacking all the first text feature matrixes as layers to obtain a second text feature matrix;
carrying out channel feature compression on the second text feature matrix to obtain a third text feature matrix;
performing cross entropy loss conversion on each element of the third text feature matrix to obtain a target text feature matrix, wherein the dimension of the rows and columns of the target feature matrix is equal to the number of characters in the Chinese medicine diagnosis text;
constructing an alphabetic interval according to the rank order corresponding to the elements larger than a preset screening threshold value in the target text feature matrix;
and cutting characters of the Chinese medicine diagnosis text, the character sequence of which is in the character sequence interval, so as to obtain the Chinese medicine term entity.
3. The method for recognizing terms of traditional Chinese medicine according to claim 2, wherein the constructing an alphabetic interval according to a rank order corresponding to elements greater than a preset screening threshold in the target text feature matrix comprises:
selecting elements larger than a preset screening threshold value in the target text feature matrix to obtain target elements;
acquiring row sequence and column sequence of the target element in the target text feature matrix;
respectively making a column sequence and a row sequence of the target element into a left end point and a right end point of a section to obtain an initial character sequence section of the target element;
and carrying out interval alignment screening on all the initial character sequence intervals according to the endpoints of the initial character sequence intervals to obtain character sequence intervals.
4. A method for recognizing a chinese medical term as recited in claim 3, wherein said performing a segment alignment filtering on all initial endian segments according to the end points of said initial endian segments to obtain endian segments comprises:
summarizing all the initial character sequence intervals to obtain an initial character sequence interval set;
extracting initial character sequence intervals with longest interval length in all initial character sequence intervals corresponding to the left end point of each interval in the initial character sequence interval set to obtain a target character sequence interval set;
and taking each initial endian section in the target endian section set as the endian section.
5. The method for recognizing traditional Chinese medicine terms according to claim 2, wherein the step of extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity comprises the steps of:
extracting layer elements of the second text feature matrix according to the sequence of each character in the Chinese medicine diagnosis and inquiry text in the Chinese medicine term entity to obtain an initial character vector of each character;
combining all the initial character vectors according to the sequence of the corresponding characters in the traditional Chinese medicine term entity to obtain an entity feature matrix;
inputting the entity feature matrix into the feature extraction model to obtain a feature extraction matrix;
and extracting columns in the feature extraction matrix based on the sequence of each character in the Chinese medicine term entity to obtain the character feature vector.
6. The method for recognizing traditional Chinese medicine terms according to any one of claims 1 to 5, wherein the screening all the standard traditional Chinese medicine terms based on the analysis eigenvalues to obtain the target recognition result of the traditional Chinese medicine diagnosis text comprises the following steps:
sorting all the standard Chinese medicine term words in a descending order according to the size of the analysis characteristic value to obtain a Chinese medicine term sequence;
selecting all standard Chinese medicinal term words within a preset ordering range in the Chinese medicinal term sequence as standardized results of the Chinese medicinal term entities;
and summarizing all the standardized results to obtain the target identification result.
7. A device for recognizing terms of traditional Chinese medicine, comprising:
the entity recognition module is used for acquiring a traditional Chinese medicine diagnosis text to be recognized, and carrying out entity recognition on the traditional Chinese medicine diagnosis text to obtain a traditional Chinese medicine term entity;
the entity feature extraction module is used for obtaining a standard Chinese medicine term word dictionary and a feature extraction model, wherein the feature extraction model is a graph neural network model trained by a dictionary tree constructed by the standard Chinese medicine term word dictionary; extracting the characteristics of the traditional Chinese medicine term entity by using the characteristic extraction model to obtain character characteristic vectors corresponding to each character in the traditional Chinese medicine term entity;
the entity standardization module is used for carrying out cross entropy loss classification on the character feature vectors based on the standard traditional Chinese medicine word dictionary to obtain an analysis feature value of each standard traditional Chinese medicine word in the standard traditional Chinese medicine word dictionary; and screening all the standard Chinese medicine term words based on the analysis characteristic values to obtain a target recognition result of the Chinese medicine diagnosis text.
8. The apparatus for recognizing terms of traditional Chinese medicine according to claim 7, wherein the screening all the standard terms of traditional Chinese medicine based on the analysis feature value to obtain the target recognition result of the diagnosis text of traditional Chinese medicine comprises:
sorting all the standard Chinese medicine term words in a descending order according to the size of the analysis characteristic value to obtain a Chinese medicine term sequence;
selecting all standard Chinese medicinal term words within a preset ordering range in the Chinese medicinal term sequence as standardized results of the Chinese medicinal term entities;
and summarizing all the standardized results to obtain the target identification result.
9. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of identifying chinese medical terms according to any one of claims 1 to 6.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method of identifying chinese medical terms according to any one of claims 1 to 6.
CN202310622149.2A 2023-05-30 2023-05-30 Method, device, equipment and storage medium for recognizing Chinese medicine terms Pending CN116644336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310622149.2A CN116644336A (en) 2023-05-30 2023-05-30 Method, device, equipment and storage medium for recognizing Chinese medicine terms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310622149.2A CN116644336A (en) 2023-05-30 2023-05-30 Method, device, equipment and storage medium for recognizing Chinese medicine terms

Publications (1)

Publication Number Publication Date
CN116644336A true CN116644336A (en) 2023-08-25

Family

ID=87614977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310622149.2A Pending CN116644336A (en) 2023-05-30 2023-05-30 Method, device, equipment and storage medium for recognizing Chinese medicine terms

Country Status (1)

Country Link
CN (1) CN116644336A (en)

Similar Documents

Publication Publication Date Title
CN112380343A (en) Problem analysis method, problem analysis device, electronic device and storage medium
CN113157927B (en) Text classification method, apparatus, electronic device and readable storage medium
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN113626607B (en) Abnormal work order identification method and device, electronic equipment and readable storage medium
CN113707303A (en) Method, device, equipment and medium for solving medical problems based on knowledge graph
CN113111162A (en) Department recommendation method and device, electronic equipment and storage medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN115238670B (en) Information text extraction method, device, equipment and storage medium
CN112733551A (en) Text analysis method and device, electronic equipment and readable storage medium
CN113505273B (en) Data sorting method, device, equipment and medium based on repeated data screening
CN113157739A (en) Cross-modal retrieval method and device, electronic equipment and storage medium
CN116702776A (en) Multi-task semantic division method, device, equipment and medium based on cross-Chinese and western medicine
CN116383766A (en) Auxiliary diagnosis method, device, equipment and storage medium based on multi-mode data
CN114596958B (en) Pathological data classification method, device, equipment and medium based on cascade classification
CN113656690B (en) Product recommendation method and device, electronic equipment and readable storage medium
CN116340516A (en) Entity relation cluster extraction method, device, equipment and storage medium
CN113435308B (en) Text multi-label classification method, device, equipment and storage medium
CN114610854A (en) Intelligent question and answer method, device, equipment and storage medium
CN116644336A (en) Method, device, equipment and storage medium for recognizing Chinese medicine terms
CN113343102A (en) Data recommendation method and device based on feature screening, electronic equipment and medium
CN112580505A (en) Method and device for identifying opening and closing states of network points, electronic equipment and storage medium
CN113361274B (en) Intent recognition method and device based on label vector, electronic equipment and medium
CN116741358A (en) Inquiry registration recommendation method, inquiry registration recommendation device, inquiry registration recommendation equipment and storage medium
CN116720525A (en) Disease auxiliary analysis method, device, equipment and medium based on inquiry data
CN116521867A (en) Text clustering method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination