CN113204969A - Medical named entity recognition model generation method and device and computer equipment - Google Patents
Medical named entity recognition model generation method and device and computer equipment Download PDFInfo
- Publication number
- CN113204969A CN113204969A CN202110605302.1A CN202110605302A CN113204969A CN 113204969 A CN113204969 A CN 113204969A CN 202110605302 A CN202110605302 A CN 202110605302A CN 113204969 A CN113204969 A CN 113204969A
- Authority
- CN
- China
- Prior art keywords
- entity
- medical named
- medical
- named entity
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 63
- 239000013598 vector Substances 0.000 claims description 34
- 230000007246 mechanism Effects 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 8
- 230000000873 masking effect Effects 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 3
- 201000010099 disease Diseases 0.000 description 17
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 17
- 208000019622 heart disease Diseases 0.000 description 11
- 208000007882 Gastritis Diseases 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000001684 chronic effect Effects 0.000 description 6
- 238000003745 diagnosis Methods 0.000 description 5
- 230000011218 segmentation Effects 0.000 description 5
- 208000024891 symptom Diseases 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 239000003814 drug Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 238000000968 medical method and process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 208000024172 Cardiovascular disease Diseases 0.000 description 2
- 241000590002 Helicobacter pylori Species 0.000 description 2
- 208000006673 asthma Diseases 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006806 disease prevention Effects 0.000 description 2
- 230000002526 effect on cardiovascular system Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 229940037467 helicobacter pylori Drugs 0.000 description 2
- 208000030603 inherited susceptibility to asthma Diseases 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 201000008827 tuberculosis Diseases 0.000 description 2
- 206010019233 Headaches Diseases 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 231100000869 headache Toxicity 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Landscapes
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The application belongs to the technical field of digital medical treatment and provides a method, a device and computer equipment for generating a medical named entity recognition model, wherein the method comprises the following steps: extracting input sentences from the electronic medical record list and acquiring artificially labeled medical named entities; embedding the medical named entities among all words of the input sentence to obtain a target input sentence; inputting a target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement; randomly selecting a shielding symbol to shield an initial entity with a preset proportion in a target input statement, and predicting an original entity of the shielded initial entity; and when the original entity is consistent with the artificially marked medical named entity, obtaining a medical named entity recognition model. The method and the system simultaneously utilize sentences and entities to train the natural language generation model, and improve the accuracy of the medical named entity recognition model in recognizing the medical named entities.
Description
Technical Field
The application relates to the technical field of digital medical treatment, in particular to a method and a device for generating a medical named entity recognition model and computer equipment.
Background
The medical named entities in the electronic medical record form identify the medical diagnostic process involving the patient. For example: the name of a disease, clinical symptoms, drug names, medical methods, and the like, are medical named entities with specific meaning. Due to the non-normative writing of doctors and the fact that the electronic medical record list contains a large number of professional term abbreviations and the like, the medical named entity recognition difficulty of the electronic medical record list is high.
The medical named entity recognition of the existing electronic medical record sheet based on the traditional and machine learning methods depends heavily on the quality and scale of training data, and when the quality of the training data is poor or the scale is limited, the recognition effect of the medical named entity recognition model obtained through training is poor.
Disclosure of Invention
The application mainly aims to provide a method and a device for generating a medical named entity recognition model and computer equipment, and improve the recognition effect of the medical named entity recognition model.
In order to achieve the above object, the present application provides a method for generating a medical named entity recognition model, which includes the following steps:
acquiring an electronic medical record list, extracting input sentences from the electronic medical record list, and acquiring artificially labeled medical named entities from a database according to the input sentences; the input statement is text data which is not marked with the medical named entity;
embedding the medical named entities among all words of the input sentence to obtain a target input sentence;
inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement;
randomly selecting a shielding symbol to shield an initial entity with a preset proportion in the target input statement, acquiring context information of the initial entity, and predicting an original entity of the shielded initial entity according to the context information;
judging whether the original entity is consistent with the artificially labeled medical named entity or not;
and when the original entity is determined to be consistent with the artificially labeled medical named entity, finishing the training of the natural language generation model to obtain a medical named entity recognition model.
Preferably, the step of embedding the medical named entities between words of the input sentence comprises:
when the medical named entity is detected to comprise a plurality of words, respectively calculating the similarity between each word and each word of the input sentence, and determining the average value of the embedding of each word into each position corresponding word of the input sentence according to the similarity; wherein the average value of word embedding is used to evaluate the reasonableness of embedding a word into each position of the input sentence;
determining the embedding position of each word corresponding to embedding according to the average value of each word corresponding to word embedding;
and embedding each word between each word of the input sentence according to the embedding position.
Preferably, the step of predicting an original entity of the masked initial entity according to the context information comprises:
predicting the shielded initial entities by adopting a softmax function according to the context information to obtain a plurality of predicted entities and probability values of the predicted entities;
and taking the predicted entity with the maximum probability value as the original entity.
Further, after the step of determining whether the original entity is consistent with the artificially labeled medical named entity, the method further includes:
when the original entity is determined to be inconsistent with the artificially labeled medical named entity, acquiring difference information of the original entity and the artificially labeled medical named entity;
and adjusting parameters of the natural language generation model according to the difference information, and training the natural language generation model after the parameters are adjusted again until the predicted original entity is consistent with the artificially labeled medical named entity.
Preferably, the step of inputting the target input sentence into a natural language generating model based on a Transformer architecture for training, and determining an initial entity of the target input sentence, includes:
in the training process of the natural language generation model, calculating the attention score of the target input statement by using various attention mechanisms, and screening out the attention mechanism with the highest attention score;
and determining an initial entity of the target input sentence according to the attention mechanism with the highest attention score.
Preferably, the step of judging whether the original entity is consistent with the artificially labeled medical named entity includes:
respectively converting the original entity and the medical named entity into Word vectors by using a Word2Vec Word vector model trained in advance;
calculating the cosine similarity between the word vector of the original entity and the word vector of the medical named entity;
judging whether the cosine similarity is larger than a preset similarity threshold value or not;
and if so, the original entity is consistent with the medical named entity marked manually.
Preferably, the step of extracting the input sentence from the electronic medical record list includes:
extracting text information from the electronic medical record list;
and carrying out data cleaning treatment on the text information to remove punctuation marks or special characters to obtain the input sentence.
The present application further provides a device for generating a medical named entity recognition model, which includes:
the acquisition module is used for acquiring the electronic medical record list, extracting input sentences from the electronic medical record list and acquiring artificially labeled medical named entities from a database according to the input sentences; the input statement is text data which is not marked with the medical named entity;
the embedding module is used for embedding the medical named entities into all words of the input sentence to obtain a target input sentence;
the training module is used for inputting the target input statement into a natural language generation model based on a Transformer architecture for training and determining an initial entity of the target input statement;
the prediction module is used for randomly selecting and masking the initial entity with a preset proportion in the target input statement by using a masking symbol, acquiring the context information of the initial entity and predicting the original entity of the masked initial entity according to the context information;
the judging module is used for judging whether the original entity is consistent with the artificially marked medical named entity or not;
and the determining module is used for finishing the training of the natural language generation model to obtain a medical named entity recognition model when the original entity is determined to be consistent with the artificially marked medical named entity.
The present application further provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
According to the generation method, the generation device, the computer equipment and the computer readable storage medium of the medical named entity recognition model, the electronic medical record list is obtained, the input sentences are extracted from the electronic medical record list, and the manually marked medical named entities are obtained from the database according to the input sentences; embedding the medical named entities among all words of the input sentence to obtain a target input sentence; then inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement; randomly selecting a shielding symbol to shield an initial entity with a preset proportion in a target input statement, and predicting the original entity of the shielded initial entity according to the context information; judging whether the original entity is consistent with the artificially marked medical named entity or not; and when the original entity is determined to be consistent with the artificially labeled medical named entity, finishing the training of the natural language generation model to obtain the medical named entity recognition model. Since the medical named entity is embedded among all words of the input sentence, the sentence and the entity are utilized to train the natural language generation model, and the training can be completed without massive training data; and randomly selecting a masking symbol to mask the initial entity with a preset proportion in the target input sentence, and judging the initial entity of the initial entity, so that the accuracy of the medical named entity recognition model obtained by training for recognizing the medical named entity is improved.
Drawings
Fig. 1 is a schematic flow chart of a method for generating a medical named entity recognition model according to an embodiment of the present application;
FIG. 2 is a block diagram illustrating the structure of a medical named entity recognition model generation apparatus according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, the present application provides a method for generating a medical named entity recognition model, where the medical named entity recognition model refers to recognizing a specific entity in a text, for example: person name, place name, etc. In the field of electronic medical records, the aim is to automatically identify and classify medical entities in the case, such as cold, hypertension, diabetes and the like in the disease field, or treatment means for fever, headache and the like in the symptom field. The medical named entity recognition model can be a deep learning network model, can be obtained after training based on electronic medical records and predetermined medical entity labels, and can output named entity recognition results corresponding to input text data after performing feature extraction on each input electronic medical record (text data), such as name of a person, place name, examination and inspection name, symptom and sign characteristics, disease diagnosis results and the like contained in the text data. According to the medical named entity recognition model, the medical named entities are embedded among all words of the input sentences during training, the shielding symbols are randomly selected to shield the initial entities in the target input sentences in a preset proportion, the original entities of the initial entities are judged, the recognition accuracy of the medical named entities through the medical named entity recognition model obtained through training is improved, the medical named entity recognition model can effectively recognize discontinuous and parallel medical named entities from the electronic medical record list to be recognized, and the recognition accuracy of the medical named entities is improved.
In one embodiment, the method for generating the medical named entity recognition model comprises the following steps:
s11, acquiring an electronic medical record list, extracting input sentences from the electronic medical record list, and acquiring artificially labeled medical named entities from a database according to the input sentences; the input statement is text data which is not marked with the medical named entity;
s12, embedding the medical named entities into the words of the input sentence to obtain a target input sentence;
s13, inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement;
s14, randomly selecting an occlusion symbol to occlude an initial entity with a preset proportion in the target input statement, acquiring context information of the initial entity, and predicting an original entity of the occluded initial entity according to the context information;
s15, judging whether the original entity is consistent with the artificially labeled medical named entity;
and S16, finishing the training of the natural language generation model when the original entity is determined to be consistent with the artificially labeled medical named entity, and obtaining the medical named entity recognition model.
As described in step S11, the electronic medical record sheet is a medical diagnosis book issued by the doctor to the patient during the patient visit, and the electronic medical record sheet may include the name, identification number, name of the patient, name of the hospital, and the like. The disease name is also referred to herein as a medical named entity. Such as heart disease, cardiovascular disease, tuberculosis, bronchial asthma, etc.
The input statement is continuous text information in the electronic medical record list, the input statement comprises medical named entities, but the medical named entities are not labeled, and the natural language generation model cannot be directly identified. For example, the input statement may be a preliminary diagnosis of chronic superficial gastritis suggesting further helicobacter pylori detection. Wherein the medical named entity is chronic superficial gastritis.
The method and the system can also manually mark the medical named entities in advance, can introduce an expert mechanism, and can mark the medical named entities by the experts, so that the accuracy of manual marking is improved, and the manually marked medical named entities are stored in the database for reference. When the input sentence is obtained, the manually marked medical named entity corresponding to the field is obtained from the database according to the field of the input sentence.
As described in step S12, in this step, the medical named entity is embedded between words of the input sentence, so that the target input sentence is obtained. When embedding, the medical named entity marked manually can be embedded into the position of the head end, the middle part or the tail end of the input sentence, and can also be embedded behind the main body of the input sentence.
As described above in step S13, the Transformer architecture is the first one that relies entirely on self-attention to compute the representations of inputs and outputs, without using a conversion model of a sequence-aligned recurrent neural network or a convolutional neural network. It handles the dependencies between inputs and outputs using an attention mechanism and is completely recursive. The method comprises the steps of inputting a target input sentence into a natural language generation model based on a Transformer architecture for training, converting each word in the target input sentence input by a user into a word embedding vector through an embedding layer by the natural language generation model of the Transformer architecture, simultaneously obtaining position embedding and sentence type embedding of each word in the target input sentence, forming the embedding vector by the word embedding vector, the position embedding and the sentence type embedding of each word, weighting a hidden state sequence of the embedding vector to obtain a weighted hidden state sequence, processing the output of the previous Transformer layer through other Transformer layers of the natural language generation model in sequence to obtain a hidden state sequence subjected to multiple layers of complex weighting, wherein the hidden state sequence subjected to multiple layers of complex weighting is a high-dimensional matrix, and the CRF layer of the natural language generation model outputs a word segmentation sequence of the target input sentence based on the high-dimensional matrix, therefore, the initial entity of the target input sentence is determined according to the word segmentation sequence, and the sentence and the entity are simultaneously utilized to train the natural language generation model, so that the training can be completed without massive training data. Wherein the word embedding vector represents the information of each word itself; position embedding refers to encoding position information of each word into a feature vector; sentence category embedding is used to distinguish different sentences.
For the whole medical system, the profession and the complexity of medical terms, the medical named entity identification can effectively extract entities with specific meanings such as disease names, clinical symptoms, drug names, medical methods and the like, and under the background of big data, the data extraction is significant for the analysis of diseases, and the analysis of related diseases is significant for the early perception and prevention of diseases.
The word expression contextualized based on the deep neural network transducer is not suitable for the task of naming an entity, and although it can obtain the complexity between words by using the self-attention mechanism to associate words with each other multiple times, it is easier to predict the "heart" inside the "heart disease" than to predict the entity of the heart disease. Therefore, during training, the scene information of the electronic medical record list can be extracted, and the natural language generation model is trained according to the scene information by further combining the scene information of the electronic medical record list. For example, for the scene information of the electronic medical record sheet, it describes the disease type of the patient rather than the organ more probably, so for the medical named entity identification of the electronic medical record sheet, the probability of the medical named entity hit of the disease type is higher, i.e. the probability of predicting the medical named entity of "heart disease" is higher than that of "heart".
As described in the step S14, the present invention can randomly replace the initial entity in the target input sentence with the special MASK mark MASK, and then respectively obtain the context information of the initial entity with the initial entity as the center, and predict the original entity of the initial entity that is masked according to the context information.
As described in step S15, the original entity predicted by the natural language generation model is compared with the artificially labeled medical named entity in this step, and whether the original entity is consistent with the artificially labeled medical named entity is determined.
As described in step S16, when it is determined that the original entity is consistent with the artificially labeled medical named entity, the training of the natural language generation model is completed, and the trained natural language generation model is used as the medical named entity recognition model.
According to the generation method of the medical named entity recognition model, the electronic medical record list is obtained, the input sentences are extracted from the electronic medical record list, and the manually marked medical named entities are obtained from the database according to the input sentences; embedding the medical named entities among all words of the input sentence to obtain a target input sentence; then inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement; randomly selecting an occlusion symbol to occlude an initial entity with a preset proportion in a target input statement, and predicting the occluded initial entity of the initial entity according to context information; judging whether the original entity is consistent with the artificially marked medical named entity or not; and when the original entity is determined to be consistent with the artificially labeled medical named entity, finishing the training of the natural language generation model to obtain the medical named entity recognition model. Since the medical named entity is embedded among all words of the input sentence, the sentence and the entity are utilized to train the natural language generation model, and the training can be completed without massive training data; and randomly selecting a masking symbol to mask the initial entity with a preset proportion in the target input sentence, and judging the initial entity of the initial entity, so that the accuracy of the medical named entity recognition model obtained by training for recognizing the medical named entity is improved.
In an embodiment, as described in the step S12, the step of embedding the medical named entity between the words of the input sentence may specifically include:
s121, when the medical named entity is detected to comprise a plurality of words, respectively calculating the similarity between each word and each word of the input sentence, and determining the average value of the embedding of each word into the corresponding word at each position of the input sentence according to the similarity; wherein the average value of word embedding is used to evaluate the reasonableness of embedding a word into each position of the input sentence;
s122, determining the embedding position of each word corresponding to embedding according to the average value of each word corresponding to word embedding;
and S123, embedding each word between each word of the input sentence according to the embedding position.
As described in step S121, if a medical named entity includes a plurality of words, an average value of word embeddings corresponding to each word is calculated. During calculation, each word and each word of the input sentence are respectively converted into a vector form, the cosine similarity between each word and each word of the input sentence is calculated, and the similarity is converted into an average value. The average value is used for evaluating the reasonability of each position of the word embedded input sentence, the larger the similarity is, the higher the average value is, the more excellent the reasonability of the corresponding embedded position is, and the recognition effect of the trained medical named entity recognition model is better.
As described in step S122, this step determines the embedding position where each word is embedded according to the average value of word embedding, that is, the embedding position corresponding to the maximum average value. For example, when the average of chronic superficial gastritis among the medically named entities is greatest after the subject, then chronic superficial gastritis is the location where this medically named entity is embedded after the subject.
In step S123, in this step, each word is respectively embedded between each word of the input sentence according to the embedding position of each word, so that the recognition effect of the trained medical named entity recognition model is better.
In an embodiment, in step S14, the step of predicting the original entity of the initial entity that is occluded according to the context information may specifically include:
s141, predicting the shielded initial entities by adopting a softmax function according to the context information to obtain a plurality of predicted entities and probability values of the predicted entities;
and S142, taking the predicted entity with the maximum probability value as the original entity.
As described in step S141, the softmax function is also called a normalized exponential function. The method is a popularization of a two-classification function sigmoid on multi-classification, and aims to show the multi-classification result in a probability form. Therefore, the masked initial entities can be predicted by adopting a softmax function, a plurality of predicted entities of each masked initial entity and the probability values of the predicted entities are predicted by combining the context information, and the predicted entity with the maximum probability value is screened out. Wherein the probability value is used to characterize the likelihood of each predicted entity being the original entity. For example, assuming that the predicted entities of the masked initial entities are heart, heart disease and cardiovascular, respectively, the probability value of the heart is 90%, the probability value of the heart disease is 99%, and the probability value of the cardiovascular is 80%, the predicted entity with the highest probability value is the heart disease. As described in step S142, the predicted entity with the highest probability value is used as the original entity in this step, so as to improve the prediction accuracy of the medical named entity.
In an embodiment, in step S15, the step of determining whether the original entity is consistent with the manually labeled medical named entity may further include:
when the original entity is determined to be inconsistent with the artificially labeled medical named entity, acquiring difference information of the original entity and the artificially labeled medical named entity;
and adjusting parameters of the natural language generation model according to the difference information, and training the natural language generation model after the parameters are adjusted again until the predicted original entity is consistent with the artificially labeled medical named entity.
In this embodiment, a self-checking mechanism may be introduced, and when it is determined that the original entity is inconsistent with the artificially labeled medical named entity, difference information between the original entity and the artificially labeled medical named entity is obtained, parameters of the natural language generation model are adjusted according to the difference information, and the natural language generation model after the parameters are adjusted is retrained again until the predicted original entity is consistent with the artificially labeled medical named entity.
In addition, the training result of the medical named entity recognition model can be adjusted and corrected by expert examination and correction, the corpus can be updated in time, the accuracy of the medical named entity recognition model generation is ensured, errors can not be repeatedly made for the same errors, and the medical named entity recognition model is more and more intelligent.
In an embodiment, in step S13, the step of inputting the target input sentence into a natural language generating model based on a Transformer architecture for training, and determining an initial entity of the target input sentence may specifically include:
s131, in the training process of the natural language generation model, calculating the attention score of the target input sentence by using multiple attention mechanisms, and screening out the attention mechanism with the highest attention score;
and S132, determining the initial entity of the target input sentence according to the attention mechanism with the highest attention score.
In the present embodiment, an Attention Mechanism (Attention Mechanism) in a neural network is a resource allocation scheme that allocates computing resources to more important tasks while solving the information overload problem in the case of limited computing power. In neural network learning, generally speaking, the more parameters of a model, the stronger the expression ability of the model, and the larger the amount of information stored by the model, but this may cause a problem of information overload. By introducing an attention mechanism, information which is more critical to the current task is focused in a plurality of input information, the attention degree to other information is reduced, and even irrelevant information is filtered, so that the problem of information overload can be solved, and the efficiency and the accuracy of task processing are improved.
As described in step S131 above, in the training process of the natural language generation model, the attention scores of the target input sentences may be calculated using a plurality of attention mechanisms, and the attention mechanism with the highest attention score is selected. Wherein the attention score is used to evaluate whether each attention mechanism conforms to the training of the natural language generation model.
As described in step S132, the present step screens out the attention mechanism with the highest attention score, and determines the initial entity of the target input sentence according to the attention mechanism with the highest attention score, so as to improve the recognition accuracy of the initial entity.
In one embodiment, the step of determining whether the original entity is consistent with the manually labeled medical named entity includes:
respectively converting the original entity and the medical named entity into Word vectors by using a Word2Vec Word vector model trained in advance;
calculating the cosine similarity between the word vector of the original entity and the word vector of the medical named entity;
judging whether the cosine similarity is larger than a preset similarity threshold value or not;
and if so, the original entity is consistent with the medical named entity marked manually.
In this embodiment, the Word2Vec Word vector model trained in advance can be used to convert the original entity and the medical named entity into Word vectors respectively, the cosine similarity between the Word vectors of the original entity and the Word vectors of the medical named entity is calculated respectively, and whether the cosine similarity is greater than a preset similarity threshold is judged; wherein the preset similarity threshold is 0.9. And when the cosine similarity is greater than the preset similarity threshold, the original entity is consistent with the artificially marked medical named entity, so that the judgment accuracy of the original entity is improved.
The Word2Vec Word vector model is a model for learning semantic knowledge from a large amount of texts and adopts an unsupervised mode. The method is characterized in that a large amount of texts are trained, words in the texts are represented in a vector form, the vector is called a word vector, and the relation between two words can be known by calculating the distance between the word vectors of the two words.
In one embodiment, the step of extracting the input sentences from the electronic medical record list includes:
extracting text information from the electronic medical record list;
and carrying out data cleaning treatment on the text information to remove punctuation marks or special characters to obtain the input sentence.
In this embodiment, the ending segmentation word may be utilized to split the text information into a set of phrases, traverse all the words in all the corpus, obtain stop words and near-sense words in the set of phrases according to the corpus, delete the stop words therein, and perform data cleaning on the text information to remove punctuation marks or special characters, so as to obtain the input sentence.
Referring to fig. 2, an embodiment of the present application further provides a device for generating a medical named entity recognition model, including:
the acquisition module 11 is configured to acquire an electronic medical record form, extract an input statement from the electronic medical record form, and acquire a manually labeled medical named entity from a database according to the input statement; the input statement is text data which is not marked with the medical named entity;
the embedding module 12 is configured to embed the medical named entity between words of the input sentence to obtain a target input sentence;
a training module 13, configured to input the target input sentence into a natural language generation model based on a Transformer architecture for training, and determine an initial entity of the target input sentence;
a prediction module 14, configured to randomly select an occlusion symbol to occlude an initial entity in the target input sentence at a predetermined ratio, obtain context information of the initial entity, and predict an original entity of the occluded initial entity according to the context information;
the judging module 15 is used for judging whether the original entity is consistent with the artificially labeled medical named entity;
and the determining module 16 is configured to complete training of the natural language generation model to obtain a medical named entity recognition model when it is determined that the original entity is consistent with the artificially labeled medical named entity.
The electronic medical record sheet is a disease diagnosis book which is prescribed by a doctor for a patient in the process of seeing a doctor, and the electronic medical record sheet can contain the name, the identification number, the disease name, the hospital name and the like of the patient. The disease name is also referred to herein as a medical named entity. Such as heart disease, cardiovascular disease, tuberculosis, bronchial asthma, etc.
The input statement is continuous text information in the electronic medical record list, the input statement comprises medical named entities, but the medical named entities are not labeled, and the natural language generation model cannot be directly identified. For example, the input statement may be a preliminary diagnosis of chronic superficial gastritis suggesting further helicobacter pylori detection. Wherein the medical named entity is chronic superficial gastritis.
The method and the system can also manually mark the medical named entities in advance, can introduce an expert mechanism, and can mark the medical named entities by the experts, so that the accuracy of manual marking is improved, and the manually marked medical named entities are stored in the database for reference. When the input sentence is obtained, the manually marked medical named entity corresponding to the field is obtained from the database according to the field of the input sentence.
Further, the medical named entities are embedded among all the words of the input sentence to obtain the target input sentence. When embedding, the medical named entity marked manually can be embedded into the position of the head end, the middle part or the tail end of the input sentence, and can also be embedded behind the main body of the input sentence.
The Transformer architecture is the first conversion model that relies entirely on self-attention to compute the representations of inputs and outputs, without using a sequence-aligned recurrent neural network or a convolutional neural network. It handles the dependencies between inputs and outputs using an attention mechanism and is completely recursive. The method comprises the steps of inputting a target input sentence into a natural language generation model based on a Transformer architecture for training, converting each word in the target input sentence input by a user into a word embedding vector through an embedding layer by the natural language generation model of the Transformer architecture, simultaneously obtaining position embedding and sentence type embedding of each word in the target input sentence, forming the embedding vector by the word embedding vector, the position embedding and the sentence type embedding of each word, weighting a hidden state sequence of the embedding vector to obtain a weighted hidden state sequence, processing the output of the previous Transformer layer through other Transformer layers of the natural language generation model in sequence to obtain a hidden state sequence subjected to multiple layers of complex weighting, wherein the hidden state sequence subjected to multiple layers of complex weighting is a high-dimensional matrix, and the CRF layer of the natural language generation model outputs a word segmentation sequence of the target input sentence based on the high-dimensional matrix, therefore, the initial entity of the target input sentence is determined according to the word segmentation sequence, and the sentence and the entity are simultaneously utilized to train the natural language generation model, so that the training can be completed without massive training data. Wherein the word embedding vector represents the information of each word itself; position embedding refers to encoding position information of each word into a feature vector; sentence category embedding is used to distinguish different sentences.
For the whole medical system, the profession and the complexity of medical terms, the medical named entity identification can effectively extract entities with specific meanings such as disease names, clinical symptoms, drug names, medical methods and the like, and under the background of big data, the data extraction is significant for the analysis of diseases, and the analysis of related diseases is significant for the early perception and prevention of diseases.
The word expression contextualized based on the deep neural network transducer is not suitable for the task of naming an entity, and although it can obtain the complexity between words by using the self-attention mechanism to associate words with each other multiple times, it is easier to predict the "heart" inside the "heart disease" than to predict the entity of the heart disease. Therefore, during training, the scene information of the electronic medical record list can be extracted, and the natural language generation model is trained according to the scene information by further combining the scene information of the electronic medical record list. For example, for the scene information of the electronic medical record sheet, it describes the disease type of the patient rather than the organ more probably, so for the medical named entity identification of the electronic medical record sheet, the probability of the medical named entity hit of the disease type is higher, i.e. the probability of predicting the medical named entity of "heart disease" is higher than that of "heart".
The invention can randomly adopt special MASK symbols MASK to replace entities with preset proportion in target input sentences, then respectively obtain the context information of the initial entities by taking the initial entities as the center, and predict the original entities of the initial entities which are masked according to the context information.
In addition, the original entity predicted by the natural language generation model is compared with the artificially labeled medical named entity, and whether the original entity is consistent with the artificially labeled medical named entity or not is judged.
And when the original entity is determined to be consistent with the artificially labeled medical named entity, the training of the natural language generation model is finished, and the trained natural language generation model is used as the medical named entity recognition model.
As described above, it can be understood that each component of the device for generating a medical named entity recognition model provided in the present application may implement the function of any one of the methods for generating a medical named entity recognition model described above, and the specific structure is not described in detail.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for data such as a relational extraction model, a drug discovery model and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of generating a medical named entity recognition model.
The processor executes the method for generating the medical named entity recognition model, and the method comprises the following steps:
acquiring an electronic medical record list, extracting input sentences from the electronic medical record list, and acquiring artificially labeled medical named entities from a database according to the input sentences; the input statement is text data which is not marked with the medical named entity;
embedding the medical named entities among all words of the input sentence to obtain a target input sentence;
inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement;
randomly selecting a shielding symbol to shield an initial entity with a preset proportion in the target input statement, acquiring context information of the initial entity, and predicting an original entity of the shielded initial entity according to the context information;
judging whether the original entity is consistent with the artificially labeled medical named entity or not;
and when the original entity is determined to be consistent with the artificially labeled medical named entity, finishing the training of the natural language generation model to obtain a medical named entity recognition model.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing a method for generating a medical named entity recognition model, including the steps of:
acquiring an electronic medical record list, extracting input sentences from the electronic medical record list, and acquiring artificially labeled medical named entities from a database according to the input sentences; the input statement is text data which is not marked with the medical named entity;
embedding the medical named entities among all words of the input sentence to obtain a target input sentence;
inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement;
randomly selecting a shielding symbol to shield an initial entity with a preset proportion in the target input statement, acquiring context information of the initial entity, and predicting an original entity of the shielded initial entity according to the context information;
judging whether the original entity is consistent with the artificially labeled medical named entity or not;
and when the original entity is determined to be consistent with the artificially labeled medical named entity, finishing the training of the natural language generation model to obtain a medical named entity recognition model.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
To sum up, the most beneficial effect of this application lies in:
according to the generation method, the generation device, the computer equipment and the computer readable storage medium of the medical named entity recognition model, the electronic medical record list is obtained, the input sentences are extracted from the electronic medical record list, and the manually marked medical named entities are obtained from the database according to the input sentences; embedding the medical named entities among all words of the input sentence to obtain a target input sentence; then inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement; randomly selecting an occlusion symbol to occlude an initial entity with a preset proportion in a target input statement, and predicting the occluded initial entity of the initial entity according to context information; judging whether the original entity is consistent with the artificially marked medical named entity or not; and when the original entity is determined to be consistent with the artificially labeled medical named entity, finishing the training of the natural language generation model to obtain the medical named entity recognition model. Since the medical named entity is embedded among all words of the input sentence, the sentence and the entity are utilized to train the natural language generation model, and the training can be completed without massive training data; and randomly selecting a masking symbol to mask the initial entity with a preset proportion in the target input sentence, and judging the initial entity of the initial entity, so that the accuracy of the medical named entity recognition model obtained by training for recognizing the medical named entity is improved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.
Claims (10)
1. A generation method of a medical named entity recognition model is characterized by comprising the following steps:
acquiring an electronic medical record list, extracting input sentences from the electronic medical record list, and acquiring artificially labeled medical named entities from a database according to the input sentences; the input statement is text data which is not marked with the medical named entity;
embedding the medical named entities among all words of the input sentence to obtain a target input sentence;
inputting the target input statement into a natural language generation model based on a Transformer architecture for training, and determining an initial entity of the target input statement;
randomly selecting a shielding symbol to shield an initial entity with a preset proportion in the target input statement, acquiring context information of the initial entity, and predicting an original entity of the shielded initial entity according to the context information;
judging whether the original entity is consistent with the artificially labeled medical named entity or not;
and when the original entity is determined to be consistent with the artificially labeled medical named entity, finishing the training of the natural language generation model to obtain a medical named entity recognition model.
2. The method for generating a medical named entity recognition model according to claim 1, wherein the step of embedding the medical named entities between words of the input sentence comprises:
when the medical named entity is detected to comprise a plurality of words, respectively calculating the similarity between each word and each word of the input sentence, and determining the average value of the embedding of each word into each position corresponding word of the input sentence according to the similarity; wherein the average value of word embedding is used to evaluate the reasonableness of embedding a word into each position of the input sentence;
determining the embedding position of each word corresponding to embedding according to the average value of each word corresponding to word embedding;
and embedding each word between each word of the input sentence according to the embedding position.
3. The method for generating a medical named entity recognition model according to claim 1, wherein the step of predicting the original entity of the masked initial entity according to the context information comprises:
predicting the shielded initial entities by adopting a softmax function according to the context information to obtain a plurality of predicted entities and probability values of the predicted entities;
and taking the predicted entity with the maximum probability value as the original entity.
4. The method for generating a medical named entity recognition model according to claim 1, wherein after the step of determining whether the original entity is consistent with the manually labeled medical named entity, the method further comprises:
when the original entity is determined to be inconsistent with the artificially labeled medical named entity, acquiring difference information of the original entity and the artificially labeled medical named entity;
and adjusting parameters of the natural language generation model according to the difference information, and training the natural language generation model after the parameters are adjusted again until the predicted original entity is consistent with the artificially labeled medical named entity.
5. The method for generating a medical named entity recognition model according to claim 1, wherein the step of inputting the target input sentence into a natural language generation model based on a Transformer architecture for training and determining an initial entity of the target input sentence comprises:
in the training process of the natural language generation model, calculating the attention score of the target input statement by using various attention mechanisms, and screening out the attention mechanism with the highest attention score;
and determining an initial entity of the target input sentence according to the attention mechanism with the highest attention score.
6. The method for generating the medical named entity recognition model according to claim 1, wherein the step of determining whether the original entity is consistent with the manually labeled medical named entity comprises:
respectively converting the original entity and the medical named entity into Word vectors by using a Word2Vec Word vector model trained in advance;
calculating the cosine similarity between the word vector of the original entity and the word vector of the medical named entity;
judging whether the cosine similarity is larger than a preset similarity threshold value or not;
if so, judging that the original entity is consistent with the artificially labeled medical named entity.
7. The method for generating a medical named entity recognition model according to claim 1, wherein the step of extracting input sentences from the electronic medical record list comprises:
extracting text information from the electronic medical record list;
and carrying out data cleaning treatment on the text information to remove punctuation marks or special characters to obtain the input sentence.
8. An apparatus for generating a medical named entity recognition model, comprising:
the acquisition module is used for acquiring the electronic medical record list, extracting input sentences from the electronic medical record list and acquiring artificially labeled medical named entities from a database according to the input sentences; the input statement is text data which is not marked with the medical named entity;
the embedding module is used for embedding the medical named entities into all words of the input sentence to obtain a target input sentence;
the training module is used for inputting the target input statement into a natural language generation model based on a Transformer architecture for training and determining an initial entity of the target input statement;
the prediction module is used for randomly selecting and masking the initial entity with a preset proportion in the target input statement by using a masking symbol, acquiring the context information of the initial entity and predicting the original entity of the masked initial entity according to the context information;
the judging module is used for judging whether the original entity is consistent with the artificially marked medical named entity or not;
and the determining module is used for finishing the training of the natural language generation model to obtain a medical named entity recognition model when the original entity is determined to be consistent with the artificially marked medical named entity.
9. A computer device, comprising:
one or more processors;
a memory;
one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs configured to perform the method of generating a medical named entity recognition model according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method for generating a medical named entity recognition model according to any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110605302.1A CN113204969A (en) | 2021-05-31 | 2021-05-31 | Medical named entity recognition model generation method and device and computer equipment |
PCT/CN2021/109362 WO2022252378A1 (en) | 2021-05-31 | 2021-07-29 | Method and apparatus for generating medical named entity recognition model, and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110605302.1A CN113204969A (en) | 2021-05-31 | 2021-05-31 | Medical named entity recognition model generation method and device and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113204969A true CN113204969A (en) | 2021-08-03 |
Family
ID=77023836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110605302.1A Pending CN113204969A (en) | 2021-05-31 | 2021-05-31 | Medical named entity recognition model generation method and device and computer equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113204969A (en) |
WO (1) | WO2022252378A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761899A (en) * | 2021-09-07 | 2021-12-07 | 卫宁健康科技集团股份有限公司 | Medical text generation method, device, equipment and storage medium |
CN115935992A (en) * | 2022-11-23 | 2023-04-07 | 贝壳找房(北京)科技有限公司 | Named entity recognition method, device and storage medium |
CN117423470A (en) * | 2023-10-30 | 2024-01-19 | 盐城市第三人民医院 | Chronic disease clinical decision support system and construction method |
CN118114675A (en) * | 2024-04-29 | 2024-05-31 | 支付宝(杭州)信息技术有限公司 | Medical named entity recognition method and device based on large language model |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115879473B (en) * | 2022-12-26 | 2023-12-01 | 淮阴工学院 | Chinese medical named entity recognition method based on improved graph attention network |
CN116757204B (en) * | 2023-08-22 | 2023-10-31 | 北京亚信数据有限公司 | Medical name mapping method, training device, medium and equipment |
CN116860706B (en) * | 2023-09-04 | 2023-11-24 | 南昌协达科技发展有限公司 | Experimental data text storage method and system |
CN116909991B (en) * | 2023-09-12 | 2023-12-12 | 中国人民解放军总医院第六医学中心 | NLP-based scientific research archive management method and system |
CN117077679B (en) * | 2023-10-16 | 2024-03-12 | 之江实验室 | Named entity recognition method and device |
CN117195877B (en) * | 2023-11-06 | 2024-01-30 | 中南大学 | Word vector generation method, system and equipment for electronic medical record and storage medium |
CN118095285A (en) * | 2024-03-18 | 2024-05-28 | 中国医学科学院医学信息研究所 | Electronic medical record named entity recognition method and system based on deep learning |
CN118297069B (en) * | 2024-06-06 | 2024-08-30 | 北方健康医疗大数据科技有限公司 | Data management system, method, equipment and medium based on natural language processing |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897919A (en) * | 2017-02-28 | 2017-06-27 | 百度在线网络技术(北京)有限公司 | With the foundation of car type prediction model, information providing method and device |
CN110705293A (en) * | 2019-08-23 | 2020-01-17 | 中国科学院苏州生物医学工程技术研究所 | Electronic medical record text named entity recognition method based on pre-training language model |
CN111126068A (en) * | 2019-12-25 | 2020-05-08 | 中电云脑(天津)科技有限公司 | Chinese named entity recognition method and device and electronic equipment |
KR20200084436A (en) * | 2018-12-26 | 2020-07-13 | 주식회사 와이즈넛 | Aparatus for coherence analyzing between each sentence in a text document and method thereof |
CN111798259A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Application recommendation method and device, storage medium and electronic equipment |
CN112001177A (en) * | 2020-08-24 | 2020-11-27 | 浪潮云信息技术股份公司 | Electronic medical record named entity identification method and system integrating deep learning and rules |
KR20200141419A (en) * | 2020-12-04 | 2020-12-18 | 넷마블 주식회사 | Mehtod for extracting synonyms |
CN112307769A (en) * | 2019-07-29 | 2021-02-02 | 武汉Tcl集团工业研究院有限公司 | Natural language model generation method and computer equipment |
CN112464662A (en) * | 2020-12-02 | 2021-03-09 | 平安医疗健康管理股份有限公司 | Medical phrase matching method, device, equipment and storage medium |
WO2021051516A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Ancient poem generation method and apparatus based on artificial intelligence, and device and storage medium |
CN112818689A (en) * | 2019-11-15 | 2021-05-18 | 马上消费金融股份有限公司 | Entity identification method, model training method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150088511A1 (en) * | 2013-09-24 | 2015-03-26 | Verizon Patent And Licensing Inc. | Named-entity based speech recognition |
CN112434331B (en) * | 2020-11-20 | 2023-08-18 | 百度在线网络技术(北京)有限公司 | Data desensitization method, device, equipment and storage medium |
-
2021
- 2021-05-31 CN CN202110605302.1A patent/CN113204969A/en active Pending
- 2021-07-29 WO PCT/CN2021/109362 patent/WO2022252378A1/en active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897919A (en) * | 2017-02-28 | 2017-06-27 | 百度在线网络技术(北京)有限公司 | With the foundation of car type prediction model, information providing method and device |
KR20200084436A (en) * | 2018-12-26 | 2020-07-13 | 주식회사 와이즈넛 | Aparatus for coherence analyzing between each sentence in a text document and method thereof |
CN111798259A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Application recommendation method and device, storage medium and electronic equipment |
CN112307769A (en) * | 2019-07-29 | 2021-02-02 | 武汉Tcl集团工业研究院有限公司 | Natural language model generation method and computer equipment |
CN110705293A (en) * | 2019-08-23 | 2020-01-17 | 中国科学院苏州生物医学工程技术研究所 | Electronic medical record text named entity recognition method based on pre-training language model |
WO2021051516A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Ancient poem generation method and apparatus based on artificial intelligence, and device and storage medium |
CN112818689A (en) * | 2019-11-15 | 2021-05-18 | 马上消费金融股份有限公司 | Entity identification method, model training method and device |
CN111126068A (en) * | 2019-12-25 | 2020-05-08 | 中电云脑(天津)科技有限公司 | Chinese named entity recognition method and device and electronic equipment |
CN112001177A (en) * | 2020-08-24 | 2020-11-27 | 浪潮云信息技术股份公司 | Electronic medical record named entity identification method and system integrating deep learning and rules |
CN112464662A (en) * | 2020-12-02 | 2021-03-09 | 平安医疗健康管理股份有限公司 | Medical phrase matching method, device, equipment and storage medium |
KR20200141419A (en) * | 2020-12-04 | 2020-12-18 | 넷마블 주식회사 | Mehtod for extracting synonyms |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761899A (en) * | 2021-09-07 | 2021-12-07 | 卫宁健康科技集团股份有限公司 | Medical text generation method, device, equipment and storage medium |
CN115935992A (en) * | 2022-11-23 | 2023-04-07 | 贝壳找房(北京)科技有限公司 | Named entity recognition method, device and storage medium |
CN117423470A (en) * | 2023-10-30 | 2024-01-19 | 盐城市第三人民医院 | Chronic disease clinical decision support system and construction method |
CN117423470B (en) * | 2023-10-30 | 2024-04-23 | 盐城市第三人民医院 | Chronic disease clinical decision support system and construction method |
CN118114675A (en) * | 2024-04-29 | 2024-05-31 | 支付宝(杭州)信息技术有限公司 | Medical named entity recognition method and device based on large language model |
CN118114675B (en) * | 2024-04-29 | 2024-07-26 | 支付宝(杭州)信息技术有限公司 | Medical named entity recognition method and device based on large language model |
Also Published As
Publication number | Publication date |
---|---|
WO2022252378A1 (en) | 2022-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113204969A (en) | Medical named entity recognition model generation method and device and computer equipment | |
US11610678B2 (en) | Medical diagnostic aid and method | |
CN106874643B (en) | Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors | |
CN112800766B (en) | Active learning-based Chinese medical entity identification labeling method and system | |
CN109992664B (en) | Dispute focus label classification method and device, computer equipment and storage medium | |
CN111402979B (en) | Method and device for detecting consistency of disease description and diagnosis | |
Carchiolo et al. | Medical prescription classification: a NLP-based approach | |
CN108062978B (en) | Method for predicting main adverse cardiovascular events of patients with acute coronary syndrome | |
CN110931128B (en) | Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts | |
WO2020224433A1 (en) | Target object attribute prediction method based on machine learning and related device | |
CN109003677B (en) | Structured analysis processing method for medical record data | |
CN114550946A (en) | Medical data processing method, device and storage medium | |
CN113012774B (en) | Automatic medical record coding method and device, electronic equipment and storage medium | |
CN116911300A (en) | Language model pre-training method, entity recognition method and device | |
CN113080907B (en) | Pulse wave signal processing method and device | |
Gu et al. | Automatic generation of pulmonary radiology reports with semantic tags | |
CN117877660A (en) | Medical report acquisition method and system based on voice recognition | |
CN116595994A (en) | Contradictory information prediction method, device, equipment and medium based on prompt learning | |
CN117316369A (en) | Chest image diagnosis report automatic generation method for balancing cross-mode information | |
CN116719840A (en) | Medical information pushing method based on post-medical-record structured processing | |
CN115910327A (en) | Small sample cancer event analysis method, device, equipment and storage medium | |
CN116403706A (en) | Diabetes prediction method integrating knowledge expansion and convolutional neural network | |
CN115762721A (en) | Medical image quality control method and system based on computer vision technology | |
CN113408291B (en) | Training method, training device, training equipment and training storage medium for Chinese entity recognition model | |
CN113539520B (en) | Method, device, computer equipment and storage medium for realizing inquiry session |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |