CN117057350A - Chinese electronic medical record named entity recognition method and system - Google Patents
Chinese electronic medical record named entity recognition method and system Download PDFInfo
- Publication number
- CN117057350A CN117057350A CN202310995207.6A CN202310995207A CN117057350A CN 117057350 A CN117057350 A CN 117057350A CN 202310995207 A CN202310995207 A CN 202310995207A CN 117057350 A CN117057350 A CN 117057350A
- Authority
- CN
- China
- Prior art keywords
- model
- medical record
- electronic medical
- chinese electronic
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000013598 vector Substances 0.000 claims description 114
- 238000012549 training Methods 0.000 claims description 75
- 238000013256 Gubra-Amylin NASH model Methods 0.000 claims description 43
- 238000012360 testing method Methods 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 13
- 238000007781 pre-processing Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 8
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000015654 memory Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 239000012633 leachable Substances 0.000 claims description 2
- 230000008569 process Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 6
- 241000288105 Grus Species 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000000586 desensitisation Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The application provides a method and a system for identifying a Chinese electronic medical record named entity, and relates to the technical field of Chinese named entity identification. According to the application, the Chinese electronic medical record knowledge graph is embedded into the input data of the Chinese electronic medical record named entity recognition model, and the problems of one word with multiple meanings, multiple words with one meaning, non-unified and standard vocabulary abbreviations and the like can be effectively solved by information enhancement of the knowledge graph, so that the text features can be extracted more pertinently by a subsequent model, and the efficiency and the accuracy of Chinese electronic medical record named entity recognition are improved.
Description
Technical Field
The application relates to the technical field of Chinese named entity recognition, in particular to a method and a system for recognizing a Chinese electronic medical record named entity by integrating knowledge graph and word characteristics.
Background
The electronic medical record mainly comprises data such as patient course records, symptoms, examination methods, operation records and the like in a medical institution. The electronic medical record is not only limited to the static data, but also records the lifelong health state and medical information of the individual, and the electronic medical record can be throughout all the processes of recording, storing, transmitting, sharing and utilizing the patient information. Most of the electronic medical records of the medical institutions at the present stage are stored in unstructured text, and accurate extraction of substantial data from the text of the electronic medical records is beneficial to clinical medical research of hospitals and public welfare medical institutions. The named entity identification based on the electronic medical records can mine the association between various diseases and between physical signs and diagnosis, is beneficial to the treatment of comprehensive patients, and can provide auxiliary decision opinion for doctors, so that the cost is saved.
The main technical route of the Chinese electronic medical record (Chinese Electronic Medical Record, CEMR) named entity recognition is approximately the same as foreign, but the main technical route and the foreign have great differences in language characteristics, such as obvious English word boundaries, easier division of word prefixes and word suffixes, relatively fixed lexical syntax structures, and parts of Chinese sentences without obvious word segmentation, radical and the like can not be directly divided, and the lexical syntax structures are complex. In particular to the medical field, the problems of multiple Chinese medical professional vocabularies, long medical naming entity, multiple words and meaning, non-unified and normative vocabulary abbreviations and the like are not solved effectively. The accuracy of the existing Chinese electronic medical record named entity identification is low.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the application provides a method and a system for identifying a named entity of a Chinese electronic medical record, which solve the technical problem of low accuracy of identifying the named entity of the existing Chinese electronic medical record.
(II) technical scheme
In order to achieve the above purpose, the application is realized by the following technical scheme:
in a first aspect, the present application provides a method for identifying a named entity of a chinese electronic medical record, including:
s1, acquiring a Chinese electronic medical record data set and preprocessing;
s2, inserting triples corresponding to the preprocessed data in the Chinese electronic medical record data set in the Chinese electronic medical record knowledge graph into the original data to generate a Chinese electronic medical record data set fused with the knowledge graph;
s3, training an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph; respectively training an initial GAN model and a BiGRU model through the preprocessed Chinese electronic medical record data; training an initial GAT model through output data of the trained BiLSTM model, the GAN model and the BiGRU model; training an initial CRF model through the trained GAT model;
s4, integrating the trained BiLSTM model, the trained GAN model, the trained BiGRU model, the trained GAT model and the trained CRF model to obtain a Chinese electronic medical record named entity recognition model, wherein the Chinese electronic medical record named entity recognition model is used for recognizing named entities in Chinese electronic medical record data to be detected.
Preferably, the S2 includes:
for a given sentence s= [ x ] 1 ,x 2 ,...,x n ]Find each word x therein i If the corresponding triples exist in the knowledge graph, inserting the triples at the corresponding positions; if word x i The representation form of the triplet in the knowledge graph is K= [ (x) i ,r i0 ,x i0 )...,(x i ,r ik ,x ik )]The original sentence is changed into a new sentence integrated into the triplet of the knowledge graph, and the new sentence is in the form of s= [ x ] 0 ,x 1 ,...,x i (r i0 ,x i0 ),...,(r ik ,x ik ),...,x n ]。
Preferably, the S3 includes:
s301, training and testing an initial GAN model through a Chinese electronic medical record data set to obtain a trained GAN model, and extracting sequence vectors containing character features from the Chinese electronic medical record data set through the trained GAN model; training and testing an initial BiGRU model through a Chinese electronic medical record data set to obtain a trained BiGRU model, and extracting sequence vectors containing word features in the Chinese electronic medical record data set through the trained BiGRU model; splicing the sequence vector containing the character features and the sequence vector containing the word features to obtain the sequence vector containing the word features;
s302, training and testing an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph to obtain a trained BiLSTM model, and extracting a sequence vector containing character features of the knowledge graph from the Chinese electronic medical record data set through the trained BiLSTM model;
s303, training an initial GAT model through a sequence vector containing word features and a sequence vector containing character features and containing a knowledge graph to obtain a trained GAT model, and processing the sequence vector containing the word features and the sequence vector containing the character features through the trained GAT model to obtain a sequence vector containing context features;
s304, training an initial CRF model according to the sequence vector containing the context characteristics to obtain a trained CRF model.
Preferably, the processing the sequence vector containing the word features and the sequence vector containing the character features through the trained GAT model to obtain the sequence vector containing the context features includes:
calculating weights using a multi-headed attention mechanism, and incorporating a sequence vector h of word features i And a sequence vector h containing character features including a knowledge graph j Respectively mapped to K dimensions and their similarity scores eij calculated k :
Wherein, leakyReLU is a ReLU function with a negative slope, and II represents vector concatenation operation, a k Is a learnable weight vector;
normalizing the score using a softmax function to obtain the attention coefficient
Weighted summation is carried out on the attention coefficient and the feature vector of the neighbor node to obtain the representation h 'of the node i' i :
Wherein W is k Is a leachable weight matrix corresponding to the kth attention head, h' i I.e. a sequence vector containing contextual features.
Preferably, the Chinese electronic medical record named entity recognition model is used for recognizing named entities in Chinese electronic medical record data to be detected, and includes:
preprocessing the Chinese electronic medical record data to be predicted, inserting triplets in the Chinese electronic medical record knowledge graph corresponding to the Chinese electronic medical record data to be predicted into the Chinese electronic medical record data to be predicted, generating the Chinese electronic medical record data to be predicted of the fusion knowledge graph, and inputting the Chinese electronic medical record data to be predicted and the Chinese electronic medical record data to be predicted after preprocessing into a Chinese electronic medical record naming entity recognition model to obtain a recognition result.
In a second aspect, the present application provides a system for identifying named entities of a chinese electronic medical record, comprising:
the data acquisition module is used for acquiring and preprocessing a Chinese electronic medical record data set;
the knowledge graph embedding module is used for inserting triples corresponding to the preprocessed data in the Chinese electronic medical record data set in the Chinese electronic medical record knowledge graph into the original data to generate a Chinese electronic medical record data set fused with the knowledge graph;
the model training module is used for training an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph; respectively training an initial GAN model and a BiGRU model through the preprocessed Chinese electronic medical record data; training an initial GAT model through output data of the trained BiLSTM model, the GAN model and the BiGRU model; training an initial CRF model through the trained GAT model;
the integration module is used for integrating the trained BiLSTM model, the trained GAN model, the trained BiGRU model, the trained GAT model and the trained CRF model to obtain a Chinese electronic medical record named entity identification model, and the Chinese electronic medical record named entity identification model is used for identifying named entities in Chinese electronic medical record data to be tested.
Preferably, the inserting the triples corresponding to the data in the preprocessed data set of the chinese electronic medical record in the knowledge graph of the chinese electronic medical record into the raw data, and generating the data set of the chinese electronic medical record with the fused knowledge graph includes:
for a given sentence s= [ x ] 1 ,x 2 ,...,x n ]Find each word x therein i If the corresponding triples exist in the knowledge graph, inserting the triples at the corresponding positions; if word x i The representation form of the triplet in the knowledge graph is K= [ (x) i ,r i0 ,x i0 )...,(x i ,r ik ,x ik )]The original sentence is changed into a new sentence integrated into the triplet of the knowledge graph, and the new sentence is in the form of s= [ x ] 0 ,x 1 ,...,x i (r i0 ,x i0 ),...,(r ik ,x ik ),...,x n ]。
Preferably, the model training module includes:
the training unit of GAN and BiGRU is used for training and testing the initial GAN model through the Chinese electronic medical record data set to obtain a trained GAN model, and extracting sequence vectors containing character features in the Chinese electronic medical record data set through the trained GAN model; training and testing an initial BiGRU model through a Chinese electronic medical record data set to obtain a trained BiGRU model, and extracting sequence vectors containing word features in the Chinese electronic medical record data set through the trained BiGRU model; splicing the sequence vector containing the character features and the sequence vector containing the word features to obtain the sequence vector containing the word features;
the BiLSTM training unit is used for training and testing the initial BiLSTM model through a Chinese electronic medical record data set fused with the knowledge graph to obtain a trained BiLSTM model, and extracting a sequence vector containing character features of the knowledge graph from the Chinese electronic medical record data set through the trained BiLSTM model;
the GAT training unit is used for training an initial GAT model through the sequence vector containing the word features and the sequence vector containing the character features and comprising the knowledge graph to obtain a trained GAT model, and processing the sequence vector containing the word features and the sequence vector containing the character features through the trained GAT model to obtain the sequence vector containing the context features;
and the CRF training unit is used for training the initial CRF model according to the sequence vector containing the context characteristics to obtain a trained CRF model.
In a third aspect, the present application provides a computer readable storage medium storing a computer program for identifying a named entity of a chinese electronic medical record, wherein the computer program causes a computer to perform the method for identifying a named entity of a chinese electronic medical record as described above.
In a third aspect, the present application provides an electronic apparatus, comprising:
one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the chinese electronic medical record named entity recognition method as described above.
(III) beneficial effects
The application provides a method and a system for identifying a named entity of a Chinese electronic medical record. Compared with the prior art, the method has the following beneficial effects:
according to the application, the Chinese electronic medical record knowledge graph is embedded into the input data of the Chinese electronic medical record named entity recognition model, and the problems of one word with multiple meanings, multiple words with one meaning, non-unified and standard vocabulary abbreviations and the like can be effectively solved by information enhancement of the knowledge graph, so that the text features can be extracted more pertinently by a subsequent model, and the efficiency and the accuracy of Chinese electronic medical record named entity recognition are improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a method for identifying named entities of a Chinese electronic medical record in an embodiment of the application;
FIG. 2 is a schematic diagram of the BiLSTM model in an embodiment of the application;
fig. 3 is a schematic structural diagram of a GAN model according to an embodiment of the application;
FIG. 4 is a schematic diagram of a BiGRU model according to an embodiment of the application;
FIG. 5 is a schematic diagram of the structure of a GAT model according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a CRF model according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a model for identifying a named entity of a chinese electronic medical record according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application are clearly and completely described, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application solves the technical problem of low accuracy of the existing Chinese electronic medical record named entity identification by providing the Chinese electronic medical record named entity identification method and system, and improves the accuracy of the named entity identification of the electronic medical record.
The technical scheme in the embodiment of the application aims to solve the technical problems, and the overall thought is as follows:
the current Chinese electronic medical record naming entity recognition technology is mainly improved based on a model technology proposed abroad, such as LSTM, BERT and the like. However, both by LSTM or BERT, there are certain drawbacks, which, in total, include the following two drawbacks:
1. electronic medical record data has a plurality of medical proprietary words, so that word embedding in Chinese medical texts is inevitably subject to risks caused by word segmentation errors. Often, a word ambiguity phenomenon occurs, meaning of the same word or word expressed in different contexts is often quite different, and therefore when the word or word is recognized through a model later, the recognition accuracy is low.
2. The text data of the Chinese electronic medical record not only has conventional entities, but also has a plurality of entities with complex structures, such as nested entities, and the prior art can not accurately understand the entities and the relations in the text, so that the recognition accuracy is lower.
In order to solve the problems, the embodiment of the application provides a method and a system for identifying a Chinese electronic medical record naming entity by integrating a knowledge graph and word characteristics, wherein the Chinese electronic medical record knowledge graph is embedded into input data of a Chinese electronic medical record naming entity identification model, and the large-scale Chinese electronic medical record corpus and the knowledge graph are utilized to train an enhanced language representation model through information enhancement of the knowledge graph, so that vocabulary, syntax and knowledge information can be fully utilized at the same time, the problems of word ambiguity, multi-word meaning, non-unified vocabulary abbreviation and the like can be effectively solved, text characteristics can be extracted more specifically by a subsequent model, and the efficiency and accuracy of identifying the Chinese electronic medical record naming entity are improved.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
The embodiment of the application provides a method for identifying a named entity of a Chinese electronic medical record, which is shown in fig. 1 and comprises the following steps:
s1, acquiring a Chinese electronic medical record data set and preprocessing;
s2, inserting triples corresponding to the preprocessed data in the Chinese electronic medical record data set in the Chinese electronic medical record knowledge graph into the original data to generate a Chinese electronic medical record data set fused with the knowledge graph;
s3, training an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph; respectively training an initial GAN model and a BiGRU model through the preprocessed Chinese electronic medical record data; training an initial GAT model through output data of the trained BiLSTM model, the GAN model and the BiGRU model; training an initial CRF model through the trained GAT model;
s4, integrating the trained BiLSTM model, the trained GAN model, the trained BiGRU model, the trained GAT model and the trained CRF model to obtain a Chinese electronic medical record named entity recognition model, wherein the Chinese electronic medical record named entity recognition model is used for recognizing named entities in Chinese electronic medical record data to be detected.
According to the embodiment of the application, the Chinese electronic medical record knowledge graph is embedded into the input data of the Chinese electronic medical record named entity recognition model, and the problems of one-word polysemous, multiple-word polysemous, non-unified and normative vocabulary abbreviations and the like can be effectively solved through information enhancement of the knowledge graph, so that the text features can be extracted more specifically from the subsequent model, and the efficiency and the accuracy of Chinese electronic medical record named entity recognition are improved.
The following details the individual steps:
in step S1, a Chinese electronic medical record data set is acquired and preprocessed. The specific implementation process is as follows:
the pretreatment mainly comprises the following steps: data cleaning and standardization, desensitization treatment and manual sequence labeling. The data cleaning and the specification mainly process the conditions of wrongly written words, inconsistent front and rear words and the like. The desensitization processing refers to reducing the content of the electronic medical record for reducing the interference of entities irrelevant to medical clinical information on the premise of not changing the semantic expression of the electronic medical record and protecting the authenticity of the electronic medical record. Because the privacy information such as the name, the age, the address and the like of the patient is recorded in the electronic medical record, in order to protect the privacy of the patient, the desensitization treatment is required to be carried out on the patient information, so that a real clinical medical record corpus with privacy removed is obtained.
In step S2, triples corresponding to the data in the preprocessed chinese electronic medical record data set in the chinese electronic medical record knowledge graph are inserted into the original data, so as to generate a chinese electronic medical record data set with a fused knowledge graph. The specific implementation process is as follows:
electronic medical record data has a plurality of medical proprietary words, and the data has certain isomerism and ambiguity. According to the embodiment of the application, the triples corresponding to the data in the Chinese electronic medical record data set in the Chinese electronic medical record knowledge graph are inserted into the original data, so that the data in the Chinese electronic medical record data set fusing the knowledge graph is generated.
Meanwhile, the word vectors in the electronic medical record data have rich external knowledge through a data enhancement method. For a given sentence s= [ x ] 1 ,x 2 ,...,x n ]Find each word x therein i If the corresponding triples exist in the knowledge graph, the triples are inserted in the corresponding positions. Assume that the expression form of the triplet of the word in the knowledge graph is K= [ (x) i ,r i0 ,x i0 )...,(x i ,r ik ,x ik )]The original sentence becomes: s= [ x ] 0 ,x 1 ,...,x i (r i0 ,x i0 ),...,(r ik ,x ik ),...,x n ]In the form of (2), a three-phase integrated knowledge graph is formed through the stepNew sentences of tuples.
In the specific implementation process, the preprocessed Chinese electronic medical record data and the Chinese electronic medical record data set fused with the knowledge graph are respectively divided into a training set and a testing set.
In step S3, training an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph; respectively training an initial GAN model and a BiGRU model through the preprocessed Chinese electronic medical record data; training an initial GAT model through output data of the trained BiLSTM model, the GAN model and the BiGRU model; and training the initial CRF model through the trained GAT model. The specific implementation process is as follows:
the structures of the BiLSTM model, the GAN model, the BiGRU model, the GAT model and the CRF model are shown in FIGS. 2 to 6.
S301, training and testing an initial GAN model through a Chinese electronic medical record data set to obtain a trained GAN model, and extracting sequence vectors containing character features from the Chinese electronic medical record data set through the trained GAN model; training and testing an initial BiGRU model through a Chinese electronic medical record data set to obtain a trained BiGRU model, and extracting sequence vectors containing word features in the Chinese electronic medical record data set through the trained BiGRU model; and splicing the sequence vector containing the character features and the sequence vector containing the word features to obtain the sequence vector containing the word features. The method comprises the following steps:
training and testing an initial GAN model and a BiGRU model respectively through a training set and a testing set to obtain a trained GAN model and a trained BiGRU model, extracting sequence vectors containing character features and sequence vectors containing word features in the training set and the testing set through the trained GAN model and the trained BiGRU model, and then splicing to obtain the sequence vectors containing the word features as input of the next model.
GAN is an efficient method of extracting morphological information from characters and encoding it as a neural representation. The embodiment of the application uses a GAN model to extract the radicals of the fonts and the character structure embedding.
Bigreu incorporates two-way GRUs (gated loop units) to process historical and future input information in order to better capture contextual information in the sequence. A GRU is a recurrent neural network that updates hidden states by controlling the states of input gates, forget gates, and output gates. The BiGRU contains two GRUs, a forward GRU and a reverse GRU, which handle left to right and right to left inputs, respectively.
Forward GRU:
r t =σ(W r x t +U r h t-1 +b r )
z t =σ(W z x t +U z h t-1 +b z )
n t =tanh(W n x t +r t ⊙U n h t-1 +b n )
h t =(1-z t )⊙n t +z t ⊙h t-1
wherein r is t Is a reset gate vector, z t Is an update gate (n) vector t Is a new candidate hidden state, h t Is the hidden state of the current time step. X is x t Is the input vector, h t-1 Is the hidden state of the last time step, W r ,W z ,W n ,U r ,U z ,U n And b r ,b z ,b n Is a learnable parameter, σ is a sigmoid function, and Σ is an element-by-element multiplication.
Reverse GRU:
r t ′ =σ(W r ′ x t +U r ′ h ′ t+1 +b r ′ )
z t ′ =σ(W z ′ x t +U z ′ h ′ t+1 +b z ′ )
n ′ t =tanh(W n ′ x t +r t ′ ⊙U n ′ h ′ t+1 +b ′ n )
h ′ t =(1-z t ′ )⊙n ′ t +z t ′ ⊙h ′ t+1
wherein r is t ′ Is the reset gate vector of the inverted GRU, z t ′ Is to update the gate vector, n ′ t Is a new candidate hidden state, h ′ t Is the hidden state of the current time step, x t Is the input vector, h ′ t+1 Is the hidden state of the next time step, W r ′ ,W z ′ ,W n ′ ,U r ′ ,U z ′ ,U n ′ And b r ′ ,b z ′ ,b ′ n Is a learnable parameter, σ is a sigmoid function, and Σ is an element-by-element multiplication.
The biglu connects the outputs of the forward and reverse GRUs to form a composite output with a dimension of 2h (h is the hidden layer size). Specifically, the output of biglu is:
y t =[h t ;h ′ t ]
wherein, [; and represents a splicing operation.
S302, training and testing an initial BiLSTM model through a Chinese electronic medical record data set fused with the knowledge graph to obtain a trained BiLSTM model, and extracting a sequence vector containing character features of the knowledge graph from the Chinese electronic medical record data set through the trained BiLSTM model. BiLSTM is composed of two LSTM (Long Short-Term Memory) and processes the input data from the forward and reverse directions respectively, and finally combines their outputs to get the final output. The mathematical formula and derivation of BiLSTM follows. Let the input sequence be x 1 ,x 2 ,...,x T Wherein x is t Is a vector and T is the length of the sequence. The output of BiLSTM is y 1 ,y 2 ,...,y T Wherein y is t Is also a vector. BiLSTM model canExpressed as:
wherein,is to treat x from left to right 1 ,x 2 ,…,x t The resulting hidden state vector,/->Is to process x from right to left T ,x T-1 ,…,x t The resulting hidden state vector,/->Is an output vector obtained by splicing the two vectors together.
S303, training an initial GAT model through a sequence vector containing word features and a sequence vector containing character features and containing a knowledge graph to obtain a trained GAT model, and processing the sequence vector containing word features and the sequence vector containing character features through the trained GAT model to obtain a sequence vector containing context features. The method comprises the following steps:
dividing the sequence vectors containing the word features and the sequence vectors containing the character features and the knowledge graph obtained in S301 and S302 into a training set and a testing set, training and testing an initial GAT model to obtain a trained GAT model, and extracting the sequence vectors containing the word features and the sequence vectors containing the character features and the knowledge graph obtained in S301 and S302 through the trained GAT model to obtain the sequence vectors containing the context features.
The GAT (Graph Attention Network) model is a model based on a graph neural network, and can give different weights to the relationships between different nodes based on an attention mechanism, so that the characteristics of the nodes can be better extracted. The core of the GAT model is to model the relationships between nodes as a graph and apply the attention mechanism on the graph. Specifically, for each node i, the GAT model uses the feature vector h of its neighboring nodes j As input, and weighting and summing these vectors using an adaptive weight αij, resulting in a representation h 'of node i' i :
Wherein,is the neighbor set of node i, W is a learnable weight matrix, and σ is a nonlinear activation function.
Weight alpha of attention mechanism ij Is calculated based on the feature vectors of node i and node j, and specifically, the GAT model uses a multi-headed attention mechanism to calculate weights that will contain the sequence vector h of word features i And a sequence vector h containing character features including a knowledge graph j Respectively mapped to K dimensions and their similarity scores calculated
Wherein, leakyReLU is a ReLU function with a negative slope, and II represents vector concatenation operation, a k Is a learnable weight vector.
Then, the softmax function is used to determineScore normalization to obtain attention coefficient
Finally, weighting and summing the attention coefficient and the feature vector of the neighbor node to obtain the representation h of the node i i ′ :
Wherein W is k Is a learnable weight matrix corresponding to the kth attention head.
The GAT model has the advantage that it can adaptively model the relationships between nodes and can extract the effect of different relationships between nodes on the node representation. Meanwhile, the GAT model can process sparse graph structures and has high accuracy and high interpretability. In summary, the GAT model models the relationships between nodes through a multi-head attention mechanism, and performs weighted summation on the attention coefficients and feature vectors of neighboring nodes, thereby extracting feature representations of the nodes.
S304, training an initial CRF model according to the sequence vector containing the context characteristics to obtain a trained CRF model. The method comprises the following steps:
the sequence vector containing the contextual features is input into the CRF (Conditional Random Field) model, which is a probability map model for sequence labeling and structured prediction. The modeling method has strong modeling capability and flexibility by defining a conditional probability distribution to model the sequence or structure. In the CRF model, it is assumed that there is one input sequence x= (x) 1 ,x 2 ,Ω,x n ) And a corresponding output sequence y= (y) 1 ,y 2 ,…,y n ). The aim of the embodiments of the application is to learn a conditional probability distribution p (y|x), i.e. givenThe sequence x is input and the probability of occurrence of the sequence y is output. The CRF model regards the output sequence y as a markov random field, i.e. a graph structure in which each node corresponds to an output position and each edge corresponds to a dependency between adjacent output positions. Assuming that the state space of the output sequence y isThe CRF model can be expressed as:
wherein Z (x) is a normalization factor, and the sum of probability distribution is ensured to be 1; w (w) j Is a model parameter, f j Is a feature function representing an evaluation of the output sequence y in some respect. Characteristic function f j Is a real-valued function of the output sequence y, which can be expressed as:
f j (y i ,y i-1 ,x,i)={1,if the feature j is active at(y i ,y i-1 ,x,i)0,otherwise}
characteristic function f j Some pattern or law in the output sequence y is captured, typically based on some local or global observations, such as part of speech at the current location, tags at previous locations, etc. The most probable output sequence y for a given input sequence x is calculated using the viterbi algorithm. Specifically, the viterbi algorithm eventually finds the maximum probability and the corresponding optimal path for the entire sequence by recursively calculating the maximum probability and the corresponding path for each output position.
In step S4, the trained BiLSTM model, GAN model, biglu model, GAT and CRF model are integrated to obtain a recognition model of a named entity of the chinese electronic medical record, where the recognition model of the named entity of the chinese electronic medical record is used to recognize the named entity in the data of the chinese electronic medical record to be tested. The specific implementation process is as follows:
the structure diagram of the Chinese electronic medical record named entity recognition model is shown in fig. 7.
And integrating the BiLSTM model, the GAN model, the BiGRU model, the GAT model and the CRF model to obtain the Chinese electronic medical record naming entity identification model. It should be noted that the Chinese electronic medical record named entity recognition model can be used for multiple times after training.
Preprocessing the Chinese electronic medical record data to be predicted, inserting triplets in the Chinese electronic medical record knowledge graph corresponding to the Chinese electronic medical record data to be predicted into the Chinese electronic medical record data to be predicted, generating the Chinese electronic medical record data to be predicted of the fusion knowledge graph, and inputting the Chinese electronic medical record data to be predicted and the Chinese electronic medical record data to be predicted after preprocessing into a Chinese electronic medical record naming entity recognition model to obtain a recognition result.
The embodiment of the application also provides a Chinese electronic medical record named entity recognition system, which comprises:
the data acquisition module is used for acquiring and preprocessing a Chinese electronic medical record data set;
the knowledge graph embedding module is used for inserting triples corresponding to the preprocessed data in the Chinese electronic medical record data set in the Chinese electronic medical record knowledge graph into the original data to generate a Chinese electronic medical record data set fused with the knowledge graph;
the model training module is used for training an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph; respectively training an initial GAN model and a BiGRU model through the preprocessed Chinese electronic medical record data; training an initial GAT model through output data of the trained BiLSTM model, the GAN model and the BiGRU model; training an initial CRF model through the trained GAT model;
the integration module is used for integrating the trained BiLSTM model, the trained GAN model, the trained BiGRU model, the trained GAT model and the trained CRF model to obtain a Chinese electronic medical record named entity identification model, and the Chinese electronic medical record named entity identification model is used for identifying named entities in Chinese electronic medical record data to be tested.
It can be understood that the system for identifying a named entity of a chinese electronic medical record provided by the embodiment of the present application corresponds to the method for identifying a named entity of a chinese electronic medical record, and the explanation, the examples, the beneficial effects, etc. of the relevant content may refer to the corresponding content in the method for identifying a named entity of a chinese electronic medical record, which is not described herein again.
The embodiment of the application also provides a computer readable storage medium which stores a computer program for identifying the named entities of the Chinese electronic medical record, wherein the computer program enables a computer to execute the named entity identification method of the Chinese electronic medical record.
The embodiment of the application also provides electronic equipment, which comprises:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the chinese electronic medical record named entity recognition method as described above.
In summary, compared with the prior art, the method has the following beneficial effects:
1. according to the embodiment of the application, the Chinese electronic medical record knowledge graph is embedded into the input data of the Chinese electronic medical record named entity recognition model, and the problems of one-word polysemous, multiple-word polysemous, non-unified and normative vocabulary abbreviations and the like can be effectively solved through information enhancement of the knowledge graph, so that the text features can be extracted more specifically from the subsequent model, and the efficiency and the accuracy of Chinese electronic medical record named entity recognition are improved.
2. According to the embodiment of the application, the models of all the components in the Chinese electronic medical record named entity recognition model are separately trained and recombined, so that the parameter adjusting precision of the model in the training process can be effectively improved, the accuracy of Chinese electronic medical record named entity recognition can be further improved, and meanwhile, the training process of the model can be effectively accelerated.
3. In the embodiment of the application, the hidden vector obtained by adding the sentences with the knowledge patterns and the sequence vector obtained by embedding the original sentences with the models are added. The weight of the hidden vector with the knowledge graph on each original sentence hidden vector is calculated, and the hidden vector is multiplied with the original hidden vector after softmax normalization to obtain the final hidden representation. Through the step, sentences can have rich semantic information, and external knowledge is fused better.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.
Claims (10)
1. A Chinese electronic medical record named entity recognition method is characterized by comprising the following steps:
s1, acquiring a Chinese electronic medical record data set and preprocessing;
s2, inserting triples corresponding to the preprocessed data in the Chinese electronic medical record data set in the Chinese electronic medical record knowledge graph into the original data to generate a Chinese electronic medical record data set fused with the knowledge graph;
s3, training an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph; respectively training an initial GAN model and a BiGRU model through the preprocessed Chinese electronic medical record data; training an initial GAT model through output data of the trained BiLSTM model, the GAN model and the BiGRU model; training an initial CRF model through the trained GAT model;
s4, integrating the trained BiLSTM model, the trained GAN model, the trained BiGRU model, the trained GAT model and the trained CRF model to obtain a Chinese electronic medical record named entity recognition model, wherein the Chinese electronic medical record named entity recognition model is used for recognizing named entities in Chinese electronic medical record data to be detected.
2. The method for identifying a named entity of a chinese electronic medical record according to claim 1, wherein S2 comprises:
for a given sentence s= [ x ] 1 ,x 2 ,...,x n ]Find each word x therein i If the corresponding triples exist in the knowledge graph, inserting the triples at the corresponding positions; if word x i The representation form of the triplet in the knowledge graph is K= [ (x) i ,r i0 ,x i0 )...,(x i ,r ik ,x ik )]The original sentence is changed into a new sentence integrated into the triplet of the knowledge graph, and the new sentence is in the form of s= [ x ] 0 ,x 1 ,...,x i (r i0 ,x i0 ),...,(r ik ,x ik ),...,x n ]。
3. The method for identifying a named entity of a chinese electronic medical record according to claim 1 or 2, wherein S3 comprises:
s301, training and testing an initial GAN model through a Chinese electronic medical record data set to obtain a trained GAN model, and extracting sequence vectors containing character features from the Chinese electronic medical record data set through the trained GAN model; training and testing an initial BiGRU model through a Chinese electronic medical record data set to obtain a trained BiGRU model, and extracting sequence vectors containing word features in the Chinese electronic medical record data set through the trained BiGRU model; splicing the sequence vector containing the character features and the sequence vector containing the word features to obtain the sequence vector containing the word features;
s302, training and testing an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph to obtain a trained BiLSTM model, and extracting a sequence vector containing character features of the knowledge graph from the Chinese electronic medical record data set through the trained BiLSTM model;
s303, training an initial GAT model through a sequence vector containing word features and a sequence vector containing character features and containing a knowledge graph to obtain a trained GAT model, and processing the sequence vector containing the word features and the sequence vector containing the character features through the trained GAT model to obtain a sequence vector containing context features;
s304, training an initial CRF model according to the sequence vector containing the context characteristics to obtain a trained CRF model.
4. The method for identifying a named entity of a chinese electronic medical record as recited in claim 3, wherein said processing the sequence vector containing the word feature and the sequence vector containing the character feature through the trained GAT model to obtain the sequence vector containing the context feature comprises:
calculating weights using a multi-headed attention mechanism, and incorporating a sequence vector h of word features i And a sequence vector h containing character features including a knowledge graph j Respectively mapped to K dimensions and their similarity scores calculated
Wherein, leakyReLU is a ReLU function with a negative slope, and II represents a vectorA) splicing operation k Is a learnable weight vector;
normalizing the score using a softmax function to obtain the attention coefficient
Weighted summation is carried out on the attention coefficient and the feature vector of the neighbor node to obtain the representation h of the node i i ′ :
Wherein W is k Is a leachable weight matrix corresponding to the kth attention head, h i ′ I.e. a sequence vector containing contextual features.
5. The method for identifying a named entity of a chinese electronic medical record according to claim 1 or 2, wherein the model for identifying a named entity in chinese electronic medical record data to be tested comprises:
preprocessing the Chinese electronic medical record data to be predicted, inserting triplets in the Chinese electronic medical record knowledge graph corresponding to the Chinese electronic medical record data to be predicted into the Chinese electronic medical record data to be predicted, generating the Chinese electronic medical record data to be predicted of the fusion knowledge graph, and inputting the Chinese electronic medical record data to be predicted and the Chinese electronic medical record data to be predicted after preprocessing into a Chinese electronic medical record naming entity recognition model to obtain a recognition result.
6. A system for identifying a named entity of a chinese electronic medical record, comprising:
the data acquisition module is used for acquiring and preprocessing a Chinese electronic medical record data set;
the knowledge graph embedding module is used for inserting triples corresponding to the preprocessed data in the Chinese electronic medical record data set in the Chinese electronic medical record knowledge graph into the original data to generate a Chinese electronic medical record data set fused with the knowledge graph;
the model training module is used for training an initial BiLSTM model through a Chinese electronic medical record data set fused with a knowledge graph; respectively training an initial GAN model and a BiGRU model through the preprocessed Chinese electronic medical record data; training an initial GAT model through output data of the trained BiLSTM model, the GAN model and the BiGRU model; training an initial CRF model through the trained GAT model;
the integration module is used for integrating the trained BiLSTM model, the trained GAN model, the trained BiGRU model, the trained GAT model and the trained CRF model to obtain a Chinese electronic medical record named entity identification model, and the Chinese electronic medical record named entity identification model is used for identifying named entities in Chinese electronic medical record data to be tested.
7. The system for identifying a named entity of a chinese electronic medical record of claim 6, wherein inserting triples in a knowledge graph of the chinese electronic medical record corresponding to data in the preprocessed set of chinese electronic medical record data into the raw data, generating a set of chinese electronic medical record data that incorporates the knowledge graph comprises:
for a given sentence s= [ x ] 1 ,x 2 ,...,x n ]Find each word x therein i If the corresponding triples exist in the knowledge graph, inserting the triples at the corresponding positions; if word x i The representation form of the triplet in the knowledge graph is K= [ (x) i ,r i0 ,x i0 )...,(x i ,r ik ,x ik )]The original sentence is changed into a new sentence integrated into the triplet of the knowledge graph, and the new sentence is in the form of s= [ x ] 0 ,x 1 ,...,x i (r i0 ,x i0 ),...,(r ik ,x ik ),...,x n ]。
8. The chinese electronic medical record named entity recognition system of claim 6 or 7, wherein the model training module comprises:
the training unit of GAN and BiGRU is used for training and testing the initial GAN model through the Chinese electronic medical record data set to obtain a trained GAN model, and extracting sequence vectors containing character features in the Chinese electronic medical record data set through the trained GAN model; training and testing an initial BiGRU model through a Chinese electronic medical record data set to obtain a trained BiGRU model, and extracting sequence vectors containing word features in the Chinese electronic medical record data set through the trained BiGRU model; splicing the sequence vector containing the character features and the sequence vector containing the word features to obtain the sequence vector containing the word features;
the BiLSTM training unit is used for training and testing the initial BiLSTM model through a Chinese electronic medical record data set fused with the knowledge graph to obtain a trained BiLSTM model, and extracting a sequence vector containing character features of the knowledge graph from the Chinese electronic medical record data set through the trained BiLSTM model;
the GAT training unit is used for training an initial GAT model through the sequence vector containing the word features and the sequence vector containing the character features and comprising the knowledge graph to obtain a trained GAT model, and processing the sequence vector containing the word features and the sequence vector containing the character features through the trained GAT model to obtain the sequence vector containing the context features;
and the CRF training unit is used for training the initial CRF model according to the sequence vector containing the context characteristics to obtain a trained CRF model.
9. A computer-readable storage medium storing a computer program for identifying a named entity of a chinese electronic medical record, wherein the computer program causes a computer to perform the method of identifying a named entity of a chinese electronic medical record as claimed in any one of claims 1 to 5.
10. An electronic device, comprising:
one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the chinese electronic medical record named entity recognition method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310995207.6A CN117057350B (en) | 2023-08-07 | 2023-08-07 | Chinese electronic medical record named entity recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310995207.6A CN117057350B (en) | 2023-08-07 | 2023-08-07 | Chinese electronic medical record named entity recognition method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117057350A true CN117057350A (en) | 2023-11-14 |
CN117057350B CN117057350B (en) | 2024-05-10 |
Family
ID=88661906
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310995207.6A Active CN117057350B (en) | 2023-08-07 | 2023-08-07 | Chinese electronic medical record named entity recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117057350B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118095285A (en) * | 2024-03-18 | 2024-05-28 | 中国医学科学院医学信息研究所 | Electronic medical record named entity recognition method and system based on deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191453A (en) * | 2019-12-25 | 2020-05-22 | 中国电子科技集团公司第十五研究所 | Named entity recognition method based on confrontation training |
US20200342172A1 (en) * | 2019-04-26 | 2020-10-29 | Wangsu Science & Technology Co., Ltd. | Method and apparatus for tagging text based on adversarial learning |
CN114417874A (en) * | 2022-01-25 | 2022-04-29 | 湖南大学 | Chinese named entity recognition method and system based on graph attention network |
CN115130468A (en) * | 2022-05-06 | 2022-09-30 | 北京安智因生物技术有限公司 | Myocardial infarction entity recognition method based on word fusion representation and graph attention network |
CN115630649A (en) * | 2022-11-23 | 2023-01-20 | 南京邮电大学 | Medical Chinese named entity recognition method based on generative model |
CN115759099A (en) * | 2022-11-17 | 2023-03-07 | 深圳大学 | Chinese named entity recognition method and related equipment integrating knowledge graph embedding |
-
2023
- 2023-08-07 CN CN202310995207.6A patent/CN117057350B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200342172A1 (en) * | 2019-04-26 | 2020-10-29 | Wangsu Science & Technology Co., Ltd. | Method and apparatus for tagging text based on adversarial learning |
CN111191453A (en) * | 2019-12-25 | 2020-05-22 | 中国电子科技集团公司第十五研究所 | Named entity recognition method based on confrontation training |
CN114417874A (en) * | 2022-01-25 | 2022-04-29 | 湖南大学 | Chinese named entity recognition method and system based on graph attention network |
CN115130468A (en) * | 2022-05-06 | 2022-09-30 | 北京安智因生物技术有限公司 | Myocardial infarction entity recognition method based on word fusion representation and graph attention network |
CN115759099A (en) * | 2022-11-17 | 2023-03-07 | 深圳大学 | Chinese named entity recognition method and related equipment integrating knowledge graph embedding |
CN115630649A (en) * | 2022-11-23 | 2023-01-20 | 南京邮电大学 | Medical Chinese named entity recognition method based on generative model |
Non-Patent Citations (4)
Title |
---|
YUEKUN MA等: "A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures", COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 24 September 2022 (2022-09-24) * |
李源;马磊;邵党国;袁梅宇;张名芳;: "用于社交媒体的中文命名实体识别", 中文信息学报, no. 08, 15 August 2020 (2020-08-15) * |
谢腾;杨俊安;刘辉;: "基于BERT-BiLSTM-CRF模型的中文实体识别", 计算机系统应用, no. 07, 15 July 2020 (2020-07-15) * |
黄培馨;赵翔;方阳;朱慧明;肖卫东;: "融合对抗训练的端到端知识三元组联合抽取", 计算机研究与发展, no. 12, 15 December 2019 (2019-12-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118095285A (en) * | 2024-03-18 | 2024-05-28 | 中国医学科学院医学信息研究所 | Electronic medical record named entity recognition method and system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN117057350B (en) | 2024-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111984766B (en) | Missing semantic completion method and device | |
CN117076653B (en) | Knowledge base question-answering method based on thinking chain and visual lifting context learning | |
CN109800414B (en) | Method and system for recommending language correction | |
CN113987104B (en) | Generating type event extraction method based on ontology guidance | |
CN109214006B (en) | Natural language reasoning method for image enhanced hierarchical semantic representation | |
WO2023029502A1 (en) | Method and apparatus for constructing user portrait on the basis of inquiry session, device, and medium | |
CN112151183A (en) | Entity identification method of Chinese electronic medical record based on Lattice LSTM model | |
CN113177412A (en) | Named entity identification method and system based on bert, electronic equipment and storage medium | |
CN114818717B (en) | Chinese named entity recognition method and system integrating vocabulary and syntax information | |
CN113128203A (en) | Attention mechanism-based relationship extraction method, system, equipment and storage medium | |
CN112420151A (en) | Method, system, equipment and medium for structured analysis after ultrasonic report | |
CN117057350B (en) | Chinese electronic medical record named entity recognition method and system | |
Deng et al. | Self-attention-based BiGRU and capsule network for named entity recognition | |
WO2023116572A1 (en) | Word or sentence generation method and related device | |
CN110134950A (en) | A kind of text auto-collation that words combines | |
CN112349294A (en) | Voice processing method and device, computer readable medium and electronic equipment | |
CN115935959A (en) | Method for labeling low-resource glue word sequence | |
WO2022242074A1 (en) | Multi-feature fusion-based method for named entity recognition in chinese medical text | |
CN116842168B (en) | Cross-domain problem processing method and device, electronic equipment and storage medium | |
Göker et al. | Neural text normalization for turkish social media | |
CN113705207A (en) | Grammar error recognition method and device | |
CN116757195A (en) | Implicit emotion recognition method based on prompt learning | |
CN115391534A (en) | Text emotion reason identification method, system, equipment and storage medium | |
CN114582449A (en) | Electronic medical record named entity standardization method and system based on XLNet-BiGRU-CRF model | |
CN113971405A (en) | Medical named entity recognition system and method based on ALBERT model fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |