CN113361277A - Medical named entity recognition modeling method based on attention mechanism - Google Patents
Medical named entity recognition modeling method based on attention mechanism Download PDFInfo
- Publication number
- CN113361277A CN113361277A CN202110667423.9A CN202110667423A CN113361277A CN 113361277 A CN113361277 A CN 113361277A CN 202110667423 A CN202110667423 A CN 202110667423A CN 113361277 A CN113361277 A CN 113361277A
- Authority
- CN
- China
- Prior art keywords
- vector
- sentence
- medical
- sequence
- bgru
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a medical named entity recognition modeling method based on an attention mechanism. Firstly, converting each word in an input medical text statement into a word vector by using a vector representation technology, then obtaining rich context information in the medical text statement by using BGRU, then selecting the importance degree of the context semantic information by using an attention mechanism, and finally obtaining a global optimal solution of a medical entity label sequence by CRF to complete the identification of the medical named entity. The invention constructs a medical named entity recognition model based on an attention mechanism, and adopts a network framework of RNN + CRF. Wherein, the RNN part uses BGRU network, compares with BLSTM commonly used, and its structure is simpler, and training speed is faster, and the effect is also better. An attention mechanism is introduced on the basis of the network framework of the RNN + CRF to select the importance degree of the context information, so that the entity identification effect is improved.
Description
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a medical named entity recognition modeling method based on an attention mechanism.
Background
With the advance of medical informatization, the medical field has accumulated a huge amount of unstructured text data, which contains a large amount of valuable information. How to extract effective information from the medical texts and store and manage the effective information to construct a large-scale and high-quality medical knowledge map has great significance for the development of medical informatization, and is also a research hotspot in the field of natural language processing. Named entity recognition is one of the core tasks of medical text structured information extraction, and aims to identify entities with specific meanings from unstructured text.
Traditional named entity recognition methods mainly include rule-based, dictionary-based, and machine learning-based methods. The rule and dictionary based method requires a domain expert to manually write a rule template, or realizes the identification of an entity by means of character string matching with the help of a domain dictionary. The named entity recognition is regarded as a sequence labeling problem based on a machine learning method, so that the named entity is recognized by using targeted feature engineering and a proper machine learning model, and the commonly used machine learning model comprises maximum entropy, a Support Vector Machine (SVM), a Conditional Random Field (CRF) and the like. Although all the methods achieve certain effects, the methods need to rely on experts in the medical field to establish medical rules or medical dictionaries or rely on manually designed features to train machine learning models, which not only takes a lot of time and effort, but also limits the recognition effect to the quality of the manually designed rules, dictionaries or features. In recent years, with the development of deep learning, neural network-based methods have been applied to entity recognition tasks, and many research results have been obtained. The method does not depend on artificial design features, and all related features are automatically learned by a neural network.
At present, the network framework of 'RNN + CRF' combining a cyclic neural network and a conditional random field is a mainstream model in a named entity recognition task. Due to the particularity of the medical field, entities in the medical text have the characteristics of strong medical specialization, a large number of abbreviations and the like, and the required medical entities can be extracted more accurately and effectively only by depending on context information with high association degree and strong dependency. However, a simple "RNN + CRF" network framework can only learn the context information of a sentence from the RNN model, and cannot select the importance of the context information.
Disclosure of Invention
The invention provides a medical named entity recognition modeling method based on an attention mechanism, aiming at the problem that an entity in a medical text has poor entity recognition effect caused by strong medical specialization, a large number of abbreviations and the like. Firstly, converting each word in an input medical text statement into a word vector by using a vector representation technology, then obtaining rich context information in the medical text statement by using BGRU, then selecting the importance degree of the context semantic information by using an attention mechanism, and finally obtaining a global optimal solution of a medical entity label sequence by CRF to complete the identification of the medical named entity.
The medical named entity recognition modeling method based on the attention mechanism comprises the following steps:
step 1: vectorizing the medical text statement sequence X to obtain an input feature vector W, which specifically comprises the following steps:
the sentence sequence X with the length of n is equal to (X)1,x2,...,xn) Word x iniInto a low-dimensional dense real-valued vector wiWord vectors of words are embedded by words in a matrix WcharIs represented by a vector code ofcharIs | V | × d, where | V | is a fixed-size input word table and d is the dimension of the word vector; wherein i belongs to [1, 2];
Representing an input feature vector of a medical text sentence as W ═ W (W ═ W1,w2,...,wn);
Step 2: learning the context information of the medical text sentence from the input feature vector W by using a bidirectional gate control loop unit network BGRU to obtain a sentence vector H, which specifically comprises the following steps:
the BGRU obtains the state output of the hidden layer from the upper information and the lower information of the medical text sentence from the input feature vector W by a forward GRU network and a backward GRU network respectivelyAnd
whereinAndrespectively representing hidden layer state output of a forward GRU network and a backward GRU network at the time t, wherein t belongs to [1, 2];
BGRU splices hidden layer state outputs of forward GRU network and backward GRU network to obtain sentence vector H ═ (H)1,h2,...,hn) Wherein the hidden layer state output of BGRU at time t is:
and step 3: selecting the importance degree of the context information in the sentence vector H by using an attention mechanism to obtain a feature vector M of the sentence, wherein the method specifically comprises the following steps:
performing attention weight calculation on the sentence vector H to obtain an attention weight vector a:
a=softmax(watanh(H));
wherein waIs the weight vector to be learned, tanh (-) is the hyperbolic tangent function;
and the sentence vector H carries out weighted summation according to the attention weight vector a to obtain a feature vector M of the sentence:
M=aH;
and 4, step 4: decoding the feature vector M by using a conditional random field CRF to obtain a final output sequence Y of the input statement X*The method specifically comprises the following steps:
the feature vector M of the sentence obtained is (M)1,m2,...,mn) Calculating the conditional probability of the possible output label sequence Y:
P(Y|M)=CRF(M,Y);
wherein Y ∈ Yx,YxRepresents all possible output tag sequences of the input sequence X;
finally, the output label sequence Y with the maximum conditional probability is used*As the final output sequence of input statement X:
Y*=argmaxP(Y|M)。
the invention constructs a medical named entity recognition model based on an attention mechanism, and adopts a network framework of RNN + CRF. Wherein, the RNN part uses BGRU network, compares with BLSTM commonly used, and its structure is simpler, and training speed is faster, and the effect is also better. An attention mechanism is introduced on the basis of the network framework of the RNN + CRF to select the importance degree of the context information, so that the entity identification effect is improved.
Drawings
FIG. 1 is a structural diagram of a medical named entity recognition model based on an attention mechanism.
Detailed Description
The specific implementation steps are as follows:
step 1: vectorizing the medical text statement by using a vector representation technology to obtain an input feature vector:
the medical text sentence sequence X with the length of n is equal to (X)1,x2,...,xn) Word x iniInto a low-dimensional dense real-valued vector wiWord vectors of words are embedded by words in a matrix WcharIs represented by a vector code ofcharIs | V | × d, where | V | is a fixed-size input word table and d is the dimension of the word vector; wherein i belongs to [1, 2];
Thus, the input feature vector of the medical text sentence may be expressed as W ═ (W ═ W1w2,...,wn)。
Step 2: and learning the context information of the medical text sentence from the input feature vector by using a bidirectional gating circulation unit network BGRU to obtain a sentence vector.
For named entity recognition, this sequence tagging problem is suitably learned using LSTM to solve the problem of dependency on sequence data. GRU as LSTM variant can learn sequence data dependence well, solve RNN gradient disappearance problem, and has simpler structure, faster training speed and better effect than LSTM. Therefore, here, the input sequence data is processed using the GRU.
The BGRU learns the context information of the text sentence by combining a forward GRU network and a backward GRU network; the forward GRU network and the backward GRU network control information flow by setting an update gate z and a reset gate r, so as to realize the update, the accept and the reject and the storage of historical information; wherein, the information flow of the forward GRU network comprises the input information w of the current time ttAnd hidden layer state output h of GRU at previous momentt-1;
Updating the door z at time ttAnd a reset gate rtThe calculation method is as follows:
zt=σ(Wwzwt+Whzht-1+bz);
rt=σ(Wwrwt+Whrht-1+br);
where σ (-) stands for sigmoid functionNumber, Wwz、WhzTo update the weight matrix to be learned in the gate, bzTo update the offset vector of the gate, Wwr、WhrTo reset the weight matrix to be learned in the gate, brA bias vector to reset the gate;
then, use the reset gate rtObtaining candidate information of GRU hidden layer at current time tThe calculation method is as follows:
wherein tanh (-) represents a hyperbolic tangent function,representing the Hadamard (Hadamard) product, Wwh、WhhWeight matrix to be learned in the candidate information of the hidden layer at the current moment, bhA bias vector in the candidate information of the hidden layer at the current moment is obtained;
finally, with the refresh door ztRespectively carrying out Hadamard multiplication with hidden layer state output of GRU at the previous moment and candidate information of the hidden layer at the current moment to obtain hidden layer state output of GRU at the current moment
The forward GRU network is used for learning the above information of the medical text sentence, and the backward GRU network is used for learning the below information of the medical text sentence, and the information flow of the backward GRU network comprises the input information w of the current time ttAnd hidden layer state output h of GRU at the later timet+1The calculation mode is the same as that of the forward GRU network;
BGRU is prepared by mixingAnd splicing hidden layer state outputs of the GRU network and the backward GRU network to obtain a sentence vector H ═ H1,h2,...,hn) Wherein at time t, the hidden layer output of the BGRU is:
whereinAndrespectively representing hidden layer state output of the forward GRU network and the backward GRU network at the time t.
And step 3: and selecting the importance degree of the context information in the sentence vector by using an attention mechanism to obtain the feature vector of the sentence.
Context dependence information of medical text sentences can be learned relatively comprehensively by using the BGRU, so that current characters can be effectively recognized. However, each context information is not of the same importance for identifying the current character. Therefore, by using an attention mechanism behind the BGRU, the attention of the context information with higher relevance and stronger dependence on the current character can be enhanced, and the attention of the context information with lower relevance and weaker dependence is weakened, so that the recognition effect of the entity is improved.
Specifically, the attention weight calculation is performed on the sentence vector H output by the BGRU network in step 2, resulting in an attention weight vector a:
a=softmax(watanh(H)),
wherein waIs the weight vector to be learned, tanh (-) is the hyperbolic tangent function;
and (3) carrying out weighted summation on the sentence vector H output by the BGRU network according to the attention weight vector a to obtain a feature vector M of the sentence:
M=aH。
and 4, step 4: and performing joint decoding on the prediction labels by using the CRF to obtain a final output sequence of the input statement.
Named entity recognition is a sequence tagging problem, in which each tag in a predicted tag sequence is not predicted independently, and it needs to be combined with previous and next tags to predict more accurately. For example, for an entity composed of multiple characters, the labels of each character are consistent with the labels of the entity category. If the label predicts one of the characters independently, this information is not available, which may lead to prediction errors. Although the context information of the current character can be well learned by the BGRU, the prediction of the label is independent, and the label bias problem is generated. Therefore, the attention added BGRU is followed by a CRF to carry out joint decoding on the label sequence, so that the label sequence at the current position can be well predicted according to the labels at the front and rear positions, and the globally optimal label sequence can be obtained.
Specifically, the sentence obtained in step 3 has a feature vector M ═ M (M)1,m2,...,mn) And calculating the conditional probability of the possible output label sequence Y in the following way:
S(M,Y)=∑i,kλktk(yi-1,yi,m,i)+∑i,lμlsl(yi,m,i)
wherein t iskAnd slAre all characteristic functions, tkIs a state feature function for extracting the features of the state sequence, its state y at the current momentiDependent on the state y at the previous momenti-1The influence of (a); stIs a transfer feature function for extracting the features of the observation sequence, its state y at the current momentiWill be observed by the current moment miThe influence of (c). The characteristic function can only take a value of 0 or 1, and takes a value of 1 when a certain characteristic is satisfied, and takes a value of 0 when the certain characteristic is not satisfied. Lambda [ alpha ]k、μlThe weights of the two characteristic functions are used for measuring the importance degree of the current characteristic. Y isXRepresents all possible output tag sequences of the input sequence X;
finally, the output label sequence Y with the maximum conditional probability is used*As the final output sequence of input statement X:
Y*=argmaxP(Y|M)。
to verify the effectiveness of the present invention, the present invention evaluates task one at CHIP 2020: a comparison experiment is carried out on a data set identified by the named entity of the Chinese medical text and 3 named entity identification models in the medical field, wherein the 3 comparison models are as follows:
(1) CRF: and after converting the input medical text into a word vector, performing entity identification by adopting a CRF model.
(2) BLSTM-CRF: the context-dependent information is learned from the input feature vectors using BLSTM, and then entity identification is performed using CRF joint decoding.
(3) BLSTM-CRF: an attention mechanism is introduced on the basis of BLSTM-CRF, and the importance degree of the context information is selected by adding the attention mechanism after BLSTM.
The evaluation indexes adopt precision (P), Recall (R) and F1 (F1-score), wherein the precision represents the proportion of the quantity of a certain medical entity type which is correctly predicted to the quantity of the medical entity type, the Recall represents the proportion of the quantity of the certain medical entity type which is correctly predicted to the quantity of the medical entity type which is truly predicted to the quantity of the medical entity type, and the F1 is a harmonic mean of the precision and the Recall, and simultaneously takes into account the two contradictory indexes of the precision and the Recall, thereby being capable of carrying out more comprehensive and overall evaluation on the performance of the model.
Let J ═ J1,j2,...,jn) For the real label set of the medical entity, the label set of the medical entity predicted by the medical named entity recognition model of the invention is set as K ═ K (K1,k2,...,km). Any one element in the two sets represents oneEach element of the medical entity comprises four contents of a sentence sequence number, an entity starting position, an entity ending position and an entity type. The sentence sequence number represents the number of the sentence in which the entity is located in the data set, the start position of the entity represents the position of the first character of the entity in the sentence, the end position of the entity represents the position of the last character of the entity in the sentence, and the entity type is the type of the medical entity. For any two elements j in two setsi、klTwo elements are equivalent if and only if their sentence numbers, start positions of the entities, end positions of the entities, and types of the entities are the same.
Based on this, the accuracy, recall and F1 values were calculated as follows:
where # represents the intersection of sets, i.e., the element that both sets share;
the results of the experiment are shown in table 1:
TABLE 1 results of the experiment
Experimental results show that the BGRU-att-CRF model provided by the invention has better effect than all comparison models, and has the best effect on accuracy, recall rate and F1 value.
Claims (1)
1. The medical named entity recognition modeling method based on the attention mechanism is characterized by comprising the following steps:
step 1: vectorizing the medical text statement sequence X to obtain an input feature vector W, which specifically comprises the following steps:
the sentence sequence X with the length of n is equal to (X)1,x2,...,xn) Word x iniInto a low-dimensional dense real-valued vector wiWord vectors of words are embedded by words in a matrix WcharIs represented by a vector code ofcharIs | V | × d, where | V | is a fixed-size input word table and d is the dimension of the word vector; wherein i belongs to [1, 2];
Representing an input feature vector of a medical text sentence as W ═ W (W ═ W1,w2,...,wn);
Step 2: learning the context information of the medical text sentence from the input feature vector W by using a bidirectional gate control loop unit network BGRU to obtain a sentence vector H, which specifically comprises the following steps:
the BGRU obtains the state output of the hidden layer from the upper information and the lower information of the medical text sentence from the input feature vector W by a forward GRU network and a backward GRU network respectivelyAnd
whereinAndrespectively representing hidden layer state output of a forward GRU network and a backward GRU network at the time t, wherein t belongs to [1, 2];
BGRU splices hidden layer state outputs of forward GRU network and backward GRU network to obtain sentence vector H ═ (H)1,h2,...,hn) Wherein the hidden layer state output of BGRU at time t is:
and step 3: selecting the importance degree of the context information in the sentence vector H by using an attention mechanism to obtain a feature vector M of the sentence, wherein the method specifically comprises the following steps:
performing attention weight calculation on the sentence vector H to obtain an attention weight vector a:
a=softmax(wataah(H));
wherein waIs the weight vector to be learned, tanh (-) is the hyperbolic tangent function;
and the sentence vector H carries out weighted summation according to the attention weight vector a to obtain a feature vector M of the sentence:
M=aH;
and 4, step 4: decoding the feature vector M by using a conditional random field CRF to obtain a final output sequence Y of the input statement X*The method specifically comprises the following steps:
the feature vector M of the sentence obtained is (M)1,m2,...,mn) Calculating the conditional probability of the possible output label sequence Y:
P(Y|M)=CRF(M,Y);
wherein Y ∈ YX,YXRepresents all possible output tag sequences of the input sequence X;
finally, the output label sequence Y with the maximum conditional probability is used*As the final output sequence of input statement X:
Y*=argmaxP(Y|M)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110667423.9A CN113361277A (en) | 2021-06-16 | 2021-06-16 | Medical named entity recognition modeling method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110667423.9A CN113361277A (en) | 2021-06-16 | 2021-06-16 | Medical named entity recognition modeling method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113361277A true CN113361277A (en) | 2021-09-07 |
Family
ID=77534506
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110667423.9A Pending CN113361277A (en) | 2021-06-16 | 2021-06-16 | Medical named entity recognition modeling method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113361277A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115630649A (en) * | 2022-11-23 | 2023-01-20 | 南京邮电大学 | Medical Chinese named entity recognition method based on generative model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284361A (en) * | 2018-09-29 | 2019-01-29 | 深圳追科技有限公司 | A kind of entity abstracting method and system based on deep learning |
CN112733541A (en) * | 2021-01-06 | 2021-04-30 | 重庆邮电大学 | Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism |
-
2021
- 2021-06-16 CN CN202110667423.9A patent/CN113361277A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284361A (en) * | 2018-09-29 | 2019-01-29 | 深圳追科技有限公司 | A kind of entity abstracting method and system based on deep learning |
CN112733541A (en) * | 2021-01-06 | 2021-04-30 | 重庆邮电大学 | Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115630649A (en) * | 2022-11-23 | 2023-01-20 | 南京邮电大学 | Medical Chinese named entity recognition method based on generative model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108628823B (en) | Named entity recognition method combining attention mechanism and multi-task collaborative training | |
CN109657239B (en) | Chinese named entity recognition method based on attention mechanism and language model learning | |
CN110502749B (en) | Text relation extraction method based on double-layer attention mechanism and bidirectional GRU | |
CN109800437B (en) | Named entity recognition method based on feature fusion | |
CN108897989B (en) | Biological event extraction method based on candidate event element attention mechanism | |
CN111858931B (en) | Text generation method based on deep learning | |
CN110609891A (en) | Visual dialog generation method based on context awareness graph neural network | |
CN113190656B (en) | Chinese named entity extraction method based on multi-annotation frame and fusion features | |
CN111967266A (en) | Chinese named entity recognition model and construction method and application thereof | |
CN112487820B (en) | Chinese medical named entity recognition method | |
CN111738007B (en) | Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network | |
CN111666758B (en) | Chinese word segmentation method, training device and computer readable storage medium | |
CN110765775A (en) | Self-adaptive method for named entity recognition field fusing semantics and label differences | |
CN109284361A (en) | A kind of entity abstracting method and system based on deep learning | |
CN113095415A (en) | Cross-modal hashing method and system based on multi-modal attention mechanism | |
CN111400494B (en) | Emotion analysis method based on GCN-Attention | |
CN111881256B (en) | Text entity relation extraction method and device and computer readable storage medium equipment | |
CN109214006A (en) | The natural language inference method that the hierarchical semantic of image enhancement indicates | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN110162789A (en) | A kind of vocabulary sign method and device based on the Chinese phonetic alphabet | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
CN113360667B (en) | Biomedical trigger word detection and named entity identification method based on multi-task learning | |
CN112699685A (en) | Named entity recognition method based on label-guided word fusion | |
CN117371523A (en) | Education knowledge graph construction method and system based on man-machine hybrid enhancement | |
CN114841151A (en) | Medical text entity relation joint extraction method based on decomposition-recombination strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210907 |
|
WD01 | Invention patent application deemed withdrawn after publication |