CN112487820B - Chinese medical named entity recognition method - Google Patents

Chinese medical named entity recognition method Download PDF

Info

Publication number
CN112487820B
CN112487820B CN202110157254.4A CN202110157254A CN112487820B CN 112487820 B CN112487820 B CN 112487820B CN 202110157254 A CN202110157254 A CN 202110157254A CN 112487820 B CN112487820 B CN 112487820B
Authority
CN
China
Prior art keywords
word
vector
medical
text
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110157254.4A
Other languages
Chinese (zh)
Other versions
CN112487820A (en
Inventor
司逸晨
管有庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110157254.4A priority Critical patent/CN112487820B/en
Publication of CN112487820A publication Critical patent/CN112487820A/en
Application granted granted Critical
Publication of CN112487820B publication Critical patent/CN112487820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese medical named entity recognition method, which comprises the steps of generating a characteristic vector of each word in a medical text through a language preprocessing model based on an attention mechanism, generating a final label sequence through a medical entity recognition model based on a bidirectional gated cyclic network, recognizing a medical named entity according to the label sequence, generating a word vector for enhancing semantics in advance before entity recognition through the language preprocessing model based on the attention mechanism, and adding a multi-head attention layer in the medical entity recognition model to extract multiple semantics of the word in the medical text.

Description

Chinese medical named entity recognition method
Technical Field
The invention relates to a medical named entity recognition method, and belongs to the technical field of named entity recognition in natural language processing.
Background
Natural language processing is a popular research direction in recent years, and aims to allow computing mechanisms to solve human languages and perform effective interaction. Named entity recognition technology is a very important technology in natural language processing, and aims to recognize entities with specific meanings including names of people, places, organizations, proper nouns and the like in sentences. The named entity recognition task can be divided into named entity recognition in the general field and named entity recognition in the specific field, such as the financial field, the medical field, the military field and the like.
Early medical field named entity recognition mainly used dictionary and rule based methods, and named entities were mainly recognized by manually built medical field dictionaries and customized recognition rules. Later, machine learning methods based on statistical learning were applied to medical named entity recognition models, where more is the use of conditional random field models. In recent years, with the great increase of hardware computing power, a deep neural network-based method has been widely applied to medical named entity recognition, wherein the most common method is to use a combined model of a bidirectional long-short term memory network and a conditional random field.
Disclosure of Invention
The purpose of the invention is as follows: in order to solve the problems that the professionality of named entities in medical texts is high, the entities are nested with each other and a word is ambiguous in the prior art, the invention provides a Chinese medical named entity recognition method. The medical field lacks high-quality labeled data, the long-term and short-term memory network model has more parameters and the training time is longer, so the invention uses the bidirectional gated circulation network to replace a bidirectional long-term and short-term memory neural network so as to improve the speed of entity identification.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a Chinese medical named entity recognition method comprises a language preprocessing model based on an attention mechanism and a medical entity recognition model. In the language preprocessing model, an attention mechanism is introduced, so that the generated word vectors can learn long-distance dependency relationships among characters, semantic features of the word vectors are enhanced, for example, for texts containing Chinese medical information, such as electronic medical records, prescriptions, physical examination reports and the like, the texts are firstly segmented into characters, and then the word vectors of each character are generated through the language preprocessing model based on the attention mechanism. In the medical entity recognition model, a bidirectional gated cyclic network is used for replacing a bidirectional long-short term memory network to improve the model training speed, a multi-head attention layer is added to further extract multiple semantic information of words, the accuracy of medical named entity recognition is improved, finally, a conditional random field is used for generating a final label sequence, and the medical named entities in the text are recognized according to the label sequence. The Chinese medical named entity recognition method is mainly applied to medical information extraction and has important application value in multiple fields of Chinese medical robots, Chinese medical knowledge maps and the like. The traditional named entity recognition method is generally based on a two-way long-short term memory network and a conditional random field, the two-way long-short term memory network cannot process data in parallel, the training speed is low, and simultaneously, the method lacks of a good coping scheme for the problems of strong entity speciality, mutual nesting of entities and the like existing in a Chinese medical text, so that the invention improves the training speed by using the two-way gating cycle network to replace the two-way long-short term memory network, trains characters through a language preprocessing model based on an attention mechanism and generates character vectors, enhances the semantic representation of the characters, adds a multi-head attention layer behind the two-way gating cycle network layer of the medical entity recognition model, further excavates the local characteristics of the medical text and the multiple semantic information of the characters, and improves the accuracy and the recognition efficiency of the Chinese medical named entity recognition, the method specifically comprises the following steps:
step 1, performing character-level segmentation on the medical text for training to obtain segmentation characters of the medical text for training. And performing character-level segmentation on the medical text for identification to obtain medical text segmentation characters for identification.
And 2, labeling the segmentation characters of the medical text for training to obtain a labeled medical text for training, wherein the starting characters of the medical named entities are labeled as 'B', the non-starting characters of the medical named entities are labeled as 'I', and the characters which are not the entities are labeled as 'O'.
And 3, training the language preprocessing model based on the attention mechanism by using the labeled medical text for training obtained in the step 2 to obtain the trained language preprocessing model based on the attention mechanism. The language preprocessing model based on the attention mechanism comprises a word embedding layer, a position vector embedding layer and an attention mechanism layer which are connected in sequence.
And 3.1, sending the marked medical text for training obtained in the step 2 into a word embedding layer of a language preprocessing model based on an attention mechanism in sentence units. The word embedding layer generates a word vector for each word using a word skip model. The skip word model predicts the surrounding words using a middle word, the first word in the text sequence for a medical text of length L
Figure 626399DEST_PATH_IMAGE001
The words are expressed as
Figure 619763DEST_PATH_IMAGE002
Maximizing the probability that a given random center word generates all its background words:
Figure 967567DEST_PATH_IMAGE003
(1)
wherein,
Figure 173421DEST_PATH_IMAGE004
indicating that the probability is calculated starting from the first word in the text,
Figure 555860DEST_PATH_IMAGE005
meaning that for each central word all distances from it do not exceed
Figure 720126DEST_PATH_IMAGE006
The probability of occurrence of the background word of (2),
Figure 165013DEST_PATH_IMAGE006
the size of the window is indicated and,
Figure 564771DEST_PATH_IMAGE007
is shown in
Figure 677083DEST_PATH_IMAGE008
Is a central word which is a Chinese character,
Figure 746670DEST_PATH_IMAGE006
for window size, its respective background word
Figure 567602DEST_PATH_IMAGE009
Equation (1) is equivalent to minimizing the first loss function:
Figure 380838DEST_PATH_IMAGE010
(2)
wherein,
Figure 613236DEST_PATH_IMAGE011
a logarithmic loss function is represented.
Suppose a central word
Figure 712779DEST_PATH_IMAGE008
In the text, index is
Figure 663417DEST_PATH_IMAGE001
Background word
Figure 280344DEST_PATH_IMAGE012
In the text, index is
Figure 491882DEST_PATH_IMAGE013
The conditional probability of a given center word in the first penalty function generating a background word is normalized by a normalizing exponential function
Figure 903272DEST_PATH_IMAGE014
Comprises the following steps:
Figure 934682DEST_PATH_IMAGE015
(3)
wherein,
Figure 89720DEST_PATH_IMAGE016
the representation index is
Figure 296710DEST_PATH_IMAGE001
The vector of the center word of (a),
Figure 239520DEST_PATH_IMAGE017
the representation index is
Figure 899172DEST_PATH_IMAGE013
The vector of the background word of (a),
Figure 592321DEST_PATH_IMAGE018
representing the transpose of the background word vector,
Figure 778452DEST_PATH_IMAGE019
representing a dot-product of two vectors,
Figure 531644DEST_PATH_IMAGE020
representing a text pair
Figure 678592DEST_PATH_IMAGE021
Each character in
Figure 300066DEST_PATH_IMAGE022
The dot-product is performed and,
Figure 950490DEST_PATH_IMAGE023
an exponential function based on a natural constant e is shown. Solving for center word vector in the above equation using stochastic gradient descent
Figure 140163DEST_PATH_IMAGE024
Gradient (2):
Figure 899041DEST_PATH_IMAGE025
(4)
iteratively training an attention-based language pre-processing model using equation (4) until a first loss function value is less than a first threshold value
Figure 933993DEST_PATH_IMAGE026
After training, any index in the medical text is
Figure 704503DEST_PATH_IMAGE001
All get its vector as the center word
Figure 422667DEST_PATH_IMAGE024
And 3.2, transmitting the word vector generated by the word embedding layer to a position vector embedding layer, using the position vector to represent the position relation of each character by the position vector embedding layer, and superposing the word vector and the position vector to obtain a new feature vector of the word. The position vector calculation formula is shown in formula (5) and formula (6):
Figure 544206DEST_PATH_IMAGE027
(5)
Figure 382849DEST_PATH_IMAGE028
(6)
wherein,
Figure 132500DEST_PATH_IMAGE029
is a two-dimensional matrix, the number of columns of the matrix is the same as the dimension of the word vector generated before,
Figure 663975DEST_PATH_IMAGE029
the row of (a) represents each word, the column represents the position vector of each word in each dimension, and the total number of columns is equal to the total dimension of the word vector.
Figure 7232DEST_PATH_IMAGE030
Is the total dimension of the position vector,
Figure 39779DEST_PATH_IMAGE031
the specific dimensions of the vector are represented,
Figure 519302DEST_PATH_IMAGE032
the representation index is
Figure 221678DEST_PATH_IMAGE001
The value of the position vector of the word in the even dimension is calculated using a sine function.
Figure 176865DEST_PATH_IMAGE033
The representation index is
Figure 91731DEST_PATH_IMAGE001
The value of the position vector of the word in odd dimensions is calculated using a cosine function. Finally, the position vector and the word vector are added to obtain a new feature vector of the word, as shown in formula (7):
Figure 956919DEST_PATH_IMAGE034
(7)
wherein,
Figure 456296DEST_PATH_IMAGE035
the representation index is
Figure 508565DEST_PATH_IMAGE001
The position vector of the word of (a),
Figure 492702DEST_PATH_IMAGE024
indicates that any index is
Figure 71451DEST_PATH_IMAGE001
The word of (a) is used as the word vector of the central word,
Figure 850051DEST_PATH_IMAGE036
representing a new feature vector with embedded position information.
And 3.3, learning the long-distance dependency relationship among the characters by using an attention mechanism, so that the character vector contains information of all other characters in the sentence. And the output of the attention mechanism layer is a final generated word vector, and further the training of the language preprocessing model based on the attention mechanism is completed.
And 4, training the medical entity recognition model by using the labeled medical text for training obtained in the step 2 to obtain the trained medical entity recognition model, wherein the medical entity recognition model comprises a bidirectional gating circulation network layer, a multi-head attention layer and a conditional random field layer which are sequentially connected.
And 4.1, performing bidirectional coding on the word vector by using a bidirectional gated cyclic network layer, wherein the bidirectional gated cyclic network layer comprises a forward gated cyclic network layer and a reverse gated cyclic network layer, the forward gated cyclic network layer learns the postamble characteristics, and the reverse gated cyclic network layer learns the preamble characteristics, so that the generated vector can better capture the contextual semantic information and learn the context. The gated loop network layer is only composed of an update gate and a reset gate, wherein the update gate determines the amount of information that is passed to the future in the past, the reset gate determines the amount of forgetting of the past information, and the gated loop network layer is calculated as shown in formula (10) -formula (13):
Figure 779830DEST_PATH_IMAGE037
(10)
Figure 567657DEST_PATH_IMAGE038
(11)
Figure 876279DEST_PATH_IMAGE039
(12)
Figure 215993DEST_PATH_IMAGE040
(13)
wherein,
Figure 977276DEST_PATH_IMAGE041
for renewing the door
Figure 568794DEST_PATH_IMAGE042
The output state at the time of day is,
Figure 620670DEST_PATH_IMAGE043
to reset the gate
Figure 741073DEST_PATH_IMAGE042
The output state at the time of day is,
Figure 520810DEST_PATH_IMAGE044
in the form of a candidate state, the state,
Figure 775074DEST_PATH_IMAGE045
to represent
Figure 792709DEST_PATH_IMAGE042
The output state of the network at the moment,
Figure 615171DEST_PATH_IMAGE046
indicating the state of the input at the current time,
Figure 475680DEST_PATH_IMAGE047
representing the hidden state of the gated-loop network node output at the last time,
Figure 143422DEST_PATH_IMAGE048
to represent
Figure 405776DEST_PATH_IMAGE049
The function of the function is that of the function,
Figure 133560DEST_PATH_IMAGE050
representing an excitation function
Figure 356731DEST_PATH_IMAGE051
Updating doors for training
Figure 719842DEST_PATH_IMAGE052
The weight parameter of (a) is determined,
Figure 977648DEST_PATH_IMAGE053
resetting a door for training
Figure 610754DEST_PATH_IMAGE054
The weight parameter of (a) is determined,
Figure 711434DEST_PATH_IMAGE055
to calculate candidate states
Figure 720979DEST_PATH_IMAGE044
The weight parameter used.
Figure 833291DEST_PATH_IMAGE056
Indicating that the two vectors are connected. Updating door
Figure 293091DEST_PATH_IMAGE041
For controlling the output state of the network at the present moment
Figure 490855DEST_PATH_IMAGE045
How much history state to keep
Figure 304090DEST_PATH_IMAGE047
Resetting door
Figure 661122DEST_PATH_IMAGE043
Has the effect of determining the candidate state
Figure 901610DEST_PATH_IMAGE044
Hidden state of last time gate control circulation network node output
Figure 852249DEST_PATH_IMAGE047
The degree of dependence of (c).
Step 4.2, a multi-head attention layer is used for further extracting multiple semantics: a multi-head attention layer essentially means performing more than two attention head operations for a network layer through bidirectional gated loops
Figure 826765DEST_PATH_IMAGE042
Output state of time of day network
Figure 648090DEST_PATH_IMAGE057
First, a single-shot attention calculation is performed by equation (16):
Figure 59480DEST_PATH_IMAGE058
(16)
wherein,
Figure 356469DEST_PATH_IMAGE059
to represent
Figure 511507DEST_PATH_IMAGE060
The result of the individual attention-head calculations,
Figure 452918DEST_PATH_IMAGE060
indicates that there is a
Figure 425422DEST_PATH_IMAGE060
The attention of the individual is focused on the head,
Figure 819494DEST_PATH_IMAGE061
to generate the weight parameters of the query vector,
Figure 778223DEST_PATH_IMAGE062
in order to generate the weight parameters of the key vectors,
Figure 698775DEST_PATH_IMAGE063
in order to generate the weight parameters of the value vector,
Figure 717546DEST_PATH_IMAGE064
is composed of
Figure 959434DEST_PATH_IMAGE065
The adjustment of the dimension is a smooth term,
Figure 721853DEST_PATH_IMAGE066
to normalize the exponential function, and finally, concatenate this
Figure 637857DEST_PATH_IMAGE060
The secondary calculation result is subjected to linear transformation to obtain the result of each time
Figure 686584DEST_PATH_IMAGE042
Circulating network layers by bidirectional gating
Figure 55249DEST_PATH_IMAGE042
Output state of time of day network
Figure 355780DEST_PATH_IMAGE057
The result of the multi-head attention calculation is shown in formula (17):
Figure 250924DEST_PATH_IMAGE067
(17)
wherein,
Figure 345919DEST_PATH_IMAGE068
showing the results of the calculation of the multi-head attention layer,
Figure 201879DEST_PATH_IMAGE069
is a weight parameter;
step 4.3, obtaining an optimal label sequence by using the conditional random field layer: for input sentences
Figure 430735DEST_PATH_IMAGE070
Sentence tag sequence
Figure 55752DEST_PATH_IMAGE071
The scoring of (A) is as follows:
Figure 587227DEST_PATH_IMAGE072
(18)
wherein,
Figure 553653DEST_PATH_IMAGE073
a scoring function representing the input sentence x generates a sequence of labels y,
Figure 930408DEST_PATH_IMAGE074
in order to be the length of the sequence,
Figure 675510DEST_PATH_IMAGE075
in order to shift the scoring matrix, the score matrix,
Figure 502520DEST_PATH_IMAGE076
representing by a label
Figure 67494DEST_PATH_IMAGE077
Transfer to label
Figure 513519DEST_PATH_IMAGE078
The score of the transition of (a) is,
Figure 503340DEST_PATH_IMAGE079
and
Figure 111039DEST_PATH_IMAGE080
the start and end tags in the presentation sentence,
Figure 163309DEST_PATH_IMAGE081
is shown as
Figure 272079DEST_PATH_IMAGE082
The words are marked as
Figure 991773DEST_PATH_IMAGE077
The probability of (c). Normalized to obtain
Figure 35953DEST_PATH_IMAGE083
Maximum probability of tag sequence, as in equation (19):
Figure 201617DEST_PATH_IMAGE084
(19)
wherein,
Figure 723865DEST_PATH_IMAGE085
which represents the sequence of the actual tag(s),
Figure 298066DEST_PATH_IMAGE086
representing the set of all possible tag sequences.
Solving a minimized second loss function of the medical entity identification model using maximum likelihood estimation, as in equation (20):
Figure 372201DEST_PATH_IMAGE087
(20)
wherein,
Figure 399063DEST_PATH_IMAGE088
expressing the second loss function value, and iteratively training the medical entity recognition model until the second loss function value
Figure 849636DEST_PATH_IMAGE088
Less than a second threshold
Figure 543923DEST_PATH_IMAGE089
And then, obtaining a global optimal sequence by utilizing a Viterbi algorithm, wherein the global optimal sequence is a labeling result of the final medical field named entity identification.
Finally, the medical named entities in the text are identified according to the tag sequence. Wherein if the character is marked as (B), it represents that it is the first character of the medical named entity, if the character is marked as (I), it represents that it is the non-beginning part of the medical named entity, and if the character is marked as (O), it represents that it is not the medical named entity.
And 5, during recognition, importing the medical text segmentation characters for recognition into a trained language preprocessing model based on an attention mechanism to generate word vectors. And importing the obtained generated word vector into a trained medical entity recognition model to recognize the medical named entity in the text.
Preferably: in step 3.3, the calculation formula of the gravity mechanism is shown as a formula (8):
Figure 664325DEST_PATH_IMAGE090
(8)
wherein,
Figure 303117DEST_PATH_IMAGE091
the score of attention is shown as a score,
Figure 432747DEST_PATH_IMAGE092
a representation of the query vector is provided,
Figure 715961DEST_PATH_IMAGE093
a key vector is represented by a vector of keys,
Figure 896013DEST_PATH_IMAGE094
a vector of values is represented that is,
Figure 897467DEST_PATH_IMAGE064
represents the square root of the dimension of the key vector,
Figure 565209DEST_PATH_IMAGE014
the function is a normalized exponential function.
Preferably: normalized exponential function softmax function:
Figure 93142DEST_PATH_IMAGE095
(9)
wherein,
Figure 820927DEST_PATH_IMAGE096
an array of data is represented,
Figure 44098DEST_PATH_IMAGE097
representing arrays
Figure 905743DEST_PATH_IMAGE096
To (1)
Figure 163549DEST_PATH_IMAGE098
The number of the elements is one,
Figure 796656DEST_PATH_IMAGE099
the value of (a) is an array
Figure 897336DEST_PATH_IMAGE096
To middle
Figure 172460DEST_PATH_IMAGE098
The ratio of the index of an element to the sum of the indices of all other elements.
Preferably: step 4.1
Figure 19193DEST_PATH_IMAGE049
The function value field is (-1, 1), and the expression is shown in formula (14):
Figure 980458DEST_PATH_IMAGE100
(14)
wherein,
Figure 178221DEST_PATH_IMAGE101
representing the input to the function.
Preferably: in step 4.1, the value domain of the excitation function is (-1, 1), and the expression is shown in formula (15):
Figure 991456DEST_PATH_IMAGE102
(15)
preferably: in step 4.3, the global optimal sequence is obtained by using the viterbi algorithm, as shown in formula (21):
Figure 82909DEST_PATH_IMAGE103
(21)
wherein,
Figure 323398DEST_PATH_IMAGE104
the sequence of tags in the set that maximizes the score function.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of preprocessing a text by using a language preprocessing model based on an attention mechanism and generating a corresponding word vector, bidirectionally encoding the word vector by using a bidirectional gating circulation network layer, further acquiring local features of the text and multiple semantics of an entity by using a multi-head attention layer, finally generating a final label sequence by using a conditional random field layer, and identifying a medical named entity in the text according to the label sequence, so that the problems of inaccurate identification and low identification speed of the Chinese medical named entity are solved. Semantic representation of words is enhanced by generating a word vector containing positional features of the words and associations between characters for each word in a medical text by an attention-based language pre-processing model. In the medical entity recognition model, a bidirectional gate control circulation network is used for replacing a bidirectional long-term and short-term memory network, the training overhead is reduced to a certain extent, the model training efficiency is improved, a multi-head attention layer is added, the local features of medical texts and the multiple semantics of characters are further learned, and the accuracy of medical named entity recognition is improved.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a language pre-processing model framework based on an attention mechanism.
FIG. 3 is a medical entity recognition model framework.
FIG. 4 is a schematic diagram of a gated loop network.
Detailed Description
The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.
A Chinese medical named entity recognition method includes the steps of firstly, using a medical text to perform segmentation and marking processing, training a language preprocessing model, then sending the medical text to be recognized into the trained language preprocessing model to generate word vectors for enhancing semantics, then using the trained medical entity recognition model to generate a label sequence according to the word vectors, and finally recognizing a medical named entity according to the label sequence, wherein the method specifically includes the following steps:
step 1, performing character-level segmentation on the medical text for training to obtain segmentation characters of the medical text for training. If the medical text is 'no obvious fracture', the characters 'no', 'see', 'clear', 'obvious', 'bone', 'broken' are obtained after the segmentation. And performing character-level segmentation on the medical text for identification to obtain medical text segmentation characters for identification. If the input text is 'continuously heated for four days', the processed text is 'continuously', 'hair', 'hot', 'four', 'day'.
And 2, labeling the segmentation characters of the medical text for training to obtain a labeled medical text for training, wherein the starting characters of the medical named entities are labeled as 'B', the non-starting characters of the medical named entities are labeled as 'I', and the characters which are not the entities are labeled as 'O'. If for the medical text ' no obvious fracture ' is seen ', the final labeling sequence is ' no (O) ', ' see (O) ', ' clear (O) ', ' bone (B) ', or ' fold (I) ', wherein the ' BIO ' label is used to distinguish the medical named entity in preparation for subsequent training of the medical entity recognition model.
And 3, training the language preprocessing model based on the attention mechanism by using the labeled medical text for training obtained in the step 2 to obtain the trained language preprocessing model based on the attention mechanism. As shown in fig. 2, the language preprocessing model based on the attention mechanism includes a word embedding layer, a position vector embedding layer and an attention mechanism layer which are connected in sequence, for the segmented text, firstly, a word vector is generated by the word embedding layer using a word skipping model, then, the position vector embedding layer learns the position information of each character by adding the position vector, and finally, the attention mechanism layer learns the relation between each character and all other characters, thereby strengthening the semantic representation of the character.
And 3.1, sending the marked medical text for training obtained in the step 2 into a word embedding layer of a language preprocessing model based on an attention mechanism in sentence units. The word embedding layer generates a word vector for each word using a word skip model. The word skipping model uses oneThe middle word predicts the words around it, for the medical text of length L, the first word in the text sequence
Figure 8457DEST_PATH_IMAGE001
The words are expressed as
Figure 484437DEST_PATH_IMAGE008
Maximizing the probability that a given random center word generates all its background words:
Figure 571342DEST_PATH_IMAGE003
(1)
wherein,
Figure 248311DEST_PATH_IMAGE004
indicating that the probability is calculated starting from the first word in the text,
Figure 279721DEST_PATH_IMAGE005
meaning that for each center word the probability of occurrence of all background words not more than m away from it is calculated,
Figure 700338DEST_PATH_IMAGE006
representing the window size, the distance between the generated background word and the central word being not greater than
Figure 641749DEST_PATH_IMAGE006
Figure 870647DEST_PATH_IMAGE007
Is shown in
Figure 530298DEST_PATH_IMAGE008
Is a central word which is a Chinese character,
Figure 489027DEST_PATH_IMAGE006
for window size, its respective background word
Figure 409579DEST_PATH_IMAGE009
Probability of occurrence ofEquation (1) is equivalent to minimizing the first loss function:
Figure 162771DEST_PATH_IMAGE010
(2)
wherein,
Figure 168773DEST_PATH_IMAGE011
a logarithmic loss function is represented.
Suppose a central word
Figure 665613DEST_PATH_IMAGE008
In the text, index is
Figure 440671DEST_PATH_IMAGE001
Background word
Figure 630344DEST_PATH_IMAGE012
In the text, index is
Figure 999009DEST_PATH_IMAGE013
The conditional probability of a given center word in the first penalty function generating a background word is normalized by a normalizing exponential function
Figure 660059DEST_PATH_IMAGE014
Normalization is as follows:
Figure 696149DEST_PATH_IMAGE015
(3)
wherein,
Figure 56723DEST_PATH_IMAGE016
the representation index is
Figure 912683DEST_PATH_IMAGE001
The vector of the center word of (a),
Figure 875960DEST_PATH_IMAGE017
the representation index is
Figure 766556DEST_PATH_IMAGE013
The vector of the background word of (a),
Figure 157086DEST_PATH_IMAGE018
representing the transpose of the background word vector,
Figure 500342DEST_PATH_IMAGE019
representing a dot-product of two vectors,
Figure 142676DEST_PATH_IMAGE020
representing a text pair
Figure 746833DEST_PATH_IMAGE021
Each character in
Figure 449210DEST_PATH_IMAGE022
The dot-product is performed and,
Figure 279763DEST_PATH_IMAGE023
an exponential function based on a natural constant e is shown. Solving for center word vector in the above equation using stochastic gradient descent
Figure 83377DEST_PATH_IMAGE024
Gradient (2):
Figure 682986DEST_PATH_IMAGE025
(4)
iteratively training an attention-based language pre-processing model using equation (4) until a first loss function value is less than a first threshold value
Figure 290685DEST_PATH_IMAGE026
First threshold value
Figure 733167DEST_PATH_IMAGE026
For a preset constant, after training is finished, any index in the medical text is
Figure 982883DEST_PATH_IMAGE001
All get its vector as the center word
Figure 171419DEST_PATH_IMAGE024
Use of
Figure 340232DEST_PATH_IMAGE024
As the final output vector of the word embedding layer.
And 3.2, transmitting the word vector generated by the word embedding layer to a position vector embedding layer, using the position vector to represent the position relation of each character by the position vector embedding layer, and superposing the word vector and the position vector to obtain a new feature vector of the word. The position vector calculation formula is shown in formula (5) and formula (6):
Figure 145377DEST_PATH_IMAGE027
(5)
Figure 667625DEST_PATH_IMAGE028
(6)
wherein,
Figure 366460DEST_PATH_IMAGE029
is a two-dimensional matrix, the number of columns of the matrix is the same as the dimension of the word vector generated before,
Figure 581540DEST_PATH_IMAGE029
the row of (a) represents each word, the column represents the position vector of each word in each dimension, and the total number of columns is equal to the total dimension of the word vector.
Figure 608402DEST_PATH_IMAGE001
Indicating an index of words in the medical text,
Figure 560440DEST_PATH_IMAGE030
is the total dimension of the position vector,
Figure 989147DEST_PATH_IMAGE031
the specific dimensions of the vector are represented,
Figure 375129DEST_PATH_IMAGE032
the representation index is
Figure 13921DEST_PATH_IMAGE001
The value of the position vector of the word in the even dimension is calculated using a sine function.
Figure 877972DEST_PATH_IMAGE033
The representation index is
Figure 426765DEST_PATH_IMAGE001
The value of the position vector of the word in odd dimensions is calculated using a cosine function. Finally, the position vector and the word vector are added to obtain a new feature vector of the word, as shown in formula (7):
Figure 842703DEST_PATH_IMAGE034
(7)
wherein,
Figure 844157DEST_PATH_IMAGE035
the representation index is
Figure 777478DEST_PATH_IMAGE001
The position vector of the word of (a),
Figure 39832DEST_PATH_IMAGE024
indicates that any index is
Figure 767616DEST_PATH_IMAGE001
The word of (a) is used as the word vector of the central word,
Figure 256367DEST_PATH_IMAGE036
representing a new feature vector with embedded position information. Embedding position vectors in word vectorsThe purpose is to prepare for subsequent attention calculations. If the attention calculation is carried out on one word in the medical text and the other two words in the text which have the same content but different positions, the same attention calculation result can be obtained if a position vector is not embedded to represent the difference, but the association degree of the word and the two words is different, so that the position vector must be used to represent the position relation of each character.
And 3.3, learning the long-distance dependency relationship among the characters by using an attention mechanism, so that the character vector contains information of all other characters in the sentence. The word vector generated by the word embedding layer uses the background word to predict the central word, and the dependency relationship of long-distance characters cannot be learned. Adding a mechanism of attention can make the word vector learn the dependency of all other characters in the sentence. The specific calculation formula of the attention mechanism is shown as formula (8):
the attention mechanism calculation formula is shown in formula (8):
Figure 85389DEST_PATH_IMAGE090
(8)
wherein,
Figure 343195DEST_PATH_IMAGE091
a function of the attention-scoring is represented,
Figure 507460DEST_PATH_IMAGE092
a representation of the query vector is provided,
Figure 342561DEST_PATH_IMAGE093
a key vector is represented by a vector of keys,
Figure 352105DEST_PATH_IMAGE094
a vector of values is represented that is,
Figure 729997DEST_PATH_IMAGE092
Figure 658639DEST_PATH_IMAGE093
Figure 794271DEST_PATH_IMAGE094
obtained by multiplying the word vectors with the corresponding weight matrix.
Figure 26669DEST_PATH_IMAGE064
The square root, which represents the dimension of the key vector, is used to prevent the multiplication result from being too large,
Figure 267157DEST_PATH_IMAGE014
the function is a normalized exponential function, and a specific mathematical expression of the function is shown as the formula (9):
Figure 578315DEST_PATH_IMAGE095
(9)
wherein,
Figure 195241DEST_PATH_IMAGE096
an array of data is represented,
Figure 16567DEST_PATH_IMAGE097
representing arrays
Figure 818170DEST_PATH_IMAGE096
To (1)
Figure 724946DEST_PATH_IMAGE098
The number of the elements is one,
Figure 879984DEST_PATH_IMAGE099
the value of (a) is an array
Figure 946029DEST_PATH_IMAGE096
To middle
Figure 793899DEST_PATH_IMAGE098
The ratio of the index of an element to the sum of the indices of all other elements.
And the output of the attention mechanism layer is the word vector finally generated by the language preprocessing model.
Figure 453550DEST_PATH_IMAGE106
The function is to score and normalize all the characters in the text, with the score for each character being a positive value and the sum being 1. Equation (8) is thus essentially a weighted sum of the vectors of values for each character in the text,
Figure 271334DEST_PATH_IMAGE107
the value of (d) is the weight coefficient of the corresponding value vector. And the output of the attention mechanism layer is a final generated word vector, and further the training of the language preprocessing model based on the attention mechanism is completed. The finally generated word vector contains the position information of the word and the dependency relationship of each other character in the sentence, thereby enhancing the semantic meaning of the word and improving the accuracy of the medical entity recognition model.
And 4, training the medical entity recognition model by using the labeled medical text for training obtained in the step 2 to obtain the trained medical entity recognition model, wherein the medical entity recognition model comprises a bidirectional gating cycle network layer, a multi-head attention layer and a conditional random field layer which are sequentially connected as shown in fig. 3. The medical text firstly generates a corresponding word vector through a trained language preprocessing model. The bidirectional gating circulation network layer is composed of two layers of gating circulation networks, bidirectional coding is carried out on the word vectors, and context relations are fully learned. Output of multi-headed attention layer to bidirectional gated cyclic network layer
Figure 67251DEST_PATH_IMAGE108
And performing attention operation for many times, further learning local features of the medical text and multiple semantics of the words, finally generating a final label sequence by using the conditional random field layer, and identifying the medical named entity according to the label sequence.
Step 4.1, bidirectional gating circulation network layer is used for bidirectional coding of word vectors to fully learn the context relationship, the named entities in the medical field are complex in structure, the subsequences of the entities may also be entities, such as 'splenectomy' and 'spleen', and meanwhile, the characters have strong relevance before and afterThe relationships of the word context are fully considered when training using neural networks. The traditional named entity recognition model usually uses a bidirectional long and short term memory network for coding, but the long and short term memory network has more parameters and slower training speed. The bidirectional gated cyclic network layer comprises a forward gated cyclic network layer and a reverse gated cyclic network layer, the forward gated cyclic network layer learns postamble characteristics, and the reverse gated cyclic network layer learns foreamble characteristics, so that the generated vector can better capture contextual semantic information and learn context. The gated loop network is a variant of a long-short term memory network and only consists of an updating gate and a resetting gate, wherein the updating gate determines the amount of information which is transmitted to the future in the past, and the resetting gate determines the forgetting amount of the past information. The specific structure of the gated loop network is shown in fig. 4, in which,
Figure 86023DEST_PATH_IMAGE109
a weighting operation of the representation vector is performed,
Figure 590560DEST_PATH_IMAGE110
the specific calculation structure of the dot multiplication algorithm for representing the number and the matrix is shown as formula (10) -formula (13):
Figure 352980DEST_PATH_IMAGE037
(10)
Figure 268983DEST_PATH_IMAGE038
(11)
Figure 52132DEST_PATH_IMAGE039
(12)
Figure 686375DEST_PATH_IMAGE040
(13)
wherein,
Figure 986907DEST_PATH_IMAGE041
for renewing the door
Figure 147629DEST_PATH_IMAGE042
The output state at the time of day is,
Figure 977045DEST_PATH_IMAGE043
to reset the gate
Figure 98585DEST_PATH_IMAGE042
The output state at the time of day is,
Figure 327441DEST_PATH_IMAGE044
in the form of a candidate state, the state,
Figure 686878DEST_PATH_IMAGE045
to represent
Figure 218354DEST_PATH_IMAGE042
The output state of the network at the moment,
Figure 187709DEST_PATH_IMAGE046
indicating the state of the input at the current time,
Figure 830043DEST_PATH_IMAGE047
representing the hidden state of the gated-loop network node output at the last time,
Figure 575145DEST_PATH_IMAGE048
to represent
Figure 136576DEST_PATH_IMAGE049
The function of the function is that of the function,
Figure 701550DEST_PATH_IMAGE050
representing an excitation function
Figure 272208DEST_PATH_IMAGE051
Updating doors for training
Figure 871817DEST_PATH_IMAGE052
The weight parameter of (a) is determined,
Figure 479516DEST_PATH_IMAGE053
resetting a door for training
Figure 921999DEST_PATH_IMAGE054
The weight parameter of (a) is determined,
Figure 906135DEST_PATH_IMAGE055
to calculate candidate states
Figure 360250DEST_PATH_IMAGE044
The weight parameter used in the time-of-day,
Figure 27599DEST_PATH_IMAGE056
indicating that the two vectors are connected. Updating door
Figure 567164DEST_PATH_IMAGE041
For controlling the output state of the network at the present moment
Figure 354992DEST_PATH_IMAGE045
How much history state to keep
Figure 53826DEST_PATH_IMAGE047
Resetting door
Figure 3328DEST_PATH_IMAGE043
Has the effect of determining the candidate state
Figure 889244DEST_PATH_IMAGE044
Hidden state of last time gate control circulation network node output
Figure 480762DEST_PATH_IMAGE047
The degree of dependence of (c).
Figure 643891DEST_PATH_IMAGE111
The function value field is (-1, 1), and the expression is shown in formula (14):
Figure 154506DEST_PATH_IMAGE100
(14)
wherein,
Figure 668664DEST_PATH_IMAGE101
representing the input to the function.
The value field of the excitation function is (-1, 1), and the expression is shown in formula (15):
Figure 798294DEST_PATH_IMAGE102
(15)
step 4.2, further extracting multiple semantic meanings by using a multi-head attention layer: the medical text has a word ambiguity phenomenon, so a multi-head attention layer is added behind a bidirectional gating circulation network to further learn the dependency relationship of entities and capture the multiple semantics of words. The head attention layer essentially performs a number of attention operations for the network layer through two-way gated loops
Figure 707607DEST_PATH_IMAGE042
Output state of time of day network
Figure 530069DEST_PATH_IMAGE057
First, a single-shot attention calculation is performed by equation (16):
Figure 265944DEST_PATH_IMAGE112
(16)
wherein,
Figure 323899DEST_PATH_IMAGE059
to represent
Figure 196040DEST_PATH_IMAGE060
The result of the individual attention-head calculations,
Figure 189404DEST_PATH_IMAGE060
indicates that there is a
Figure 802788DEST_PATH_IMAGE060
Attention head, i.e. total calculation
Figure 274220DEST_PATH_IMAGE060
Next, the process of the present invention,
Figure 266447DEST_PATH_IMAGE061
to generate the weight parameters of the query vector,
Figure 555346DEST_PATH_IMAGE062
in order to generate the weight parameters of the key vectors,
Figure 265813DEST_PATH_IMAGE063
in order to generate the weight parameters of the value vector,
Figure 275357DEST_PATH_IMAGE064
is composed of
Figure 276418DEST_PATH_IMAGE065
The adjustment of the dimension is a smooth term, prevents the vector product from being too large,
Figure 80426DEST_PATH_IMAGE113
to normalize the exponential function, and finally, concatenate this
Figure 543768DEST_PATH_IMAGE060
The secondary calculation result is subjected to linear transformation to obtain the result of each time
Figure 481637DEST_PATH_IMAGE042
Circulating network layers by bidirectional gating
Figure 448456DEST_PATH_IMAGE042
Output state of time of day network
Figure 688945DEST_PATH_IMAGE057
The result of the multi-head attention calculation is shown in formula (17):
Figure 764217DEST_PATH_IMAGE114
(17)
wherein,
Figure 381143DEST_PATH_IMAGE068
showing the results of the calculation of the multi-head attention layer,
Figure 202469DEST_PATH_IMAGE060
indicates that there is a
Figure 4071DEST_PATH_IMAGE060
The attention of the individual is focused on the head,
Figure 176427DEST_PATH_IMAGE069
is a weight parameter. The multi-head attention layer expands the capability of the medical entity recognition model to focus on different positions, so that multiple semantics of words in the medical text are further extracted.
Step 4.3, obtaining an optimal label sequence by using the conditional random field layer: in the medical named entity recognition model, the bidirectional gated loop network layer can only obtain word vectors containing further context information, and the dependency relationship between tags cannot be considered even if a multi-head attention layer is added, for example, a tag (I) must be behind a tag (B). Therefore, the invention adopts the conditional random field layer to consider the adjacent relation between the labels to obtain the globally optimal label sequence. A conditional random field model is a classical discriminative probabilistic undirected graph model that is often applied in sequence labeling tasks for input sentences
Figure 65885DEST_PATH_IMAGE070
Sentence tag sequence
Figure 898974DEST_PATH_IMAGE115
The scoring of (A) is as follows:
Figure 481265DEST_PATH_IMAGE072
(18)
wherein,
Figure 875338DEST_PATH_IMAGE116
a scoring function representing the input sentence x generates a sequence of labels y,
Figure 958700DEST_PATH_IMAGE074
in order to be the length of the sequence,
Figure 754618DEST_PATH_IMAGE075
in order to shift the scoring matrix, the score matrix,
Figure 773389DEST_PATH_IMAGE117
representing by a label
Figure 779392DEST_PATH_IMAGE077
Transfer to label
Figure 276232DEST_PATH_IMAGE078
The score of the transition of (a) is,
Figure 192235DEST_PATH_IMAGE079
and
Figure 240963DEST_PATH_IMAGE080
representing the start and end tags in the sentence, which are only temporarily added at the time of computation,
Figure 875207DEST_PATH_IMAGE081
is shown as
Figure 175738DEST_PATH_IMAGE082
The words are marked as
Figure 303837DEST_PATH_IMAGE077
The probability of (c). Normalized to obtain
Figure 664412DEST_PATH_IMAGE083
Maximum probability of tag sequence, as in equation (19):
Figure 645006DEST_PATH_IMAGE084
(19)
wherein,
Figure 483649DEST_PATH_IMAGE085
which represents the sequence of the actual tag(s),
Figure 374245DEST_PATH_IMAGE086
representing the set of all possible tag sequences.
Solving a minimized second loss function of the medical entity identification model using maximum likelihood estimation, as in equation (20):
Figure 905720DEST_PATH_IMAGE087
(20)
wherein,
Figure 108031DEST_PATH_IMAGE088
expressing the second loss function value, and iteratively training the medical entity recognition model until the second loss function value
Figure 15944DEST_PATH_IMAGE088
Less than a second threshold
Figure 620101DEST_PATH_IMAGE118
Second threshold value
Figure 322478DEST_PATH_IMAGE118
For the preset constant, then, the Viterbi algorithm is used to obtain the global optimum sequenceThe column is the labeling result of the final medical field named entity identification, as shown in formula (21):
Figure 153031DEST_PATH_IMAGE103
(21)
wherein,
Figure 333476DEST_PATH_IMAGE104
the sequence of tags in the set that maximizes the score function.
Finally, the medical named entities in the text are identified according to the tag sequence. Wherein if the character is marked as (B), it represents that it is the first character of the medical named entity, if the character is marked as (I), it represents that it is the non-beginning part of the medical named entity, and if the character is marked as (O), it represents that it is not the medical named entity. If the input text is ' continuously heated for four days ', the final labels are ' continuously (O) ', ' continuously (B) ', ' hot (I) ', ' four (O) ' ' day (O) ' ' and the medically named entity is identified as ' heated ' according to the label.
And 5, during recognition, importing the medical text segmentation characters for recognition into a trained language preprocessing model based on an attention mechanism to generate word vectors. And importing the obtained generated word vector into a trained medical entity recognition model to recognize the medical named entity in the text.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (4)

1. A Chinese medical named entity recognition method is characterized by comprising the following steps:
step 1, performing character-level segmentation on a medical text for training to obtain segmentation characters of the medical text for training; performing character level segmentation on the medical text for identification to obtain medical text segmentation characters for identification;
step 2, labeling the segmentation characters of the medical text for training to obtain a labeled medical text for training, wherein the starting characters of the medical named entities are labeled as 'B', the non-starting characters of the medical named entities are labeled as 'I', and the characters which are not entities are labeled as 'O';
step 3, training the language preprocessing model based on the attention mechanism by using the labeled medical text for training obtained in the step 2 to obtain a trained language preprocessing model based on the attention mechanism; the language preprocessing model based on the attention mechanism comprises a word embedding layer, a position vector embedding layer and an attention mechanism layer which are sequentially connected;
step 3.1, sending the marked medical text for training obtained in the step 2 into a word embedding layer of a language preprocessing model based on an attention mechanism by taking a sentence as a unit; the word embedding layer generates a word vector of each word by using a word skipping model; the skip word model predicts the surrounding words using a middle word, the first word in the text sequence for a medical text of length L
Figure 710600DEST_PATH_IMAGE001
The words are expressed as
Figure 652011DEST_PATH_IMAGE002
Maximizing the probability that a given random center word generates all its background words:
Figure 765460DEST_PATH_IMAGE003
(1)
wherein,
Figure 159533DEST_PATH_IMAGE004
indicating that the probability is calculated starting from the first word in the text,
Figure 587103DEST_PATH_IMAGE005
meaning that for each central word all distances from it do not exceed
Figure 383021DEST_PATH_IMAGE006
The probability of occurrence of the background word of (2),
Figure 932951DEST_PATH_IMAGE006
the size of the window is indicated and,
Figure 814319DEST_PATH_IMAGE007
is shown in
Figure 780001DEST_PATH_IMAGE002
Is a central word which is a Chinese character,
Figure 430425DEST_PATH_IMAGE006
for window size, its respective background word
Figure 354519DEST_PATH_IMAGE008
Equation (1) is equivalent to minimizing the first loss function:
Figure 519921DEST_PATH_IMAGE009
(2)
wherein,
Figure 289294DEST_PATH_IMAGE010
representing a logarithmic loss function;
suppose a central word
Figure 794224DEST_PATH_IMAGE002
In the text, index is
Figure 685957DEST_PATH_IMAGE001
Background word
Figure 541918DEST_PATH_IMAGE011
In the text, index is
Figure 849402DEST_PATH_IMAGE012
The conditional probability of a given center word in the first penalty function generating a background word is normalized by a normalizing exponential function
Figure 474419DEST_PATH_IMAGE013
Comprises the following steps:
Figure 537053DEST_PATH_IMAGE014
(3)
wherein,
Figure 614730DEST_PATH_IMAGE015
the representation index is
Figure 991485DEST_PATH_IMAGE001
The vector of the center word of (a),
Figure 471008DEST_PATH_IMAGE016
the representation index is
Figure 704543DEST_PATH_IMAGE012
The vector of the background word of (a),
Figure 3937DEST_PATH_IMAGE017
representing the transpose of the background word vector,
Figure 918803DEST_PATH_IMAGE018
representing a dot-product of two vectors,
Figure 518412DEST_PATH_IMAGE019
an exponential function with a natural constant e as a base number is represented; solving for center word vector in the above equation using stochastic gradient descent
Figure 922849DEST_PATH_IMAGE020
Gradient (2):
Figure 975118DEST_PATH_IMAGE021
(4)
iteratively training an attention-based language pre-processing model using equation (4) until a first loss function value is less than a first threshold value
Figure 428096DEST_PATH_IMAGE022
After training, any index in the medical text is
Figure 882211DEST_PATH_IMAGE001
All get its vector as the center word
Figure 457549DEST_PATH_IMAGE020
Step 3.2, the word vector generated by the word embedding layer is sent to a position vector embedding layer, the position vector embedding layer uses the position vector to represent the position relation of each character, and the word vector and the position vector are superposed to obtain a new feature vector of the word; the position vector calculation formula is shown in formula (5) and formula (6):
Figure 997115DEST_PATH_IMAGE023
(5)
Figure 253784DEST_PATH_IMAGE024
(6)
wherein,
Figure 562406DEST_PATH_IMAGE025
is a two-dimensional matrix, the number of columns of the matrix is the same as the dimension of the word vector generated before,
Figure 308645DEST_PATH_IMAGE025
the column represents the position vector of each word in each dimension, and the total column number is equal to the total dimension of the word vector;
Figure 69927DEST_PATH_IMAGE026
is the total dimension of the position vector,
Figure 130287DEST_PATH_IMAGE027
the specific dimensions of the vector are represented,
Figure 293415DEST_PATH_IMAGE028
the representation index is
Figure 210556DEST_PATH_IMAGE001
The value of the position vector of the word in even dimension is calculated by using a sine function;
Figure 718854DEST_PATH_IMAGE029
the representation index is
Figure 317326DEST_PATH_IMAGE001
The value of the position vector of the word in odd dimensionality is calculated by using a cosine function; finally, the position vector and the word vector are added to obtain a new feature vector of the word, as shown in formula (7):
Figure 334960DEST_PATH_IMAGE030
(7)
wherein,
Figure 688581DEST_PATH_IMAGE031
the representation index is
Figure 424456DEST_PATH_IMAGE001
The position vector of the word of (a),
Figure 561039DEST_PATH_IMAGE020
indicates that any index is
Figure 698760DEST_PATH_IMAGE001
The word of (a) is used as the word vector of the central word,
Figure 426544DEST_PATH_IMAGE032
representing a new feature vector in which the position information is embedded;
3.3, learning the long-distance dependency relationship among the characters by using an attention mechanism, so that the character vector contains information of all other characters in the sentence; the output of the attention mechanism layer is a final generated word vector, and then training of a language preprocessing model based on the attention mechanism is completed;
the attention mechanism calculation formula is shown in formula (8):
Figure 180874DEST_PATH_IMAGE033
(8)
wherein,
Figure 652306DEST_PATH_IMAGE034
the score of attention is shown as a score,
Figure 378954DEST_PATH_IMAGE035
a representation of the query vector is provided,
Figure 12061DEST_PATH_IMAGE036
a key vector is represented by a vector of keys,
Figure 519265DEST_PATH_IMAGE037
a vector of values is represented that is,
Figure 528810DEST_PATH_IMAGE038
represents the square root of the dimension of the key vector,
Figure 109964DEST_PATH_IMAGE013
the function is a normalized exponential function;
normalized exponential function softmax function:
Figure 179551DEST_PATH_IMAGE039
(9)
wherein,
Figure 908472DEST_PATH_IMAGE040
an array of data is represented,
Figure 721708DEST_PATH_IMAGE041
representing arrays
Figure 422947DEST_PATH_IMAGE040
To (1)
Figure 397857DEST_PATH_IMAGE042
The number of the elements is one,
Figure 879654DEST_PATH_IMAGE043
the value of (a) is an array
Figure 231000DEST_PATH_IMAGE040
To middle
Figure 786747DEST_PATH_IMAGE042
The ratio of the index of an element to the sum of the indices of all other elements;
step 4, training the medical entity recognition model by using the labeled medical text for training obtained in the step 2 to obtain a trained medical entity recognition model, wherein the medical entity recognition model comprises a bidirectional gating circulation network layer, a multi-head attention layer and a conditional random field layer which are sequentially connected;
step 4.1, bidirectional coding is carried out on the word vector by using a bidirectional gating circulation network layer, the bidirectional gating circulation network layer comprises a forward gating circulation network layer and a reverse gating circulation network layer, the forward gating circulation network layer learns the postamble characteristics, and the reverse gating circulation network layer learns the preamble characteristics, so that the generated vector can better capture the contextual semantic information and learn the context; the gated loop network layer is only composed of an update gate and a reset gate, wherein the update gate determines the amount of information that is passed to the future in the past, the reset gate determines the amount of forgetting of the past information, and the gated loop network layer is calculated as shown in formula (10) -formula (13):
Figure 932557DEST_PATH_IMAGE044
(10)
Figure 636071DEST_PATH_IMAGE045
(11)
Figure 791109DEST_PATH_IMAGE046
(12)
Figure 201362DEST_PATH_IMAGE047
(13)
wherein,
Figure 783653DEST_PATH_IMAGE048
for renewing the door
Figure 177725DEST_PATH_IMAGE049
The output state at the time of day is,
Figure 667612DEST_PATH_IMAGE050
to reset the gate
Figure 463530DEST_PATH_IMAGE049
The output state at the time of day is,
Figure 951143DEST_PATH_IMAGE051
in the form of a candidate state, the state,
Figure 566932DEST_PATH_IMAGE052
to represent
Figure 63772DEST_PATH_IMAGE049
The output state of the network at the moment,
Figure 714197DEST_PATH_IMAGE053
indicating the state of the input at the current time,
Figure 638290DEST_PATH_IMAGE054
representing the hidden state of the gated-loop network node output at the last time,
Figure 803692DEST_PATH_IMAGE055
to represent
Figure 838645DEST_PATH_IMAGE056
The function of the function is that of the function,
Figure 343575DEST_PATH_IMAGE057
representing an excitation function
Figure 172991DEST_PATH_IMAGE058
Updating doors for training
Figure 780950DEST_PATH_IMAGE059
The weight parameter of (a) is determined,
Figure 619593DEST_PATH_IMAGE060
resetting a door for training
Figure 41347DEST_PATH_IMAGE061
The weight parameter of (a) is determined,
Figure 307244DEST_PATH_IMAGE062
to calculate candidate states
Figure 119342DEST_PATH_IMAGE051
A weight parameter used in the time;
Figure 496097DEST_PATH_IMAGE063
representing that the two vectors are connected; updating door
Figure 772357DEST_PATH_IMAGE048
For controlling the output state of the network at the present moment
Figure 209155DEST_PATH_IMAGE052
How much history state to keep
Figure 774128DEST_PATH_IMAGE054
Resetting door
Figure 954574DEST_PATH_IMAGE050
Has the effect of determining the candidate state
Figure 288603DEST_PATH_IMAGE051
Hidden state of last time gate control circulation network node output
Figure 896302DEST_PATH_IMAGE054
The degree of dependence of;
step 4.2, a multi-head attention layer is used for further extracting multiple semantics: a multi-head attention layer essentially means performing more than two attention head operations for a network layer through bidirectional gated loops
Figure 682993DEST_PATH_IMAGE049
Output state of time of day network
Figure 667129DEST_PATH_IMAGE064
First, a single-shot attention calculation is performed by equation (16):
Figure 917982DEST_PATH_IMAGE065
(16)
wherein,
Figure 696582DEST_PATH_IMAGE066
to represent
Figure 704989DEST_PATH_IMAGE067
The result of the individual attention-head calculations,
Figure 227237DEST_PATH_IMAGE067
indicates that there is a
Figure 535859DEST_PATH_IMAGE067
The attention of the individual is focused on the head,
Figure 282098DEST_PATH_IMAGE068
to generate the weight parameters of the query vector,
Figure 308960DEST_PATH_IMAGE069
in order to generate the weight parameters of the key vectors,
Figure 369320DEST_PATH_IMAGE070
in order to generate the weight parameters of the value vector,
Figure 532448DEST_PATH_IMAGE038
is composed of
Figure 652851DEST_PATH_IMAGE071
The adjustment of the dimension is a smooth term,
Figure 901429DEST_PATH_IMAGE072
to normalize the exponential function, and finally, concatenate this
Figure 562218DEST_PATH_IMAGE067
The secondary calculation result is subjected to linear transformation to obtain the result of each time
Figure 845432DEST_PATH_IMAGE049
Circulating network layers by bidirectional gating
Figure 136736DEST_PATH_IMAGE049
Output state of time of day network
Figure 607031DEST_PATH_IMAGE064
The result of the multi-head attention calculation is shown in formula (17):
Figure 274773DEST_PATH_IMAGE073
(17)
wherein,
Figure 209231DEST_PATH_IMAGE074
showing the results of the calculation of the multi-head attention layer,
Figure 671436DEST_PATH_IMAGE075
is a weight parameter;
step 4.3, obtaining an optimal label sequence by using the conditional random field layer: for input sentences
Figure 894607DEST_PATH_IMAGE076
Sentence tag sequence
Figure 366040DEST_PATH_IMAGE077
The scoring of (A) is as follows:
Figure 92688DEST_PATH_IMAGE078
(18)
wherein,
Figure 725794DEST_PATH_IMAGE079
a scoring function representing the input sentence x generates a sequence of labels y,
Figure 232999DEST_PATH_IMAGE080
in order to be the length of the sequence,
Figure 242543DEST_PATH_IMAGE081
in order to shift the scoring matrix, the score matrix,
Figure 823697DEST_PATH_IMAGE082
representing by a label
Figure 887425DEST_PATH_IMAGE083
Transfer to label
Figure 819609DEST_PATH_IMAGE084
The score of the transition of (a) is,
Figure 632844DEST_PATH_IMAGE085
and
Figure 130822DEST_PATH_IMAGE086
the start and end tags in the presentation sentence,
Figure 371310DEST_PATH_IMAGE087
is shown as
Figure 525211DEST_PATH_IMAGE088
The words are marked as
Figure 876558DEST_PATH_IMAGE083
The probability of (d); normalized to obtain
Figure 494621DEST_PATH_IMAGE089
Maximum probability of tag sequence, as in equation (19):
Figure 906011DEST_PATH_IMAGE090
(19)
wherein,
Figure 812787DEST_PATH_IMAGE091
which represents the sequence of the actual tag(s),
Figure 702245DEST_PATH_IMAGE092
represents a set of all possible tag sequences;
solving a minimized second loss function of the medical entity identification model using maximum likelihood estimation, as in equation (20):
Figure 378077DEST_PATH_IMAGE093
(20)
wherein,
Figure 757106DEST_PATH_IMAGE094
expressing the second loss function value, and iteratively training the medical entity recognition model until the second loss function value
Figure 151178DEST_PATH_IMAGE094
Less than a second threshold
Figure 844328DEST_PATH_IMAGE095
Then, a global optimal sequence is obtained by utilizing a Viterbi algorithm, and the global optimal sequence is a labeling result of the final medical field named entity identification;
finally, identifying medical named entities in the text according to the label sequence; wherein if the character is marked as (B), it represents that it is the first character of the medical named entity, if the character is marked as (I), it represents that it is the non-beginning part of the medical named entity, if the character is marked as (O), it represents that it is not the medical named entity;
step 5, during recognition, importing the medical text segmentation characters for recognition into a trained language preprocessing model based on an attention mechanism to generate word vectors; and importing the obtained generated word vector into a trained medical entity recognition model to recognize the medical named entity in the text.
2. The method for identifying named entities in chinese medical treatment according to claim 1, wherein: step 4.1
Figure 640246DEST_PATH_IMAGE056
The function value field is (-1, 1), and the expression is shown in formula (14):
Figure 862280DEST_PATH_IMAGE096
(14)
wherein,
Figure 743648DEST_PATH_IMAGE097
representing the input to the function.
3. The method for identifying named entities in chinese medical treatment according to claim 2, wherein: in step 4.1, the value domain of the excitation function is (-1, 1), and the expression is shown in formula (15):
Figure 37226DEST_PATH_IMAGE098
(15)。
4. the method for identifying named entities in Chinese medical science according to claim 3, wherein: in step 4.3, the global optimal sequence is obtained by using the viterbi algorithm, as shown in formula (21):
Figure 687650DEST_PATH_IMAGE099
(21)
wherein,
Figure 611744DEST_PATH_IMAGE100
the sequence of tags in the set that maximizes the score function.
CN202110157254.4A 2021-02-05 2021-02-05 Chinese medical named entity recognition method Active CN112487820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110157254.4A CN112487820B (en) 2021-02-05 2021-02-05 Chinese medical named entity recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110157254.4A CN112487820B (en) 2021-02-05 2021-02-05 Chinese medical named entity recognition method

Publications (2)

Publication Number Publication Date
CN112487820A CN112487820A (en) 2021-03-12
CN112487820B true CN112487820B (en) 2021-05-25

Family

ID=74912336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110157254.4A Active CN112487820B (en) 2021-02-05 2021-02-05 Chinese medical named entity recognition method

Country Status (1)

Country Link
CN (1) CN112487820B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033207B (en) * 2021-04-07 2023-08-29 东北大学 Biomedical nested type entity identification method based on layer-by-layer perception mechanism
CN113221533B (en) * 2021-04-29 2024-07-05 支付宝(杭州)信息技术有限公司 Label extraction method, device and equipment for experience sound
CN113241128B (en) * 2021-04-29 2022-05-13 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113051897B (en) * 2021-05-25 2021-09-10 中国电子科技集团公司第三十研究所 GPT2 text automatic generation method based on Performer structure
CN113223656A (en) * 2021-05-28 2021-08-06 西北工业大学 Medicine combination prediction method based on deep learning
CN114239585B (en) * 2021-12-17 2024-06-21 安徽理工大学 Biomedical nested named entity recognition method
CN114692636B (en) * 2022-03-09 2023-11-03 南京海泰医疗信息系统有限公司 Nested named entity identification method based on relationship classification and sequence labeling
CN114332872B (en) * 2022-03-14 2022-05-24 四川国路安数据技术有限公司 Contract document fault-tolerant information extraction method based on graph attention network
CN115796407B (en) * 2023-02-13 2023-05-23 中建科技集团有限公司 Production line fault prediction method and related equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115721A (en) * 2020-09-28 2020-12-22 青岛海信网络科技股份有限公司 Named entity identification method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368541B (en) * 2018-12-06 2024-06-11 北京搜狗科技发展有限公司 Named entity identification method and device
CN110781683B (en) * 2019-11-04 2024-04-05 河海大学 Entity relation joint extraction method
CN111626056B (en) * 2020-04-11 2023-04-07 中国人民解放军战略支援部队信息工程大学 Chinese named entity identification method and device based on RoBERTA-BiGRU-LAN model
CN111783466A (en) * 2020-07-15 2020-10-16 电子科技大学 Named entity identification method for Chinese medical records

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115721A (en) * 2020-09-28 2020-12-22 青岛海信网络科技股份有限公司 Named entity identification method and device

Also Published As

Publication number Publication date
CN112487820A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
CN112487820B (en) Chinese medical named entity recognition method
CN109471895B (en) Electronic medical record phenotype extraction and phenotype name normalization method and system
CN111858931B (en) Text generation method based on deep learning
CN111738007B (en) Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN112257449B (en) Named entity recognition method and device, computer equipment and storage medium
CN106980609A (en) A kind of name entity recognition method of the condition random field of word-based vector representation
CN111985239A (en) Entity identification method and device, electronic equipment and storage medium
CN109800437A (en) A kind of name entity recognition method based on Fusion Features
CN113297364A (en) Natural language understanding method and device for dialog system
CN114818717B (en) Chinese named entity recognition method and system integrating vocabulary and syntax information
CN111881256B (en) Text entity relation extraction method and device and computer readable storage medium equipment
CN111460824A (en) Unmarked named entity identification method based on anti-migration learning
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
US20240104353A1 (en) Sequence-to sequence neural network systems using look ahead tree search
CN113641809A (en) XLNET-BiGRU-CRF-based intelligent question answering method
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN118227769B (en) Knowledge graph enhancement-based large language model question-answer generation method
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
CN115238693A (en) Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
Yang et al. Sequence-to-sequence prediction of personal computer software by recurrent neural network
CN111523320A (en) Chinese medical record word segmentation method based on deep learning
CN117875326A (en) Judicial named entity recognition method based on vocabulary enhancement
CN116720519B (en) Seedling medicine named entity identification method
CN114969343B (en) Weak supervision text classification method combined with relative position information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant