CN112115721A

CN112115721A - Named entity identification method and device

Info

Publication number: CN112115721A
Application number: CN202011039983.1A
Authority: CN
Inventors: 于腾; 葛通; 李晓雨; 孙凯; 徐文权; 潘汉祺; 胡永利; 申彦明; 陈维强; 孙永良; 于涛; 王玮
Original assignee: Hisense TransTech Co Ltd
Current assignee: Hisense TransTech Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2020-12-22
Anticipated expiration: 2040-09-28
Also published as: CN112115721B

Abstract

The embodiment of the invention provides a named entity identification method and a device, wherein the method comprises the following steps: inputting a first character sequence matrix of a text to be recognized into a first training model to obtain a first character feature matrix of the text to be recognized; inputting the first word sequence matrix of the text to be recognized into a second training model to obtain a first word characteristic matrix of the text to be recognized; the dimension of the first character feature matrix is the same as the dimension of the first word feature matrix; processing the first character feature matrix and the first word feature matrix to obtain a first word fusion feature matrix; and processing the first word fusion characteristic matrix through a third training model to obtain a named entity recognition result of the text to be recognized. In the method, the character feature matrix and the word feature matrix are subjected to fusion processing, and the character and word fusion feature matrix is processed, so that the accuracy of the recognition result of the text to be recognized is further improved.

Description

Named entity identification method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a named entity identification method and apparatus.

Background

Named Entity Recognition (NER), also called "proper name Recognition", refers to recognizing entities with specific meaning in text, mainly including names of people, places, organizations, proper nouns, etc. The NER technique is to mark the position and type of the related entity from a piece of natural language text, and extract the required entity, such as some organization names, person names, identification of disease and symptom etc. in the medical field. The method is widely applied to tasks such as knowledge graph construction, information extraction, information retrieval, machine translation, automatic question answering, public opinion monitoring and the like, and is the basis of natural language processing.

NER generally uses sequence labeling to associate entity boundaries and determine entity types. However, since the first step of NER is to determine the boundaries of words, i.e., segmentation, chinese text has no boundary markers like the spaces in english text for explicit labeled words; and special entity types in Chinese, such as foreigner name translation and place name translation, exist in addition to the entities defined in English; and the polysemy of Chinese words; the existing Chinese named entity recognition method still has certain limitation.

In the prior art, characters, word roots or words are mapped into a single vector, and the NER technology is realized through training models such as a corresponding Convolutional Neural Network (CNN) and a Long Short-Term Memory network (LSTM). However, in order to strengthen the correlation between the characters and the characters or between the characters and the characters, a great deal of manual intervention is needed to construct the characteristics of the characters or the characters, and the method is time-consuming and labor-consuming. And the method is difficult to ensure the accuracy of named entity identification in the practical process. Especially for sentences with long entities, it is more difficult to identify the boundaries of the entities, resulting in a lower accuracy of named entity identification.

Therefore, there is a need for a method and an apparatus for identifying a named entity, which can improve the accuracy of identifying the named entity.

Disclosure of Invention

The embodiment of the invention provides a named entity identification method and device, which can improve the accuracy of named entity identification.

In a first aspect, an embodiment of the present invention provides a method for identifying a named entity, where the method includes:

inputting a first character sequence matrix of a text to be recognized into a first training model to obtain a first character feature matrix of the text to be recognized; inputting the first word sequence matrix of the text to be recognized into a second training model to obtain a first word characteristic matrix of the text to be recognized; the dimension of the first character feature matrix is the same as the dimension of the first word feature matrix; processing the first character feature matrix and the first word feature matrix to obtain a first word fusion feature matrix; and processing the first word fusion characteristic matrix through a third training model to obtain a named entity recognition result of the text to be recognized.

In the method, a word feature matrix and a word feature matrix are obtained by using a first training model and a second training model. The dimension of the character feature matrix is the same as that of the word feature matrix, so that the character feature matrix and the word feature matrix are fused, and on one hand, the accuracy of the recognition result of the text to be recognized can be improved; on the other hand, the phenomenon that gradient explosion occurs due to overhigh dimensionality caused by calculation modes such as splicing the character feature matrix and the word feature matrix and the like, and the running efficiency of the model is reduced. And the character and word fusion characteristic matrix is processed through the third training model, so that the accuracy of the recognition result of the text to be recognized is further improved.

Optionally, before the first word sequence matrix of the text to be recognized is input into the first training model to obtain the first word feature matrix of the text to be recognized, the method further includes: setting a first parameter of the first training model, wherein the first parameter is used for acquiring the first character feature matrix with a preset dimensionality; before the first word sequence matrix of the text to be recognized is input into the second training model to obtain the first word feature matrix of the text to be recognized, the method further comprises the following steps: and setting a second parameter of the second training model, wherein the second parameter is used for acquiring the first word feature matrix of the preset dimensionality.

In the method, the first parameters are respectively set for the first training model, and the second parameters are set for the second training model, so that the obtained character feature matrix and the obtained word feature matrix have the same dimension, the fusion processing of the first character feature matrix and the first word feature matrix is facilitated, and the accuracy of named entity recognition is further facilitated to be improved.

Optionally, before the first word sequence matrix of the text to be recognized is input into the second training model to obtain the first word feature matrix of the text to be recognized, the method further includes: determining a first word vector corresponding to each word of the text to be recognized in a first mode; the first word vector of each word constitutes the first word sequence matrix; determining a second word vector corresponding to each word of the text to be recognized in a second mode; the first mode is different from the second mode; performing word segmentation on the text to be recognized to obtain each word segmentation of the text to be recognized; and carrying out same-dimension processing on the second word vector of each word in each participle, and determining the word vector of each participle so as to obtain the first word sequence matrix.

In the method, a first word vector and a second word vector of the text to be recognized are determined in a first mode and a second mode respectively, and further, according to each participle of the text to be recognized and the second word vector corresponding to each participle, the second word vector of a plurality of words of each participle is subjected to same-dimension processing to determine the word vector of each participle, and further, a first word sequence matrix is obtained. Therefore, the first word sequence matrix not only contains the word segmentation information of the text to be recognized, but also contains the semantic information of each word of the text to be recognized, and the global informativeness of the word is kept. Therefore, the accuracy of the named entity recognition result of the text to be recognized is improved.

Optionally, the first training model is a BERT model (Bidirectional Encoder characterization model of transformer); the second training model is a CNN model (Convolutional Neural Networks).

Optionally, the third training model includes a bidirectional Long Short-Term Memory network (Bi-directional Long Short-Term Memory) model and a self-attention mechanism model; processing the first word fusion feature matrix through a third training model to obtain a named entity recognition result of the text to be recognized, wherein the named entity recognition result comprises the following steps: processing the first word fusion characteristic matrix through the BilSTM model to increase semantic information of the text to be recognized corresponding to the first word fusion characteristic matrix to obtain a first word characteristic matrix; processing the first word feature matrix through the self-attention mechanism model to increase the weight of the corresponding named entity in the first word feature matrix to obtain a second word feature matrix; and acquiring the named entity recognition result of the text to be recognized according to the second character feature matrix.

In the method, the semantic information contained in the first word fusion characteristic matrix corresponding to the text to be recognized is increased through the BilSTM model, and the reliability and the accuracy of the semantic information contained in the first word fusion characteristic matrix are improved. Furthermore, the weight of the corresponding named entity in the first word feature matrix is increased through the self-attention mechanism model, so that the named entity in the second word feature matrix is prominent in position, when the named entity of the text to be recognized is obtained according to the second word feature matrix, the recognition of the named entity is more definite, and the accuracy of the recognition result of the named entity is increased.

Optionally, the third training model further includes a CRF (conditional random field); processing the first word feature matrix through the attention mechanism to increase the weight of the corresponding named entity in the first word feature matrix, and after obtaining a second word feature matrix, the method further includes: performing sequence optimization on the second word characteristic matrix through the CRF model to obtain a third word characteristic matrix; and acquiring the named entity recognition result of the text to be recognized in the optimal arrangement sequence according to the third character feature matrix.

In the method, the sequence of the second word feature matrix is optimized through the CRF model, so that the sequence of the named entities obtained by identifying the text to be identified is the optimal sequence on the premise of improving the accuracy of the recognition result of the named entities.

Optionally, the method further includes: inputting a second word sequence matrix of a sample text into a first training model to obtain a second word feature matrix of the sample text, wherein the first training model is a trained model; inputting the second word sequence matrix of the sample text into an initial second training model to obtain a second word characteristic matrix of the sample text; the dimension of the second word feature matrix is the same as the dimension of the second word feature matrix; processing the second character feature matrix and the second word feature matrix to obtain a second word fusion feature matrix; processing the second word fusion feature matrix through an initial third training model to obtain a second named entity recognition result of the sample text; and if the second named entity recognition result does not meet the set condition, adjusting a second training model and a third training model according to the second named entity recognition result.

In the method, the second training model and the third training model which are not trained are trained by using the first training model which is trained to be mature, so that the accuracy of each relevant parameter of the second training model and the third training model is improved, and the matching degree of the first training model, the second training model and the third training model is improved. Therefore, the recognition result of the text to be recognized is more accurate.

In a second aspect, an embodiment of the present invention provides a named entity identifying apparatus, where the apparatus includes:

the acquisition module is used for inputting a first character sequence matrix of a text to be recognized into a first training model to acquire a first character feature matrix of the text to be recognized; inputting the first word sequence matrix of the text to be recognized into a second training model to obtain a first word characteristic matrix of the text to be recognized; the dimension of the first character feature matrix is the same as the dimension of the first word feature matrix;

the processing module is used for processing the first character feature matrix and the first word feature matrix to obtain a first word fusion feature matrix; and processing the first word fusion characteristic matrix through a third training model to obtain a named entity recognition result of the text to be recognized.

In a third aspect, an embodiment of the present application further provides a computing device, including: a memory for storing a program; a processor for calling the program stored in said memory and executing the method as described in the various possible designs of the first aspect according to the obtained program.

In a fourth aspect, embodiments of the present application further provide a computer-readable non-transitory storage medium including a computer-readable program which, when read and executed by a computer, causes the computer to perform the method as described in the various possible designs of the first aspect.

These and other implementations of the present application will be more readily understood from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of an architecture for named entity recognition according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a named entity identification method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a BilSTM model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a BilSTM model according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of a named entity recognition method according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a named entity recognition apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a system architecture for named entity recognition according to an embodiment of the present invention, where a text to be recognized is input into a word feature training model 101, and the word feature training model 101 inputs a first word feature matrix in the text to be recognized into a word feature fusion model 103 according to the first word feature matrix; the word feature training model 102 inputs a first word feature matrix into the word feature fusion model 103 according to the first word feature matrix in the text to be recognized; the word feature fusion model 103 performs fusion processing on a first word feature matrix and a first word feature matrix with the same dimension to obtain a first word fusion feature matrix, and inputs the first word fusion feature matrix into the global semantic training model 104; training the first word fusion feature matrix through the global semantic training model 104 to increase global semantic information of the first word fusion feature matrix to the text to be recognized, so as to obtain a first word feature matrix; the global semantic training model 104 inputs the obtained first word feature matrix into the named entity weight training model 105; the named entity recognition weight model 105 trains the first word feature matrix to increase the weight of the named entity in the text to be recognized corresponding to the first word feature matrix to obtain a second word feature matrix; the named entity weight training model 105 inputs the obtained second word feature matrix into the named entity sequence training model 106; the named entity sequence training model 106 trains the second word feature matrix to optimize the arrangement sequence of the named entities in the text to be recognized corresponding to the second word feature matrix, so as to obtain a third word feature matrix, and a named entity recognition result is obtained according to the third word feature matrix.

Based on this, an embodiment of the present application provides a process of a named entity identification method, as shown in fig. 2, including:

step 201, inputting a first character sequence matrix of a text to be recognized into a first training model to obtain a first character feature matrix of the text to be recognized;

step 202, inputting the first word sequence matrix of the text to be recognized into a second training model to obtain a first word feature matrix of the text to be recognized; the dimension of the first character feature matrix is the same as the dimension of the first word feature matrix;

step 203, processing the first character feature matrix and the first word feature matrix to obtain a first word fusion feature matrix;

and 204, processing the first word fusion feature matrix through a third training model to obtain a named entity recognition result of the text to be recognized.

Before the first word sequence matrix of the text to be recognized is input into the first training model to obtain the first word feature matrix of the text to be recognized, the embodiment of the application further provides a dimension obtaining method, which further includes:

setting a first parameter of the first training model, wherein the first parameter is used for acquiring the first character feature matrix with a preset dimensionality; before the first word sequence matrix of the text to be recognized is input into the second training model to obtain the first word feature matrix of the text to be recognized, the method further comprises the following steps: and setting a second parameter of the second training model, wherein the second parameter is used for acquiring the first word feature matrix of the preset dimensionality. That is, the matrix dimensions output by the first training model and the second training model may be made the same by setting the first parameters of the first training model and the second parameters of the second training model. Namely, the dimensionality of the first word feature matrix output by the first training model is the same as that of the first word feature matrix output by the second training model.

The embodiment of the present application further provides a method for acquiring a first word sequence matrix and a first word sequence matrix, before inputting the first word sequence matrix of a text to be recognized into a second training model to acquire the first word feature matrix of the text to be recognized, the method further includes: determining a first word vector corresponding to each word of the text to be recognized in a first mode; the first word vector of each word constitutes the first word sequence matrix; determining the text to be recognized through a second modeA second word vector corresponding to each word of the text; the first mode is different from the second mode; performing word segmentation on the text to be recognized to obtain each word segmentation of the text to be recognized; and carrying out same-dimension processing on the second word vector of each word in each participle, and determining the word vector of each participle so as to obtain the first word sequence matrix. That is, the first word vectors are obtained in a first manner, and the first word sequence matrix is obtained according to each first word vector of the text to be recognized:

Cⁱa set of first word vectors representing an ith sentence,

a first word vector representing the nth word of the ith sentence. A second word vector is obtained by a second means,

performing word segmentation on the text to be recognized to obtain each word segmentation of the text to be recognized

SⁱA set of word vectors representing the ith sentence,

a word vector representing the mth word of the ith sentence. If word vector

Corresponding to the second word vector

Then

Or

The way of obtaining the word vector by the second word vector here may be addition orThe way of obtaining the word vector through the second word vector is not particularly limited, other than the subtraction. Thus, the word vector is obtained according to the second word vector, so that the first word sequence matrix obtained according to the word vector not only contains the word segmentation information of the text to be recognized, but also contains the semantic information of each word of the text to be recognized, and the global informativeness of the word is kept. Therefore, the accuracy of the named entity recognition result of the text to be recognized is improved.

For example, the text to be recognized is "Jiangsu Suzhou disease control center", and the first word vectors corresponding to the words obtained by the first method are respectively: river (159), Su (357), State (489), disease (621), accuse (741), Zhongzhong (963), Heart (452); the first word vector forming the first word sequence matrix may be:

obtaining second word vectors corresponding to the words through a second mode respectively as follows: river (321), Su (355), Su (557), State (499), disease (622), accuse (451), Zhong (564), Heart (877); performing word segmentation on the text to be recognized to obtain each word segmentation: "Jiangsu, Suzhou, disease control, center", the word vector of each participle is obtained by adding and averaging the second word vectors of each participle: jiangsu

Suzhou province

Disease control center

", the second word vector forming the first word sequence matrix may be:

the text to be recognized is only an example, and may also be a date, a symbol, or the like; the text to be recognized is not particularly limited. Averaging the second word vectors here is only an example of a same-dimension process to determine the word vectors for each participleThe dimension may be unchanged, such as subtraction. The dimension-invariant processing method is not particularly limited herein.

The embodiment of the application provides a named entity identification method, wherein the first training model is a BERT model (Bidirectional Encoder representation from transforms of a Bidirectional Encoder of a transformer); the second training model is a CNN model (Convolutional Neural Networks).

Here, the first training model is a BERT model, and the training process of the BERT model mainly encodes the input vector by using a bidirectional Transformer as an encoder. In particular, the BERT model is applied to the first word sequence matrix

Dividing according to words, if the first word sequence matrix of max _ length is exceeded, truncating the exceeded data, and adopting [ PAD ] when the length does not reach the value of max _ length]And (6) filling. Here, max _ length may refer to a row length or a column length of the first word sequence matrix, and is set according to specific needs. And then, labeling the first word sequence matrix, and labeling the beginning, middle and end of the sentence, the characters or words and the like to memorize the structure of the text to be recognized. Further, to train a deep bidirectional token, the empty task (size task) can be completed, that is, samples are determined by simply and randomly blocking x% of the text to be recognized, a first word sequence matrix of the blocked samples is obtained, the first word sequence matrix of the original blocked positions replaced by random words and the original first word sequence matrix are used as samples to be fed to an output softmax (logistic regression model), and then the blocked contents are predicted. For example, assuming that the original sentence is "my dog is hairpin", and 15% of the positions of tokens in the sentence are randomly selected for occlusion, and assuming that the fourth token position is randomly selected to be occluded, i.e., to occlude the hairpin, the occlusion process can be described as follows, wherein, assuming that a period of time is required for the whole prediction process to make each sample repeatedly input into the model in multiple epochs process, the text corresponding to the input matrixComprises the following steps:

80% time: replace the target word with [ MASK ], such as: my dog is hairpin- - > my dog is [ MASK ].

10% time: replacing the target word with a random word, for example: my dog is hairpin- > my dog is apple.

10% time: do not change the target word, for example: my dog is hairpin- - > my dog is hairpin.

Thus, the Transformer encoder model in the BERT model has to maintain a representation distribution (a distribution conditional representation) of one context per input token. That is, if the Transformer encoder model learns what the word to be predicted is, the learning of the context information is lost, and if the Transformer encoder model cannot learn which word to be predicted in the model training process, the word to be predicted must be judged by learning the information of token context, and such a model has the feature expression capability for the sentence.

The timing information and the position information of the samples in the above method can be characterized by the following formulas in the BERT model:

where pos denotes the position index of the word in the matrix, i denotes dimension, 2i denotes the even number therein, 2i +1 denotes the odd number therein, the even number position is encoded using sine, and the odd number position is encoded using cosine.

Wherein, softmax (logistic regression model) multiplies the input matrix by three parameters, Wq, Wv, Wk, respectively, to transform the matrix to obtain query, key, and values. Performing linear transformation on the input matrix according to Wq and Wk respectively through a Transformer encoder to obtain a matrix K corresponding to a matrix Q, Key corresponding to Query; and performing linear transformation on the input matrix according to Wv through a Transformer Decoder to obtain a matrix V corresponding to the Values. The softmax normalization process was further performed by the following formula, and the data was processed to be between 0 and 1. The importance of each word is adjusted using these correlations to obtain a new expression for each word:

wherein the content of the first and second substances,

is a regulatory factor.

Then, in order to increase semantic information of the output first word feature matrix, a Transformal "multi-head" mode is adopted, and the formula is as follows:

MultiHead(Q，K，V)＝Concat(head_1,head_2,…,head_k)W^o (1)

as in formula (5), different attition results of Q, K, V are obtained by changing the W parameters (three parameters Wq, Wv, Wk), and the obtained result is used as a head. In the formula (1), k heads are spliced and then the parameters W are used^oThe value obtained by performing a linear transformation once is obtained as a result of multi-head attention, i.e., MultiHead (Q, K, V). Finally, a fitting calculation is performed on the MultiHead (Q, K, V) through the fully connected feedforward network, and the application formula is as follows:

FFN(X)＝max(0，ZW₁+b₁)W₂+b₂

where the output is represented as X, i.e., the first word feature vector; b is a bias vector; z is input Multihead (Q, K, V); w is a parameter of the fully-connected feed-forward network. The second training model is a CNN model, and the CNN model further extracts a first word sequence matrix

Is represented by the following formula:

obtaining a first word feature vector processed by a CNN model,

finally, C, which can be output according to the BERT modelⁱAnd M of CNN model outputⁱObtaining a first word fusion feature matrix R by summingⁱ。

The embodiment of the application provides a named entity identification method, wherein the third training model comprises a bidirectional Long Short Term Memory network (Bi-directional Long Short-Term Memory network) BilS model and an automatic attention mechanism model; processing the first word fusion feature matrix through a third training model to obtain a named entity recognition result of the text to be recognized, wherein the named entity recognition result comprises the following steps: processing the first word fusion characteristic matrix through the BilSTM model to increase semantic information of the text to be recognized corresponding to the first word fusion characteristic matrix to obtain a first word characteristic matrix;

here, the feature matrix R is fused to the first word by the BilSTM modelⁱProcessing is performed, as shown in fig. 3, the diagram is an internal structure diagram of the BiLSTM model, the BiLSTM model mainly includes three gates, namely a forgetting gate (Forget gate), an Input gate (Input gate), and an Output gate (Output gate), and the middle Cell is called a memory Cell and is used for storing a current memory state.

(1) Forget the door: the function of the forgetting gate is to determine the information discarded in the memory cell. The sigmod activation function is adopted to normalize the numerical value, the weight value is set to be a value between 0 and 1, the data of the sigmod activation function is derived from the current input, the hidden layer state at the previous moment and the memory cell at the previous moment, and the formula of forward propagation is as follows:

f_t＝sigmoid(W_f.[h_t-1+x_t]+b_f

f_tvalues of 0 or 1, 0 indicating complete discard and 1 indicating complete retention. f. of_tOutput of forgetting gate layer at time t, h_t-1Representing the hidden layer output vector, x, at time t-1_tIndicating input at time t, W_fIndicating x for input in the f state_tWeight matrix of b_fRepresenting a bias vector.

(2) An input gate: the input gate determines that additional content is required. The method is characterized in that a sigmod activation function is adopted for normalization, and then a new candidate value vector C forward propagation formula is created through a tanh function:

i_t＝σ(W_f.[h_t-1,x_t]+b_i)

C_t＝Tanh(W_c.[h_t-1,x_t]+b_c

i_tthe value of (1) is 0 or 1, 0 indicates that the current content is not added, and 1 indicates that the current content is newly added. i.e. i_tIs the output of the output gate layer at time t, W_iIndicating for input x in the i state_tWeight matrix of b_iRepresenting the bias vector in the i state. W_cIndicates for input x in the C state_tIs given, bc denotes the offset vector, C_tThe candidate vectors generated for time t. It should be noted here that the data input in the calculation formulas of the forgetting gate and the input gate are the same, and the functions for distinguishing the two are the corresponding weight matrix and the bias.

(3) Memory cell: the memory cell stores the memorized content, and determines whether the past memory is retained at the current moment (i.e. f)_tValue of) and whether to remember the new content (i.e., i)_tOf) is determined, then the memory cell is updated, that is, after the candidate vector is determined, a state update is performed based on the previously obtained outputs of the forgetting gate and the input gate, where C_t-1Is the state vector at time t-1, C_tIs the state vector at time t. The formula is as follows:

C_t＝f_tC_t-1+C_t

among these, the formula for updating cells can be understood as follows: c_t-1What represents the LSTM model remembers at time t-1, when at time t there are two questions to be faced, do one continue to remember what was previously (time t-1)? And whether new content is currently remembered? There will therefore be four cases:

i, when f _t0 and i_tWhen equal to 0, C _t0, i.e. forget all content in the past and not remember new content;

II when f _t0 and i_tWhen 1, C_t＝Z_tForget to complete the content in the past, but remember the new content;

III when f _t1 and i_tWhen equal to 0, C_t＝C_t-1Namely, reserving the previous content and ignoring the new content;

IV, when f _t1 and i_tWhen 1, C_t＝C_t-1+Z_tI.e. both retain the previous content and remember the new content.

Where the sigmod function is not binary (i.e. it is a value between 0-1), so for f_tAnd i_tIn fact, it is decided how much to keep remembering past content and to choose to remember new content, respectively, for example f _t1 indicates that all past content is retained, f_tA value of 0.5 indicates forgetting half of the past content or fading past memory.

(4) An output gate: the output gate determines what content is output, i.e. for the current time t, if O_tWhen 0, no output is indicated, and when O is not output_tAnd 1 represents output, and the third Sigmoid function determines that part of information needs to be output. Then processing through a tanh function to obtain a value between-1 and multiplying it with the output of the Sigmoid function to finally obtain the output:

O_t＝σ(W_o.[h_t-1,x_t]+b_o

h_t＝O_t*Tanh(C_t)

wherein, Tanh (C)_t) The content memorized in the memory cell at the current moment is processed to make the value range between-1 and 1. O is_tIs the output at time t, W_oIndicating x for input in the o state_tWeight matrix of b_oDenotes an offset vector, h_tThe vectors of the hidden layer are for time t. Because of the three gating mechanisms, the LSTM can effectively process the long-term dependence problem, and solves the problems of gradient disappearance and gradient explosion to a certain extent.

Thus summarizing the principle of the LSTM model: at the t-th moment, firstly, whether past memory contents are reserved or not is judged, secondly, whether new contents need to be added or not is judged, and then, whether the contents at the current moment need to be output or not is judged after memory cells are updated.

Finally, by the above method, as shown in fig. 4, two opposite forward and backward LSTM layers are set, the forward LSTM layer represents a sequential sequence, and the backward LSTM layer represents a reverse sequence. Outputting through forward LSTM to show the past information; the future information is represented by the backward LSTM output. And combining the forward direction and the backward direction to obtain the output of the BilSTM layer, and obtaining a first word characteristic matrix. According to the method, the semantic information contained in the first word fusion characteristic matrix corresponding to the text to be recognized is increased through the BilSTM model, and the reliability and the accuracy of the semantic information contained in the first word fusion characteristic matrix are improved.

Then, continuously processing the first word characteristic matrix through the self-attention mechanism model to increase the weight of the corresponding named entity in the first word characteristic matrix to obtain a second word characteristic matrix;

i.e. the first word feature matrix GⁱInput formula

Obtaining the output second word characteristic matrix Xⁱ. Finally, according to the second character feature matrix XⁱAnd acquiring a named entity recognition result of the text to be recognized. Thus, it is aAdding a first word feature matrix G through a self-attention mechanism modelⁱSo that the second word feature matrix X isⁱSo that the feature matrix X is based on the second wordⁱWhen the named entity of the text to be recognized is obtained, the named entity is recognized more definitely, and the accuracy of the recognition result of the named entity is improved.

The embodiment of the application also provides a feature matrix X belonging to a second wordⁱA method of performing sequence optimization, said third training model further comprising a CRF model (conditional random field);

processing the first word feature matrix through the attention mechanism to increase the weight of the corresponding named entity in the first word feature matrix, and after obtaining a second word feature matrix, the method further includes: performing sequence optimization on the second word characteristic matrix through the CRF model to obtain a third word characteristic matrix; and acquiring the named entity recognition result of the text to be recognized in the optimal arrangement sequence according to the third character feature matrix. For the second word feature matrix X, it is assumed that K is the output score matrix through the self-attention mechanism, the size of K is n × K, n is the number of words, K is the number of labels, and K is_ijScore representing jth label of ith word, for predicted sequence

To say, its score function is obtained:

wherein A represents a transition score matrix, A represents the score of label i transitioning to label j, and the probability of generating the prediction sequence Y is as follows:

taking logarithms at two ends to obtain a likelihood function of the prediction sequence:

in the formula (I), the compound is shown in the specification,

representing the actual tag sequence, Y_XRepresenting all possible tag sequences, the final output optimal sequence is:

therefore, the sequence of the named entities obtained by identifying the text to be identified is the optimal sequence Y on the premise of improving the accuracy of the named entity identification result by performing sequence optimization on the second word feature matrix through the CRF model^*。

The embodiment of the application also provides a model training method, which comprises the following steps: inputting a second word sequence matrix of a sample text into a first training model to obtain a second word feature matrix of the sample text, wherein the first training model is a trained model; inputting the second word sequence matrix of the sample text into an initial second training model to obtain a second word characteristic matrix of the sample text; the dimension of the second word feature matrix is the same as the dimension of the second word feature matrix; processing the second character feature matrix and the second word feature matrix to obtain a second word fusion feature matrix; processing the second word fusion feature matrix through an initial third training model to obtain a second named entity recognition result of the sample text; and if the second named entity recognition result does not meet the set condition, adjusting a second training model and a third training model according to the second named entity recognition result. That is, if the text to be recognized is recognized through the combined model of the first training model, the second training model and the third training model, the sample text may be recognized through the first training model which is trained well, the second training model which is not trained and the third training model, and the relevant parameters of the second training model and the third training model are continuously adjusted in the process of recognizing the sample text through the first training model, the second training model and the third training model; and finishing the training of a combined model formed by the first training model, the second training model and the third training model.

Based on the above method flow, an embodiment of the present application provides a flow of a named entity identification method, as shown in fig. 5, including:

step 501, obtaining a trained first training model, inputting a second word sequence matrix of a sample text into the first training model, and obtaining a second word feature matrix of the sample text, wherein a first parameter for adjusting dimensions in the first training model is set as a parameter value capable of obtaining a preset dimension.

Step 502, obtaining an untrained second training model, inputting a second word sequence matrix of the sample text into the initial second training model, and obtaining a second word feature matrix of the sample text, wherein a second parameter for adjusting the dimension in the second training model is set as a parameter value capable of obtaining a preset dimension.

Step 503, obtaining a second word fusion feature matrix according to the second word feature matrix and the second word feature matrix with the same dimension.

And step 504, inputting the second word fusion feature matrix into an untrained third training model, and obtaining a second named entity recognition result of the sample text.

And 505, adjusting relevant parameters of the second training model and the third training model according to the second named entity recognition result, and re-executing the steps 501 to 505 until the obtained second named entity recognition result reaches a preset accuracy rate.

Step 506, obtaining a first word vector of each word of the text to be recognized in a first mode, obtaining a first word sequence matrix through the first word vector of the text to be recognized, inputting the first word sequence matrix of the text to be recognized into a first training model, and obtaining a first word feature matrix of the text to be recognized.

Step 507, acquiring a second word vector of each word of the text to be recognized through a second mode, and performing word segmentation on the text to be recognized to obtain each word segmentation of the text to be recognized; and according to the word segmentation of the text to be recognized, carrying out same-dimension processing on the second word vector of each word in each word segmentation, and determining the word vector of each word segmentation so as to obtain a first word sequence matrix. And inputting the first word sequence matrix into a second training model to obtain a first word characteristic matrix.

And step 508, performing same-dimension processing on the first character feature matrix and the first word feature matrix to obtain a first word fusion feature matrix.

And 509, inputting the first word fusion feature matrix into a third training model to obtain a named entity recognition result.

It should be noted that, in the above-mentioned flow, steps 501 to 504 are to train the second training model and the third training model through the trained first training model to obtain a mature combined model of the first training model, the second training model and the third training model. Steps 501 to 504 in the above flow may be executed in a loop until it is determined that the recognition accuracy of the combined model of the current first training model, the current second training model, and the current third training model reaches the required accuracy. And executing the steps 506 to 509 according to the models obtained in the steps 501 to 505 to obtain the named entity recognition result of the text to be recognized.

The accuracy of the recognition results of several types of named entities with respect to the above method is provided herein, including activity name (activity _ name), address (address), index data (data), organization name (organization _ name), and time (time). The evaluation index of the application uses methods of accuracy (P), recall (R), and F1 values. The specific formula is as follows:

wherein P is the ratio of the named entity correctly labeled by the method to the total amount of the entity identified in the text to be identified; correct number of entities marked as correct number of entities; missing is the number of entities identifying the error; spurious is the correct number of entities which are not recognized; r is the ratio of the named entity which is correctly marked to the total number of the entities in the test set; f1 is a weighted geometric mean of P and R. The application also provides an accuracy (P), a recall (R) and an F1 value of the named entity identified in the government affairs report; as shown in the following table:

based on the same concept, an embodiment of the present invention provides a named entity recognition apparatus, and fig. 6 is a schematic diagram of the named entity recognition apparatus provided in the embodiment of the present application, as shown in fig. 6, including:

an obtaining module 601, configured to input a first word sequence matrix of a text to be recognized into a first training model to obtain a first word feature matrix of the text to be recognized; inputting the first word sequence matrix of the text to be recognized into a second training model to obtain a first word characteristic matrix of the text to be recognized; the dimension of the first character feature matrix is the same as the dimension of the first word feature matrix;

a processing module 602, configured to process the first word feature matrix and the first word feature matrix to obtain a first word fusion feature matrix; and processing the first word fusion characteristic matrix through a third training model to obtain a named entity recognition result of the text to be recognized.

Optionally, the processing module 602 is further configured to: setting a first parameter of the first training model, wherein the first parameter is used for acquiring the first character feature matrix with a preset dimensionality; before the first word sequence matrix of the text to be recognized is input into the second training model to obtain the first word feature matrix of the text to be recognized, the method further comprises the following steps: and setting a second parameter of the second training model, wherein the second parameter is used for acquiring the first word feature matrix of the preset dimensionality.

Optionally, the processing module 602 is further configured to: determining a first word vector corresponding to each word of the text to be recognized in a first mode; the first word vector of each word constitutes the first word sequence matrix; determining a second word vector corresponding to each word of the text to be recognized in a second mode; the first mode is different from the second mode; performing word segmentation on the text to be recognized to obtain each word segmentation of the text to be recognized; and carrying out same-dimension processing on the second word vector of each word in each participle, and determining the word vector of each participle so as to obtain the first word sequence matrix.

Optionally, the third training model includes a bidirectional Long Short-Term Memory network (Bi-directional Long Short-Term Memory) model and a self-attention mechanism model; optionally, the processing module 602 is specifically configured to: processing the first word fusion characteristic matrix through the BilSTM model to increase semantic information of the text to be recognized corresponding to the first word fusion characteristic matrix to obtain a first word characteristic matrix; processing the first word feature matrix through the self-attention mechanism model to increase the weight of the corresponding named entity in the first word feature matrix to obtain a second word feature matrix; and acquiring the named entity recognition result of the text to be recognized according to the second character feature matrix.

Optionally, the third training model further includes a CRF (conditional random field); the processing module 602 is further configured to: performing sequence optimization on the second word characteristic matrix through the CRF model to obtain a third word characteristic matrix; and acquiring the named entity recognition result of the text to be recognized in the optimal arrangement sequence according to the third character feature matrix.

Optionally, the processing module 602 is further configured to: inputting a second word sequence matrix of a sample text into a first training model to obtain a second word feature matrix of the sample text, wherein the first training model is a trained model; inputting the second word sequence matrix of the sample text into an initial second training model to obtain a second word characteristic matrix of the sample text; the dimension of the second word feature matrix is the same as the dimension of the second word feature matrix; processing the second character feature matrix and the second word feature matrix to obtain a second word fusion feature matrix; processing the second word fusion feature matrix through an initial third training model to obtain a second named entity recognition result of the sample text; and if the second named entity recognition result does not meet the set condition, adjusting a second training model and a third training model according to the second named entity recognition result.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A named entity recognition method, comprising:

inputting a first character sequence matrix of a text to be recognized into a first training model to obtain a first character feature matrix of the text to be recognized;

inputting the first word sequence matrix of the text to be recognized into a second training model to obtain a first word characteristic matrix of the text to be recognized; the dimension of the first character feature matrix is the same as the dimension of the first word feature matrix;

processing the first character feature matrix and the first word feature matrix to obtain a first word fusion feature matrix;

and processing the first word fusion characteristic matrix through a third training model to obtain a named entity recognition result of the text to be recognized.

2. The method of claim 1, wherein before inputting the first word sequence matrix of the text to be recognized into the first training model to obtain the first word feature matrix of the text to be recognized, the method further comprises:

setting a first parameter of the first training model, wherein the first parameter is used for acquiring the first character feature matrix with a preset dimensionality;

before the first word sequence matrix of the text to be recognized is input into the second training model to obtain the first word feature matrix of the text to be recognized, the method further comprises the following steps:

and setting a second parameter of the second training model, wherein the second parameter is used for acquiring the first word feature matrix of the preset dimensionality.

3. The method of claim 1, wherein before inputting the first word sequence matrix of the text to be recognized into the second training model to obtain the first word feature matrix of the text to be recognized, the method further comprises:

determining a first word vector corresponding to each word of the text to be recognized in a first mode; the first word vector of each word constitutes the first word sequence matrix;

determining a second word vector corresponding to each word of the text to be recognized in a second mode; the first mode is different from the second mode;

performing word segmentation on the text to be recognized to obtain each word segmentation of the text to be recognized;

and carrying out same-dimension processing on the second word vector of each word in each participle, and determining the word vector of each participle so as to obtain the first word sequence matrix.

4. The method of claim 1, wherein the first training model is a BERT model (Bidirectional Encoder characterization model of transformer); the second training model is a CNN model (Convolutional Neural Networks).

5. The method of any of claims 1 to 4, wherein the third training model comprises a bidirectional Long Short Term Memory network (BilL TM) model (Bi-directional Long Short-Term Memory network) and a self-attention mechanism model;

processing the first word fusion feature matrix through a third training model to obtain a named entity recognition result of the text to be recognized, wherein the named entity recognition result comprises the following steps:

processing the first word fusion characteristic matrix through the BilSTM model to increase semantic information of the text to be recognized corresponding to the first word fusion characteristic matrix to obtain a first word characteristic matrix;

processing the first word feature matrix through the self-attention mechanism model to increase the weight of the corresponding named entity in the first word feature matrix to obtain a second word feature matrix;

and acquiring the named entity recognition result of the text to be recognized according to the second character feature matrix.

6. The method of claim 5, wherein the third training model further comprises a CRF model (conditional random field);

processing the first word feature matrix through the attention mechanism to increase the weight of the corresponding named entity in the first word feature matrix, and after obtaining a second word feature matrix, the method further includes:

performing sequence optimization on the second word characteristic matrix through the CRF model to obtain a third word characteristic matrix;

and acquiring the named entity recognition result of the text to be recognized in the optimal arrangement sequence according to the third character feature matrix.

7. The method of any of claims 1-6, further comprising:

inputting a second word sequence matrix of a sample text into a first training model to obtain a second word feature matrix of the sample text, wherein the first training model is a trained model;

inputting the second word sequence matrix of the sample text into an initial second training model to obtain a second word characteristic matrix of the sample text; the dimension of the second word feature matrix is the same as the dimension of the second word feature matrix;

processing the second character feature matrix and the second word feature matrix to obtain a second word fusion feature matrix;

processing the second word fusion feature matrix through an initial third training model to obtain a second named entity recognition result of the sample text;

and if the second named entity recognition result does not meet the set condition, adjusting a second training model and a third training model according to the second named entity recognition result.

8. An apparatus for named entity recognition, the apparatus comprising:

9. A computer-readable storage medium, characterized in that the storage medium stores a program which, when run on a computer, causes the computer to carry out the method of any one of claims 1 to 7.

10. A computer device, comprising:

a memory for storing a computer program;

a processor for calling a computer program stored in said memory to execute the method of any of claims 1 to 7 in accordance with the obtained program.