CN109657239A

CN109657239A - The Chinese name entity recognition method learnt based on attention mechanism and language model

Info

Publication number: CN109657239A
Application number: CN201811517779.9A
Authority: CN
Inventors: 廖伟智; 马攀; 王宇; 阴艳超
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2019-04-19
Anticipated expiration: 2038-12-12
Also published as: CN109657239B

Abstract

The invention discloses a kind of Chinese name entity recognition methods learnt based on attention mechanism and language model, this method includes constructing the dictionary based on word, vector conversion is carried out to the corresponding ID number of each element, it is combined by Self-attention layers of restricted, splice and combine simultaneously train language model by first layer Bi-LSTM, it is spliced and combined by second layer Bi-LSTM and maximum matching method is used to carry out Tag Estimation training, data set is subjected to random alignment and more wheel training are carried out using Adam optimization method, Chinese name Entity recognition is carried out to text data to be identified using neural network.The present invention is based only on the feature of word, does not need to carry out the manual features such as participle and other parts of speech, syntax, improves the robustness and robustness of method；And the present invention shows well unregistered word, and function admirable, can be very good the performance for improving Downstream Jobs.

Description

The Chinese name entity recognition method learnt based on attention mechanism and language model

Technical field

The invention belongs to entity recognition techniques fields, and in particular to a kind of to be learnt based on attention mechanism and language model Chinese name entity recognition method.

Background technique

Chinese name Entity recognition problem is one of most common problem of natural language processing field, main task be to Word or word in non-structured text is tagged, convenient for extracting the effective information in text.

Chinese name body identification mission is identified to the entity in Chinese text data, and effective letter in text is extracted Breath, specifically, the object of identification is Chinese text data, such as a word or a Duan Wenzhang；The requirement of identification is to mark this section Entity name in text, such as name, place name, mechanism name, appellation.

The method for carrying out Chinese name Entity recognition at present is broadly divided into three classes:

1. rule-based unsupervised approaches:

The form of expression mainly according to name entity to be identified in linguistics, is artificially arranged some rules and goes to match The syntactic structure of sentence, mark name entity.Rule-based method mostly uses linguistic expertise construction rule template by hand, Selecting feature includes statistical information, punctuation mark, keyword, deictic words and direction word, position word (such as tail word), centre word side Method, matches in mode with character string for main means, and this kind of system depends on the foundation of knowledge base and dictionary mostly.This kind of side The effect of method is largely dependent upon the level of the linguistic expertise of setting rule, and different fields is needed to be arranged Different rules are deacclimatized, so comparing elapsed time and manpower.

2. the method based on probability statistics:

Statistical machine learning method will name Entity recognition as sequence labelling task from the point of view of, pass through large-scale corpus It practises to obtain language model, to realize the mark to each position of sentence.The common model of such methods has: production model Hidden Markov Model (HMM) and discriminative model condition random field (CRF) etc..

3. method neural network based:

With the development of the fast development of neural network, especially Recognition with Recurrent Neural Network, to processing sequence task bring Great performance improves, in addition the development of term vector, makes Processing with Neural Network text data become a kind of approach.In addition nerve net The powerful ability in feature extraction of network, should not supernumerary's work feature can reach good performance.Problem is marked in processing sequence More outstanding is exactly LSTM+CRF model, achieves good effect in English name Entity recognition, but due in The relationship of literary characteristic of speech sounds, this set of model showed in Chinese name Entity recognition task without so outstanding.So Chinese It is being carried out always based on word or word-based or research based on word and word composite character, and there have paper to point out to be special based on word The expression effect of the name Entity recognition task of sign is better than the effect of word-based feature, and the method based on word feature for The performance of unregistered word is better than the method for word-based feature.

Prior art defect:

1. rule-based unsupervised approaches need linguistic expertise rule is arranged, and for different fields, very It to different text language styles requires that different rules is arranged, expansible row is very low.And if the rule of setting is stringent If, it is very big for the omission of effective information.If too loose, the poor effect of identification.

2. mainly being learnt by large-scale corpus based on the method for probability statistics to obtain language model, main models have Hidden Markov Model, maximum entropy model and condition random field, such methods become dependent upon the quality of corpus, and certain Showed on sample it is very poor, cause it is extensive not enough, recall rate is relatively low, and expression effect is not fine.

3. the neural network method of current mainstream utilizes Recognition with Recurrent Neural Network (RNN) or other Recognition with Recurrent Neural Network Name Entity recognition problem is converted to sequence labelling problem by mutation, such as LSTM, GRU.Such issues that first pass through participle after Sequence labelling is carried out again, but the effect of this method is largely influenced by participle effect, and for being not logged in The expression effect of word is not fine；There are also researchs to improve neural network model performance to be simple, manually increases various features (such as Part of speech feature, grammar property etc.) into the input of neural network, such methods are although improve to effect, but people The increased feature of work, increases the workload of people, complicates method, this does not meet the sheet that neural network automatically extracts feature yet Matter feature.

Summary of the invention

Goal of the invention of the invention is: in order to overcome defect existing for existing Chinese name entity recognition method, improving and knows Other effect, the invention proposes a kind of Chinese name entity recognition methods learnt based on attention mechanism and language model.

The technical scheme is that a kind of Chinese name Entity recognition learnt based on attention mechanism and language model Method, comprising the following steps:

A, the data set for having marked the Chinese name Entity recognition of label is obtained, the dictionary based on word is constructed；

B, vector conversion process is carried out to the corresponding ID number of element each in the dictionary based on word of step A building；

C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained every The word vector of a center word interior weighted array with other word vectors of window ranges nearby, excavates center word window model nearby Enclose interior potential word information；

D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time of both direction The hidden layer of step exports, and obtained output is spliced and combined, and exports training language using the hidden layer of each time step Model；

E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtain it is secondary splice and combine it is defeated Out, and using maximum matching method Tag Estimation training is carried out；

F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and excellent using Adam Change method circulation step A-E carries out more wheel training to neural network；

G, text data to be identified is handled using neural network, completes Chinese name Entity recognition.

Further, the step A obtains the data set for having marked the Chinese name Entity recognition of label, and building is based on word Dictionary, specially

An ID number is distributed to each of the data set of Chinese name Entity recognition for having marked label word and symbol, And mark is added respectively in the beginning and end of sentence.

Further, the step A further includes building sequence label；The sequence label includes positive language model mark The label of the corresponding name body classification of label, reversed language model label and each word.

Further, the word vector after step B conversion is passed through restricted Self-attention by the step C Layer is combined, and obtains the word information that word is combined into, specially

Window size is set, by each word in the window size region of center word compared with center word carries out correlation, Calculate the relevance values of each word Yu center word；Then group is weighted according to the correlation of word each in region and center word It closes, calculates center word vector, obtain the potential word information being combined by word.

Further, the formula of the relevance values for calculating each word and center word is specially f (x_i, q) and=ω^Tσ (W⁽¹⁾x_i+W⁽²⁾q)

Wherein, q indicates center word, x_iI-th of word of window-size window size, W near the word of expression center⁽¹⁾It indicates x_iNeed by full articulamentum correspond to weight matrix, W⁽²⁾Indicate that q needs the full articulamentum passed through to correspond to weight matrix, w^TIt indicates Correlation vector.

Further, the formula of the calculating center word vector is to be specially

P (z | x, q)=soft max (a)

Wherein, a indicates each x_iVector after the relevance values splicing calculated with q, and p (z | x, q) indicate all right Answer x_iWith the output valve of the relevance values of q after softmax function, p (z=i | x, q) indicates x_iRelevance values with q are whole Weighted value in a sequence, z indicate all x in the neighbouring window-size window of center word_iAbout the weight value set of q, x table Show comprising x_iSequence, s indicate weighted array calculated value.

Further, the word vector that step C is obtained is handled in the step D by first layer Bi-LSTM, is obtained The hidden layer of each time step of both direction exports, and is by the computation model that obtained output splices and combines

Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer of the t moment of reversed LSTM Output, d_tIt indicatesWithSplicing result,Indicate positive hidden layer output by activation primitive tanh's as a result,It indicates Reversed hidden layer output by activation primitive tanh's as a result,It indicatesThe full articulamentum passed through corresponds to weight matrix,Table ShowThe full articulamentum passed through corresponds to weight matrix.

Further, the computation model of the hidden layer output train language model of each time step is utilized in the step D For

Wherein,It indicates to use in t momentCome predict next moment word probability vector,It indicates to use in t momentCome predict next moment word probability vector,It indicatesThe full connection passed through The corresponding weight matrix of layer,It indicatesThe full articulamentum passed through corresponds to weight matrix,Indicate forward direction LSTM train language model Loss function,Indicate that the loss function of reversed LSTM train language model, T indicate total time step of sequence.

Further, the step D further includes respectively by the word ID for calculating prediction and the really friendship between the word ID of front and back Entropy is pitched, and calculates loss using the positive LSTM label and reversed LSTM label of step A building, language model is optimized.

Further, carrying out Tag Estimation training using maximum matching method in the step E is specially

The output obtained using step D is input to second layer Bi-LSTM, then to obtain second layer Bi-LSTM spliced defeated Out, the maximal possibility estimation of sequence label is then calculated inside a condition random field, and utilizes each word of step A building The loss of the label design conditions random field of corresponding name body classification, finally adds second layer Bi- for the loss of condition random field The loss for the entire neural network that the loss of the language model of LSTM obtains.

The beneficial effects of the present invention are: the present invention passes through restricted by dictionary of the building based on word respectively Self-attention layers, first layer Bi-LSTM and second layer Bi-LSTM are handled, and are carried out in first layer Bi-LSTM Language model study carries out Tag Estimation training finally by maximum matching method；The present invention is based only on the feature of word, is not required to The manual features such as participle and other parts of speech, syntax are carried out, the robustness and robustness of method are improved；And the present invention couple Unregistered word performance is good, and function admirable, can be very good the performance for improving Downstream Jobs.

Detailed description of the invention

Fig. 1 is the flow diagram of Chinese name entity recognition method of the invention；

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.

As shown in Figure 1, for the flow diagram of Chinese name entity recognition method of the invention.One kind being based on attention machine The Chinese name entity recognition method of system and language model study, comprising the following steps:

Present invention combination attention mechanism (restricted self-attention) technology, and analyze current main-stream The shortcomings that LSTM+CRF model, devises two layers of two-way LSTM+CRF structure, and language is added in the output of the two-way LSTM of first layer It says model learning, improves the ability to express of model.

Above-mentioned steps A-C is the first stage of Chinese name Entity recognition, i.e., being converted to Chinese text data can locate The digital vectors of reason reuse restricted Self-attention technology and excavate potential word information；Above-mentioned steps D is The second stage of Chinese name Entity recognition, i.e., handled, and learn language model by first layer Bi-LSTM；Above-mentioned step Rapid E-G is the phase III of Chinese name Entity recognition, i.e., is predicted by second layer Bi-LSTM+CRF.

In an alternate embodiment of the present invention where, above-mentioned steps A obtains the Chinese name Entity recognition for having marked label Data set, construct the dictionary based on word, specifically: to marked label Chinese name Entity recognition data set in it is every One word and symbol distribute an ID number, and add mark respectively in the beginning and end of sentence.

The above-mentioned dictionary based on word includes all words and symbol in data set, and add identifier<UNK>,<PAD>and It represents sentence and starts ending<s>,</S>deng mark, an ID number can be distributed to each element in dictionary；For convenient for after Continuous mass processing, the present invention set the ID of<PAD>as 0, and other elements just distribute ID in order, so as to obtain about number According to the dictionary of collection, each element in dictionary has one-to-one ID number therewith.Using the dictionary made can will in Literary text data is converted into the digital information that can be inputted to neural network.Such as a Chinese text, " I likes Beijing Tian An Door.", one searches each word and the corresponding ID number of symbol in dictionary, and the ID of " I " is 1, and the ID of " love " is 4, and the ID in " north " is 16 ..., "." corresponding ID is 10, the ID sequence [Isosorbide-5-Nitrae, 16,45,46,153,86,10] after conversion.

If the dimension of input is [batch_size*max_len], here batch_ using the method for batch training Size indicates that the sample number of batch input, max_len indicate a longest sample length in batch sample, longest that other are insufficient ' 0 ' come polishings of sample.For example the number of samples of a batch batch is 2, carrys out polishing according to longest sequence.

" I am Chinese."

" I takes pride in very much."

Batch ID after converting, will into the matrix [[1,8,87,89,62,10], [1,54,465,875,10,0]] of bidimensional Insufficient longest sentence sequence is with O come polishing.

Above-mentioned steps A further includes building sequence label；The sequence label includes positive language model label, reversed language The label of model label and the corresponding name body classification of each word.

Above-mentioned forward direction LSTM label is used for positive language model study, and the method for building forward direction LSTM label is to delete one First ID of sequence samples (such as: in short), in sequence finally with "<S>" the corresponding ID that indicates to end up, for example, " I likes north Capital Tian An-men.", remove first character " I ", finally fill in sequence "<S>", "<S>" corresponding ID is in dictionary here " 489 ", the ID label after conversion are [4,16,45,46,153,86,10,489].

The construction method of above-mentioned reversed LSTM label is the last character for removing sequence, and fills " < S to sequence beginning > ", such as " I loves Beijing Tian An-men.", remove ".", "<S>" is filled to beginning, "<S>" corresponding ID is 789, after conversion As a result [789, Isosorbide-5-Nitrae, 16,45,46,153,86].

The label of the above-mentioned corresponding name body classification of each word is the raw information of the data set marked, and the present invention will Each label is converted to specified digital classification.Such as: " B-PER " that original labeled data is concentrated indicates the beginning of name, " I-PER " indicates the middle word of name, " E-PER ", indicates name ending character, they can be established one-to-one relationship, " The corresponding number 1 of B-PER ", " I-PER " corresponding number 2, " E-PER " corresponding number 3, thus the character string that initial data is concentrated The label of type, which is converted to, can handle digital information.

In an alternate embodiment of the present invention where, each member in the dictionary based on word that above-mentioned steps B constructs step A The corresponding ID number of element carries out vector conversion process, specifically by Google's word2vec tool by each ID be converted to one to Amount, is expressed as [batch_size*max_len*emb_size], and wherein emb_size indicates the vector dimension of setting.

In an alternate embodiment of the present invention where, above-mentioned steps C passes through the word vector after step B conversion Self-attention layers of restricted are combined, and obtain the word information that word is combined into, specially setting window size, By each word in the window size region of center word compared with center word carries out correlation, the phase of each word with center word is calculated Closing property value；Then combination is weighted according to the correlation of word each in region and center word, calculates center word vector, obtain by The potential word information that word is combined into.

The word vector that the present invention converts step B passes through restricted Self-attention layers, allows Neural Network Science Practise the word information potentially combined by word.Here restricted indicates limitation window (window_size) size, i.e., will not Be Self-attention in entire sentence sequence, only the region of near itself word a fixed window size into Row self-attention processing.

The formula that the present invention calculates the relevance values of each word and center word is specially f (x_i, q) and=w^Tσ(W⁽¹⁾x_i+W⁽²⁾q)

Wherein, q indicates center word, x_iIt is i-th of word of window-size window size near the word of center, it is a total of Window_size word, i.e. i are from 1 to window-size.Correlation is done with neighbouring each word and center word to compare, and is realized Mode be by center word after linear transformation and after linear transformation near a word carry out addition of vectors.Using σ activation primitive, σ activation primitive can be sigmoid function or tanh function, multiplied by w^T；w^TIt is an emb_size dimension Vector, so product is a value, and this value be exactly near this word of x_i and center word q correlation；W⁽¹⁾It indicates x_iNeed by full articulamentum correspond to weight matrix, W⁽²⁾Indicate that q needs the full articulamentum passed through to correspond to weight matrix.

All words of neighbouring window are all done correlation with center word and compared by the present invention, obtain each nearby word and center word Relevance scores f (x_i, q), and it is stitched together they are all, by a softmax activation primitive by respective correlation Property score is converted to corresponding probability, by the probability and corresponding center word multiplication of vectors after conversion, then by results added, obtains The vector of center word indicates that the combining form of this multiplied by weight by word vector can allow neural network discovery to pass through word group The potential word information of synthesis.

The formula of calculating center word vector is to be specially

P (z | x, q)=soft max (a)

Wherein, a indicates each x_iVector after the relevance values splicing calculated with q, as list；P (z | x, q) table Show all corresponding x_iWith the output valve of the relevance values of q after softmax function, softmax function can be by corresponding sequence Each value of column is converted to the corresponding weight of each value, all x of entire sequence_iThe sum of weight be equal to 1, (p (z=i | x, q) Indicate x_iWith the weighted value of the relevance values of q in entire sequence, z indicates that center word nearby owns in window-size window x_iAbout the weight value set of q, x indicates to include x_iN word sequence, x={ x1, x2, x3 ... xi ... xn }, s indicate weighting Combined calculated value, i.e., by each x_iCorresponding weight and x_iProduct is carried out, the value that then all adductions are got up.

The present invention is based on words to carry out Chinese name Entity recognition, has good effect to processing unregistered word, and use Self-attention technology has excavated the word information being potentially combined by word, improves the ability to express of model, and The method that the present invention nearby limits window size using center word, is not self-attention in entire sequence, thus Improve the precision of prediction.Because the information of most of center word is only information-related with the word of neighbouring window size, and this hair The bright attention technology used can excavate the power of word related with center word by epineural network in window size automatically The heavy and high center word correlation big weighted value low with center word correlation of weighted value is low.

In an alternate embodiment of the present invention where, the word vector that above-mentioned steps D obtains step C passes through first layer Bi- LSTM is handled, and obtains the hidden layer output of each time step of both direction, obtained output is spliced and combined, and Train language model is exported using the hidden layer of each time step.

For in short, " I Love You ", the hidden layer after first layer Bi-LSTM exports before can predicting this word Word afterwards can allow the output of the hidden layer of Bi-LSTM to be more in line with text language model in this way.Specifically computation model is

Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer of the t moment of reversed LSTM Output, x_tIndicate the input of t moment, i.e. a word vector；d_tIt indicatesWithSplicing result, i.e., by positive LSTM and anti- Be stitched together to the output of the t moment hidden layer of LSTM as a result, d_tDimension beWithDimension and,Indicate positive hidden Hide layer output by activation primitive tanh's as a result, i.e. positive hidden layer output multiplied by the result after a weight matrix using The final result of the activation primitive of one tanh,Indicate the output of reversed hidden layer by activation primitive tanh's as a result, i.e. instead To hidden layer output multiplied by the result after a weight matrix using the final result of the activation primitive of a tanh,Table ShowThe full articulamentum passed through corresponds to weight matrix,It indicatesThe full articulamentum passed through corresponds to weight matrix. For The output of Bi-LSTM hidden layer is after linear change, using the output of tanh activation primitive, can directly use in this way To predict the word information of lower front and back.It is using the computation model that the hidden layer of each time step exports train language model

Wherein,It indicates to use in t momentCome predict next moment word probability vector, it is every in vector A element is all a probability, all elements probability of entire vector and be 1,It indicates to use in t momentCome pre- The probability vector of the word at next moment is surveyed, each element is a probability, all elements probability of entire vector in vector Be 1,It indicatesThe full articulamentum passed through corresponds to weight matrix,It indicatesThe full articulamentum passed through corresponds to weight square Battle array,The loss function for indicating forward direction LSTM train language model, using Maximum-likelihood estimation mode,Indicate reversed LSTM The loss function of train language model, using Maximum-likelihood estimation mode, T indicates total time step of sequence, the i.e. length of sequence Degree.

It calculates Value, the two be worth it is smaller illustrate study language model it is better, respectively by calculate prediction Cross entropy between word ID and true front and back word ID, and utilize the positive LSTM label and reversed LSTM label meter of step A building Loss is calculated, language model is optimized.

In an alternate embodiment of the present invention where, the term vector that above-mentioned steps E obtains step D passes through second layer Bi- LSTM is handled, specially the input by the output of first layer Bi-LSTM as second layer Bi-LSTM, use and first layer The identical calculating graph model of Bi-LSTM is handled, but study of the second layer Bi-LSTM without language model, but will be hidden Tag Estimation training directly is carried out with the method for condition random field after hiding layer output splicing, is specially obtained using step D Output, is input to second layer Bi-LSTM, then obtain the spliced output of second layer Bi-LSTM, then in a condition random field The inside calculates the maximal possibility estimation of sequence label, and utilizes the label meter of the corresponding name body classification of each word of step A building The loss of condition random field is calculated, the loss of condition random field is finally added to losing for the language model of second layer Bi-LSTM The loss of the entire neural network arrived.

The present invention has used the study of language model after first layer Bi-LSTM, in the costing bio disturbance that neural network is final In not only added the loss of Tag Estimation, also add the loss of language model, form a multi-task learning, thus may be used So that the output that first layer Bi-LSTM is generated is more in line with characteristic of speech sounds, the input for allowing second layer Bi-LSTM to receive has sequence Columnization, to improve recognition effect.

In an alternate embodiment of the present invention where, above-mentioned steps F will mark the Chinese name Entity recognition of label Data set carries out random alignment processing, and carries out taking turns training to neural network using Adam optimization method circulation step A-E, one more Wheel indicates that training is primary on entire data set, to constantly reduce the loss of entire neural network, Optimal Parameters.

In an alternate embodiment of the present invention where, above-mentioned steps G utilizes neural network after the completion of neural metwork training Text data to be identified is handled, neural network can return to transition probability matrix and the generation of condition random field generation Logit value.The method for wherein generating logit value is after second layer Bi-LSTM bidirectional output is spliced, and dimension is [batch_ Size*max_len*hidden_size], wherein hidden_size is to define LSTM to be the hyper parameter of setting, then carried out Full articulamentum converts the other number target_num of tag class for finally needing to predict for last one-dimensional hidden_size, most Whole logit dimension is [batch_size*max_len*target_num], and obtaining shifting probability matrix and logit value can lead to It crosses viterbi algorithm to be decoded, marks the final classification of each word.

The word of Chinese text is converted to vector input by the word2vec that the present invention is increased income using Google and pre-training is good, is led to It crosses restricted Self-attention to be combined the character information of input, neural network discovery is allowed to combine by word At potential word information, then by the Bi-LSTM of first layer, two obtained unidirectional outputs are spliced and combined, this In the present invention be added to an additional learning tasks, i.e., a language model is learnt by the input of first layer Bi-LSTM, made The output of first layer the characteristics of being more in line with language, then second layer Bi- is regarded into the spliced output of first layer Bi-LSTM The input of LSTM, the LSTM output for then obtaining both direction are spliced again, using a full articulamentum, pass through condition random field The method of CRF carries out prediction label.

The present invention current Chinese name entity recognition method there are aiming at the problem that, and combine nerual network technique, pass through Only the method for feature input of the building based on word, Direct Recognition go out the physical name in Chinese text；This method is based only on word Feature does not need to carry out the manual features such as participle and other parts of speech, syntax, is convenient for the robustness and robustness of method in this way； And the present invention shows unregistered word good, and function admirable, can be very good to improve, Downstream Jobs (such as: information retrieval and Keyword recognition) performance.

Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field Those of ordinary skill disclosed the technical disclosures can make according to the present invention and various not depart from the other each of essence of the invention The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.

Claims

1. a kind of Chinese name entity recognition method learnt based on attention mechanism and language model, which is characterized in that including Following steps:

C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained in each The word vector of the heart word interior weighted array with other word vectors of window ranges nearby, excavation center word is nearby in window ranges Potential word information；

D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time step of both direction Hidden layer output, obtained output is spliced and combined, and exports train language model using the hidden layer of each time step；

E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtains the secondary output spliced and combined, And Tag Estimation training is carried out using maximum matching method；

F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and uses the optimization side Adam Method circulation step A-E carries out more wheel training to neural network；

2. the Chinese name entity recognition method learnt as described in claim 1 based on attention mechanism and language model, It being characterized in that, the step A obtains the data set for having marked the Chinese name Entity recognition of label, the dictionary based on word is constructed, Specially

To each of the data set of Chinese name Entity recognition for having marked label word and symbol one ID number of distribution, and The beginning and end of sentence adds mark respectively.

3. the Chinese name entity recognition method learnt as claimed in claim 2 based on attention mechanism and language model, It is characterized in that, the step A further includes building sequence label；The sequence label includes positive language model label, reversed language Say the label of model label and the corresponding name body classification of each word.

4. the Chinese name entity recognition method learnt as claimed in claim 3 based on attention mechanism and language model, It is characterized in that, the word vector after the step C converts step B carries out group by restricted Self-attention layers It closes, obtains the word vector of each center word interior weighted array with other word vectors of window ranges nearby, excavate center word and exist Potential word information in neighbouring window ranges, specially

Window size is set, by each word in the window size region of center word compared with center word carries out correlation, is calculated The relevance values of each word and center word；Then combination is weighted according to the correlation of word each in region and center word, counted Calculation center word vector obtains the potential word information being combined by word.

5. the Chinese name entity recognition method learnt as claimed in claim 4 based on attention mechanism and language model, It is characterized in that, the formula of the relevance values for calculating each word and center word is to be specially

f(x_i, q) and=ω^Tσ(W⁽¹⁾x_i+W⁽²⁾q)

Wherein, q indicates center word, x_iI-th of word of window-size window size, W near the word of expression center⁽¹⁾Indicate x_iIt needs The full articulamentum to be passed through corresponds to weight matrix, W⁽²⁾Indicate that q needs the full articulamentum passed through to correspond to weight matrix, w^TIndicate phase Closing property vector.

6. special such as the Chinese name entity recognition method learnt based on attention mechanism and language model that claim 5 is stated Sign is that the formula of the calculating center word vector is to be specially

P (z | x, q)=softmax (a)

Wherein, a indicates each x_iVector after the relevance values splicing calculated with q, and p (z | x, q) indicate all corresponding x_iWith Output valve of the relevance values of q after softmax function, and p (z=i | x, q) indicate x_iRelevance values with q are in entire sequence On weighted value, z indicates center word all x in window-size window nearby_iAbout the weight value set of q, x indicates to include x_i Sequence, s indicate weighted array calculated value.

7. the Chinese name entity recognition method of claim 6 learnt based on attention mechanism and language model, feature are existed In the word vector that step C is obtained is handled by first layer Bi-LSTM in the step D, obtains each of both direction The hidden layer of time step exports, and is by the computation model that obtained output splices and combines

Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer output of the t moment of reversed LSTM, d_tIt indicatesWithSplicing result,Indicate positive hidden layer output by activation primitive tanh's as a result,Indicate reversed Hidden layer output by activation primitive tanh's as a result,It indicatesThe full articulamentum passed through corresponds to weight matrix,It indicates The full articulamentum passed through corresponds to weight matrix.

8. the Chinese name entity recognition method of claim 7 learnt based on attention mechanism and language model, feature are existed In the computation model of the hidden layer output train language model in the step D using each time step is

Wherein,It indicates to use in t momentCome predict next moment word probability vector,Table Show and is used in t momentCome predict next moment word probability vector,It indicatesThe full articulamentum passed through corresponds to weight square Battle array,It indicatesThe full articulamentum passed through corresponds to weight matrix,Indicate the loss function of forward direction LSTM train language model,Indicate that the loss function of reversed LSTM train language model, T indicate total time step of sequence.

9. the Chinese name entity recognition method of claim 8 learnt based on attention mechanism and language model, feature are existed In the step D further includes the cross entropy between the word ID predicted respectively by calculating and true front and back word ID, and utilizes step The positive LSTM label and reversed LSTM label of A building calculate loss, optimize to language model.

10. the Chinese name entity recognition method of claim 9 learnt based on attention mechanism and language model, feature are existed In carrying out Tag Estimation training using maximum matching method in the step E is specially

The output obtained using step D is input to second layer Bi-LSTM, then obtains the spliced output of second layer Bi-LSTM, Then the maximal possibility estimation of sequence label is calculated inside a condition random field, and utilizes each word pair of step A building The loss of the label design conditions random field of body classification should be named, the loss of condition random field is finally added into second layer Bi- The loss for the entire neural network that the loss of the language model of LSTM obtains.