CN109657239A - The Chinese name entity recognition method learnt based on attention mechanism and language model - Google Patents

The Chinese name entity recognition method learnt based on attention mechanism and language model Download PDF

Info

Publication number
CN109657239A
CN109657239A CN201811517779.9A CN201811517779A CN109657239A CN 109657239 A CN109657239 A CN 109657239A CN 201811517779 A CN201811517779 A CN 201811517779A CN 109657239 A CN109657239 A CN 109657239A
Authority
CN
China
Prior art keywords
word
language model
lstm
entity recognition
name entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811517779.9A
Other languages
Chinese (zh)
Other versions
CN109657239B (en
Inventor
廖伟智
马攀
王宇
阴艳超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201811517779.9A priority Critical patent/CN109657239B/en
Publication of CN109657239A publication Critical patent/CN109657239A/en
Application granted granted Critical
Publication of CN109657239B publication Critical patent/CN109657239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The invention discloses a kind of Chinese name entity recognition methods learnt based on attention mechanism and language model, this method includes constructing the dictionary based on word, vector conversion is carried out to the corresponding ID number of each element, it is combined by Self-attention layers of restricted, splice and combine simultaneously train language model by first layer Bi-LSTM, it is spliced and combined by second layer Bi-LSTM and maximum matching method is used to carry out Tag Estimation training, data set is subjected to random alignment and more wheel training are carried out using Adam optimization method, Chinese name Entity recognition is carried out to text data to be identified using neural network.The present invention is based only on the feature of word, does not need to carry out the manual features such as participle and other parts of speech, syntax, improves the robustness and robustness of method;And the present invention shows well unregistered word, and function admirable, can be very good the performance for improving Downstream Jobs.

Description

The Chinese name entity recognition method learnt based on attention mechanism and language model
Technical field
The invention belongs to entity recognition techniques fields, and in particular to a kind of to be learnt based on attention mechanism and language model Chinese name entity recognition method.
Background technique
Chinese name Entity recognition problem is one of most common problem of natural language processing field, main task be to Word or word in non-structured text is tagged, convenient for extracting the effective information in text.
Chinese name body identification mission is identified to the entity in Chinese text data, and effective letter in text is extracted Breath, specifically, the object of identification is Chinese text data, such as a word or a Duan Wenzhang;The requirement of identification is to mark this section Entity name in text, such as name, place name, mechanism name, appellation.
The method for carrying out Chinese name Entity recognition at present is broadly divided into three classes:
1. rule-based unsupervised approaches:
The form of expression mainly according to name entity to be identified in linguistics, is artificially arranged some rules and goes to match The syntactic structure of sentence, mark name entity.Rule-based method mostly uses linguistic expertise construction rule template by hand, Selecting feature includes statistical information, punctuation mark, keyword, deictic words and direction word, position word (such as tail word), centre word side Method, matches in mode with character string for main means, and this kind of system depends on the foundation of knowledge base and dictionary mostly.This kind of side The effect of method is largely dependent upon the level of the linguistic expertise of setting rule, and different fields is needed to be arranged Different rules are deacclimatized, so comparing elapsed time and manpower.
2. the method based on probability statistics:
Statistical machine learning method will name Entity recognition as sequence labelling task from the point of view of, pass through large-scale corpus It practises to obtain language model, to realize the mark to each position of sentence.The common model of such methods has: production model Hidden Markov Model (HMM) and discriminative model condition random field (CRF) etc..
3. method neural network based:
With the development of the fast development of neural network, especially Recognition with Recurrent Neural Network, to processing sequence task bring Great performance improves, in addition the development of term vector, makes Processing with Neural Network text data become a kind of approach.In addition nerve net The powerful ability in feature extraction of network, should not supernumerary's work feature can reach good performance.Problem is marked in processing sequence More outstanding is exactly LSTM+CRF model, achieves good effect in English name Entity recognition, but due in The relationship of literary characteristic of speech sounds, this set of model showed in Chinese name Entity recognition task without so outstanding.So Chinese It is being carried out always based on word or word-based or research based on word and word composite character, and there have paper to point out to be special based on word The expression effect of the name Entity recognition task of sign is better than the effect of word-based feature, and the method based on word feature for The performance of unregistered word is better than the method for word-based feature.
Prior art defect:
1. rule-based unsupervised approaches need linguistic expertise rule is arranged, and for different fields, very It to different text language styles requires that different rules is arranged, expansible row is very low.And if the rule of setting is stringent If, it is very big for the omission of effective information.If too loose, the poor effect of identification.
2. mainly being learnt by large-scale corpus based on the method for probability statistics to obtain language model, main models have Hidden Markov Model, maximum entropy model and condition random field, such methods become dependent upon the quality of corpus, and certain Showed on sample it is very poor, cause it is extensive not enough, recall rate is relatively low, and expression effect is not fine.
3. the neural network method of current mainstream utilizes Recognition with Recurrent Neural Network (RNN) or other Recognition with Recurrent Neural Network Name Entity recognition problem is converted to sequence labelling problem by mutation, such as LSTM, GRU.Such issues that first pass through participle after Sequence labelling is carried out again, but the effect of this method is largely influenced by participle effect, and for being not logged in The expression effect of word is not fine;There are also researchs to improve neural network model performance to be simple, manually increases various features (such as Part of speech feature, grammar property etc.) into the input of neural network, such methods are although improve to effect, but people The increased feature of work, increases the workload of people, complicates method, this does not meet the sheet that neural network automatically extracts feature yet Matter feature.
Summary of the invention
Goal of the invention of the invention is: in order to overcome defect existing for existing Chinese name entity recognition method, improving and knows Other effect, the invention proposes a kind of Chinese name entity recognition methods learnt based on attention mechanism and language model.
The technical scheme is that a kind of Chinese name Entity recognition learnt based on attention mechanism and language model Method, comprising the following steps:
A, the data set for having marked the Chinese name Entity recognition of label is obtained, the dictionary based on word is constructed;
B, vector conversion process is carried out to the corresponding ID number of element each in the dictionary based on word of step A building;
C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained every The word vector of a center word interior weighted array with other word vectors of window ranges nearby, excavates center word window model nearby Enclose interior potential word information;
D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time of both direction The hidden layer of step exports, and obtained output is spliced and combined, and exports training language using the hidden layer of each time step Model;
E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtain it is secondary splice and combine it is defeated Out, and using maximum matching method Tag Estimation training is carried out;
F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and excellent using Adam Change method circulation step A-E carries out more wheel training to neural network;
G, text data to be identified is handled using neural network, completes Chinese name Entity recognition.
Further, the step A obtains the data set for having marked the Chinese name Entity recognition of label, and building is based on word Dictionary, specially
An ID number is distributed to each of the data set of Chinese name Entity recognition for having marked label word and symbol, And mark is added respectively in the beginning and end of sentence.
Further, the step A further includes building sequence label;The sequence label includes positive language model mark The label of the corresponding name body classification of label, reversed language model label and each word.
Further, the word vector after step B conversion is passed through restricted Self-attention by the step C Layer is combined, and obtains the word information that word is combined into, specially
Window size is set, by each word in the window size region of center word compared with center word carries out correlation, Calculate the relevance values of each word Yu center word;Then group is weighted according to the correlation of word each in region and center word It closes, calculates center word vector, obtain the potential word information being combined by word.
Further, the formula of the relevance values for calculating each word and center word is specially f (xi, q) and=ωTσ (W(1)xi+W(2)q)
Wherein, q indicates center word, xiI-th of word of window-size window size, W near the word of expression center(1)It indicates xiNeed by full articulamentum correspond to weight matrix, W(2)Indicate that q needs the full articulamentum passed through to correspond to weight matrix, wTIt indicates Correlation vector.
Further, the formula of the calculating center word vector is to be specially
P (z | x, q)=soft max (a)
Wherein, a indicates each xiVector after the relevance values splicing calculated with q, and p (z | x, q) indicate all right Answer xiWith the output valve of the relevance values of q after softmax function, p (z=i | x, q) indicates xiRelevance values with q are whole Weighted value in a sequence, z indicate all x in the neighbouring window-size window of center wordiAbout the weight value set of q, x table Show comprising xiSequence, s indicate weighted array calculated value.
Further, the word vector that step C is obtained is handled in the step D by first layer Bi-LSTM, is obtained The hidden layer of each time step of both direction exports, and is by the computation model that obtained output splices and combines
Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer of the t moment of reversed LSTM Output, dtIt indicatesWithSplicing result,Indicate positive hidden layer output by activation primitive tanh's as a result,It indicates Reversed hidden layer output by activation primitive tanh's as a result,It indicatesThe full articulamentum passed through corresponds to weight matrix,Table ShowThe full articulamentum passed through corresponds to weight matrix.
Further, the computation model of the hidden layer output train language model of each time step is utilized in the step D For
Wherein,It indicates to use in t momentCome predict next moment word probability vector,It indicates to use in t momentCome predict next moment word probability vector,It indicatesThe full connection passed through The corresponding weight matrix of layer,It indicatesThe full articulamentum passed through corresponds to weight matrix,Indicate forward direction LSTM train language model Loss function,Indicate that the loss function of reversed LSTM train language model, T indicate total time step of sequence.
Further, the step D further includes respectively by the word ID for calculating prediction and the really friendship between the word ID of front and back Entropy is pitched, and calculates loss using the positive LSTM label and reversed LSTM label of step A building, language model is optimized.
Further, carrying out Tag Estimation training using maximum matching method in the step E is specially
The output obtained using step D is input to second layer Bi-LSTM, then to obtain second layer Bi-LSTM spliced defeated Out, the maximal possibility estimation of sequence label is then calculated inside a condition random field, and utilizes each word of step A building The loss of the label design conditions random field of corresponding name body classification, finally adds second layer Bi- for the loss of condition random field The loss for the entire neural network that the loss of the language model of LSTM obtains.
The beneficial effects of the present invention are: the present invention passes through restricted by dictionary of the building based on word respectively Self-attention layers, first layer Bi-LSTM and second layer Bi-LSTM are handled, and are carried out in first layer Bi-LSTM Language model study carries out Tag Estimation training finally by maximum matching method;The present invention is based only on the feature of word, is not required to The manual features such as participle and other parts of speech, syntax are carried out, the robustness and robustness of method are improved;And the present invention couple Unregistered word performance is good, and function admirable, can be very good the performance for improving Downstream Jobs.
Detailed description of the invention
Fig. 1 is the flow diagram of Chinese name entity recognition method of the invention;
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.
As shown in Figure 1, for the flow diagram of Chinese name entity recognition method of the invention.One kind being based on attention machine The Chinese name entity recognition method of system and language model study, comprising the following steps:
A, the data set for having marked the Chinese name Entity recognition of label is obtained, the dictionary based on word is constructed;
B, vector conversion process is carried out to the corresponding ID number of element each in the dictionary based on word of step A building;
C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained every The word vector of a center word interior weighted array with other word vectors of window ranges nearby, excavates center word window model nearby Enclose interior potential word information;
D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time of both direction The hidden layer of step exports, and obtained output is spliced and combined, and exports training language using the hidden layer of each time step Model;
E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtain it is secondary splice and combine it is defeated Out, and using maximum matching method Tag Estimation training is carried out;
F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and excellent using Adam Change method circulation step A-E carries out more wheel training to neural network;
G, text data to be identified is handled using neural network, completes Chinese name Entity recognition.
Present invention combination attention mechanism (restricted self-attention) technology, and analyze current main-stream The shortcomings that LSTM+CRF model, devises two layers of two-way LSTM+CRF structure, and language is added in the output of the two-way LSTM of first layer It says model learning, improves the ability to express of model.
Above-mentioned steps A-C is the first stage of Chinese name Entity recognition, i.e., being converted to Chinese text data can locate The digital vectors of reason reuse restricted Self-attention technology and excavate potential word information;Above-mentioned steps D is The second stage of Chinese name Entity recognition, i.e., handled, and learn language model by first layer Bi-LSTM;Above-mentioned step Rapid E-G is the phase III of Chinese name Entity recognition, i.e., is predicted by second layer Bi-LSTM+CRF.
In an alternate embodiment of the present invention where, above-mentioned steps A obtains the Chinese name Entity recognition for having marked label Data set, construct the dictionary based on word, specifically: to marked label Chinese name Entity recognition data set in it is every One word and symbol distribute an ID number, and add mark respectively in the beginning and end of sentence.
The above-mentioned dictionary based on word includes all words and symbol in data set, and add identifier<UNK>,<PAD>and It represents sentence and starts ending<s>,</S>deng mark, an ID number can be distributed to each element in dictionary;For convenient for after Continuous mass processing, the present invention set the ID of<PAD>as 0, and other elements just distribute ID in order, so as to obtain about number According to the dictionary of collection, each element in dictionary has one-to-one ID number therewith.Using the dictionary made can will in Literary text data is converted into the digital information that can be inputted to neural network.Such as a Chinese text, " I likes Beijing Tian An Door.", one searches each word and the corresponding ID number of symbol in dictionary, and the ID of " I " is 1, and the ID of " love " is 4, and the ID in " north " is 16 ..., "." corresponding ID is 10, the ID sequence [Isosorbide-5-Nitrae, 16,45,46,153,86,10] after conversion.
If the dimension of input is [batch_size*max_len], here batch_ using the method for batch training Size indicates that the sample number of batch input, max_len indicate a longest sample length in batch sample, longest that other are insufficient ' 0 ' come polishings of sample.For example the number of samples of a batch batch is 2, carrys out polishing according to longest sequence.
" I am Chinese."
" I takes pride in very much."
Batch ID after converting, will into the matrix [[1,8,87,89,62,10], [1,54,465,875,10,0]] of bidimensional Insufficient longest sentence sequence is with O come polishing.
Above-mentioned steps A further includes building sequence label;The sequence label includes positive language model label, reversed language The label of model label and the corresponding name body classification of each word.
Above-mentioned forward direction LSTM label is used for positive language model study, and the method for building forward direction LSTM label is to delete one First ID of sequence samples (such as: in short), in sequence finally with "<S>" the corresponding ID that indicates to end up, for example, " I likes north Capital Tian An-men.", remove first character " I ", finally fill in sequence "<S>", "<S>" corresponding ID is in dictionary here " 489 ", the ID label after conversion are [4,16,45,46,153,86,10,489].
The construction method of above-mentioned reversed LSTM label is the last character for removing sequence, and fills " < S to sequence beginning > ", such as " I loves Beijing Tian An-men.", remove ".", "<S>" is filled to beginning, "<S>" corresponding ID is 789, after conversion As a result [789, Isosorbide-5-Nitrae, 16,45,46,153,86].
The label of the above-mentioned corresponding name body classification of each word is the raw information of the data set marked, and the present invention will Each label is converted to specified digital classification.Such as: " B-PER " that original labeled data is concentrated indicates the beginning of name, " I-PER " indicates the middle word of name, " E-PER ", indicates name ending character, they can be established one-to-one relationship, " The corresponding number 1 of B-PER ", " I-PER " corresponding number 2, " E-PER " corresponding number 3, thus the character string that initial data is concentrated The label of type, which is converted to, can handle digital information.
In an alternate embodiment of the present invention where, each member in the dictionary based on word that above-mentioned steps B constructs step A The corresponding ID number of element carries out vector conversion process, specifically by Google's word2vec tool by each ID be converted to one to Amount, is expressed as [batch_size*max_len*emb_size], and wherein emb_size indicates the vector dimension of setting.
In an alternate embodiment of the present invention where, above-mentioned steps C passes through the word vector after step B conversion Self-attention layers of restricted are combined, and obtain the word information that word is combined into, specially setting window size, By each word in the window size region of center word compared with center word carries out correlation, the phase of each word with center word is calculated Closing property value;Then combination is weighted according to the correlation of word each in region and center word, calculates center word vector, obtain by The potential word information that word is combined into.
The word vector that the present invention converts step B passes through restricted Self-attention layers, allows Neural Network Science Practise the word information potentially combined by word.Here restricted indicates limitation window (window_size) size, i.e., will not Be Self-attention in entire sentence sequence, only the region of near itself word a fixed window size into Row self-attention processing.
The formula that the present invention calculates the relevance values of each word and center word is specially f (xi, q) and=wTσ(W(1)xi+W(2)q)
Wherein, q indicates center word, xiIt is i-th of word of window-size window size near the word of center, it is a total of Window_size word, i.e. i are from 1 to window-size.Correlation is done with neighbouring each word and center word to compare, and is realized Mode be by center word after linear transformation and after linear transformation near a word carry out addition of vectors.Using σ activation primitive, σ activation primitive can be sigmoid function or tanh function, multiplied by wT;wTIt is an emb_size dimension Vector, so product is a value, and this value be exactly near this word of x_i and center word q correlation;W(1)It indicates xiNeed by full articulamentum correspond to weight matrix, W(2)Indicate that q needs the full articulamentum passed through to correspond to weight matrix.
All words of neighbouring window are all done correlation with center word and compared by the present invention, obtain each nearby word and center word Relevance scores f (xi, q), and it is stitched together they are all, by a softmax activation primitive by respective correlation Property score is converted to corresponding probability, by the probability and corresponding center word multiplication of vectors after conversion, then by results added, obtains The vector of center word indicates that the combining form of this multiplied by weight by word vector can allow neural network discovery to pass through word group The potential word information of synthesis.
The formula of calculating center word vector is to be specially
P (z | x, q)=soft max (a)
Wherein, a indicates each xiVector after the relevance values splicing calculated with q, as list;P (z | x, q) table Show all corresponding xiWith the output valve of the relevance values of q after softmax function, softmax function can be by corresponding sequence Each value of column is converted to the corresponding weight of each value, all x of entire sequenceiThe sum of weight be equal to 1, (p (z=i | x, q) Indicate xiWith the weighted value of the relevance values of q in entire sequence, z indicates that center word nearby owns in window-size window xiAbout the weight value set of q, x indicates to include xiN word sequence, x={ x1, x2, x3 ... xi ... xn }, s indicate weighting Combined calculated value, i.e., by each xiCorresponding weight and xiProduct is carried out, the value that then all adductions are got up.
The present invention is based on words to carry out Chinese name Entity recognition, has good effect to processing unregistered word, and use Self-attention technology has excavated the word information being potentially combined by word, improves the ability to express of model, and The method that the present invention nearby limits window size using center word, is not self-attention in entire sequence, thus Improve the precision of prediction.Because the information of most of center word is only information-related with the word of neighbouring window size, and this hair The bright attention technology used can excavate the power of word related with center word by epineural network in window size automatically The heavy and high center word correlation big weighted value low with center word correlation of weighted value is low.
In an alternate embodiment of the present invention where, the word vector that above-mentioned steps D obtains step C passes through first layer Bi- LSTM is handled, and obtains the hidden layer output of each time step of both direction, obtained output is spliced and combined, and Train language model is exported using the hidden layer of each time step.
For in short, " I Love You ", the hidden layer after first layer Bi-LSTM exports before can predicting this word Word afterwards can allow the output of the hidden layer of Bi-LSTM to be more in line with text language model in this way.Specifically computation model is
Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer of the t moment of reversed LSTM Output, xtIndicate the input of t moment, i.e. a word vector;dtIt indicatesWithSplicing result, i.e., by positive LSTM and anti- Be stitched together to the output of the t moment hidden layer of LSTM as a result, dtDimension beWithDimension and,Indicate positive hidden Hide layer output by activation primitive tanh's as a result, i.e. positive hidden layer output multiplied by the result after a weight matrix using The final result of the activation primitive of one tanh,Indicate the output of reversed hidden layer by activation primitive tanh's as a result, i.e. instead To hidden layer output multiplied by the result after a weight matrix using the final result of the activation primitive of a tanh,Table ShowThe full articulamentum passed through corresponds to weight matrix,It indicatesThe full articulamentum passed through corresponds to weight matrix. For The output of Bi-LSTM hidden layer is after linear change, using the output of tanh activation primitive, can directly use in this way To predict the word information of lower front and back.It is using the computation model that the hidden layer of each time step exports train language model
Wherein,It indicates to use in t momentCome predict next moment word probability vector, it is every in vector A element is all a probability, all elements probability of entire vector and be 1,It indicates to use in t momentCome pre- The probability vector of the word at next moment is surveyed, each element is a probability, all elements probability of entire vector in vector Be 1,It indicatesThe full articulamentum passed through corresponds to weight matrix,It indicatesThe full articulamentum passed through corresponds to weight square Battle array,The loss function for indicating forward direction LSTM train language model, using Maximum-likelihood estimation mode,Indicate reversed LSTM The loss function of train language model, using Maximum-likelihood estimation mode, T indicates total time step of sequence, the i.e. length of sequence Degree.
It calculates Value, the two be worth it is smaller illustrate study language model it is better, respectively by calculate prediction Cross entropy between word ID and true front and back word ID, and utilize the positive LSTM label and reversed LSTM label meter of step A building Loss is calculated, language model is optimized.
In an alternate embodiment of the present invention where, the term vector that above-mentioned steps E obtains step D passes through second layer Bi- LSTM is handled, specially the input by the output of first layer Bi-LSTM as second layer Bi-LSTM, use and first layer The identical calculating graph model of Bi-LSTM is handled, but study of the second layer Bi-LSTM without language model, but will be hidden Tag Estimation training directly is carried out with the method for condition random field after hiding layer output splicing, is specially obtained using step D Output, is input to second layer Bi-LSTM, then obtain the spliced output of second layer Bi-LSTM, then in a condition random field The inside calculates the maximal possibility estimation of sequence label, and utilizes the label meter of the corresponding name body classification of each word of step A building The loss of condition random field is calculated, the loss of condition random field is finally added to losing for the language model of second layer Bi-LSTM The loss of the entire neural network arrived.
The present invention has used the study of language model after first layer Bi-LSTM, in the costing bio disturbance that neural network is final In not only added the loss of Tag Estimation, also add the loss of language model, form a multi-task learning, thus may be used So that the output that first layer Bi-LSTM is generated is more in line with characteristic of speech sounds, the input for allowing second layer Bi-LSTM to receive has sequence Columnization, to improve recognition effect.
In an alternate embodiment of the present invention where, above-mentioned steps F will mark the Chinese name Entity recognition of label Data set carries out random alignment processing, and carries out taking turns training to neural network using Adam optimization method circulation step A-E, one more Wheel indicates that training is primary on entire data set, to constantly reduce the loss of entire neural network, Optimal Parameters.
In an alternate embodiment of the present invention where, above-mentioned steps G utilizes neural network after the completion of neural metwork training Text data to be identified is handled, neural network can return to transition probability matrix and the generation of condition random field generation Logit value.The method for wherein generating logit value is after second layer Bi-LSTM bidirectional output is spliced, and dimension is [batch_ Size*max_len*hidden_size], wherein hidden_size is to define LSTM to be the hyper parameter of setting, then carried out Full articulamentum converts the other number target_num of tag class for finally needing to predict for last one-dimensional hidden_size, most Whole logit dimension is [batch_size*max_len*target_num], and obtaining shifting probability matrix and logit value can lead to It crosses viterbi algorithm to be decoded, marks the final classification of each word.
The word of Chinese text is converted to vector input by the word2vec that the present invention is increased income using Google and pre-training is good, is led to It crosses restricted Self-attention to be combined the character information of input, neural network discovery is allowed to combine by word At potential word information, then by the Bi-LSTM of first layer, two obtained unidirectional outputs are spliced and combined, this In the present invention be added to an additional learning tasks, i.e., a language model is learnt by the input of first layer Bi-LSTM, made The output of first layer the characteristics of being more in line with language, then second layer Bi- is regarded into the spliced output of first layer Bi-LSTM The input of LSTM, the LSTM output for then obtaining both direction are spliced again, using a full articulamentum, pass through condition random field The method of CRF carries out prediction label.
The present invention current Chinese name entity recognition method there are aiming at the problem that, and combine nerual network technique, pass through Only the method for feature input of the building based on word, Direct Recognition go out the physical name in Chinese text;This method is based only on word Feature does not need to carry out the manual features such as participle and other parts of speech, syntax, is convenient for the robustness and robustness of method in this way; And the present invention shows unregistered word good, and function admirable, can be very good to improve, Downstream Jobs (such as: information retrieval and Keyword recognition) performance.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field Those of ordinary skill disclosed the technical disclosures can make according to the present invention and various not depart from the other each of essence of the invention The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.

Claims (10)

1. a kind of Chinese name entity recognition method learnt based on attention mechanism and language model, which is characterized in that including Following steps:
A, the data set for having marked the Chinese name Entity recognition of label is obtained, the dictionary based on word is constructed;
B, vector conversion process is carried out to the corresponding ID number of element each in the dictionary based on word of step A building;
C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained in each The word vector of the heart word interior weighted array with other word vectors of window ranges nearby, excavation center word is nearby in window ranges Potential word information;
D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time step of both direction Hidden layer output, obtained output is spliced and combined, and exports train language model using the hidden layer of each time step;
E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtains the secondary output spliced and combined, And Tag Estimation training is carried out using maximum matching method;
F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and uses the optimization side Adam Method circulation step A-E carries out more wheel training to neural network;
G, text data to be identified is handled using neural network, completes Chinese name Entity recognition.
2. the Chinese name entity recognition method learnt as described in claim 1 based on attention mechanism and language model, It being characterized in that, the step A obtains the data set for having marked the Chinese name Entity recognition of label, the dictionary based on word is constructed, Specially
To each of the data set of Chinese name Entity recognition for having marked label word and symbol one ID number of distribution, and The beginning and end of sentence adds mark respectively.
3. the Chinese name entity recognition method learnt as claimed in claim 2 based on attention mechanism and language model, It is characterized in that, the step A further includes building sequence label;The sequence label includes positive language model label, reversed language Say the label of model label and the corresponding name body classification of each word.
4. the Chinese name entity recognition method learnt as claimed in claim 3 based on attention mechanism and language model, It is characterized in that, the word vector after the step C converts step B carries out group by restricted Self-attention layers It closes, obtains the word vector of each center word interior weighted array with other word vectors of window ranges nearby, excavate center word and exist Potential word information in neighbouring window ranges, specially
Window size is set, by each word in the window size region of center word compared with center word carries out correlation, is calculated The relevance values of each word and center word;Then combination is weighted according to the correlation of word each in region and center word, counted Calculation center word vector obtains the potential word information being combined by word.
5. the Chinese name entity recognition method learnt as claimed in claim 4 based on attention mechanism and language model, It is characterized in that, the formula of the relevance values for calculating each word and center word is to be specially
f(xi, q) and=ωTσ(W(1)xi+W(2)q)
Wherein, q indicates center word, xiI-th of word of window-size window size, W near the word of expression center(1)Indicate xiIt needs The full articulamentum to be passed through corresponds to weight matrix, W(2)Indicate that q needs the full articulamentum passed through to correspond to weight matrix, wTIndicate phase Closing property vector.
6. special such as the Chinese name entity recognition method learnt based on attention mechanism and language model that claim 5 is stated Sign is that the formula of the calculating center word vector is to be specially
P (z | x, q)=softmax (a)
Wherein, a indicates each xiVector after the relevance values splicing calculated with q, and p (z | x, q) indicate all corresponding xiWith Output valve of the relevance values of q after softmax function, and p (z=i | x, q) indicate xiRelevance values with q are in entire sequence On weighted value, z indicates center word all x in window-size window nearbyiAbout the weight value set of q, x indicates to include xi Sequence, s indicate weighted array calculated value.
7. the Chinese name entity recognition method of claim 6 learnt based on attention mechanism and language model, feature are existed In the word vector that step C is obtained is handled by first layer Bi-LSTM in the step D, obtains each of both direction The hidden layer of time step exports, and is by the computation model that obtained output splices and combines
Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer output of the t moment of reversed LSTM, dtIt indicatesWithSplicing result,Indicate positive hidden layer output by activation primitive tanh's as a result,Indicate reversed Hidden layer output by activation primitive tanh's as a result,It indicatesThe full articulamentum passed through corresponds to weight matrix,It indicates The full articulamentum passed through corresponds to weight matrix.
8. the Chinese name entity recognition method of claim 7 learnt based on attention mechanism and language model, feature are existed In the computation model of the hidden layer output train language model in the step D using each time step is
Wherein,It indicates to use in t momentCome predict next moment word probability vector,Table Show and is used in t momentCome predict next moment word probability vector,It indicatesThe full articulamentum passed through corresponds to weight square Battle array,It indicatesThe full articulamentum passed through corresponds to weight matrix,Indicate the loss function of forward direction LSTM train language model,Indicate that the loss function of reversed LSTM train language model, T indicate total time step of sequence.
9. the Chinese name entity recognition method of claim 8 learnt based on attention mechanism and language model, feature are existed In the step D further includes the cross entropy between the word ID predicted respectively by calculating and true front and back word ID, and utilizes step The positive LSTM label and reversed LSTM label of A building calculate loss, optimize to language model.
10. the Chinese name entity recognition method of claim 9 learnt based on attention mechanism and language model, feature are existed In carrying out Tag Estimation training using maximum matching method in the step E is specially
The output obtained using step D is input to second layer Bi-LSTM, then obtains the spliced output of second layer Bi-LSTM, Then the maximal possibility estimation of sequence label is calculated inside a condition random field, and utilizes each word pair of step A building The loss of the label design conditions random field of body classification should be named, the loss of condition random field is finally added into second layer Bi- The loss for the entire neural network that the loss of the language model of LSTM obtains.
CN201811517779.9A 2018-12-12 2018-12-12 Chinese named entity recognition method based on attention mechanism and language model learning Active CN109657239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811517779.9A CN109657239B (en) 2018-12-12 2018-12-12 Chinese named entity recognition method based on attention mechanism and language model learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811517779.9A CN109657239B (en) 2018-12-12 2018-12-12 Chinese named entity recognition method based on attention mechanism and language model learning

Publications (2)

Publication Number Publication Date
CN109657239A true CN109657239A (en) 2019-04-19
CN109657239B CN109657239B (en) 2020-04-21

Family

ID=66113875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811517779.9A Active CN109657239B (en) 2018-12-12 2018-12-12 Chinese named entity recognition method based on attention mechanism and language model learning

Country Status (1)

Country Link
CN (1) CN109657239B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147551A (en) * 2019-05-14 2019-08-20 腾讯科技(深圳)有限公司 Multi-class entity recognition model training, entity recognition method, server and terminal
CN110298043A (en) * 2019-07-03 2019-10-01 吉林大学 A kind of vehicle name entity recognition method and system
CN110334189A (en) * 2019-07-11 2019-10-15 河南大学 Method is determined based on the long microblog topic label in short-term and from attention neural network
CN110598213A (en) * 2019-09-06 2019-12-20 腾讯科技(深圳)有限公司 Keyword extraction method, device, equipment and storage medium
CN110619124A (en) * 2019-09-19 2019-12-27 成都数之联科技有限公司 Named entity identification method and system combining attention mechanism and bidirectional LSTM
CN110705272A (en) * 2019-08-28 2020-01-17 昆明理工大学 Named entity identification method for automobile engine fault diagnosis
CN110826334A (en) * 2019-11-08 2020-02-21 中山大学 Chinese named entity recognition model based on reinforcement learning and training method thereof
CN110969020A (en) * 2019-11-21 2020-04-07 中国人民解放军国防科技大学 CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111160467A (en) * 2019-05-31 2020-05-15 北京理工大学 Image description method based on conditional random field and internal semantic attention
CN111209754A (en) * 2020-02-25 2020-05-29 桂林电子科技大学 Data set construction method for Vietnamese entity recognition
CN111222339A (en) * 2020-01-13 2020-06-02 华南理工大学 Medical consultation named entity identification method based on anti-multitask learning
CN111223489A (en) * 2019-12-20 2020-06-02 厦门快商通科技股份有限公司 Specific keyword identification method and system based on Attention mechanism
CN111274829A (en) * 2020-02-07 2020-06-12 中国科学技术大学 Sequence labeling method using cross-language information
CN111368544A (en) * 2020-02-28 2020-07-03 中国工商银行股份有限公司 Named entity identification method and device
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111522964A (en) * 2020-04-17 2020-08-11 电子科技大学 Tibetan medicine literature core concept mining method
CN111651995A (en) * 2020-06-07 2020-09-11 上海建科工程咨询有限公司 Accident information automatic extraction method and system based on deep circulation neural network
CN111738169A (en) * 2020-06-24 2020-10-02 北方工业大学 Handwriting formula recognition method based on end-to-end network model
CN112883737A (en) * 2021-03-03 2021-06-01 山东大学 Robot language instruction analysis method and system based on Chinese named entity recognition
CN112989811A (en) * 2021-03-01 2021-06-18 哈尔滨工业大学 BilSTM-CRF-based historical book reading auxiliary system and control method thereof
CN113033192A (en) * 2019-12-09 2021-06-25 株式会社理光 Training method and device for sequence labels and computer readable storage medium
CN113139382A (en) * 2020-01-20 2021-07-20 北京国双科技有限公司 Named entity identification method and device
CN113158658A (en) * 2021-04-26 2021-07-23 中国电子科技集团公司第二十八研究所 Knowledge embedding-based structured control instruction extraction method
CN113743116A (en) * 2020-05-28 2021-12-03 株式会社理光 Training method and device for named entity recognition and computer readable storage medium
WO2023004528A1 (en) * 2021-07-26 2023-02-02 深圳市检验检疫科学研究院 Distributed system-based parallel named entity recognition method and apparatus
CN113033192B (en) * 2019-12-09 2024-04-26 株式会社理光 Training method and device for sequence annotation and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017045453A (en) * 2015-08-27 2017-03-02 ゼロックス コーポレイションXerox Corporation Document-specific gazetteers for named entity recognition
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN107797992A (en) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 Name entity recognition method and device
CN107977361A (en) * 2017-12-06 2018-05-01 哈尔滨工业大学深圳研究生院 The Chinese clinical treatment entity recognition method represented based on deep semantic information
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017045453A (en) * 2015-08-27 2017-03-02 ゼロックス コーポレイションXerox Corporation Document-specific gazetteers for named entity recognition
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN107797992A (en) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 Name entity recognition method and device
CN107977361A (en) * 2017-12-06 2018-05-01 哈尔滨工业大学深圳研究生院 The Chinese clinical treatment entity recognition method represented based on deep semantic information
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUL KHAN SAFI QAMAS 等: "基于深度神经网络的命名实体识别", 《技术研究》 *
陈彦妤 等: "基于CRF 和Bi-LSTM 的保险名称实体识别", 《智能计算机与应用》 *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147551A (en) * 2019-05-14 2019-08-20 腾讯科技(深圳)有限公司 Multi-class entity recognition model training, entity recognition method, server and terminal
CN110147551B (en) * 2019-05-14 2023-07-11 腾讯科技(深圳)有限公司 Multi-category entity recognition model training, entity recognition method, server and terminal
CN111160467A (en) * 2019-05-31 2020-05-15 北京理工大学 Image description method based on conditional random field and internal semantic attention
CN111160467B (en) * 2019-05-31 2021-12-10 北京理工大学 Image description method based on conditional random field and internal semantic attention
CN110298043A (en) * 2019-07-03 2019-10-01 吉林大学 A kind of vehicle name entity recognition method and system
CN110334189A (en) * 2019-07-11 2019-10-15 河南大学 Method is determined based on the long microblog topic label in short-term and from attention neural network
CN110705272A (en) * 2019-08-28 2020-01-17 昆明理工大学 Named entity identification method for automobile engine fault diagnosis
CN110598213A (en) * 2019-09-06 2019-12-20 腾讯科技(深圳)有限公司 Keyword extraction method, device, equipment and storage medium
CN110619124A (en) * 2019-09-19 2019-12-27 成都数之联科技有限公司 Named entity identification method and system combining attention mechanism and bidirectional LSTM
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111079418B (en) * 2019-11-06 2023-12-05 科大讯飞股份有限公司 Named entity recognition method, device, electronic equipment and storage medium
CN110826334A (en) * 2019-11-08 2020-02-21 中山大学 Chinese named entity recognition model based on reinforcement learning and training method thereof
CN110826334B (en) * 2019-11-08 2023-04-21 中山大学 Chinese named entity recognition model based on reinforcement learning and training method thereof
CN110969020B (en) * 2019-11-21 2022-10-11 中国人民解放军国防科技大学 CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN110969020A (en) * 2019-11-21 2020-04-07 中国人民解放军国防科技大学 CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN113033192B (en) * 2019-12-09 2024-04-26 株式会社理光 Training method and device for sequence annotation and computer readable storage medium
CN113033192A (en) * 2019-12-09 2021-06-25 株式会社理光 Training method and device for sequence labels and computer readable storage medium
CN111223489A (en) * 2019-12-20 2020-06-02 厦门快商通科技股份有限公司 Specific keyword identification method and system based on Attention mechanism
CN111222339B (en) * 2020-01-13 2023-05-23 华南理工大学 Medical consultation named entity recognition method based on countermeasure multitask learning
CN111222339A (en) * 2020-01-13 2020-06-02 华南理工大学 Medical consultation named entity identification method based on anti-multitask learning
CN113139382A (en) * 2020-01-20 2021-07-20 北京国双科技有限公司 Named entity identification method and device
CN111274829A (en) * 2020-02-07 2020-06-12 中国科学技术大学 Sequence labeling method using cross-language information
CN111274829B (en) * 2020-02-07 2023-06-16 中国科学技术大学 Sequence labeling method utilizing cross-language information
CN111209754B (en) * 2020-02-25 2023-06-02 桂林电子科技大学 Data set construction method for Vietnam entity recognition
CN111209754A (en) * 2020-02-25 2020-05-29 桂林电子科技大学 Data set construction method for Vietnamese entity recognition
CN111368544B (en) * 2020-02-28 2023-09-19 中国工商银行股份有限公司 Named entity identification method and device
CN111368544A (en) * 2020-02-28 2020-07-03 中国工商银行股份有限公司 Named entity identification method and device
CN111522964A (en) * 2020-04-17 2020-08-11 电子科技大学 Tibetan medicine literature core concept mining method
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN113743116A (en) * 2020-05-28 2021-12-03 株式会社理光 Training method and device for named entity recognition and computer readable storage medium
CN111651995A (en) * 2020-06-07 2020-09-11 上海建科工程咨询有限公司 Accident information automatic extraction method and system based on deep circulation neural network
CN111738169A (en) * 2020-06-24 2020-10-02 北方工业大学 Handwriting formula recognition method based on end-to-end network model
CN112989811A (en) * 2021-03-01 2021-06-18 哈尔滨工业大学 BilSTM-CRF-based historical book reading auxiliary system and control method thereof
CN112883737B (en) * 2021-03-03 2022-06-14 山东大学 Robot language instruction analysis method and system based on Chinese named entity recognition
CN112883737A (en) * 2021-03-03 2021-06-01 山东大学 Robot language instruction analysis method and system based on Chinese named entity recognition
CN113158658B (en) * 2021-04-26 2023-09-19 中国电子科技集团公司第二十八研究所 Knowledge embedding-based structured control instruction extraction method
CN113158658A (en) * 2021-04-26 2021-07-23 中国电子科技集团公司第二十八研究所 Knowledge embedding-based structured control instruction extraction method
WO2023004528A1 (en) * 2021-07-26 2023-02-02 深圳市检验检疫科学研究院 Distributed system-based parallel named entity recognition method and apparatus

Also Published As

Publication number Publication date
CN109657239B (en) 2020-04-21

Similar Documents

Publication Publication Date Title
CN109657239A (en) The Chinese name entity recognition method learnt based on attention mechanism and language model
CN108460013B (en) Sequence labeling model and method based on fine-grained word representation model
CN108628823B (en) Named entity recognition method combining attention mechanism and multi-task collaborative training
CN108268444B (en) Chinese word segmentation method based on bidirectional LSTM, CNN and CRF
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN109086267B (en) Chinese word segmentation method based on deep learning
McCallum Efficiently inducing features of conditional random fields
CN112528676B (en) Document-level event argument extraction method
CN109543181B (en) Named entity model and system based on combination of active learning and deep learning
Zhang et al. Semi-supervised structured prediction with neural CRF autoencoder
CN111738007A (en) Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN110263325A (en) Chinese automatic word-cut
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN108647191A (en) It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method
CN111460824A (en) Unmarked named entity identification method based on anti-migration learning
CN111651983A (en) Causal event extraction method based on self-training and noise model
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN112347269A (en) Method for recognizing argument pairs based on BERT and Att-BilSTM
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN113312498B (en) Text information extraction method for embedding knowledge graph by undirected graph
CN111062214A (en) Integrated entity linking method and system based on deep learning
CN112699685B (en) Named entity recognition method based on label-guided word fusion
CN114356990A (en) Base named entity recognition system and method based on transfer learning
CN117094325B (en) Named entity identification method in rice pest field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant