CN109657239A - The Chinese name entity recognition method learnt based on attention mechanism and language model - Google Patents
The Chinese name entity recognition method learnt based on attention mechanism and language model Download PDFInfo
- Publication number
- CN109657239A CN109657239A CN201811517779.9A CN201811517779A CN109657239A CN 109657239 A CN109657239 A CN 109657239A CN 201811517779 A CN201811517779 A CN 201811517779A CN 109657239 A CN109657239 A CN 109657239A
- Authority
- CN
- China
- Prior art keywords
- word
- language model
- lstm
- entity recognition
- name entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Abstract
The invention discloses a kind of Chinese name entity recognition methods learnt based on attention mechanism and language model, this method includes constructing the dictionary based on word, vector conversion is carried out to the corresponding ID number of each element, it is combined by Self-attention layers of restricted, splice and combine simultaneously train language model by first layer Bi-LSTM, it is spliced and combined by second layer Bi-LSTM and maximum matching method is used to carry out Tag Estimation training, data set is subjected to random alignment and more wheel training are carried out using Adam optimization method, Chinese name Entity recognition is carried out to text data to be identified using neural network.The present invention is based only on the feature of word, does not need to carry out the manual features such as participle and other parts of speech, syntax, improves the robustness and robustness of method;And the present invention shows well unregistered word, and function admirable, can be very good the performance for improving Downstream Jobs.
Description
Technical field
The invention belongs to entity recognition techniques fields, and in particular to a kind of to be learnt based on attention mechanism and language model
Chinese name entity recognition method.
Background technique
Chinese name Entity recognition problem is one of most common problem of natural language processing field, main task be to
Word or word in non-structured text is tagged, convenient for extracting the effective information in text.
Chinese name body identification mission is identified to the entity in Chinese text data, and effective letter in text is extracted
Breath, specifically, the object of identification is Chinese text data, such as a word or a Duan Wenzhang;The requirement of identification is to mark this section
Entity name in text, such as name, place name, mechanism name, appellation.
The method for carrying out Chinese name Entity recognition at present is broadly divided into three classes:
1. rule-based unsupervised approaches:
The form of expression mainly according to name entity to be identified in linguistics, is artificially arranged some rules and goes to match
The syntactic structure of sentence, mark name entity.Rule-based method mostly uses linguistic expertise construction rule template by hand,
Selecting feature includes statistical information, punctuation mark, keyword, deictic words and direction word, position word (such as tail word), centre word side
Method, matches in mode with character string for main means, and this kind of system depends on the foundation of knowledge base and dictionary mostly.This kind of side
The effect of method is largely dependent upon the level of the linguistic expertise of setting rule, and different fields is needed to be arranged
Different rules are deacclimatized, so comparing elapsed time and manpower.
2. the method based on probability statistics:
Statistical machine learning method will name Entity recognition as sequence labelling task from the point of view of, pass through large-scale corpus
It practises to obtain language model, to realize the mark to each position of sentence.The common model of such methods has: production model
Hidden Markov Model (HMM) and discriminative model condition random field (CRF) etc..
3. method neural network based:
With the development of the fast development of neural network, especially Recognition with Recurrent Neural Network, to processing sequence task bring
Great performance improves, in addition the development of term vector, makes Processing with Neural Network text data become a kind of approach.In addition nerve net
The powerful ability in feature extraction of network, should not supernumerary's work feature can reach good performance.Problem is marked in processing sequence
More outstanding is exactly LSTM+CRF model, achieves good effect in English name Entity recognition, but due in
The relationship of literary characteristic of speech sounds, this set of model showed in Chinese name Entity recognition task without so outstanding.So Chinese
It is being carried out always based on word or word-based or research based on word and word composite character, and there have paper to point out to be special based on word
The expression effect of the name Entity recognition task of sign is better than the effect of word-based feature, and the method based on word feature for
The performance of unregistered word is better than the method for word-based feature.
Prior art defect:
1. rule-based unsupervised approaches need linguistic expertise rule is arranged, and for different fields, very
It to different text language styles requires that different rules is arranged, expansible row is very low.And if the rule of setting is stringent
If, it is very big for the omission of effective information.If too loose, the poor effect of identification.
2. mainly being learnt by large-scale corpus based on the method for probability statistics to obtain language model, main models have
Hidden Markov Model, maximum entropy model and condition random field, such methods become dependent upon the quality of corpus, and certain
Showed on sample it is very poor, cause it is extensive not enough, recall rate is relatively low, and expression effect is not fine.
3. the neural network method of current mainstream utilizes Recognition with Recurrent Neural Network (RNN) or other Recognition with Recurrent Neural Network
Name Entity recognition problem is converted to sequence labelling problem by mutation, such as LSTM, GRU.Such issues that first pass through participle after
Sequence labelling is carried out again, but the effect of this method is largely influenced by participle effect, and for being not logged in
The expression effect of word is not fine;There are also researchs to improve neural network model performance to be simple, manually increases various features (such as
Part of speech feature, grammar property etc.) into the input of neural network, such methods are although improve to effect, but people
The increased feature of work, increases the workload of people, complicates method, this does not meet the sheet that neural network automatically extracts feature yet
Matter feature.
Summary of the invention
Goal of the invention of the invention is: in order to overcome defect existing for existing Chinese name entity recognition method, improving and knows
Other effect, the invention proposes a kind of Chinese name entity recognition methods learnt based on attention mechanism and language model.
The technical scheme is that a kind of Chinese name Entity recognition learnt based on attention mechanism and language model
Method, comprising the following steps:
A, the data set for having marked the Chinese name Entity recognition of label is obtained, the dictionary based on word is constructed;
B, vector conversion process is carried out to the corresponding ID number of element each in the dictionary based on word of step A building;
C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained every
The word vector of a center word interior weighted array with other word vectors of window ranges nearby, excavates center word window model nearby
Enclose interior potential word information;
D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time of both direction
The hidden layer of step exports, and obtained output is spliced and combined, and exports training language using the hidden layer of each time step
Model;
E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtain it is secondary splice and combine it is defeated
Out, and using maximum matching method Tag Estimation training is carried out;
F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and excellent using Adam
Change method circulation step A-E carries out more wheel training to neural network;
G, text data to be identified is handled using neural network, completes Chinese name Entity recognition.
Further, the step A obtains the data set for having marked the Chinese name Entity recognition of label, and building is based on word
Dictionary, specially
An ID number is distributed to each of the data set of Chinese name Entity recognition for having marked label word and symbol,
And mark is added respectively in the beginning and end of sentence.
Further, the step A further includes building sequence label;The sequence label includes positive language model mark
The label of the corresponding name body classification of label, reversed language model label and each word.
Further, the word vector after step B conversion is passed through restricted Self-attention by the step C
Layer is combined, and obtains the word information that word is combined into, specially
Window size is set, by each word in the window size region of center word compared with center word carries out correlation,
Calculate the relevance values of each word Yu center word;Then group is weighted according to the correlation of word each in region and center word
It closes, calculates center word vector, obtain the potential word information being combined by word.
Further, the formula of the relevance values for calculating each word and center word is specially f (xi, q) and=ωTσ
(W(1)xi+W(2)q)
Wherein, q indicates center word, xiI-th of word of window-size window size, W near the word of expression center(1)It indicates
xiNeed by full articulamentum correspond to weight matrix, W(2)Indicate that q needs the full articulamentum passed through to correspond to weight matrix, wTIt indicates
Correlation vector.
Further, the formula of the calculating center word vector is to be specially
P (z | x, q)=soft max (a)
Wherein, a indicates each xiVector after the relevance values splicing calculated with q, and p (z | x, q) indicate all right
Answer xiWith the output valve of the relevance values of q after softmax function, p (z=i | x, q) indicates xiRelevance values with q are whole
Weighted value in a sequence, z indicate all x in the neighbouring window-size window of center wordiAbout the weight value set of q, x table
Show comprising xiSequence, s indicate weighted array calculated value.
Further, the word vector that step C is obtained is handled in the step D by first layer Bi-LSTM, is obtained
The hidden layer of each time step of both direction exports, and is by the computation model that obtained output splices and combines
Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer of the t moment of reversed LSTM
Output, dtIt indicatesWithSplicing result,Indicate positive hidden layer output by activation primitive tanh's as a result,It indicates
Reversed hidden layer output by activation primitive tanh's as a result,It indicatesThe full articulamentum passed through corresponds to weight matrix,Table
ShowThe full articulamentum passed through corresponds to weight matrix.
Further, the computation model of the hidden layer output train language model of each time step is utilized in the step D
For
Wherein,It indicates to use in t momentCome predict next moment word probability vector,It indicates to use in t momentCome predict next moment word probability vector,It indicatesThe full connection passed through
The corresponding weight matrix of layer,It indicatesThe full articulamentum passed through corresponds to weight matrix,Indicate forward direction LSTM train language model
Loss function,Indicate that the loss function of reversed LSTM train language model, T indicate total time step of sequence.
Further, the step D further includes respectively by the word ID for calculating prediction and the really friendship between the word ID of front and back
Entropy is pitched, and calculates loss using the positive LSTM label and reversed LSTM label of step A building, language model is optimized.
Further, carrying out Tag Estimation training using maximum matching method in the step E is specially
The output obtained using step D is input to second layer Bi-LSTM, then to obtain second layer Bi-LSTM spliced defeated
Out, the maximal possibility estimation of sequence label is then calculated inside a condition random field, and utilizes each word of step A building
The loss of the label design conditions random field of corresponding name body classification, finally adds second layer Bi- for the loss of condition random field
The loss for the entire neural network that the loss of the language model of LSTM obtains.
The beneficial effects of the present invention are: the present invention passes through restricted by dictionary of the building based on word respectively
Self-attention layers, first layer Bi-LSTM and second layer Bi-LSTM are handled, and are carried out in first layer Bi-LSTM
Language model study carries out Tag Estimation training finally by maximum matching method;The present invention is based only on the feature of word, is not required to
The manual features such as participle and other parts of speech, syntax are carried out, the robustness and robustness of method are improved;And the present invention couple
Unregistered word performance is good, and function admirable, can be very good the performance for improving Downstream Jobs.
Detailed description of the invention
Fig. 1 is the flow diagram of Chinese name entity recognition method of the invention;
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.
As shown in Figure 1, for the flow diagram of Chinese name entity recognition method of the invention.One kind being based on attention machine
The Chinese name entity recognition method of system and language model study, comprising the following steps:
A, the data set for having marked the Chinese name Entity recognition of label is obtained, the dictionary based on word is constructed;
B, vector conversion process is carried out to the corresponding ID number of element each in the dictionary based on word of step A building;
C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained every
The word vector of a center word interior weighted array with other word vectors of window ranges nearby, excavates center word window model nearby
Enclose interior potential word information;
D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time of both direction
The hidden layer of step exports, and obtained output is spliced and combined, and exports training language using the hidden layer of each time step
Model;
E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtain it is secondary splice and combine it is defeated
Out, and using maximum matching method Tag Estimation training is carried out;
F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and excellent using Adam
Change method circulation step A-E carries out more wheel training to neural network;
G, text data to be identified is handled using neural network, completes Chinese name Entity recognition.
Present invention combination attention mechanism (restricted self-attention) technology, and analyze current main-stream
The shortcomings that LSTM+CRF model, devises two layers of two-way LSTM+CRF structure, and language is added in the output of the two-way LSTM of first layer
It says model learning, improves the ability to express of model.
Above-mentioned steps A-C is the first stage of Chinese name Entity recognition, i.e., being converted to Chinese text data can locate
The digital vectors of reason reuse restricted Self-attention technology and excavate potential word information;Above-mentioned steps D is
The second stage of Chinese name Entity recognition, i.e., handled, and learn language model by first layer Bi-LSTM;Above-mentioned step
Rapid E-G is the phase III of Chinese name Entity recognition, i.e., is predicted by second layer Bi-LSTM+CRF.
In an alternate embodiment of the present invention where, above-mentioned steps A obtains the Chinese name Entity recognition for having marked label
Data set, construct the dictionary based on word, specifically: to marked label Chinese name Entity recognition data set in it is every
One word and symbol distribute an ID number, and add mark respectively in the beginning and end of sentence.
The above-mentioned dictionary based on word includes all words and symbol in data set, and add identifier<UNK>,<PAD>and
It represents sentence and starts ending<s>,</S>deng mark, an ID number can be distributed to each element in dictionary;For convenient for after
Continuous mass processing, the present invention set the ID of<PAD>as 0, and other elements just distribute ID in order, so as to obtain about number
According to the dictionary of collection, each element in dictionary has one-to-one ID number therewith.Using the dictionary made can will in
Literary text data is converted into the digital information that can be inputted to neural network.Such as a Chinese text, " I likes Beijing Tian An
Door.", one searches each word and the corresponding ID number of symbol in dictionary, and the ID of " I " is 1, and the ID of " love " is 4, and the ID in " north " is
16 ..., "." corresponding ID is 10, the ID sequence [Isosorbide-5-Nitrae, 16,45,46,153,86,10] after conversion.
If the dimension of input is [batch_size*max_len], here batch_ using the method for batch training
Size indicates that the sample number of batch input, max_len indicate a longest sample length in batch sample, longest that other are insufficient
' 0 ' come polishings of sample.For example the number of samples of a batch batch is 2, carrys out polishing according to longest sequence.
" I am Chinese."
" I takes pride in very much."
Batch ID after converting, will into the matrix [[1,8,87,89,62,10], [1,54,465,875,10,0]] of bidimensional
Insufficient longest sentence sequence is with O come polishing.
Above-mentioned steps A further includes building sequence label;The sequence label includes positive language model label, reversed language
The label of model label and the corresponding name body classification of each word.
Above-mentioned forward direction LSTM label is used for positive language model study, and the method for building forward direction LSTM label is to delete one
First ID of sequence samples (such as: in short), in sequence finally with "<S>" the corresponding ID that indicates to end up, for example, " I likes north
Capital Tian An-men.", remove first character " I ", finally fill in sequence "<S>", "<S>" corresponding ID is in dictionary here
" 489 ", the ID label after conversion are [4,16,45,46,153,86,10,489].
The construction method of above-mentioned reversed LSTM label is the last character for removing sequence, and fills " < S to sequence beginning
> ", such as " I loves Beijing Tian An-men.", remove ".", "<S>" is filled to beginning, "<S>" corresponding ID is 789, after conversion
As a result [789, Isosorbide-5-Nitrae, 16,45,46,153,86].
The label of the above-mentioned corresponding name body classification of each word is the raw information of the data set marked, and the present invention will
Each label is converted to specified digital classification.Such as: " B-PER " that original labeled data is concentrated indicates the beginning of name, "
I-PER " indicates the middle word of name, " E-PER ", indicates name ending character, they can be established one-to-one relationship, "
The corresponding number 1 of B-PER ", " I-PER " corresponding number 2, " E-PER " corresponding number 3, thus the character string that initial data is concentrated
The label of type, which is converted to, can handle digital information.
In an alternate embodiment of the present invention where, each member in the dictionary based on word that above-mentioned steps B constructs step A
The corresponding ID number of element carries out vector conversion process, specifically by Google's word2vec tool by each ID be converted to one to
Amount, is expressed as [batch_size*max_len*emb_size], and wherein emb_size indicates the vector dimension of setting.
In an alternate embodiment of the present invention where, above-mentioned steps C passes through the word vector after step B conversion
Self-attention layers of restricted are combined, and obtain the word information that word is combined into, specially setting window size,
By each word in the window size region of center word compared with center word carries out correlation, the phase of each word with center word is calculated
Closing property value;Then combination is weighted according to the correlation of word each in region and center word, calculates center word vector, obtain by
The potential word information that word is combined into.
The word vector that the present invention converts step B passes through restricted Self-attention layers, allows Neural Network Science
Practise the word information potentially combined by word.Here restricted indicates limitation window (window_size) size, i.e., will not
Be Self-attention in entire sentence sequence, only the region of near itself word a fixed window size into
Row self-attention processing.
The formula that the present invention calculates the relevance values of each word and center word is specially f (xi, q) and=wTσ(W(1)xi+W(2)q)
Wherein, q indicates center word, xiIt is i-th of word of window-size window size near the word of center, it is a total of
Window_size word, i.e. i are from 1 to window-size.Correlation is done with neighbouring each word and center word to compare, and is realized
Mode be by center word after linear transformation and after linear transformation near a word carry out addition of vectors.Using
σ activation primitive, σ activation primitive can be sigmoid function or tanh function, multiplied by wT;wTIt is an emb_size dimension
Vector, so product is a value, and this value be exactly near this word of x_i and center word q correlation;W(1)It indicates
xiNeed by full articulamentum correspond to weight matrix, W(2)Indicate that q needs the full articulamentum passed through to correspond to weight matrix.
All words of neighbouring window are all done correlation with center word and compared by the present invention, obtain each nearby word and center word
Relevance scores f (xi, q), and it is stitched together they are all, by a softmax activation primitive by respective correlation
Property score is converted to corresponding probability, by the probability and corresponding center word multiplication of vectors after conversion, then by results added, obtains
The vector of center word indicates that the combining form of this multiplied by weight by word vector can allow neural network discovery to pass through word group
The potential word information of synthesis.
The formula of calculating center word vector is to be specially
P (z | x, q)=soft max (a)
Wherein, a indicates each xiVector after the relevance values splicing calculated with q, as list;P (z | x, q) table
Show all corresponding xiWith the output valve of the relevance values of q after softmax function, softmax function can be by corresponding sequence
Each value of column is converted to the corresponding weight of each value, all x of entire sequenceiThe sum of weight be equal to 1, (p (z=i | x, q)
Indicate xiWith the weighted value of the relevance values of q in entire sequence, z indicates that center word nearby owns in window-size window
xiAbout the weight value set of q, x indicates to include xiN word sequence, x={ x1, x2, x3 ... xi ... xn }, s indicate weighting
Combined calculated value, i.e., by each xiCorresponding weight and xiProduct is carried out, the value that then all adductions are got up.
The present invention is based on words to carry out Chinese name Entity recognition, has good effect to processing unregistered word, and use
Self-attention technology has excavated the word information being potentially combined by word, improves the ability to express of model, and
The method that the present invention nearby limits window size using center word, is not self-attention in entire sequence, thus
Improve the precision of prediction.Because the information of most of center word is only information-related with the word of neighbouring window size, and this hair
The bright attention technology used can excavate the power of word related with center word by epineural network in window size automatically
The heavy and high center word correlation big weighted value low with center word correlation of weighted value is low.
In an alternate embodiment of the present invention where, the word vector that above-mentioned steps D obtains step C passes through first layer Bi-
LSTM is handled, and obtains the hidden layer output of each time step of both direction, obtained output is spliced and combined, and
Train language model is exported using the hidden layer of each time step.
For in short, " I Love You ", the hidden layer after first layer Bi-LSTM exports before can predicting this word
Word afterwards can allow the output of the hidden layer of Bi-LSTM to be more in line with text language model in this way.Specifically computation model is
Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer of the t moment of reversed LSTM
Output, xtIndicate the input of t moment, i.e. a word vector;dtIt indicatesWithSplicing result, i.e., by positive LSTM and anti-
Be stitched together to the output of the t moment hidden layer of LSTM as a result, dtDimension beWithDimension and,Indicate positive hidden
Hide layer output by activation primitive tanh's as a result, i.e. positive hidden layer output multiplied by the result after a weight matrix using
The final result of the activation primitive of one tanh,Indicate the output of reversed hidden layer by activation primitive tanh's as a result, i.e. instead
To hidden layer output multiplied by the result after a weight matrix using the final result of the activation primitive of a tanh,Table
ShowThe full articulamentum passed through corresponds to weight matrix,It indicatesThe full articulamentum passed through corresponds to weight matrix. For
The output of Bi-LSTM hidden layer is after linear change, using the output of tanh activation primitive, can directly use in this way To predict the word information of lower front and back.It is using the computation model that the hidden layer of each time step exports train language model
Wherein,It indicates to use in t momentCome predict next moment word probability vector, it is every in vector
A element is all a probability, all elements probability of entire vector and be 1,It indicates to use in t momentCome pre-
The probability vector of the word at next moment is surveyed, each element is a probability, all elements probability of entire vector in vector
Be 1,It indicatesThe full articulamentum passed through corresponds to weight matrix,It indicatesThe full articulamentum passed through corresponds to weight square
Battle array,The loss function for indicating forward direction LSTM train language model, using Maximum-likelihood estimation mode,Indicate reversed LSTM
The loss function of train language model, using Maximum-likelihood estimation mode, T indicates total time step of sequence, the i.e. length of sequence
Degree.
It calculates Value, the two be worth it is smaller illustrate study language model it is better, respectively by calculate prediction
Cross entropy between word ID and true front and back word ID, and utilize the positive LSTM label and reversed LSTM label meter of step A building
Loss is calculated, language model is optimized.
In an alternate embodiment of the present invention where, the term vector that above-mentioned steps E obtains step D passes through second layer Bi-
LSTM is handled, specially the input by the output of first layer Bi-LSTM as second layer Bi-LSTM, use and first layer
The identical calculating graph model of Bi-LSTM is handled, but study of the second layer Bi-LSTM without language model, but will be hidden
Tag Estimation training directly is carried out with the method for condition random field after hiding layer output splicing, is specially obtained using step D
Output, is input to second layer Bi-LSTM, then obtain the spliced output of second layer Bi-LSTM, then in a condition random field
The inside calculates the maximal possibility estimation of sequence label, and utilizes the label meter of the corresponding name body classification of each word of step A building
The loss of condition random field is calculated, the loss of condition random field is finally added to losing for the language model of second layer Bi-LSTM
The loss of the entire neural network arrived.
The present invention has used the study of language model after first layer Bi-LSTM, in the costing bio disturbance that neural network is final
In not only added the loss of Tag Estimation, also add the loss of language model, form a multi-task learning, thus may be used
So that the output that first layer Bi-LSTM is generated is more in line with characteristic of speech sounds, the input for allowing second layer Bi-LSTM to receive has sequence
Columnization, to improve recognition effect.
In an alternate embodiment of the present invention where, above-mentioned steps F will mark the Chinese name Entity recognition of label
Data set carries out random alignment processing, and carries out taking turns training to neural network using Adam optimization method circulation step A-E, one more
Wheel indicates that training is primary on entire data set, to constantly reduce the loss of entire neural network, Optimal Parameters.
In an alternate embodiment of the present invention where, above-mentioned steps G utilizes neural network after the completion of neural metwork training
Text data to be identified is handled, neural network can return to transition probability matrix and the generation of condition random field generation
Logit value.The method for wherein generating logit value is after second layer Bi-LSTM bidirectional output is spliced, and dimension is [batch_
Size*max_len*hidden_size], wherein hidden_size is to define LSTM to be the hyper parameter of setting, then carried out
Full articulamentum converts the other number target_num of tag class for finally needing to predict for last one-dimensional hidden_size, most
Whole logit dimension is [batch_size*max_len*target_num], and obtaining shifting probability matrix and logit value can lead to
It crosses viterbi algorithm to be decoded, marks the final classification of each word.
The word of Chinese text is converted to vector input by the word2vec that the present invention is increased income using Google and pre-training is good, is led to
It crosses restricted Self-attention to be combined the character information of input, neural network discovery is allowed to combine by word
At potential word information, then by the Bi-LSTM of first layer, two obtained unidirectional outputs are spliced and combined, this
In the present invention be added to an additional learning tasks, i.e., a language model is learnt by the input of first layer Bi-LSTM, made
The output of first layer the characteristics of being more in line with language, then second layer Bi- is regarded into the spliced output of first layer Bi-LSTM
The input of LSTM, the LSTM output for then obtaining both direction are spliced again, using a full articulamentum, pass through condition random field
The method of CRF carries out prediction label.
The present invention current Chinese name entity recognition method there are aiming at the problem that, and combine nerual network technique, pass through
Only the method for feature input of the building based on word, Direct Recognition go out the physical name in Chinese text;This method is based only on word
Feature does not need to carry out the manual features such as participle and other parts of speech, syntax, is convenient for the robustness and robustness of method in this way;
And the present invention shows unregistered word good, and function admirable, can be very good to improve, Downstream Jobs (such as: information retrieval and
Keyword recognition) performance.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair
Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field
Those of ordinary skill disclosed the technical disclosures can make according to the present invention and various not depart from the other each of essence of the invention
The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.
Claims (10)
1. a kind of Chinese name entity recognition method learnt based on attention mechanism and language model, which is characterized in that including
Following steps:
A, the data set for having marked the Chinese name Entity recognition of label is obtained, the dictionary based on word is constructed;
B, vector conversion process is carried out to the corresponding ID number of element each in the dictionary based on word of step A building;
C, the word vector after step B conversion is combined by restricted Self-attention layers, is obtained in each
The word vector of the heart word interior weighted array with other word vectors of window ranges nearby, excavation center word is nearby in window ranges
Potential word information;
D, the word vector that step C is obtained is handled by first layer Bi-LSTM, obtains each time step of both direction
Hidden layer output, obtained output is spliced and combined, and exports train language model using the hidden layer of each time step;
E, the splicing result that step D is obtained is handled by second layer Bi-LSTM, obtains the secondary output spliced and combined,
And Tag Estimation training is carried out using maximum matching method;
F, the data set for the Chinese name Entity recognition for having marked label is subjected to random alignment processing, and uses the optimization side Adam
Method circulation step A-E carries out more wheel training to neural network;
G, text data to be identified is handled using neural network, completes Chinese name Entity recognition.
2. the Chinese name entity recognition method learnt as described in claim 1 based on attention mechanism and language model,
It being characterized in that, the step A obtains the data set for having marked the Chinese name Entity recognition of label, the dictionary based on word is constructed,
Specially
To each of the data set of Chinese name Entity recognition for having marked label word and symbol one ID number of distribution, and
The beginning and end of sentence adds mark respectively.
3. the Chinese name entity recognition method learnt as claimed in claim 2 based on attention mechanism and language model,
It is characterized in that, the step A further includes building sequence label;The sequence label includes positive language model label, reversed language
Say the label of model label and the corresponding name body classification of each word.
4. the Chinese name entity recognition method learnt as claimed in claim 3 based on attention mechanism and language model,
It is characterized in that, the word vector after the step C converts step B carries out group by restricted Self-attention layers
It closes, obtains the word vector of each center word interior weighted array with other word vectors of window ranges nearby, excavate center word and exist
Potential word information in neighbouring window ranges, specially
Window size is set, by each word in the window size region of center word compared with center word carries out correlation, is calculated
The relevance values of each word and center word;Then combination is weighted according to the correlation of word each in region and center word, counted
Calculation center word vector obtains the potential word information being combined by word.
5. the Chinese name entity recognition method learnt as claimed in claim 4 based on attention mechanism and language model,
It is characterized in that, the formula of the relevance values for calculating each word and center word is to be specially
f(xi, q) and=ωTσ(W(1)xi+W(2)q)
Wherein, q indicates center word, xiI-th of word of window-size window size, W near the word of expression center(1)Indicate xiIt needs
The full articulamentum to be passed through corresponds to weight matrix, W(2)Indicate that q needs the full articulamentum passed through to correspond to weight matrix, wTIndicate phase
Closing property vector.
6. special such as the Chinese name entity recognition method learnt based on attention mechanism and language model that claim 5 is stated
Sign is that the formula of the calculating center word vector is to be specially
P (z | x, q)=softmax (a)
Wherein, a indicates each xiVector after the relevance values splicing calculated with q, and p (z | x, q) indicate all corresponding xiWith
Output valve of the relevance values of q after softmax function, and p (z=i | x, q) indicate xiRelevance values with q are in entire sequence
On weighted value, z indicates center word all x in window-size window nearbyiAbout the weight value set of q, x indicates to include xi
Sequence, s indicate weighted array calculated value.
7. the Chinese name entity recognition method of claim 6 learnt based on attention mechanism and language model, feature are existed
In the word vector that step C is obtained is handled by first layer Bi-LSTM in the step D, obtains each of both direction
The hidden layer of time step exports, and is by the computation model that obtained output splices and combines
Wherein,Indicate the hidden layer output of the t moment of forward direction LSTM,Indicate the hidden layer output of the t moment of reversed LSTM,
dtIt indicatesWithSplicing result,Indicate positive hidden layer output by activation primitive tanh's as a result,Indicate reversed
Hidden layer output by activation primitive tanh's as a result,It indicatesThe full articulamentum passed through corresponds to weight matrix,It indicates
The full articulamentum passed through corresponds to weight matrix.
8. the Chinese name entity recognition method of claim 7 learnt based on attention mechanism and language model, feature are existed
In the computation model of the hidden layer output train language model in the step D using each time step is
Wherein,It indicates to use in t momentCome predict next moment word probability vector,Table
Show and is used in t momentCome predict next moment word probability vector,It indicatesThe full articulamentum passed through corresponds to weight square
Battle array,It indicatesThe full articulamentum passed through corresponds to weight matrix,Indicate the loss function of forward direction LSTM train language model,Indicate that the loss function of reversed LSTM train language model, T indicate total time step of sequence.
9. the Chinese name entity recognition method of claim 8 learnt based on attention mechanism and language model, feature are existed
In the step D further includes the cross entropy between the word ID predicted respectively by calculating and true front and back word ID, and utilizes step
The positive LSTM label and reversed LSTM label of A building calculate loss, optimize to language model.
10. the Chinese name entity recognition method of claim 9 learnt based on attention mechanism and language model, feature are existed
In carrying out Tag Estimation training using maximum matching method in the step E is specially
The output obtained using step D is input to second layer Bi-LSTM, then obtains the spliced output of second layer Bi-LSTM,
Then the maximal possibility estimation of sequence label is calculated inside a condition random field, and utilizes each word pair of step A building
The loss of the label design conditions random field of body classification should be named, the loss of condition random field is finally added into second layer Bi-
The loss for the entire neural network that the loss of the language model of LSTM obtains.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811517779.9A CN109657239B (en) | 2018-12-12 | 2018-12-12 | Chinese named entity recognition method based on attention mechanism and language model learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811517779.9A CN109657239B (en) | 2018-12-12 | 2018-12-12 | Chinese named entity recognition method based on attention mechanism and language model learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109657239A true CN109657239A (en) | 2019-04-19 |
CN109657239B CN109657239B (en) | 2020-04-21 |
Family
ID=66113875
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811517779.9A Active CN109657239B (en) | 2018-12-12 | 2018-12-12 | Chinese named entity recognition method based on attention mechanism and language model learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109657239B (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110147551A (en) * | 2019-05-14 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Multi-class entity recognition model training, entity recognition method, server and terminal |
CN110298043A (en) * | 2019-07-03 | 2019-10-01 | 吉林大学 | A kind of vehicle name entity recognition method and system |
CN110334189A (en) * | 2019-07-11 | 2019-10-15 | 河南大学 | Method is determined based on the long microblog topic label in short-term and from attention neural network |
CN110598213A (en) * | 2019-09-06 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Keyword extraction method, device, equipment and storage medium |
CN110619124A (en) * | 2019-09-19 | 2019-12-27 | 成都数之联科技有限公司 | Named entity identification method and system combining attention mechanism and bidirectional LSTM |
CN110705272A (en) * | 2019-08-28 | 2020-01-17 | 昆明理工大学 | Named entity identification method for automobile engine fault diagnosis |
CN110826334A (en) * | 2019-11-08 | 2020-02-21 | 中山大学 | Chinese named entity recognition model based on reinforcement learning and training method thereof |
CN110969020A (en) * | 2019-11-21 | 2020-04-07 | 中国人民解放军国防科技大学 | CNN and attention mechanism-based Chinese named entity identification method, system and medium |
CN111079418A (en) * | 2019-11-06 | 2020-04-28 | 科大讯飞股份有限公司 | Named body recognition method and device, electronic equipment and storage medium |
CN111160467A (en) * | 2019-05-31 | 2020-05-15 | 北京理工大学 | Image description method based on conditional random field and internal semantic attention |
CN111209754A (en) * | 2020-02-25 | 2020-05-29 | 桂林电子科技大学 | Data set construction method for Vietnamese entity recognition |
CN111222339A (en) * | 2020-01-13 | 2020-06-02 | 华南理工大学 | Medical consultation named entity identification method based on anti-multitask learning |
CN111223489A (en) * | 2019-12-20 | 2020-06-02 | 厦门快商通科技股份有限公司 | Specific keyword identification method and system based on Attention mechanism |
CN111274829A (en) * | 2020-02-07 | 2020-06-12 | 中国科学技术大学 | Sequence labeling method using cross-language information |
CN111368544A (en) * | 2020-02-28 | 2020-07-03 | 中国工商银行股份有限公司 | Named entity identification method and device |
CN111444721A (en) * | 2020-05-27 | 2020-07-24 | 南京大学 | Chinese text key information extraction method based on pre-training language model |
CN111522964A (en) * | 2020-04-17 | 2020-08-11 | 电子科技大学 | Tibetan medicine literature core concept mining method |
CN111651995A (en) * | 2020-06-07 | 2020-09-11 | 上海建科工程咨询有限公司 | Accident information automatic extraction method and system based on deep circulation neural network |
CN111738169A (en) * | 2020-06-24 | 2020-10-02 | 北方工业大学 | Handwriting formula recognition method based on end-to-end network model |
CN112883737A (en) * | 2021-03-03 | 2021-06-01 | 山东大学 | Robot language instruction analysis method and system based on Chinese named entity recognition |
CN112989811A (en) * | 2021-03-01 | 2021-06-18 | 哈尔滨工业大学 | BilSTM-CRF-based historical book reading auxiliary system and control method thereof |
CN113033192A (en) * | 2019-12-09 | 2021-06-25 | 株式会社理光 | Training method and device for sequence labels and computer readable storage medium |
CN113139382A (en) * | 2020-01-20 | 2021-07-20 | 北京国双科技有限公司 | Named entity identification method and device |
CN113158658A (en) * | 2021-04-26 | 2021-07-23 | 中国电子科技集团公司第二十八研究所 | Knowledge embedding-based structured control instruction extraction method |
CN113743116A (en) * | 2020-05-28 | 2021-12-03 | 株式会社理光 | Training method and device for named entity recognition and computer readable storage medium |
WO2023004528A1 (en) * | 2021-07-26 | 2023-02-02 | 深圳市检验检疫科学研究院 | Distributed system-based parallel named entity recognition method and apparatus |
CN113033192B (en) * | 2019-12-09 | 2024-04-26 | 株式会社理光 | Training method and device for sequence annotation and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017045453A (en) * | 2015-08-27 | 2017-03-02 | ゼロックス コーポレイションXerox Corporation | Document-specific gazetteers for named entity recognition |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107797992A (en) * | 2017-11-10 | 2018-03-13 | 北京百分点信息科技有限公司 | Name entity recognition method and device |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
CN108628823A (en) * | 2018-03-14 | 2018-10-09 | 中山大学 | In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training |
-
2018
- 2018-12-12 CN CN201811517779.9A patent/CN109657239B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017045453A (en) * | 2015-08-27 | 2017-03-02 | ゼロックス コーポレイションXerox Corporation | Document-specific gazetteers for named entity recognition |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107797992A (en) * | 2017-11-10 | 2018-03-13 | 北京百分点信息科技有限公司 | Name entity recognition method and device |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
CN108628823A (en) * | 2018-03-14 | 2018-10-09 | 中山大学 | In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training |
Non-Patent Citations (2)
Title |
---|
GUL KHAN SAFI QAMAS 等: "基于深度神经网络的命名实体识别", 《技术研究》 * |
陈彦妤 等: "基于CRF 和Bi-LSTM 的保险名称实体识别", 《智能计算机与应用》 * |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110147551A (en) * | 2019-05-14 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Multi-class entity recognition model training, entity recognition method, server and terminal |
CN110147551B (en) * | 2019-05-14 | 2023-07-11 | 腾讯科技(深圳)有限公司 | Multi-category entity recognition model training, entity recognition method, server and terminal |
CN111160467A (en) * | 2019-05-31 | 2020-05-15 | 北京理工大学 | Image description method based on conditional random field and internal semantic attention |
CN111160467B (en) * | 2019-05-31 | 2021-12-10 | 北京理工大学 | Image description method based on conditional random field and internal semantic attention |
CN110298043A (en) * | 2019-07-03 | 2019-10-01 | 吉林大学 | A kind of vehicle name entity recognition method and system |
CN110334189A (en) * | 2019-07-11 | 2019-10-15 | 河南大学 | Method is determined based on the long microblog topic label in short-term and from attention neural network |
CN110705272A (en) * | 2019-08-28 | 2020-01-17 | 昆明理工大学 | Named entity identification method for automobile engine fault diagnosis |
CN110598213A (en) * | 2019-09-06 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Keyword extraction method, device, equipment and storage medium |
CN110619124A (en) * | 2019-09-19 | 2019-12-27 | 成都数之联科技有限公司 | Named entity identification method and system combining attention mechanism and bidirectional LSTM |
CN111079418A (en) * | 2019-11-06 | 2020-04-28 | 科大讯飞股份有限公司 | Named body recognition method and device, electronic equipment and storage medium |
CN111079418B (en) * | 2019-11-06 | 2023-12-05 | 科大讯飞股份有限公司 | Named entity recognition method, device, electronic equipment and storage medium |
CN110826334A (en) * | 2019-11-08 | 2020-02-21 | 中山大学 | Chinese named entity recognition model based on reinforcement learning and training method thereof |
CN110826334B (en) * | 2019-11-08 | 2023-04-21 | 中山大学 | Chinese named entity recognition model based on reinforcement learning and training method thereof |
CN110969020B (en) * | 2019-11-21 | 2022-10-11 | 中国人民解放军国防科技大学 | CNN and attention mechanism-based Chinese named entity identification method, system and medium |
CN110969020A (en) * | 2019-11-21 | 2020-04-07 | 中国人民解放军国防科技大学 | CNN and attention mechanism-based Chinese named entity identification method, system and medium |
CN113033192B (en) * | 2019-12-09 | 2024-04-26 | 株式会社理光 | Training method and device for sequence annotation and computer readable storage medium |
CN113033192A (en) * | 2019-12-09 | 2021-06-25 | 株式会社理光 | Training method and device for sequence labels and computer readable storage medium |
CN111223489A (en) * | 2019-12-20 | 2020-06-02 | 厦门快商通科技股份有限公司 | Specific keyword identification method and system based on Attention mechanism |
CN111222339B (en) * | 2020-01-13 | 2023-05-23 | 华南理工大学 | Medical consultation named entity recognition method based on countermeasure multitask learning |
CN111222339A (en) * | 2020-01-13 | 2020-06-02 | 华南理工大学 | Medical consultation named entity identification method based on anti-multitask learning |
CN113139382A (en) * | 2020-01-20 | 2021-07-20 | 北京国双科技有限公司 | Named entity identification method and device |
CN111274829A (en) * | 2020-02-07 | 2020-06-12 | 中国科学技术大学 | Sequence labeling method using cross-language information |
CN111274829B (en) * | 2020-02-07 | 2023-06-16 | 中国科学技术大学 | Sequence labeling method utilizing cross-language information |
CN111209754B (en) * | 2020-02-25 | 2023-06-02 | 桂林电子科技大学 | Data set construction method for Vietnam entity recognition |
CN111209754A (en) * | 2020-02-25 | 2020-05-29 | 桂林电子科技大学 | Data set construction method for Vietnamese entity recognition |
CN111368544B (en) * | 2020-02-28 | 2023-09-19 | 中国工商银行股份有限公司 | Named entity identification method and device |
CN111368544A (en) * | 2020-02-28 | 2020-07-03 | 中国工商银行股份有限公司 | Named entity identification method and device |
CN111522964A (en) * | 2020-04-17 | 2020-08-11 | 电子科技大学 | Tibetan medicine literature core concept mining method |
CN111444721A (en) * | 2020-05-27 | 2020-07-24 | 南京大学 | Chinese text key information extraction method based on pre-training language model |
CN113743116A (en) * | 2020-05-28 | 2021-12-03 | 株式会社理光 | Training method and device for named entity recognition and computer readable storage medium |
CN111651995A (en) * | 2020-06-07 | 2020-09-11 | 上海建科工程咨询有限公司 | Accident information automatic extraction method and system based on deep circulation neural network |
CN111738169A (en) * | 2020-06-24 | 2020-10-02 | 北方工业大学 | Handwriting formula recognition method based on end-to-end network model |
CN112989811A (en) * | 2021-03-01 | 2021-06-18 | 哈尔滨工业大学 | BilSTM-CRF-based historical book reading auxiliary system and control method thereof |
CN112883737B (en) * | 2021-03-03 | 2022-06-14 | 山东大学 | Robot language instruction analysis method and system based on Chinese named entity recognition |
CN112883737A (en) * | 2021-03-03 | 2021-06-01 | 山东大学 | Robot language instruction analysis method and system based on Chinese named entity recognition |
CN113158658B (en) * | 2021-04-26 | 2023-09-19 | 中国电子科技集团公司第二十八研究所 | Knowledge embedding-based structured control instruction extraction method |
CN113158658A (en) * | 2021-04-26 | 2021-07-23 | 中国电子科技集团公司第二十八研究所 | Knowledge embedding-based structured control instruction extraction method |
WO2023004528A1 (en) * | 2021-07-26 | 2023-02-02 | 深圳市检验检疫科学研究院 | Distributed system-based parallel named entity recognition method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN109657239B (en) | 2020-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109657239A (en) | The Chinese name entity recognition method learnt based on attention mechanism and language model | |
CN108460013B (en) | Sequence labeling model and method based on fine-grained word representation model | |
CN108628823B (en) | Named entity recognition method combining attention mechanism and multi-task collaborative training | |
CN108268444B (en) | Chinese word segmentation method based on bidirectional LSTM, CNN and CRF | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
CN109086267B (en) | Chinese word segmentation method based on deep learning | |
McCallum | Efficiently inducing features of conditional random fields | |
CN112528676B (en) | Document-level event argument extraction method | |
CN109543181B (en) | Named entity model and system based on combination of active learning and deep learning | |
Zhang et al. | Semi-supervised structured prediction with neural CRF autoencoder | |
CN111738007A (en) | Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network | |
CN110263325A (en) | Chinese automatic word-cut | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN108647191A (en) | It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method | |
CN111460824A (en) | Unmarked named entity identification method based on anti-migration learning | |
CN111651983A (en) | Causal event extraction method based on self-training and noise model | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN112347269A (en) | Method for recognizing argument pairs based on BERT and Att-BilSTM | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN110569355B (en) | Viewpoint target extraction and target emotion classification combined method and system based on word blocks | |
CN113312498B (en) | Text information extraction method for embedding knowledge graph by undirected graph | |
CN111062214A (en) | Integrated entity linking method and system based on deep learning | |
CN112699685B (en) | Named entity recognition method based on label-guided word fusion | |
CN114356990A (en) | Base named entity recognition system and method based on transfer learning | |
CN117094325B (en) | Named entity identification method in rice pest field |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |