CN109766523A

CN109766523A - Part-of-speech tagging method and labeling system

Info

Publication number: CN109766523A
Application number: CN201711095902.8A
Authority: CN
Inventors: 张鹏
Original assignee: Putian Information Technology Co Ltd
Current assignee: Potevio Information Technology Co Ltd; Putian Information Technology Co Ltd
Priority date: 2017-11-09
Filing date: 2017-11-09
Publication date: 2019-05-17

Abstract

The present invention provides a kind of part-of-speech tagging method and labeling system, this method comprises: step A-1: carrying out subordinate sentence, participle to text to be marked, forms the first input text；Whether step A-2: including rare word in detection the first input text, if it is, the rare word of the first input text is replaced with preset characters, forms the second input text, if it is not, then the second input text is enabled to be equal to the first input text；Step A-3: being term vector V1 by the first input text conversion, is term vector V2 by the second input text conversion；Step A-4: V1 is inputted into CNN model, exports word feature vector V1 '；Step A-5: V2 is inputted into BGRU model, exports word feature vector V2 '；Step A-6: connection V1, V1 ' and V2 ' obtain V3, by V3 input BLSTM model, and by the output result of BLSTM model input CRF model, CRF model export all participles of text to be marked part of speech label.The part-of-speech tagging accuracy rate of normal word and rare word can be improved in part-of-speech tagging method provided by the invention.

Description

Part-of-speech tagging method and labeling system

Technical field

The present invention relates to artificial intelligence field, in particular to a kind of part-of-speech tagging method and labeling system.

Background technique

Part-of-speech tagging (part-of-speech tagging), abbreviation POS sentence each word in given sentence sequence Determine part of speech and marked, it is the foundation stone for going deep into processing natural language processing, is machine translation, speech recognition, information retrieval Contour level task provides the foundation.

With the development of nerual network technique, new model is constantly suggested, and the introducing of neural network is so that part of speech The accuracy rate of mark is further promoted.Wherein, Yoav Goldber is based on BLSTM (bidirectional long short- Term memory) model to the part-of-speech tagging of rare word and unregistered word carry out research make progress.Nowadays, in part-of-speech tagging Field, the model being widely used are CNN (convolutional neural networks)+BLSTM+CRF (conditional random field algorithm) model.

But CNN+BLSTM+CRF model, it is lower for the mark accuracy rate of rare word and unregistered word, wherein rare word refers to The lower word of the frequency of occurrences in corpus.

CNN+BLSTM+CRF model by normal word with rare word feature is indiscriminate is read out together, and rare word Part of speech be often gathered in the limited part of speech such as noun, therefore will affect the part-of-speech tagging accuracy rate of rare word Yu normal word.

Summary of the invention

The present invention provides a kind of part-of-speech tagging method and labeling systems, and the part of speech mark of normal word and rare word can be improved Infuse accuracy rate.

The present invention provides a kind of part-of-speech tagging method, includes convolutional neural networks CNN model, bidirectional gate cycling element BGRU Model, two-way length Memorability network B LSTM model and condition random field CRF model, method includes the following steps:

Step A-1: carrying out subordinate sentence, participle to text to be marked, forms the first input text；

Whether step A-2: including rare word in detection the first input text, if it is, by the dilute of the first input text There is word to replace with preset characters, form the second input text, if it is not, then the second input text is enabled to be equal to the first input text；

Step A-3: being term vector V1 by the first input text conversion, is term vector V2 by the second input text conversion；

Step A-4: V1 is inputted into CNN model, CNN model exports word feature vector V1 '；

Step A-5: V2 is inputted into BGRU model, BGRU model exports word feature vector V2 '；

Step A-6: connection V1, V1 ' and V2 ' obtain V3, V3 is inputted into BLSTM model, and by the output knot of BLSTM model Fruit inputs CRF model, and CRF model exports the part of speech label of all participles of text to be marked.

The invention also includes a kind of rare word part of speech feature separation methods, comprising:

Step A-1: carrying out subordinate sentence, participle to text to be separated, forms the first input text；

Step A-4: V1 is inputted into convolutional neural networks CNN model, CNN model exports word feature vector V1 '；

Step A-5: V2 is inputted into bidirectional gate cycling element BGRU model, BGRU model exports word feature vector V2 '；

Step B: connection V1, V1 ' and V2 ' obtain V3, the vector location in V3 comprising preset characters be rare word feature to Unit is measured, the vector location that preset characters are not included in V3 is normal word feature vector unit.

The invention also includes a kind of training methods for part-of-speech tagging model, which includes convolutional Neural Network C NN model, bidirectional gate cycling element BGRU model, two-way length Memorability network B LSTM model and condition random field CRF Model；

Training method includes:

Step C-1: the sample data of training corpus is converted into the first input text；

Step A-4: V1 is inputted into CNN model, CNN mould exports word feature vector V1 '；

Step A-6: connection V1, V1 ' and V2 ' obtain V3, V3 is inputted into BLSTM model, and by the output knot of BLSTM model Fruit inputs CRF model, and CRF model exports the part of speech label of all participles of training corpus sample data；

Step C-2: the mistake between the part of speech label of CRF model output and the part of speech label of training corpus sample data is calculated Difference, according to error update CNN, BGRU, BLSTM and CRF model.

The invention also includes a kind of part-of-speech tagging systems, comprising:

Text Pretreatment module: carrying out subordinate sentence, participle to text to be marked, forms the first input text；

Whether rare word processing module: including rare word in detection the first input text, if it is, by the first input text This rare word replaces with preset characters, forms the second input text, if it is not, then the second input text is enabled to be equal to the first input Text；

Term vector generation module: by first input text conversion be term vector V1, by second input text conversion be word to Measure V2；

CNN model: V1 is inputted into CNN model, CNN model exports word feature vector V1 '；

BGRU model: V2 is inputted into BGRU model, BGRU model exports word feature vector V2 '；

Vector link block: connection V1, V1 ' and V2 ' obtain V3；

BLSTM model: inputting BLSTM model for V3, and the output result of BLSTM model inputted CRF model,

CRF model: CRF model exports the part of speech label of all participles of text to be marked.

The invention also includes a kind of rare word part of speech feature separation systems, comprising:

Text Pretreatment module: carrying out subordinate sentence, participle to text to be separated, forms the first input text；

Vector link block: connection V1, V1 ' and V2 ' obtain V3；

Characteristic separation module: the vector location in V3 comprising preset characters is rare word feature vector unit, is not wrapped in V3 Vector location containing preset characters is normal word feature vector unit.

The invention also includes a kind of training systems for part-of-speech tagging model, comprising:

Text conversion module: the sample data of training corpus is converted into the first input text；

Vector link block: connection V1, V1 ' and V2 ' obtain V3；

BLSTM model: V3 is inputted into BLSTM model, and the output result of BLSTM model is inputted into CRF model；

CRF model: CRF model exports the part of speech label of all participles of training corpus sample data.

Update module: it calculates between the part of speech label of CRF model output and the part of speech label of training corpus sample data Error, according to error update CNN, BGRU, BLSTM and CRF model.

Part-of-speech tagging method of the invention increases BGRU model on the basis of CNN+BLSTM+CRF model, compared to back The case where only CNN model that scape technology is previously mentioned, increased BGRU improves the extraction accuracy of the part of speech feature of normal word, The input value of BLSTM+CRF includes the output of CNN and BGRU simultaneously, because the marker characteristic comprising rare word is (pre- in BGRU output If character), allow BLSTM+CRF to isolate rare word and normal word, can further improve to rare word and normal word Learning effect and recognition effect.

Detailed description of the invention

Fig. 1 is list LSTM network structure；

Fig. 2 is GRU network structure；

Fig. 3 is GRU model neuron state Computational frame figure；

Fig. 4 is part-of-speech tagging method flow diagram of the present invention；

Fig. 5 is the neural network of CNN+BLSTM+CRF model；

Fig. 6 is the neural network of invention CNN+BGRU+BLSTM+CRF model；

Fig. 7 is the rare word part of speech feature separation method flow chart of the present invention；

Fig. 8 is the training method flow chart of part-of-speech tagging model of the present invention；

Fig. 9 present invention is part-of-speech tagging system construction drawing；

Figure 10 is the rare word part of speech feature separation system structure chart of the present invention；

Figure 11 is the training system structure chart of part-of-speech tagging model of the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be for distinguishing similar object, without for describing specific sequence and precedence.It should be understood that making in this way Data are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein can be in addition to illustrating herein Or the sequence other than those of description is implemented.

In part-of-speech tagging field, artificial neural network is recognition result corresponding to text input for part of speech input.People Artificial neural networks generate mapping by study between input model and output mode, and export the study knot for indicating the mapping Fruit.Based on learning outcome, artificial neural network is generated for the output for having the input model for being ready to use in study.

Part-of-speech tagging method of the invention, comprising convolutional neural networks CNN model, bidirectional gate cycling element BGRU model, Two-way length Memorability network B LSTM model and condition random field CRF model are to 4 model introductions below:

Convolutional neural networks CNN model, is usually used to and does feature extraction work, and the conventional part of the model mainly includes Input layer, convolutional layer, pond (Pool) layer and output layer.

Input layer can be initial data, be also possible to characteristic pattern.And convolutional layer then includes the convolution that can learn Core and activation primitive.It inputs information and convolution kernel carries out convolution algorithm, convolution results are then inputted into activation primitive, export feature Figure, therefore the layer is also feature extraction layer.Input signal is divided into nonoverlapping region by pond layer, carries out pond to each region Change operation.Pond operation is commonly used for maximum value pondization and mean value pond.The operation can be used to eliminate the offset and distortion of signal. CNN model generallys use multiple convolution layer and the alternate depth network structure of pond layer.The full articulamentum of CNN model rolls up multilayer Successively group is combined into one group of signal to multiple groups feature after product pond operation.And the label probability distribution based on input is obtained, to mention The internal information of words and phrases is taken, the character representation based on word is generated.

Two-way shot and long term memory network BLSTM model is different from LSTM network, and there are two contrary for BLSTM model LSTM layers parallel, their structures having the same only read the sequence difference of text.Shown in single LSTM network structure Fig. 1.

The memory unit of BLSTM model mainly includes three kinds of gate cells, and whether sigmoid input gate can determine input value Current state can be added to.State cell has linear self-loopa, its weight is controlled by forgetting door.The output of cell can be with It is closed by out gate.

Each more new formula is summarized as:

i_t=σ (W_ih_t-1+U_iα_t+b_i)i_t=σ (W_ih_t-1+U_iα_t+b_i)

f_t=σ (W_fh_t-1+U_fα_t+b_f)

o_t=σ (W_oh_t-1+U_oα_t+b_o)

h_t=o_t*tanh(c_t)

Wherein σ indicates sigmoid activation primitive, α_tFor the input vector of t moment, h_tRepresent hidden state, U_i, U_f, U_c, U_o Respectively x_iDifferent weight matrixs.And W_i, W_f, W_c, W_oFor h_tNot fellow disciple weight matrix, b_i, b_f, b_c, b_oIt is inclined for each door It sets, i_t, f_t, c_t, o_tIt then respectively represents input gate, forget door, memory unit and out gate.

The output of BLSTM are as follows:

y_t=[h_ft, h_bt]

The full articulamentum of final BLSTM model is output layer.

Output of the CRF as part of speech label.Enable x={ x₁..., x_nIndicate list entries, x_iIt indicates i-th in list entries The vector of a word.Y={ y₁..., y_nIndicate output sequence part of speech label, y indicate x sequence label set.CRF is defined A series of conditional probability p (y | z；W, b):

Wherein,For potential function, W and b are weight and bias vector.To mould When type is trained, CRF network is optimized by optimizing the negative log-likelihood function of CRF.

Loss function of the negative log-likelihood function of CRF as model:

L_CRF(W, b)=- ∑_ilog(p(y|z；W, b)

The likelihood function expresses under different parameter vectors, observes data probability of occurrence.In the hypothesis of Gaussian noise Under, it minimizes negative log-likelihood function and is equivalent to minimize error of sum square function, is i.e. minimum model predication value and actual value Between difference.

BGRU (bidirectional gated recurrent unit) model, it is a kind of improvement of LSTM model. As shown in Fig. 2, r, z indicate two kinds of door machine systems of resetting and update in GRU model.Pass through the optimization of door machine, BGRU model ginseng Quantity is less, while guaranteeing modelling effect, simplifies model, the present invention is extracted feature using BGRU and effectively mentioning Under the premise of taking feature, the population parameter of model is reduced to the greatest extent.

For GRU model neuron state Computational frame as shown in figure 3, in t moment, the state in GRU passes through following equation meter It calculates:

Z_t=σ (W_z*[h_t-1, x_t])

r_t=σ (W* [h_t-1, x_t])

h_t=tanh (W* [r_t⊙h_t-1, x_t])

h_t=(1-z_t)⊙h_t-1+z_t⊙h_t

Wherein z_t, r_tIt is to update to be multiplied with resetting door machine function, ⊙ representing matrix corresponding element respectively, σ indicates sigmod Function, W indicate the shared parameter of GRU model.

Assuming that a sentence S_iThere is T word, each word isBy sentence S_iRegard a sequence as, in sentence Word be sentence sequence component part.So pass through the preceding expression that sentence can be obtained to GRU and backward GRU model respectively:

Pass through combination Obtain sentence S_iSemantic expressiveness:

Part-of-speech tagging method proposed by the invention increases BGRU mould on the basis of CNN+BLSTM+CRF model Type, specific algorithm are as shown in Figure 4, comprising the following steps:

Step A-1 (S101): carrying out subordinate sentence, participle to text to be marked, forms the first input text；

Whether step A-2 (S102): including rare word in detection the first input text, if it is, by the first input text This rare word replaces with preset characters, forms the second input text, if it is not, then the second input text is enabled to be equal to the first input Text；

Step A-3 (S103): being term vector V1 by the first input text conversion, is term vector by the second input text conversion V2；

Step A-4 (S104): V1 is inputted into CNN model, CNN model exports word feature vector V1 '；

Step A-5 (S105): V2 is inputted into BGRU model, BGRU model exports word feature vector V2 '；

Step A-6 (S106): connection V1, V1 ' and V2 ' obtain V3, V3 is inputted into BLSTM model, and by BLSTM model It exports result and inputs CRF model, CRF model exports the part of speech label of all participles of text to be marked.

Part-of-speech tagging method of the invention, on the basis of CNN+BLSTM+CRF model, compared to before use CNN merely To the universal model that term vector is scanned, the present invention adds additional for the rare word problem in part-of-speech tagging field BGRU network is handled by pretreated normal word term vector, obtains the part of speech feature of normal word.Fig. 5 is the prior art The neural network of CNN+BLSTM+CRF model, Fig. 6 are the neural network of CNN+BGRU+BLSTM+CRF model of the present invention.

BGRU has a forward direction GRU and reversed GRU, positive GRU positive can capture article information simultaneously in hidden layer, instead Capture article information is then upsided down the other way around to GRU, can capture more characteristic informations relative to unilateral network in this way.Together Rare word part is removed in the input of Shi Yinwei BGRU, and rare word removal can be weakened by extracting feature using BGRU so then Bring discontinuity influences, the part of speech feature of the normal word of extraction of maximum possible.

Then again with CNN to word feature and text term vector connect after input as BLSTM+CRF network.Phase There was only the case where CNN model than what background technique was previously mentioned, the extraction that increased BGRU improves the part of speech feature of normal word is quasi- Exactness, while the input value of BLSTM+CRF includes the output of CNN+BGRU, because including the marker characteristic of rare word in BGRU output (preset characters) allow BLSTM+CRF to isolate rare word and normal word, can further improve to rare word and normal The learning effect and recognition effect of word.

Further, in part-of-speech tagging method of the invention, the preset characters in step A-2 can be null character, can To be legal vector character defined in 0 or NaN or programming language, or using pre-defined character.Preset characters can be with BLSTM+CRF is set to identify rare word and normal word.

In method and step A-3 of the invention, text vector tool such as Word2Vec can be used, inputs text for first Term vector V1 is converted to, the second input is converted into term vector V2 herein.The term vector being embedded in by Word2Vec can be with Contacting between effective expression word and word is that word is embedded in one of common algorithm.

In the method for the invention, CNN model optimization uses the operation of maximum value pond, and the extraction part of speech of maximum possible is special Sign.

In method and step A-2 of the invention, the decision condition of rare word are as follows: in reference corpus, frequency of occurrence is lower than Preset value.Preset value is rule of thumb set, the number within generally 1-6.

It can be PFR corpus with reference to corpus.

PFR corpus is to have carried out word segmentation and part-of-speech tagging to the plain text corpus in People's Daily's first half of the year in 1998 It is made, in strict accordance with the date of People's Daily, version sequence, article sequential organization.Each word in article has word Property label.Have in current label sets 26 basic word mark (noun n, time word t, place word s, noun of locality f, number m, Quantifier q, it distinction word b, pronoun r, verb v, adjective a, descriptive word z, adverbial word d, preposition p, conjunction c, auxiliary word u, modal particle y, sighs Word e, onomatopoeia o, Chinese idiom i, idiom l, abbreviation j, enclitics h, ingredient k, morpheme g, non-morpheme word x, punctuate symbol are followed by Number w) outside, the angle applied from corpus, increases proper noun (name nr, place name ns, organization names nt, other proprietary names Word nz)；Some labels are also increased from linguistics angle, have used more than 40 label in total.

Optionally, in method and step A-2 of the invention, the decision condition of rare word are as follows: do not appear in normal word dictionary In word be rare word.Normal word dictionary includes all normal words.

The invention also includes a kind of rare word part of speech feature separation methods, as shown in fig. 7, comprises following steps:

Step A-1 (S201): carrying out subordinate sentence, participle to text to be separated, forms the first input text；

Whether step A-2 (S202): including rare word in detection the first input text, if it is, by the first input text This rare word replaces with preset characters, forms the second input text, if it is not, then the second input text is enabled to be equal to the first input Text；

Step A-3 (S203): being term vector V1 by the first input text conversion, is term vector by the second input text conversion V2；

Step A-4 (S204): V1 is inputted into CNN model, CNN model exports word feature vector V1 '；

Step A-5 (S205): V2 is inputted into BGRU model, CNN model exports word feature vector V2 '；

Step B (S206): connection V1, V1 ' and V2 ' obtain V3, the vector location in V3 comprising preset characters is rare word Feature vector units, the vector location that preset characters are not included in V3 is normal word feature vector unit.

Preferentially, in Fig. 7, CNN model uses the operation of maximum value pond.

Preferentially, in Fig. 7, the decision condition of rare word are as follows: in reference corpus, frequency of occurrence is lower than preset value.

V3, rare word feature vector unit and the normal word feature vector unit that Fig. 7 step B of the present invention is obtained can be used for it His neural network model, to improve the normal word of other models and the part-of-speech tagging accuracy rate of rare word.

Further, the decision condition of rare word may be set in Fig. 7: in reference corpus, frequency of occurrence is lower than pre- If value.

The invention also includes a kind of training methods for part-of-speech tagging model, which includes convolutional Neural Network C NN model, bidirectional gate cycling element BGRU model, two-way length Memorability network B LSTM model and condition random field CRF Model.

As shown in figure 8, the training method includes:

Step C-1 (S301): the sample data of training corpus is converted into the first input text；

Whether step A-2 (S302): including rare word in detection the first input text, if it is, by the first input text This rare word replaces with preset characters, forms the second input text, if it is not, then the second input text is enabled to be equal to the first input Text；

Step A-3 (S303): being term vector V1 by the first input text conversion, is term vector by the second input text conversion V2；

Step A-4 (S304): V1 is inputted into CNN model, CNN model exports word feature vector V1 '；

Step A-5 (S305): V2 is inputted into BGRU model, BGRU model exports word feature vector V2 '；

Step A-6 (S306): connection V1, V1 ' and V2 ' obtain V3, V3 is inputted into BLSTM model, and by BLSTM model It exports result and inputs CRF model, CRF model exports the part of speech label of all participles of training corpus sample data；

Step C-2 (S307): the part of speech of the part of speech label and training corpus sample data that calculate the output of CRF model marks it Between error, according to error update CNN, BGRU, BLSTM and CRF model.

The preferred PFR corpus of training corpus in Fig. 8.

In the S307 (step C-2) of Fig. 8, when updating CNN, BGRU, BLSTM and CRF model, Adam algorithm can be used The renewal process of Controlling model.

Adam (adaptive moment estimation) algorithm full name is adaptive moments estimation algorithm, in probability theory " square " is meant that: if a stochastic variable X obeys some distribution, the first moment of X is E (X), that is, sample mean, X Second moment with regard to E (X²), that is, the average value of sample square.Adam algorithm is according to loss function to the gradient of each parameter Single order moments estimation and second order moments estimation, dynamic adjustment are directed to the learning rate of each parameter.Adam is declined based on gradient Method, but the Learning Step of iterative parameter has a determining range every time, will not cause because of very big gradient very big Learning Step, the value of parameter is more stable.Preferentially, the present invention is set as every 3000 step and carries out primary learning rate index declining Subtract, decaying radix is 0.1, remaining parameter uses default setting.

When training pattern, the hyper parameter of setting CNN model and BLSTM model is also needed.For CNN model, hyper parameter includes The window size of filter is preferentially set as 2*50；Filter quantity, is preferentially set as 50.For BLSTM model, hide The hyper parameter of unit includes quantity and the number of plies, it may be preferable that quantity is set as 100, and the number of plies is set as 2 layers.

In addition, the present invention uses dropout technology also to reduce over-fitting situation, it may be preferable that dropout rate is 0.5.

The present invention can be such that loss function restrains, make simultaneously by the way that the parameters such as the above learning rate, CNN window size are arranged Obtain network has optimal performance under this architecture.

For example: learning parameter of the invention may be configured as, and batch size is 64 to accelerate convergence rate.Initial study Rate size is 0.01, attenuation rate 0.1, and training total period is 20000 times, to obtain the preferable part-of-speech tagging of learning effect (CNN+BGRU+BLSTM+CRF) model.

The embodiment of Fig. 9 of the present invention gives a kind of part-of-speech tagging system, includes convolutional neural networks CNN model, two-way Door cycling element BGRU model, two-way length Memorability network B LSTM model and condition random field CRF model, the system are also wrapped It includes:

Term vector generation module: being term vector V1 by the first input text conversion, is term vector by two input text conversions V2；

Vector link block: connection V1, V1 ' and V2 ' obtain V3；

The operation of maximum value pond can be used in CNN model in Fig. 9.

In Fig. 9, the decision condition of rare word be may be configured as: in reference corpus, frequency of occurrence is lower than preset value.

Part-of-speech tagging system of the invention increases BGRU model on the basis of CNN+BLSTM+CRF model, compared to back The case where only CNN model that scape technology is previously mentioned, increased BGRU improves the extraction accuracy of the part of speech feature of normal word, The input value of BLSTM+CRF includes the output of CNN and BGRU simultaneously, because the marker characteristic comprising rare word is (pre- in BGRU output If character), allow BLSTM+CRF to isolate rare word and normal word, can further improve to rare word and normal word Learning effect and recognition effect.

The embodiment of Figure 10 of the present invention gives a kind of rare word part of speech feature separation system, includes convolutional neural networks CNN model, bidirectional gate cycling element BGRU model, the system further include:

Vector link block: connection V1, V1 ' and V2 ' obtain V3；

The operation of maximum value pond can be used in CNN model in Figure 10.

In Figure 10, the decision condition of rare word be may be configured as: in reference corpus, frequency of occurrence is lower than preset value.

The embodiment of Figure 11 of the present invention gives a kind of training system for part-of-speech tagging model, the part-of-speech tagging model Including convolutional neural networks CNN model, bidirectional gate cycling element BGRU model, two-way length Memorability network B LSTM model and Condition random field CRF model；

The training system includes:

Vector link block: connection V1, V1 ' and V2 ' obtain V3；

It, can also be using Adam algorithm Controlling model more when updating CNN, BGRU, BLSTM and CRF model in Figure 11 New process.

The preferred PFR corpus of training corpus in Figure 11.

It should be noted that the embodiment of part-of-speech tagging system of the invention, the implementation with part-of-speech tagging method of the present invention Example principle is identical, and related place can mutual reference.

The foregoing is merely illustrative of the preferred embodiments of the present invention, not to limit scope of the invention, it is all Within the spirit and principle of technical solution of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in this hair Within bright protection scope.

Claims

1. a kind of part-of-speech tagging method, which is characterized in that include convolutional neural networks CNN model, bidirectional gate cycling element BGRU Model, two-way length Memorability network B LSTM model and condition random field CRF model, which comprises

Whether step A-2: including rare word in detection the first input text, if it is, by the first input text Rare word replace with preset characters, the second input text is formed, if it is not, then the second input text is enabled to be equal to the first input text This；

Step A-3: being term vector V1 by the first input text conversion, is term vector by the second input text conversion V2；

Step A-4: the V1 is inputted into CNN model, the CNN model exports word feature vector V1 '；

Step A-5: the V2 is inputted into BGRU model, the BGRU model exports word feature vector V2 '；

Step A-6: described V1, V1 are connected ' and V2 ' obtain V3, the V3 is inputted into BLSTM model, and by the BLSTM model Output result input CRF model, the CRF model export all participles of the text to be marked part of speech label.

2. the method according to claim 1, wherein the CNN model uses the operation of maximum value pond.

3. the method according to claim 1, wherein the decision condition of the rare word are as follows: in reference corpus, Frequency of occurrence is lower than preset value.

4. a kind of rare word part of speech feature separation method, which is characterized in that the described method includes:

Step A-4: the V1 is inputted into convolutional neural networks CNN model, the CNN model exports word feature vector V1 '；

Step A-5: the V2 is inputted into bidirectional gate cycling element BGRU model, the BGRU model exports word feature vector V2 '；

Step B: described V1, V1 are connected ' and V2 ' obtain V3, the vector location in the V3 comprising the preset characters is rare Word feature vector unit, the vector location that the preset characters are not included in the V3 is normal word feature vector unit.

5. according to the method described in claim 4, it is characterized in that, the CNN model uses the operation of maximum value pond.

6. according to the method described in claim 4, it is characterized in that, the decision condition of the rare word are as follows: in reference corpus, Frequency of occurrence is lower than preset value.

7. a kind of training method for part-of-speech tagging model, which is characterized in that the part-of-speech tagging model includes convolutional Neural Network C NN model, bidirectional gate cycling element BGRU model, two-way length Memorability network B LSTM model and condition random field CRF Model；

The described method includes:

Step A-6: described V1, V1 are connected ' and V2 ' obtain V3, the V3 is inputted into BLSTM model, and by the BLSTM model Output result input CRF model, the CRF model exports the part of speech mark of all participles of the training corpus sample data Note；

Step C-2: it calculates between the part of speech label of the CRF model output and the part of speech label of the training corpus sample data Error, according to CNN, BGRU, BLSTM and CRF model described in the error update.

8. the method according to the description of claim 7 is characterized in that being adopted when updating CNN, BGRU, BLSTM and CRF model The renewal process of the model is controlled with Adam algorithm.

9. the method according to the description of claim 7 is characterized in that the training corpus is PFR corpus.

10. a kind of part-of-speech tagging system, which is characterized in that the system comprises:

Whether rare word processing module: including rare word in detection the first input text, if it is, defeated by described first The rare word for entering text replaces with preset characters, the second input text is formed, if it is not, then the second input text is enabled to be equal to first Input text；

Term vector generation module: being term vector V1 by the first input text conversion, is by the second input text conversion Term vector V2；

CNN model: the V1 is inputted into convolutional neural networks CNN model, the CNN model exports word feature vector V1 '；

BGRU model: the V2 is inputted into bidirectional gate cycling element BGRU model, the BGRU model exports word feature vector V2'；

Vector link block: described V1, V1 are connected ' and V2 ' obtain V3；

BLSTM model: inputting two-way length Memorability network B LSTM model for the V3, and by the output of the BLSTM model As a result input condition random field CRF model,

CRF model: the CRF model exports the part of speech label of all participles of the text to be marked.

11. system according to claim 10, which is characterized in that the CNN model uses the operation of maximum value pond.

12. system according to claim 10, which is characterized in that the decision condition of the rare word are as follows: in reference corpus In, frequency of occurrence is lower than preset value.

13. a kind of rare word part of speech feature separation system, which is characterized in that it is characterized in that, the system comprises:

Vector link block: described V1, V1 are connected ' and V2 ' obtain V3；

Characteristic separation module: the vector location in the V3 comprising the preset characters is rare word feature vector unit, described Vector location in V3 not comprising the preset characters is normal word feature vector unit.

14. system according to claim 13, which is characterized in that the CNN model uses the operation of maximum value pond.

15. system according to claim 13, which is characterized in that the decision condition of the rare word are as follows: in reference corpus In, frequency of occurrence is lower than preset value.

16. a kind of training system for part-of-speech tagging model, which is characterized in that the part-of-speech tagging model includes convolutional Neural Network C NN model, bidirectional gate cycling element BGRU model, two-way length Memorability network B LSTM model and condition random field CRF Model；

The system comprises:

CNN model: the V1 is inputted into CNN model, the CNN model exports word feature vector V1 '；

BGRU model: the V2 is inputted into BGRU model, the BGRU model exports word feature vector V2 '；

Vector link block: described V1, V1 are connected ' and V2 ' obtain V3；

BLSTM model: the V3 is inputted into BLSTM model, and the output result of the BLSTM model is inputted into CRF model；

CRF model: the CRF model exports the part of speech label of all participles of the training corpus sample data.

Update module: the part of speech of the part of speech label and the training corpus sample data that calculate the CRF model output marks it Between error, according to CNN, BGRU, BLSTM and CRF model described in the error update.

17. system according to claim 16, which is characterized in that when updating CNN, BGRU, BLSTM and CRF model, The renewal process of the model is controlled using Adam algorithm.

18. according to the method for claim 16, which is characterized in that the training corpus is PFR corpus.