CN110489750A

CN110489750A - Burmese participle and part-of-speech tagging method and device based on two-way LSTM-CRF

Info

Publication number: CN110489750A
Application number: CN201910739718.5A
Authority: CN
Inventors: 毛存礼; 满志博; 余正涛; 高盛祥; 王振晗; 王红斌
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2019-08-12
Filing date: 2019-08-12
Publication date: 2019-11-22

Abstract

The present invention relates to Burmese participles and part-of-speech tagging method and device based on two-way LSTM-CRF, belong to natural language processing technique field.The present invention generates the term vector based on syllable splitting using BERT and indicates；Burmese is carried out using two-way LSTM-CRF and segments task pre-training, obtains Burmese participle optimal models；Burmese part-of-speech tagging task pre-training is carried out using two-way LSTM-CRF, obtains Burmese part of speech label optimal models；By calling trained participle or part-of-speech tagging model, the Burmese sentence after the vectorization of input is segmented, part of speech label prediction；The participle of the Burmese based on two-way LSTM-CRF and part-of-speech tagging device is made according to above-mentioned steps function modoularization.The present invention has been able to achieve to Burmese sentence while participle and part-of-speech tagging, solves the problems, such as that Burmese causes to segment and mark inaccuracy due to data deficiency.

Description

Burmese participle and part-of-speech tagging method and device based on two-way LSTM-CRF

Technical field

The present invention relates to Burmese participles and part-of-speech tagging method and device based on two-way LSTM-CRF, belong to nature language Say processing technology field.

Background technique

Burmese participle and part-of-speech tagging are the background tasks in the research of Burmese natural language, traditional Burmese participle Method with part of speech label is handled based on statistical learning mostly, and the method based on statistical learning needs the Burma largely marked Pasture language corpus, handmarking are time-consuming and laborious.Currently, by the method for Burmese participle and part of speech label task combination deep learning Manage effect it is more preferable, but simply by Burmese carry out term vector characterization be it is inadequate, the smallest unit-in-context of Burmese is sound Section, can be more preferable in conjunction with the effect of syllable characteristic processing Burmese participle and part of speech label task.In addition, most of research work It is all to separate to handle by Burmese participle and part of speech label task, Burmese part of speech label accuracy relies heavily on In the order of accuarcy of participle, Burmese participle and part of speech label are separately handled to the mistake for being easy for causing part-of-speech tagging task, Therefore, set forth herein Burmese of the method for the deep learning based on two-way LSTM-CRF to scarcity of resources type to carry out participle and word Property mark joint training.

Summary of the invention

The Burmese participle and part-of-speech tagging method that the present invention provides a kind of based on two-way LSTM-CRF, for solving Burmese is combined as the participle inaccuracy of Burmese caused by corpus that lacks in training, and by Burmese participle and part-of-speech tagging Can solve the not high problem of Burmese part-of-speech tagging accuracy rate.

The technical scheme is that a kind of Burmese participle and part-of-speech tagging method based on two-way LSTM-CRF, institute State based on two-way LSTM-CRF Burmese participle and part-of-speech tagging method specific step is as follows:

Step1, Burmese pretreatment carry out term vector characterization: utilizing the syllable in BERT combination Burmese word context Information, generating the term vector based on syllable splitting indicates；

Step2, coding training is carried out to the Burmese after term vector expression using two-way LSTM, extracts Burmese participle Feature carries out the mark of word segmentation to global Burmese word using CRF, and training Burmese participle model simultaneously obtains Burmese participle most Excellent model；Using the good participle optimal models of Burmese pre-training, the Burmese sentence of input is predicted, obtains the overall situation most Excellent solution prediction output Burmese word segmentation result；

Step3, coding training is carried out to the Burmese after above-mentioned participle using two-way LSTM, extracts Burmese part-of-speech tagging Feature, using CRF to global Burmese word carry out part of speech label, training Burmese part of speech markup model simultaneously obtain Burmese Part of speech marks optimal models；Optimal models are marked using the good part of speech of Burmese pre-training, the Burmese sentence of input is carried out Prediction obtains globally optimal solution prediction output Burmese part-of-speech tagging result.

Step4, pass through and call trained participle or part-of-speech tagging model, to Burma's sentence after the vectorization of input Son is segmented, part of speech label is predicted.

As a preferred solution of the present invention, the specific steps of the step Step1 are as follows:

Step1.1, a point Burmese sentence for good word is got by Asian language treebank website；

Step1.2, the word of the Burmese sentence of good word will be divided according to syllable splitting using Burmese syllable splitting tool Word；

Step1.3, it is indicated using BERT pre-training Burmese based on the term vector of syllable splitting.

The specific steps of the step Step1.1 are as follows: by Asian language treebank website (http: // Www2.nict.go.jp/astrec-att/member/mutiyama/ALT/ Burma's sentence of 20106 points of good words) is got Son.For example,

The specific steps of the step Step1.2 are as follows:

Burmese minimum unit is syllable, for example, Chinese semantic meaning is that " my dog is very lovely." Burmese sentenceIt is indicated using the Burmese word after syllable splitting

As a preferred solution of the present invention, the specific steps of the step Step1.3 are as follows:

Step1.3.1,20106 Burmese sentences progress duplicate removal of acquisition is cut into 70002 Burmese words, led to The multilingual model for calling BERT is crossed, the Burmese term vector for directly generating 768 dimensions with contextual information indicates；

Step1.3.2, again by after syllable splitting Burmese word generate term vector indicate；

Step1.3.3, finally, the two is spliced, obtaining, there is the term vector of Burmese syllable information to characterize.

As a preferred solution of the present invention, the specific steps of the step Step2 are as follows:

Step2.1, Burmese is extracted based on the Burmese participle feature that syllable vector generates using two-way LSTM, Obtaining complete two-way LSTM indicates；

Step2.2, the mark of word segmentation is carried out to global Burmese word using CRF, training Burmese participle model simultaneously obtains Burmese segments optimal models；Using the good participle optimal models of Burmese pre-training, the Burmese sentence of input is carried out pre- It surveys, obtains globally optimal solution prediction output Burmese word segmentation result.

As a preferred solution of the present invention, the specific steps of the step Step2.1 are as follows:

Step2.1.1, as unit of sentence, a Burmese sentence containing i word, which is carried out vectorization, indicates note Make: X=(x₁,x₂,x₃,x₄,...,)；

The hidden status switch of Step2.1.2, forward direction LSTM outputWith reversed LSTM'sEach The hidden state of position output carries out opsition dependent and splices to obtain complete hidden status switch: h_t=[h_t；h_t]∈R^n×m, obtain one The two-way LSTM of a complete Burmese segmentation sequence is indicated.

As a preferred solution of the present invention, the specific steps of the step Step2.2 are as follows:

Step2.2.1、A_ijIndicate the transfer score of the label from the i-th to j-th；Represent y_iThe label of a sentence tagy_iIt is transferred to y_i+1The label tagy of a sentence_i+1A possibility that marking, length is equal to the label sequence of sentence length Arrange y=(y₁,y₂,...,y_n), then marking of the participle model for the label of sentence x equal to y is s (X, y)_Participle,Table Show from the participle probability after n word prediction:

Step2.2.2, normalized using Softmax after participle probability P (y | X)_Participle:

Wherein,Indicate the process for solving logarithm in participle model to each Burmese word；

By maximizing log-likelihood function when Step2.2.3, the training of Burmese participle model, a training sample is obtained (x,y_x) participle loss function log (p (y | x))_ParticipleIt is as follows:

Wherein,Expression is worked asWhen participle logarithm；It indicates to belong to Yx's y-th Sentence, x indicates that Burmese word, Y indicate that Burmese sentence, Yx indicate that multiple sentences have the collection of Burmese word utterances in Yx It closes；

Step2.2.4, Burmese participle model use the Viterbi algorithm of Dynamic Programming in prediction process when decoding To solve optimal participle prediction result y*_Participle:

Wherein,Expression is worked asWhen, the maximum value of score.

As a preferred solution of the present invention, the specific steps of the step Step3 are as follows:

Step3.1, Burmese is carried out based on the Burmese part-of-speech tagging feature that syllable vector generates using two-way LSTM It extracts, obtaining complete two-way LSTM indicates；

Step3.2, part of speech label is carried out to global Burmese word using CRF, training Burmese part of speech markup model is simultaneously Obtain Burmese part of speech label optimal models；Optimal models are marked using the good part of speech of Burmese pre-training, to the Burma of input Sentence is predicted, globally optimal solution prediction output Burmese part-of-speech tagging result is obtained.

As a preferred solution of the present invention, the specific steps of the step Step3.1 are as follows:

Step3.1.1, as unit of sentence, the Burmese sentence of a point good word containing i word is subjected to vector Changing indicates to be denoted as: X_{Burmese after participle}=(x₁, x₂, x₃, x₄...)；

The hidden status switch of Step3.1.2, forward direction LSTM outputWith reversed LSTM'sIn The hidden state of each position output carries out opsition dependent and splices to obtain complete hidden status switch: h_t=[h_t；h_t]∈R^n×m, obtain Two-way LSTM to a complete Burmese part of speech flag sequence is indicated.

As a preferred solution of the present invention, the specific steps of the step Step3.2 are as follows:

Step3.2.1、A_ijIndicate the transfer score of the label from the i-th to j-th；Represent y_iThe label of a sentence tagy_iIt is transferred to y_i+1The label tagy of a sentence_i+1A possibility that marking, length is equal to the label sequence of sentence length Arrange y=(y₁,y₂,...,y_n), then marking of the part of speech markup model for the label of sentence x equal to y is s (X, y)_{Part of speech label},Indicate the part of speech marking probability after predicting from n word:

Step3.2.2, normalized using Softmax after part of speech marking probability be P (y | X)_{Part of speech label}:

Indicate the process for solving logarithm in part of speech markup model to each Burmese word；

By maximizing log-likelihood function when Step3.2.3, the training of Burmese part of speech markup model, a training is obtained Sample (x, y_x) part of speech label loss function log (p (y | x))_{Part of speech label}:

Wherein,Expression is worked asWhen part of speech label logarithm；It indicates to belong to Yx y-th Sentence, x indicates that Burmese word, Y indicate that Burmese sentence, Yx indicate that multiple sentences have the collection of Burmese word utterances in Yx It closes；

Step3.2.4, Burmese part of speech markup model use the Viterbi of Dynamic Programming in prediction process when decoding Algorithm marks prediction result y* to solve optimal part of speech_{Part of speech label}:

Wherein,Expression is worked asWhen, the maximum value of score.

A kind of Burmese participle and part-of-speech tagging device based on two-way LSTM-CRF, including following module:

Burmese preprocessing module, for generating base using the syllable information in BERT combination Burmese word context It is indicated in the term vector of syllable characteristic cutting；

Burmese word segmentation module is mentioned for carrying out coding training to the Burmese after term vector expression using two-way LSTM The feature for taking Burmese to segment carries out the mark of word segmentation to global Burmese word using CRF, and training Burmese participle model simultaneously obtains Optimal models are segmented to Burmese；Using the good participle optimal models of Burmese pre-training, the Burmese sentence of input is carried out Prediction obtains globally optimal solution prediction output Burmese word segmentation result；

Burmese part-of-speech tagging module, for being carried out using two-way LSTM to the Burmese after Burmese word segmentation module participle Coding training, extracts the feature of Burmese part-of-speech tagging, carries out part of speech label to global Burmese word using CRF, training is remote Pasture words and phrases markup model simultaneously obtains Burmese part of speech label optimal models；It is marked using the good part of speech of Burmese pre-training optimal Model predicts the Burmese sentence of input, obtains globally optimal solution prediction output Burmese part-of-speech tagging result；

Burmese sentence participle, part-of-speech tagging module are predicted, for by calling trained participle or part-of-speech tagging Model segments the Burmese sentence after the vectorization of input, part of speech label prediction.

The beneficial effects of the present invention are:

1, the present invention utilizes BERT models coupling Burmese syllable characteristic information, generates the word with Burmese syllable information Vector indicates, strengthens the ability of Burmese term vector characterization, to improve the performance of Burmese participle and part of speech label；

2, present invention joint Burmese participle and part of speech mark task, reinforce the characterization ability of word, promote Burmese point The performance of word and part of speech label；

3, the present invention is generated special based on syllable firstly, using the syllable information in BERT combination Burmese word context Levy cutting term vector indicate, secondly, encoded using two-way LSTM, respectively extract Burmese participle and part-of-speech tagging spy Sign, can combine participle and part-of-speech tagging training mission in this way and the feature extracted transmitting is independent of each other.Finally, sharp The result of globally optimal solution prediction output Burmese participle and part-of-speech tagging is obtained with CRF.

Detailed description of the invention

Fig. 1 is total flow chart in the present invention；

Fig. 2 is the flow diagram of the method for the present invention；

Fig. 3 is Burmese participle and part-of-speech tagging apparatus structure block diagram based on two-way LSTM-CRF.

Specific embodiment

Embodiment 1: as shown in Figure 1-3, a kind of Burmese participle and part-of-speech tagging method based on two-way LSTM-CRF, institute State based on two-way LSTM-CRF Burmese participle and part-of-speech tagging method specific step is as follows:

Step1.1, pass through Asian language treebank website (http://www2.nict.go.jp/astrec-att/ Member/mutiyama/ALT/ the Burmese sentence of 20106 points of good words) is got.For example,

Table 1 gets the format of Burmese sentence

Burmese minimum unit is syllable, for example, Chinese semantic meaning is that " my dog is very lovely." Burmese sentenceIt is indicated using the Burmese word after syllable splittingIt specifically can be as shown in table 2:

Burmese word after 2 syllable splitting of table indicates

Step2.1.2, using the good term vector of pre-training by each Burmese word x in sentence_i∈R^dIt is mapped to corresponding Matrix in, d be term vector training dimension.Before inputting next layer, it is arranged dropout layers to alleviate over-fitting.

By the sequence vector (x of each word of a sentence₁,x₂,x₃,x₄...) defeated at various locations as two-way LSTM Hidden state out carries out the input that opsition dependent splices M time step, then the hidden status switch that positive LSTM is exported With reversed LSTM'sThe hidden state exported at various locations carries out opsition dependent and splices to obtain a complete hidden state Sequence:

h_t=[h_t；h_t]∈R^n×m

It is indicated by the two-way LSTM of above formula available one complete Burmese segmentation sequence.

After dropout is set, a linear layer is accessed, hidden state vector is mapped to k dimension from m dimension, k is mark collection Number of tags, so that the feature of the Burmese participle automatically extracted, is denoted as:

P=(p₁,p₂,...,p_n)∈R^n×k

It can be the p in above-mentioned formula_i∈R^kPer one-dimensional p_ijIt can be regarded as x_iIt is categorized into the marking of j-th of label Value.P_i,yiBe i-th of Word prediction be y_iThe label tagy of a sentence_iA possibility that.

Step2.2.1, CRF layers of parameter is the matrix A of one (k+2) × (k+2), A_ijIndicate the label from the i-th to j-th Transfer score；Represent y_iThe label tagy of a sentence_iIt is transferred to y_i+1The label tagy of a sentence_i+1Possibility Property marking, length is equal to the sequence label y=(y of sentence length₁,y₂,...,y_n), then participle model is for sentence x Label equal to y marking be s (X, y)_Participle,Indicate the participle probability after predicting from n word:

Wherein,Expression is worked asWhen, the maximum value of score.

Step3.1.2, using the good term vector of pre-training by each Burmese word x in sentence_i∈R^dIt is mapped to corresponding Matrix in, d be term vector training dimension.Before inputting next layer, it is arranged dropout layers to alleviate over-fitting.

By the Burmese sentence sequence vector X of point good word of each word of a sentence_{Burmese after participle}=(x₁, x₂, x₃, x₄...) the hidden state that is exported at various locations as two-way LSTM carries out the input that opsition dependent splices M time step, then general The hidden status switch of positive LSTM outputWith reversed LSTM'sThe hidden state exported at various locations into Row opsition dependent splices to obtain a complete hidden status switch:

h_t=[h_t；h_t]∈R^n×m

It is indicated by the two-way LSTM of above formula available one complete Burmese part of speech flag sequence.

After dropout is set, a linear layer is accessed, hidden state vector is mapped to k dimension from m dimension, k is mark collection Number of tags, so that the feature of the Burmese part-of-speech tagging automatically extracted, is denoted as:

P=(p₁,p₂,...,p_n)∈R^n×k

It can be the p in above-mentioned formula_i∈R^kPer one-dimensional p_ijIt can be regarded as x_iIt is categorized into the marking of j-th of label Value.Be i-th of Word prediction be y_iThe label tagy of a sentence_iA possibility that.

Step3.2.1, CRF layers of parameter is the matrix A of one (k+2) × (k+2), A_ijIndicate the label from the i-th to j-th Transfer score；Represent y_iThe label tagy of a sentence_iIt is transferred to y_i+1The label tagy of a sentence_i+1Possibility Property marking, length is equal to the sequence label y=(y of sentence length₁,y₂,...,y_n), then part of speech markup model for Marking of the label of sentence x equal to y is s (X, y)_{Part of speech label},Indicate the part of speech marking probability after predicting from n word:

Wherein,Expression is worked asWhen, the maximum value of score.

In order to illustrate effect of the invention, the present invention has carried out following comparative experiments, and used experimental data is from Asia The Burmese data set of continent low-resource language treebank comprising 20011 points of good words and has the Burmese sentence of part-of-speech tagging, will After Burmese sentence removes repetition, 75498 Burmese words are generated after carrying out cutting.Experiment Training collection and test set according to The ratio of 9:1 is divided.

In strict accordance with standard evaluation index accurate rate (Precision), recall rate (Recall) and F value (F1- in experiment Measure formula) is as follows:

Wherein TP is genuine positive example, and FP is false counter-example, and FP is false positive example, and TN is genuine counter-example.

In order to verify the effect of two-way LSTM-CRF model proposed in this paper, design following comparative testing is analyzed.

It compares with traditional CRF mode and is compared with the model of CNN, the results are shown in Table 3 for specific experiment. In the case where ensuring its dependent variable all unanimous circumstances, contrast model parameter is based on the former hyper parameter setting provided, the experiment of survey Shown in P, R, F1 of test data table 3 specific as follows.

The different neural network model comparisons of table 3

After using the Burmese participle of different models and part-of-speech tagging result by comparison, it was therefore concluded that: application is based on sound The participle of the BERT+ of section two-way LSTM-CRF, part-of-speech tagging effect are an advantage over other three kinds.

Whether can be had an impact to experimental result for verifying using the experiment effect of different term vectors.Contrived experiment compares such as Under:

The expression that Burmese term vector is generated using the mode of word2vec, in experiment 2, by changing the Burma generated The dimension size comparative experiments effect of words and phrases vector.In the case where ensuring its dependent variable all unanimous circumstances, the equal base of comparative experiments parameter In the hyper parameter setting that original provides, P, R, F1 of the test data of the experiment of survey are specifically as shown in table 4.

Table 4 is the comparison different with term vector mode is generated of the of different sizes of term vector dimension

It is concluded that using the Burmese term vector ratio using BERT method based on syllable splitting directly by analysis comparison It to be got well using the effect of Word2vec.And generating the process of term vector in Word2vec is not that dimension is the bigger the better.

Dropout layers are provided in hyper parameter setting, the main purpose of Droupout layers of setting is can be relatively effective The generation for alleviating over-fitting, achievees the effect that regularization to a certain extent.Compare examination by the way that different size is arranged in experiment 3 Test effect.In the case where ensuring its dependent variable all unanimous circumstances, comparative experiments parameter is based on the former hyper parameter setting provided, survey P, R, F1 of the test data of experiment are specifically as shown in table 5.

Table 5 is the comparison of different sizes of the Dropout number of plies

Dropout layers of size, which is arranged, prevents over-fitting, similar to giving up a part of neuron, but can be in experiment 3 Find out not to be that the number of plies is the bigger the better, specific effect is as follows: 0.4 > Dropout layers of 0.8 > Dropout of size of Dropout layers of size Layer size 0.6 has and selects Dropout layers of size 0.4 most suitable in experimentation known to above-mentioned experimental result.

Burmese minimum unit is syllable, is exactly secondly character, is cut by comparing the participle based on character with based on syllable The effect of the Burmese participle divided.In the case where ensuring its dependent variable all unanimous circumstances, comparative experiments parameter is based on what original provided Hyper parameter setting, P, R, F1 of the test data of the experiment of survey are specifically as shown in table 6.

Table 6 is based on character and based on syllable participle comparison

After comparison, it is found that the effect of the Burmese term vector generated based on syllable splitting is better than based on character rank Generate term vector.The language feature of Burmese itself can be better blended into conjunction with the feature of Burmese syllable, incorporate Burmese Contextual information, preferably segmented, the effect of part-of-speech tagging.

The size of experimental data set also will affect final participle effect, by changing the size of experimental data set to having a competition Test result.In the case where ensuring its dependent variable all unanimous circumstances, comparative experiments parameter is based on the former hyper parameter setting provided, survey P, R, F1 of the test data of experiment are specifically as shown in table 7.

The comparison of different sizes of 7 data set of table

Experiment effect will be will affect using the size of the method training pattern data set of deep learning, pass through 5 comparison of experiment The effect obtained on the Burmese data set marked known to analysis result at 20000 is best.

Design according to the present invention, the Burmese participle and part of speech that the present invention also provides a kind of based on two-way LSTM-CRF Annotation equipment, as shown in figure 3, the device includes following module:

Above in conjunction with attached drawing, the embodiment of the present invention is explained in detail, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims

1. Burmese participle and part-of-speech tagging method based on two-way LSTM-CRF, it is characterised in that:

Specific step is as follows for the Burmese participle and part-of-speech tagging method based on two-way LSTM-CRF:

Step1, Burmese pretreatment carry out term vector characterization: being believed using the syllable in BERT combination Burmese word context Breath, generating the term vector based on syllable splitting indicates；

Step2, coding training is carried out to the Burmese after term vector expression using two-way LSTM, extracts the spy of Burmese participle It levies, the mark of word segmentation is carried out to global Burmese word using CRF, train Burmese participle model and obtain Burmese participle and is optimal Model；Using the good participle optimal models of Burmese pre-training, the Burmese sentence of input is predicted, global optimum is obtained Solution prediction output Burmese word segmentation result；

Step3, coding training is carried out to the Burmese after above-mentioned participle using two-way LSTM, extracts the spy of Burmese part-of-speech tagging Sign carries out part of speech label to global Burmese word using CRF, and training Burmese part of speech markup model simultaneously obtains Burmese part of speech Mark optimal models；Optimal models are marked using the good part of speech of Burmese pre-training, the Burmese sentence of input is predicted, Obtain globally optimal solution prediction output Burmese part-of-speech tagging result.

Step4, by call it is trained participle or part-of-speech tagging model, to the Burmese sentence after the vectorization of input into Row participle, part of speech label prediction.

2. the Burmese participle and part-of-speech tagging method, feature according to claim 1 based on two-way LSTM-CRF exists In: the specific steps of the step Step1 are as follows:

Step1.2, the word of the Burmese sentence of good word will be divided according to syllable splitting word using Burmese syllable splitting tool；

3. the Burmese participle and part-of-speech tagging method, feature according to claim 2 based on two-way LSTM-CRF exists In: the specific steps of the step Step1.3 are as follows:

Step1.3.1,20106 Burmese sentences progress duplicate removal of acquisition is cut into 70002 Burmese words, passes through tune With the multilingual model of BERT, the Burmese term vector for directly generating 768 dimensions with contextual information is indicated；

4. the Burmese participle and part-of-speech tagging method, feature according to claim 1 based on two-way LSTM-CRF exists In: the specific steps of the step Step2 are as follows:

Step2.1, Burmese is extracted using two-way LSTM based on the Burmese participle feature that syllable vector generates, is obtained Complete two-way LSTM is indicated；

Step2.2, the mark of word segmentation is carried out to global Burmese word using CRF, training Burmese participle model simultaneously obtains Burma Language segments optimal models；Using the good participle optimal models of Burmese pre-training, the Burmese sentence of input is predicted, is obtained Output Burmese word segmentation result is predicted to globally optimal solution.

5. the Burmese participle and part-of-speech tagging method, feature according to claim 4 based on two-way LSTM-CRF exists In: the specific steps of the step Step2.1 are as follows:

Step2.1.1, as unit of sentence, a Burmese sentence containing i word is subjected to vectorization expression and is denoted as: X =(x₁,x₂,x₃,x₄,...,)；

The hidden status switch of Step2.1.2, forward direction LSTM outputWith reversed LSTM'sAt various locations The hidden state of output carries out opsition dependent and splices to obtain complete hidden status switch: h_t=[h_t；h_t]∈R^n×m, obtain one it is complete The two-way LSTM of whole Burmese segmentation sequence is indicated.

6. the Burmese participle and part-of-speech tagging method, feature according to claim 4 based on two-way LSTM-CRF exists In: the specific steps of the step Step2.2 are as follows:

Step2.2.3, Burmese participle model training when by maximize log-likelihood function, obtain a training sample (x, y_x) participle loss function log (p (y | x))_ParticipleIt is as follows:

Wherein,Expression is worked asWhen participle logarithm；Indicate y-th of sentence for belonging to Yx, X indicates that Burmese word, Y indicate that Burmese sentence, Yx indicate that multiple sentences have the set of Burmese word utterances in Yx；

Step2.2.4, Burmese participle model are asked when decoding using the Viterbi algorithm of Dynamic Programming in prediction process Solve optimal participle prediction result y*_Participle:

Wherein,Expression is worked asWhen, the maximum value of score.

7. the Burmese participle and part-of-speech tagging method, feature according to claim 1 based on two-way LSTM-CRF exists In: the specific steps of the step Step3 are as follows:

Step3.1, Burmese is extracted based on the Burmese part-of-speech tagging feature that syllable vector generates using two-way LSTM, Obtaining complete two-way LSTM indicates；

Step3.2, part of speech label is carried out to global Burmese word using CRF, training Burmese part of speech markup model simultaneously obtains Burmese part of speech marks optimal models；Optimal models are marked using the good part of speech of Burmese pre-training, to Burma's sentence of input Son is predicted, globally optimal solution prediction output Burmese part-of-speech tagging result is obtained.

8. the Burmese participle and part-of-speech tagging method, feature according to claim 7 based on two-way LSTM-CRF exists In: the specific steps of the step Step3.1 are as follows:

Step3.1.1, as unit of sentence, the Burmese sentence of a point good word containing i word is subjected to vectorization table Show and be denoted as: X_{Burmese after participle}=(x₁, x₂, x₃, x₄...)；

The hidden status switch of Step3.1.2, forward direction LSTM outputWith reversed LSTM'sEach The hidden state of position output carries out opsition dependent and splices to obtain complete hidden status switch: h_t=[h_t；h_t]∈R^n×m, obtain one The two-way LSTM of a complete Burmese part of speech flag sequence is indicated.

9. the Burmese participle and part-of-speech tagging method, feature according to claim 7 based on two-way LSTM-CRF exists In: the specific steps of the step Step3.2 are as follows:

Step3.2.1、A_ijIndicate the transfer score of the label from the i-th to j-th；Represent y_iThe label tagy of a sentence_i It is transferred to y_i+1The label tagy of a sentence_i+1A possibility that marking, length is equal to the sequence label y=of sentence length (y₁,y₂,...,y_n), then marking of the part of speech markup model for the label of sentence x equal to y is s (X, y)_{Part of speech label},Indicate the part of speech marking probability after predicting from n word:

By maximizing log-likelihood function when Step3.2.3, the training of Burmese part of speech markup model, a training sample is obtained (x,y_x) part of speech label loss function log (p (y | x))_{Part of speech label}:

Wherein,Expression is worked asWhen part of speech label logarithm；Indicate y-th of sentence for belonging to Yx Son, x indicates that Burmese word, Y indicate that Burmese sentence, Yx indicate that multiple sentences have the set of Burmese word utterances in Yx；

Step3.2.4, Burmese part of speech markup model use the Viterbi algorithm of Dynamic Programming in prediction process when decoding To solve optimal part of speech label prediction result y*_{Part of speech label}:

Wherein,Expression is worked asWhen, the maximum value of score.

10. Burmese participle and part-of-speech tagging device based on two-way LSTM-CRF, it is characterised in that: including following module:

Burmese preprocessing module, for generating and being based on sound using the syllable information in BERT combination Burmese word context The term vector for saving feature cutting indicates；

Burmese word segmentation module is extracted remote for carrying out coding training to the Burmese after term vector expression using two-way LSTM The feature of pasture language participle carries out the mark of word segmentation to global Burmese word using CRF, and training Burmese participle model simultaneously obtains Burma Pasture language segments optimal models；Using the good participle optimal models of Burmese pre-training, the Burmese sentence of input is predicted, Obtain globally optimal solution prediction output Burmese word segmentation result；

Burmese part-of-speech tagging module, for being encoded using two-way LSTM to the Burmese after Burmese word segmentation module participle The feature of Burmese part-of-speech tagging is extracted in training, carries out part of speech label, training Burmese to global Burmese word using CRF Part of speech markup model simultaneously obtains Burmese part of speech label optimal models；Optimal mould is marked using the good part of speech of Burmese pre-training Type predicts the Burmese sentence of input, obtains globally optimal solution prediction output Burmese part-of-speech tagging result；

It predicts Burmese sentence participle, part-of-speech tagging module, calls trained participle or part-of-speech tagging model for passing through, Burmese sentence after the vectorization of input is segmented, part of speech label prediction.