CN108829684A

CN108829684A - A kind of illiteracy Chinese nerve machine translation method based on transfer learning strategy

Info

Publication number: CN108829684A
Application number: CN201810428618.6A
Authority: CN
Inventors: 苏依拉; 赵亚平; 牛向华
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2018-11-16

Abstract

The present invention be for solve the problems, such as to cover at present Chinese machine translation translation quality it is low, translation effect difference propose.Mongol belongs to low-resource language, and the parallel bilingualism corpora of a large amount of illiteracy Chinese of collection is extremely difficult, and transfer learning strategy can be with this problem of effective solution in the present invention.Transfer learning strategy is the method solved with existing knowledge to different but related fields problem.It is trained firstly, being based on neural machine translation framework using large-scale Ying-Chinese parallel corpora；It is covered in Chinese nerve machine translation framework secondly, the trained translation model parameters weighting of large-scale Ying-Chinese parallel corpora is moved to, utilizes existing Meng-neural Machine Translation Model of Chinese parallel corpora training；Finally, by based on transfer learning strategy neural machine translation translation and statistical machine translation translation BLEU value and translation fluency compare and evaluated.By using control variate method, show that transfer learning strategy effectively increases and cover Chinese machine translation performance.

Description

A kind of illiteracy Chinese nerve machine translation method based on transfer learning strategy

Technical field

The invention belongs to neural machine translation mothod field, in particular to a kind of illiteracy Chinese nerve based on transfer learning strategy Machine translation method.

Background technique

Machine translation, which refers to, is automatically converted a kind of natural language to identical meaning using machine (computer) Another natural language process.In recent years with increased, the machine translation conduct breakthrough language barrier of international exchange The important means hindered plays increasing effect in the production, life of people.Neural machine translation is as data-driven One of machine translation of method, height rely on scale, the quality of parallel corpora data structure.Since neural network parameter is advised Mould is huge, and only after training corpus has certain scale, neural machine translation just can be significantly beyond statistical machine translation Translation quality.However, the illiteracy Chinese parallel corpora resource for being presently available for experiment is extremely limited, a large amount of illiteracy bilingual parallel languages of the Chinese are collected It is extremely difficult that material library needs to expend a large amount of human and material resources.

Mongol machine translation research is started late and Mongol grammer complexity itself grinds illiteracy Chinese machine translation It is relatively slow to study carefully progress, wherein covering Chinese parallel corpora data set scarcity is to hinder one that covers the research of Chinese machine translation not allowing to neglect Depending on big problem.And the core concept of transfer learning is that the knowledge store that training originating task is obtained is got off, applied to it is new (no Together, but close task) in task.Transfer learning strategy allows to borrow a large amount of existing flag datas to train network by its knowledge It moves in the less model of flag data.

There are problems that Parallel Corpus scarcity is mentioned for low-resource language currently, having some neural machine translation mothods Out.Since Meng-Chinese parallel corpora is deficient and Mongol grammer complexity itself makes translation translation quality unsatisfactory, translation There are still serious Sparse phenomenons for process.The knowledge learnt is applied in close task by transfer learning strategy, is subtracted The amount of training data of few application task, provides possibility to reach general artificial intelligence.It is moved compared to from the beginning trained neural network Moving learning strategy may be implemented using the parameters weighting of trained network structure as pre-training, to accelerate translation model It trains progress and promotes final translation translation quality.

Summary of the invention

In order to overcome the disadvantages of the above prior art, the present invention from alleviate cover Chinese machine translation there are Sparse Problem and The angle for improving illiteracy Chinese machine translation translation quality is set out, and proposes a kind of simple and effective transfer learning for low-resource language Strategy.Currently, in addition to Chinese and english languages possess a large amount of bilingual teaching mode resource, all generally existing parallel corpora of other language The problem of library scarcity.A large amount of Ying-Chinese parallel corpora base resource training is obtained network parameter weight by the present invention, is moved to illiteracy In Chinese nerve Machine Translation Model, Meng-Chinese Parallel Corpus training is recycled to obtain covering Chinese nerve translation model.To solve Meng-Chinese Parallel Corpus deficiency problem reaches the target for being promoted and covering Chinese machine translation performance.

To achieve the goals above, the technical solution adopted by the present invention is that：

A kind of illiteracy Chinese nerve machine translation method based on transfer learning strategy, firstly, using large-scale English-Chinese parallel Corpus carries out English-Chinese neural Machine Translation Model training；Chinese nerve is covered secondly, the network parameter weight that training is acquired is moved to In Machine Translation Model；Then, it carries out covering the training of Chinese nerve Machine Translation Model using existing illiteracy Chinese parallel corpora, be based on The illiteracy Chinese nerve Machine Translation Model of transfer learning strategy；Finally, real using the illiteracy Chinese nerve Machine Translation Model that training obtains Now cover Chinese nerve machine translation.

Its specific steps can be described as follows：

01：The division and data prediction work of data set are carried out to Chinese and English corpus；Data set division refers to It is divided into training set, verifying collection and test set, data prediction work includes Chinese word segmentation and English pretreatment；

02：It constructs RNN and recycles neural Machine Translation Model framework, including encoder and decoder；

03：English-Chinese neural Machine Translation Model training, benefit in model training are carried out using large-scale English-Chinese parallel corpora Network parameter is adjusted and is optimized with stochastic gradient descent (SGD)；

04：Trained English-Chinese neural Machine Translation Model network parameter weight is moved to and covers Chinese nerve machine translation mould Type carries out parameter initialization to illiteracy Chinese neural network and replaces random initializtion；

06：Translation evaluation and test is carried out to test set using BLEU value.

Wherein, before carrying out model training, data preferably are carried out to English-Chinese parallel corpora and Meng Han parallel corpora base resource Pretreatment.The data prediction is the beam worker to be done when carrying out neural Machine Translation Model training using bilingual parallel corporas Make.

The data prediction using Stanford University's natural language laboratory open source software as tool, including：

1) participle operation is carried out to Chinese corpus using participle tool stanford-segmenter；

2) pretreatment operation English corpus is carried out to English corpus using English pretreating tool stanford-ner to carry out Pretreatment operation and Chinese data word segmentation processing；

The pretreatment is based on condition random field (CRF) model, i.e., using maximum entropy model as the conditional probability of main source Model, the model are one according to given input node, find the undirected graph model of the conditional probability of output node.CRF mould Type is defined as G=(V, E), is a non-directed graph, and V is node set, is the set of stochastic variable Y, Y={ Y_i| 1≤i≤m }, E For undirected line set, marking unit, E={ Y are needed for m for inputting a sentence_i-1,Y_i| 1≤i≤m }, it is m-1 side The linear chain of composition；

A given sequence a for needing to mark, the condition probability formula of corresponding flag sequence b are：

Wherein, ii is the subscript of sequence, and Z (a) is normalized function, λ_kAnd λ^ι _kIt is the parameter of model, k is meant that every The feature quantity on side and corresponding node, f_kAnd f^ι _kIt is a binary feature function.

It is described nerve Machine Translation Model formula be：

Wherein,It is the parameter of model,It is nonlinear function, y_nIt is current target language word, x is source language sentence, y< N is the target language sentence generated, V_yIt is object language term vector, D is target language vocabulary table, C_sIt is original language or more Literary vector, C_tObject language context vector.

The network type of the nerve Machine Translation Model is RNN Recognition with Recurrent Neural Network, to biography before RNN Recognition with Recurrent Neural Network It broadcasts in algorithm, for any one sequence index t, hides layer state h^(t)By list entries x^(t)Stratiform is hidden with previous moment State h^(t-1)It obtains：

h^(t)=σ (Ux^(t)+Wh^(t-1)+b)

Wherein, σ is the activation primitive of Recognition with Recurrent Neural Network, and generally tanh, b are the biasing of linear relationship, sequence index The output o of number t model^(t)Be expressed as o^(t)=Vh^(t)+ d, finally in sequence index t, prediction output is D is the biasing of output node, and U, V, W is the parameter matrix shared in Recognition with Recurrent Neural Network.

In the model training, encoder and decoder carry out joint training, and model formation is：

Wherein, θ is the parameter of model, and p is conditional probability function, (x_n,y_n) indicating bilingual training corpus, N is training sample Quantity, using maximum likelihood estimation algorithm training sample.

The network parameter weight acquired using bilingual parallel corporas training neural network is that each node of neural network joins The parameter matrix connect, the network parameter weight acquired using training carry out parameter initialization replacement at random to Chinese neural network is covered Initialization realizes that the network parameter weight for acquiring training moves to and covers Chinese nerve Machine Translation Model.

It is described using cover Chinese parallel corpora carry out cover the Chinese nerve Machine Translation Model training when, it is English-Chinese and cover Chinese translation model Parameter setting including dictionary size, term vector size, hidden layer size need it is consistent.

Further, by the translation translation and statistical machine translation translation that cover Chinese nerve machine translation prototype system with regard to BLEU Value is compared and is evaluated, and is achieved the purpose that finally to improve and is covered Chinese machine translation performance.

The BLEU value is the tool for assessing machine translation translation quality, and score is higher to illustrate Machine Translation Model Can be better, the formula of BLEU value is：

Wherein, w_n=1/M, M are the group word numbers of translation and reference translation, and the upper limit value of M is 4, p_nRepresent n-gram standard True rate, BP represent the shorter penalty factor of translation：

BP=e^min(1-r/h,0)

Wherein, h is the number of word in candidate translation, and r is and the immediate reference translation length of h length.

The transfer learning Policy Core thought is that the knowledge store that training originating task/domain is obtained is got off, and is applied to new (different, but close task) task/domain in.The present invention is the illiteracy Chinese nerve machine translation method based on transfer learning strategy, Research method belongs to the transfer learning method based on model.Source domain and aiming field are assumed in the transfer learning method based on model There is a model parameter that can be shared, specific moving method is on the model use to aiming field learnt by source domain, further according to Aiming field learns new model.

Compared with existing illiteracy Chinese machine translation method, the invention firstly uses large-scale english-chinese bilingual parallel corpora instructions Translation model is got, while guaranteeing English-Chinese parallel corpora high quality, wide coverage rate；Secondly, being turned over according to machine between different language The network parameter of English-Chinese translation model learning is moved to and is covered in Chinese Machine Translation Model by the correlation translated；Finally, using existing It covers the training of Chinese parallel corpora and covers Chinese nerve machine translation, transfer learning strategy implementation method simple possible proposed by the present invention has Effect alleviates machine translation Sparse Problem existing for low-resource language.

Detailed description of the invention

Fig. 1 is conventional machines study and transfer learning comparison diagram.

Fig. 2 is the flow chart for realizing the neural machine translation prototype system based on transfer learning strategy.

Specific embodiment

The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.

The present invention is based on the illiteracy Chinese nerve machine translation method of transfer learning strategy, realize that process is as follows：

1, data prediction problem is carried out to corpus

Data prediction work includes Chinese word segmentation and English data prediction.It is tested using Stanford University's natural language Room open source software participle tool stanford-segmenter carries out participle operation to Chinese corpus；Utilize English pretreating tool Stanford-ner carries out pretreatment operation to English corpus.Its basic functional principle is condition random field (CRF), i.e., with maximum Entropy model is the conditional probability model of main source, which is one according to given input node, finds output node The undirected graph model of conditional probability.CRF model is defined as G=(V, E), is a non-directed graph, and V is node set, and E is nonoriented edge Set, wherein V is the set Y={ Y of stochastic variable Y_i| 1≤i≤m }, the m needs for inputting a sentence mark single Member, E={ Y_i-1,Y_i| 1≤i≤m } it is the linear chain that m-1 side is constituted.

Wherein, ii is the subscript of sequence, and Z (a) is normalized function, λ_kAnd λ^ι _kIt is the parameter of model, k is defined in each The feature quantity on side and corresponding node, f_kAnd f^ι _kIt is a binary feature function.

2, statistical machine translation and neural machine translation modeling problem

A. statistical machine translation model describes：The key problem of statistical machine translation is exactly to use statistical method from bilingual corpora In learn translation model automatically, be then based on this translation model, to source language sentence from translation Candidate Set in find one scoring Highest target sentences are as best translation translation.According to noise channel simulated target language T as the defeated of noisy channel model Enter, after noisy communication channel encodes, corresponding sequence will be exported, this sequence is original language S.And statistical machine translation Target will exactly obtain corresponding object language T according to original language S Gray code, this process is otherwise known as decoding or translation.System Count Machine Translation Model formula：

ArgmaxPr (T | S)=argmaxPr (S | T) Pr (T) (2)

Wherein, Pr (T) indicates the language model of object language, and Pr (S | T) indicates bilingual translation model, the formula The referred to as fundamental equation of statistical machine translation.

B. neural Machine Translation Model description：Neural machine translation is a kind of to directly acquire natural language using neural network Between mapping relations machine translation method.The Nonlinear Mapping of neural machine translation (NMT) is different from linear statistical machine Device translates (SMT) model, and neural machine translation describes bilingual semanteme using the state vector of connection encoder and decoder Equivalence relation.Neural machine translation method based on deep learning becomes presently more than traditional statistical machine translation method New mainstream technology.Key problem using the mapping (i.e. machine translation) of neural fusion natural language is that conditional probability is built Mould, neural machine translation model formula：

C. machine translation translation quality evaluation index, that is, BLEU value is the tool for assessing machine translation translation quality, point Number is higher to illustrate that Machine Translation Model performance is better.The formula of BLEU value is：

BP=e^min(1-r/h,0) (7)

3, it is based on Recognition with Recurrent Neural Network (RNN) coder-decoder framework problem

Recognition with Recurrent Neural Network is more good at for traditional neural network for holding the relationship between context, Therefore it is commonly used in the inter-related task of natural language processing.The next word for wanting prediction sentence, needs to use under normal circumstances The word that front occurs into sentence, because front and back word is not independent in a sentence.It is current in Recognition with Recurrent Neural Network Output depend on the output of current input and front, RNN is the neural network with certain memory function.Coder-decoder Model (Encoder-Decoder) is one of neural network machine translation model, and encoder reads source language sentence, encoder Main task is that source language sentence is encoded into the fixed real vector of dimension, which represents original language semantic information；Solution The real vector for representing original language semantic information is read in code device part, then sequentially generates corresponding target language term sequence, The end of translation process is indicated until encountering a tail end mark.

A. encoder reads neural network and inputs x=(x₁,x₂,…,x_l), it is encoded to hidden state h=(h₁, h₂,…,h_l), encoder generallys use Recognition with Recurrent Neural Network (RNN) to realize, more new formula：

h_i=f (x_i,h_i-1) (5)

C=q ({ h₁,...,h_l}) (6)

Wherein, c is source language sentence subrepresentation, and f and q are nonlinear functions.

B. c and forerunner's output sequence { y is indicated in given source language sentence₁,y₂,…,y_t-1Under the conditions of, decoder is successively given birth to Vocabulary y is corresponded at object language_t, model formation：

Y=(y₁,y₂,…,y_T), decoder usually equally uses Recognition with Recurrent Neural Network, form：

p(y_t|{y₁,...,y_t-1, c)=g (y_t-1,s_t,c) (8)

G is that nonlinear function is used to calculate y_tProbability, s_tIt is the hidden state of decoder.

4, neural network propagated forward algorithm and translation model training problem

A. in Recognition with Recurrent Neural Network training process in propagated forward algorithm, for any one sequence index t, hidden layer State h^(t)By list entries x^(t)Layer state h is hidden with previous moment^(t-1)It obtains：

h^(t)=σ (Ux^(t)+Wh^(t-1)+b)

Wherein, σ is the activation primitive of Recognition with Recurrent Neural Network, and generally tanh, b are the biasing of linear relationship, sequence index The output o of number t model^(t)Be expressed as o^(t)=Vh^(t)+ d, finally in sequence index t, prediction output isd For the biasing of output node, U, V, W is the parameter matrix shared in Recognition with Recurrent Neural Network.

B. Parallel Corpus is given, the more common training method of neural machine translation is Maximum-likelihood estimation, the present invention Middle neural metwork training carries out joint training using encoder and decoder, and model training formula is：

5, attention mechanism problem

Neural machine translation initially translate effect be not it is highly desirable, be not above the machine translation based on statistical method Quality.As the end-to-end coder-decoder frame for machine translation proposes and attention mechanism is introduced into nerve In machine translation frame, so that the performance of neural machine translation is significantly improved and neural machine translation box has gradually been determined The main composition framework of frame.General neural network translation model by source language sentence be expressed as the real number of a fixed dimension to Amount, this method Shortcomings place, such as fixed-size vector can not give full expression to out source language sentence semantic information.It will Attention mechanism is added in neural Machine Translation Model, when generating target language term, is sought by attention mechanism dynamic Source language term information relevant to the word is generated is looked for, so that the ability to express of neural network machine translation model is enhanced, and And translation effect is significantly improved in related experiment.When using attention mechanism, formula 8 is newly defined as：

p(y_t|{y₁,...,y_t-1, x)=g (y_t-1,s_t,c_i) (9)

s_tIt is the hidden state of t moment Recognition with Recurrent Neural Network, is obtained by following formula：

s_t=f (s_t-1,y_t-1,c_t) (10)

G, f is nonlinear function, context vector (Context Vector) c_tDependent on source-language coding's sequence (h₁, h₂,…,h_l), h_iInclude i-th of input word contextual information.c_tThe following formula of calculation method：

a_tjIt is h_jWeight, the following formula of calculation method：

Wherein, e_tj=a (s_t-1,h_j) it is alignment model, calculate the matching journey that t moment generates word and j-th of original language word Degree.It is translated compared to common neural network machine, this method has merged more original language end information in decoding, can be significant Hoisting machine translates effect.

6, transfer learning strategy

Come on the basis of giving and training up data in traditional machine learning (Machine Learning) model Learn a model, is then classified to document and predicted using the model learnt.Transfer learning (Transfer Learning target) is that the knowledge acquired from some field or task is used to help the study of inter-related task.In attached drawing 1 In compared the difference of traditional machine learning and transfer learning.

The thought of transfer learning is that the knowledge store that training originating task is obtained is got off, and is applied in close task.It uses The pre-training model of training on large data sets, directly uses corresponding structure and weight, applies it on target problem, i.e., By the model " migration " of pre-training into target problem.How pre-training model is used, is by between source domain and aiming field data set The similarities of data, size determine.

The application method of pre-training model in the case of 1 four kinds of table

Relationship between source domain and aiming field	Model training method
		Data set is small, similarity is high	Pretreated model is as pattern extractor
Data set is small, similarity is low	Freeze k layers of weight before pre-training model, the subsequent layer of re -training
		Data set is big, similarity is low	Pretreated model weights initialisation, is trained using new data set
Data set is small, similarity is high	Weights initialisation is trained using new data set

With reference to Fig. 2, the present invention is based on the neural machine translation prototype system of transfer learning strategy specific implementation steps to retouch It states as follows：

To keep illiteracy Chinese translation flow of the invention clearer, a Mongolian is put up with below and is made to Chinese sentence translation process It is described in further detail.

To Mongol sentenceIt is as follows to carry out translation process：

01：The real vector of Mongol sentence boil down to fixed dimension, the vector are represent source language sentence by encoder Semantic information；

02：By the vector inversely decoding at corresponding target language sentence, attention mechanism generates decoder in decoder Dynamic finds original language context relevant with current word when each target language words, such as when generating Chinese word " work ", Mongolian wordIt is most related therewith；

03：Translation translation is evaluated and tested with regard to BLEU value；

04：Obtaining complete Chinese translation translation, " this work is completed us and is needed for a long time.".

Claims

1. a kind of illiteracy Chinese nerve machine translation method based on transfer learning strategy, which is characterized in that firstly, using large-scale English-Chinese parallel corpora carries out English-Chinese neural Machine Translation Model training；Secondly, the network parameter weight that training is acquired is moved to It covers in Chinese nerve Machine Translation Model；Then, it carries out covering the training of Chinese nerve Machine Translation Model using existing illiteracy Chinese parallel corpora, Obtain the illiteracy Chinese nerve Machine Translation Model based on transfer learning strategy；Finally, being turned over using the illiteracy Chinese nerve machine that training obtains It translates model realization and covers Chinese nerve machine translation.

2. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 1, which is characterized in that into Before row model training, data prediction is carried out to English-Chinese parallel corpora and Meng Han parallel corpora base resource.

3. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 2, which is characterized in that described Data prediction using Stanford University's natural language laboratory open source software as tool, including：

2) pretreatment operation English corpus is carried out to English corpus using English pretreating tool stanford-ner to be located in advance Reason operation and Chinese data word segmentation processing；

The pretreatment is based on condition random field (CRF) model, and it is a non-directed graph that CRF model, which is defined as G=(V, E), and V is Node set is the set of stochastic variable Y, Y={ Y_i| 1≤i≤m }, E is undirected line set, for inputting the m of a sentence It is a to need marking unit, E={ Y_i-1,Y_i| 1≤i≤m }, it is the linear chain that m-1 side is constituted；

Wherein, ii is the subscript of sequence, and Z (a) is normalized function, λ_kAnd λ^ι _kThe parameter of model, k be meant that each edge and The feature quantity of corresponding node, f_kAnd f^ι _kIt is a binary feature function.

4. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 1, which is characterized in that described Neural Machine Translation Model formula is：

Wherein,It is the parameter of model,It is nonlinear function, y_nIt is current target language word, x is source language sentence, y<N is Target language sentence through generating, V_yIt is object language term vector, D is target language vocabulary table, C_sOriginal language context to Amount, C_tObject language context vector.

5. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 1, which is characterized in that described The network type of neural Machine Translation Model is RNN Recognition with Recurrent Neural Network, right in RNN Recognition with Recurrent Neural Network propagated forward algorithm In any one sequence index t, layer state h is hidden^(t)By list entries x^(t)Layer state h is hidden with previous moment^(t-1)It obtains：

h^(t)=σ (Ux^(t)+Wh^(t-1)+b)

Wherein, σ is the activation primitive of Recognition with Recurrent Neural Network, and generally tanh, b are the biasing of linear relationship, sequence index t mould The output o of type^(t)Be expressed as o^(t)=Vh^(t)+ d, finally in sequence index t, prediction output isD is defeated The biasing of node out, U, V, W are the parameter matrixs shared in Recognition with Recurrent Neural Network.

6. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 1, which is characterized in that described In model training, encoder and decoder carry out joint training, and model formation is：

Wherein, θ is the parameter of model, and p is conditional probability function, (x_n,y_n) indicating bilingual training corpus, N is number of training Amount, using maximum likelihood estimation algorithm training sample.

7. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 1, which is characterized in that described It is that neural network respectively ties point-connected parameter matrix using the network parameter weight that bilingual parallel corporas training neural network is acquired, The network parameter weight acquired using training carries out parameter initialization instead of random initializtion to Chinese neural network is covered, and realizing will The network parameter weight that training is acquired, which moves to, covers Chinese nerve Machine Translation Model.

8. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 1, which is characterized in that described It is English-Chinese and illiteracy Chinese translation model big including dictionary when carrying out covering the training of Chinese nerve Machine Translation Model using illiteracy Chinese parallel corpora Parameter setting including small, term vector size, hidden layer size needs consistent.

9. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 1, which is characterized in that will cover The translation translation of Chinese nerve machine translation prototype system is compared and is evaluated with regard to BLEU value with statistical machine translation translation, is reached It is final to improve the purpose for covering Chinese machine translation performance.

10. the illiteracy Chinese nerve machine translation method based on transfer learning strategy according to claim 9, which is characterized in that institute Stating BLEU value is the tool for assessing machine translation translation quality, and score is higher to illustrate that Machine Translation Model performance is better, The formula of BLEU value is：

Wherein, w_n=1/M, M are the group word numbers of translation and reference translation, and the upper limit value of M is 4, p_nN-gram accuracy rate is represented, BP represents the shorter penalty factor of translation：

BP=e^min(1-r/h,0)