CN107038159B

CN107038159B - A kind of neural network machine interpretation method based on unsupervised domain-adaptive

Info

Publication number: CN107038159B
Application number: CN201710139214.0A
Authority: CN
Inventors: 米尔阿迪力江·麦麦提; 刘洋; 栾焕博; 孙茂松
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-03-09
Filing date: 2017-03-09
Publication date: 2019-07-12
Anticipated expiration: 2037-03-09
Also published as: CN107038159A

Abstract

The present invention provides a kind of neural network machine interpretation method based on unsupervised domain-adaptive, comprising: is trained the input that the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as Softmax classifier and translation module；According to the field number that Softmax classifier generates, the number of translation network decoder is generated, the decoder based on target side generates target side and corresponding field.The present invention overcomes the problem of having marked FIELD Data is lacked in the prior art, save time and cost, can efficiently and accurately complete translation field between it is adaptive, have preferable practicability, have the good scope of application and scalability.

Description

A kind of neural network machine interpretation method based on unsupervised domain-adaptive

Technical field

The present invention relates to machine learning and machine translation mothod fields, are based on unsupervised neck more particularly, to one kind The adaptive neural network machine interpretation method in domain.

Background technique

Currently, increasingly gradually deepening with international exchange, demand of the people to language translation is growing day by day.However, generation Category of language is various present on boundary, and internet is to become current modern the most easily to obtain information platform, and user to turning over online The demand of translating is increasingly urgent to.Respectively there are feature, flexible form, so that language automatically processes, including the machine translation between language, at For problem to be resolved.Simultaneously how for user provide the translation service of high quality as one be difficult to be resolved ask Topic.Category of language present in internet is more, and each language has a large amount of ambiguity again, and language is in variation at every moment again Among, this just puts forward higher requirements translation service.

In the prior art, to realize automatic machine translation, currently used technology is based on neural network and based on system The method of meter, the former is NMT (Neural Machine Translation, neural network machine translation), the latter SMT (Statistical Machien Translation, statistical machine translation).

However, in order to which the above-mentioned prior art can be good at realizing, need to collect the parallel corpora of extensive high quality with Obtain reliable translation model.Instead, the parallel corpora of high quality usually exists only between a small amount of several language, and past It is past to be limited to certain specific fields, such as public document, news etc..With the rise of internet, the circulation of international information becomes It is unprecedentedly convenient to obtain, and people also become further urgent for the demand of high quality machine translation.At the same time, internet is also machine Translation brings new opportunity.A large amount of corpus on internet, so that the parallel corpora acquisition of covering multilingual, field becomes It may.Therefore the corpus in single field is seldom in the corpus that obtains from network, for example, become more readily available be news corpus, But the corpus of a certain kind field, the especially medicine etc. such as government, film, trade, education, sport and literature and art is very difficult To obtain.Trained corpus be belong to development set (carrying out tuning to the model trained with training corpus) corpus it is same Field, while if testing material also belongs to the same field, (corpus in field) translation result is very good, otherwise (outside field Corpus) it is very poor.Therefore, be badly in need of it is a kind of can the interpretation method for similarly realizing preferable translation effect between different field, Such as training set and development set are news corpus, test set is that law corpus (outside field) can guarantee to avoid due to not as far as possible The reason of same domain, declines the case where translation efficiency.

However, data weighting method in the prior art, the method is according to the similarity with corpus in domain come to those Sentence distributes weight；The above-mentioned prior art be unable to do without the serious problems of mark corpus, if needing original training corpus cutting Dry small component is so as to cause the complex operations such as model parameter number are increased, so that the performance drop of neural network machine translation Low, translation efficiency is low and can not accurately obtain adaptive between each field.

Summary of the invention

The present invention in order to overcome the problems referred above or at least is partially solved the above problem, provides a kind of machine translation method.

According to an aspect of the present invention, a kind of machine translation method is provided characterized by comprising

Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as The input of Softmax classifier and translation module is trained；

Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on mesh The decoder for marking end generates target side and corresponding field.

The application proposes a kind of machine translation method, and the method can effectively be used without mark, i.e., not to The parallel sentence pairs of subject information.Component part is indicated with hiding theme compared with the similar work in traditional SMT, and It is not that original training data is cut into several composition datas.Most of all, the corpus of small component is fused to global mould Type is a very big challenge for NMT, and reason is a lack of interpretability.Finally propose what an approximate parameter was shared And model parameter number is reduced, and recognize the decoding algorithm being easily processed.Simultaneously with the similar Comparision in NMT, with They do not have to the subject information predetermined marked by contrast, theme are regarded as in the present invention and treated as hidden variable And holding is without the advantage of change translation model.Obviously, it is easy to be applied to other neural network in natural language processing Model.The neural network machine interpretation method based on unsupervised domain-adaptive proposed in the present invention solves in the prior art It is be unable to do without the serious problem of mark corpus, avoids causing to increase model by original several small components of training corpus cutting The complex operations such as number of parameters improve the performance of neural network machine translation, improve translation efficiency and obtain more accurately neck It is adaptive between domain.

Detailed description of the invention

Fig. 1 is according to a kind of overall flow schematic diagram of machine translation method of the embodiment of the present invention；

Fig. 2 is the domain classification knot according to Softmax sorter model in a kind of machine translation method of the embodiment of the present invention Structure schematic diagram；

Fig. 3 is the hybrid decoding side according to categorization module and translation module in a kind of machine translation method of the embodiment of the present invention Formula 1 (SUM) schematic diagram；

Fig. 4 is the hybrid decoding side according to categorization module and translation module in a kind of machine translation method of the embodiment of the present invention Formula 2 (MAX) schematic diagram.

Specific embodiment

With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.

It proposes in the present invention a kind of based on unsupervised domain-adaptive neural network machine interpretation method.

Transfer learning inside NMT or SMT can be divided into two kinds of forms, and one is DA (domain-adaptive), NMT models Itself has terseness, and has done least a priori assumption, its performance on field adapts to is inherently more more excellent than SMT, And it can more easily utilize the knowledge of different field.For example the knowledge of news can effectively help Interpreter, it is another Kind is migration, how to utilize extensive single original knowledge improvement machine translation of language.

No matter the method for many domain-adaptives is proposed in NMT or in SMT in recent years, besides the DA in SMT In-depth study is obtained.All methods can be summarized as following five kinds: (1) self-training method is utilized in the field of single language Corpus.(2) data selecting method is retrieved some with corpus identical in domain.(3) data weighting method, according to corpus in domain Similarity come give those sentences distribution weight.(4) based on the method for context, according to part or global context come Distinguish the translation between different field.(5) suitable between sufficiently using topic model to go realization every field based on the method for theme It answers.

Method proposed by the present invention is similar with the third method, but primary difference is that hiding theme has been used to carry out table Show component part, rather than original training data is cut into several composition datas.Most of all, the corpus of small component Being fused to world model is a very big challenge for NMT, and reason is a lack of interpretability.Finally propose an approximation Parameter share and reduce model parameter number, and recognize the decoding algorithm being easily processed.

Others also did similar work in the same old way in NMT, such as the natural language processing group of Stanford University is done One adapts to a new field using already present model, and acquisition is obviously improved.This method is being led on a large scale first Overseas (it is as defined above, with the different field of test set, such as training set and development set news, test set government corpus, Film or trade etc.) on corpus, continue to train in the data that then (provided explanation above) in new small-scale field again, this When their method be concerned with from the overseas transmitting information process in domain, but concern that exploitation one is able to solve number According to the mixed model of isomerism.

In addition to this, other job description is that subject information is directly embedded into neural network.Specifically give NMT provides the corpus of the classification by the LDA subject information obtained or in decoding stage direct labor mark.Also some scholars Subject information is further fused to the encoder and decoder stage.In addition, proposing one kind there are also in some new work The method of control field and with realm information come expansion word vector layer.Predetermined marked is not had to by contrast with them The difference of the subject information infused, technical solution of the present invention and these work is to regard theme as hidden variable (hidden variable) To treat and keep the advantage without changing translation model.It is apparent that being easily applied in natural language processing Other neural network model.

In conclusion it is badly in need of a kind of method of new domain-adaptive in machine translation (either NMT or SMT), Solve the serious problems for be unableing to do without mark corpus in the prior art, avoid original several small cities of training corpus cutting point and Cause to increase the complex operations such as model parameter number, improve the performance of neural network machine translation, improve translation efficiency and obtains It obtains adaptive between more accurate field.

Neural network machine interpretation method based on unsupervised domain-adaptive is intended to be instructed in the parallel sentence pairs of no mark Practice to generate and translates effect also preferable translation model outside a field on test set.

The method of the present invention is exactly mixed model in fact, because including the disaggregated model and neural network of Softmax classifier The translation model of machine translation.The mixed model thought that forefathers are done be first training corpus according to field (news, film, Government, trade etc.) resolve into several different component partsThen translation model P (y | x；θ_c) at each Small-scale corpus < X_c,Y_c> on training.The model combination of these each component parts is to an independent world model:Wherein λ_cHybrid parameter, i.e., the parameter of c-th component part, also just to What the distance between the target phase text of translation and the corpus of each small component part matrix obtained, such as: tf/idf, The methods of LSA, perplexity or EM (call TF-IDF, cosine similarity in the present invention in order to calculate similarity).This A little hybrid parameters are to be come by text similarity to predict, without interfering in the learning process of entire mixed model.Although mixed It is fine to close modeling pattern effect in SMT, but it is less simple to adapt to NMT.Basis in SMTFor, Model Fusion that very simply each has individually been instructed to global mould Type, it appears that how he is mixed into NMT has point fuzziness.Main difficult point is that (different field is different next for different component parts Source) thousands of a highests that the neural network translation model come is chosen at those for efficiency reasons are trained on small-scale corpus There are apparent difference on the vocabulary of frequency, illustrate the translation model of training on each different field data set in each small rule On mould corpus component part before training, in pretreatment stage, thousands of a vocabulary of high frequency are selected, at this time every field model Apparent difference is generated on vocabulary.

If not the Model Fusion of small-scale component part to world model, parameter space will be will increase.Except this, search Rope algorithm must add up the weight of the translation probability of all small-scale model sentence levels:

Target phase maximum probabilityIt will not picture

Equally it is broken down into word rank.However, being difficult the nerve those small-scale component parts in decoding process The prediction fusion of Network-based machine translation model is got up, therefore needs to develop the neural network machine translation an of mixed model really System.

Such as Fig. 1, a kind of overall flow schematic diagram of machine translation method in a specific embodiment of the invention is shown.It is overall On, comprising:

In another of the invention specific embodiment, a kind of machine translation method, before the step 1 further include:

Step 0, training corpus data set is constructed, the training corpus in the data set is pre-processed；It utilizes Softmax sorter model and translation model are trained the training corpus, respectively obtain classification and translation model parameter；

In another specific embodiment of the invention, a kind of machine translation method, between the step 0 and step 1 further include: Based on the pretreated training corpus data set, the encoder stage of translation module is obtained using GRU and Softmax classifies The input of device.

In another of the invention specific embodiment, a kind of machine translation method is classified according to Softmax in the step 2 The field number that device generates, the number for generating translation network decoder further comprises:

S21 divides Softmax sorter model to t domain class；

S22, in the decoder stage of translation module, according to t decoder of defeated generation of classifier modules.

In another of the invention specific embodiment, a kind of machine translation method, before the step 1 further include: using two-way GRU neural network obtains the vector of the last one word of source sentence and first word in bilingual corpora training sample and indicates；Simultaneously The last one word of source sentence and first in bilingual corpora training sample can be obtained using CNN neural network or LSTM neural network The vector of a word indicates.

In another of the invention specific embodiment, a kind of machine translation method constructs training corpus data in the step 0 Collection further comprises:

Collect bilingual sentence pair；Select training set, development set and test set；The bilingual sentence pair is no field markup information Sentence pair.

In another of the invention specific embodiment, a kind of machine translation method, in the data set in the step 0 Training corpus carries out pretreatment:

Sentence in the data set in source language text and target language text is cut into word and unified converted magnitude It writes.

In another specific embodiment of the invention, a kind of machine translation method, the step 2 further comprises:

Wherein, the first item in formula on the right side of equation is the categorization module of entire model, will association's γ prediction by training Be t, the model parameter θ of required in Section 2_tPrediction is y；T ∈ { 1 ..., T } indicates the theme of source sentence x Integer, T are the theme numbers pre-defined,It is to predict to distribute the theme of theme probability to x from model.

In another specific embodiment of the invention, a kind of machine translation method, the field that the Softmax classifier generates Number can be configured according to input.

In the decoder stage of translation module, according to t decoder of defeated generation of classifier modules；

The decoder that the probability in the t field that Softmax classifier modules generate and the decoding stage of translation module generate The original state of number be random entirely.

In another specific embodiment of the invention, a kind of machine translation method specifically comprises the following steps:

A, prepare on a large scale without mark, without the parallel sentence pairs training corpus of subject information；

B, reference term vector model indicates to obtain the term vector for each word that source sentence is included；

C, the encoder stage of translation module and the input of Softmax classifier are obtained using GRU；

D, obtained input is handled with two-way GRU, also just the vector of sentence is indicated to pass through the last one word Term vector indicate (forward direction) and first word term vector expression (reversed) come what is obtained, be then delivered to classifier modules with Translation module；

E, the field number generated according to Softmax classifier generates the number of translation network decoder and is solved Code generates object language and corresponding field.

The step A further comprises: constructing data set and is pre-processed, and using production model in training set Training corpus be trained, obtain model parameter；

The building data set includes collecting bilingual sentence pair, selects training set, development set and test set；

It is described carry out pretreatment include that word is cut into the sentence in data concentrated source language text and target language text (being consistent the Chinese word segmentation Open-Source Tools for calling the exploitation of Stanford University's natural language processing group when participle), Yi Jitong One converted magnitude, which is write, (with the MOSES truecaser.perl carried or or oneself can write the conversion of capital and small letter Script) and tokennize (English corpus is needed without exception to carry out tokenize, in the same old way for being consistent property, Ke Yiyong MOSES included tokenizer.perl).

Specifically, the model parameter includes the translation probability between source language and the target language；

Parallel sentence pairs in the step A refer to source sentence: x=x₁,…,x_i,…,x_IWith target side y=y₁,…, y_j,…,y_J。

Further, the step B is realized particular by the pre-treatment step of RNN language model:

Because RNN language model is made of look-up layers, three parts such as hidden layer and output layer.The sentence of input Each word for being included is converted into corresponding term vector by look-up layers and indicated:

x_t=look-up (s),

Wherein, x_tIt is the term vector expression of s, s is that the input of each time period t and look-up indicate look-up Layer.

Further, the step C is realized particular by execution following steps:

C1, it is made of because of whole network two parts, categorization module and translation module: translation module refers to nerve net Network machine translation module, categorization module are only to have invoked Softmax classifier herein as an independent classifier.

C2, for pass through step A parallel sentence pairs source x=x obtained₁,…,x_i,…,x_IAnd target phase y= y₁,…,y_j,…,y_J, usually the translation probability of sentence level, factorization becomes the general of word rank for neural network machine translation Rate:

Wherein, θ is a series of model parameter, y_<jIt is partial translation.Expression formula:When as training set, the training objective of standard is the log- for maximizing training corpus Likelihood:

The decision rule of translation is that (both not trained) the source sentence x not encountered is given by public affairs The model parameter that formula is acquired

That is, by the best probability of target phaseIt calculates, these probability factors point Solution is the translation of word rank:

Preferably, further comprising the steps of after the step C:

D, it after the input for obtaining Softmax classifier and translation module by step C, needs to be further processed, also just exist The encoder stage (coding) of this two module what is desired is that sentence vector representation, obtained using two-way GRU whole The expression of a source sentence.Because GRU is also a unit of RNN network, and explanation, RNN language have been given in step Model is made of look-up layers, hidden layer and output layer.Asked in stepb by RNN the word of each word to Amount indicates, then result is sent to the input of encoder, that is, gives encoder stage hidden layer institute ready message.Hidden layer calculating is worked as It is both the several hidden status informations of term vector and front of each word to obtain using loop-up layers of output when preceding hidden state It takes, term vector is just also mapped to context vector: h_t=f (x_t,h_t-1) wherein to be an abstract function providing input to f x_tWith historic state h_t-1Under the premise of, calculate current new hidden state.Original state h₀It is usually set as 0, the choosing of common f function Select as being following provide, σ be nonlinear function (such as: softmax or tanh) h_t=σ (W_xhx_t+W_xhh_t-1).Herein Softmax refers to that translation module encoder section calculates the softmax called when hidden state, this with proposed in the present invention it is whole The softmax called in a model classifiers module is different, and categorization module is independent domain classification device in fact, translates instead The hidden state of calculating at modular coding device end and use, act on different, and find out current new hidden state when may not Call softmax that can also use other nonlinear activation primitive tanh etc. in fact.

Therefore, forward direction (forward) state of two-way BiRNN is calculated according to formula:

Wherein It is term vector matrix, It is weight matrix.N × m is respectively term vector dimension and hidden state number.σ ⊙ is Logistic Sigmoid function.Reversed stateIt calculates also as forward direction.Term vector square is shared between positive and repercussion Battle arrayBut it is different from weight matrix.It is obtained after forward and reverse is mergedThe expression of these symbols, wherein

It further illustrating, single GRU is made of update door and adjustment door in two-way GRU, as follows:

u_t=σ (W_ux_t+U_uh_t-1+b_u),

r_t=σ (W_rx_t+U_rh_t-1+b_r),

Wherein u_tIt is to update door, r_tIt is adjustment door,It is candidate activation, the operation of ⊙ element mode multiplication, h_tIt is upper one hidden State h_t-1It is activated with candidateBetween linear insertion.Intuitively, update agte selects hidden state whether by new shape State updates out, and adjustment door determines situations such as whether a upper hidden state is ignored.

Further, further comprising the steps of after the step D:

E, Softmax classifier modules are primarily referred to as and sort out the probability for the every field come and the target phase of translation module The generation of sentence.The information in the field that intuitively categorization module generates is that the different probability in t field are added to translation Subject information is also just added to translation probability by module by way of hidden variable:

The decoder network of translation module originally also corresponding hidden state, but this hidden state is with encoder net Network is not quite alike, and detailed calculating process is as follows:

Wherein:

z_i=σ (W_zEy_i+U_zs_i-1+C_zc_i),

r_i=σ (W_rEy_i+U_rs_i-1+C_rc_i),

E is the term vector matrix for each word that target language sentence is included, W, W_z, U,U_z,U_r And C, C_z,These are weights.M, n are respectively term vector dimension and hidden state number.σ ⊙ is to patrol Collect this meaning sigmoid function.Initial hidden state s₀It is to calculate in the following way:

Wherein

Context vector (context vector) is by heavy come each time step (time step) to its model Newly calculated:

Wherein

h_jIt is j-th of symbol (hidden state) in source sentence,AndIt is weight vectors entirely.

In another specific embodiment of the invention, a kind of machine translation method, Fig. 2 is according to a kind of machine of the embodiment of the present invention The domain classification structural schematic diagram of Softmax sorter model in device interpretation method.

Unlike the mixed model that the hybrid parameter being previously mentioned in SMT is obtained by text similarity, in this hair It is bright it is middle theme submodel when hybrid weight calls, be with translation submodel together optimised hybrid weight.The present invention is mixed Molding type expands the NMT of that standard by the way that hidden variable is added:

Wherein, t ∈ { 1 ..., T } is the integer for indicating the theme of source sentence x, and T is the theme number pre-defined,It is to predict the theme that theme probability is distributed to x from model, also with regard to the module in Fig. 2.

Be t theme translation submodule be respectively in Fig. 3 and 4 the right neural network machine translation module.

In order to solve the factorization problem of word rank above-mentioned, approximate mixed model and assume those word ranks It translates mutually independent:

The third line of above formula shows that mixed model can permit the training in word grade, also just provides searching algorithm as follows In effect.Although this approximation violates the independence in NMT, had been significantly improved in actual application.

In order to sentence classification and be previously mentioned in Fig. 2 theme submodel P (t | x；It may γ) utilize more networks Framework, for example, CNN and Recursive Auto Encoder (recursive autocoder).It is utilized in the present invention based on logical Encoder is crossed to indicate the simple Softmax classifier of study.Provide one include I vocabulary source sentence x, use with GRU calculates positive state as the bi-RNN of unitWith reversed stateThen the positive shape of the last one word State (state of the last one word is both calculated according to positive RNN) and the reverse state of first word (are counted with reversed RNN Calculate the state of first word) it is combinedHe is sent to the input of Softmax classifier, and (work of forefathers is Sentence is calculated in encoder stage and has finally added sentence finishing sign).This strategy has following several advantages:

GRU is obtained as unit by RNN the function of long-distance dependence.I.e.It is concluded that source sentence is anti- To state, andIt is concluded that the forward condition of source sentence.

Softmax layers of inputIt is that dimension is fixed, and the length of sentence is inputted independently of source Degree.

Theme submodel described in Fig. 2 and those translation submodels (Fig. 3, the translation submodule on the right in 4) are mutually altogether One and same coding device is enjoyed, designs the parameter space very outstanding for reducing mixed model in this way.

Translation submodel P (y | t, x；In γ), the encoder based on attention mechanism of the standard of forefathers has been referred to Decoder model.It shares simultaneously in order to reduce parameter space all neural networks translation submodules with theme submodel same. In other words, mixed model has an encoder and T decoder.

In another specific embodiment of the invention, a kind of machine translation method, in conjunction with Fig. 3, the process of step A-D has been given It explains out as described above, being therefore not repeated, the implementation of direct introduction step E.The classification mould that the embodiment of the present invention proposes Hybrid decoding mode 1 (SUM's) of block and translation module comprises the concrete steps that:

The training set being previously mentioned in step ATraining objective find out some maximization training corpus The model parameter of log-likelihood:

Wherein

Mini-batch (small a batch is parallel to train sentence pair) stochastic gradient descent algorithm of standard is for estimating theme and turning over Translate the parameter of device submodel.

Fig. 3, in 4, provide to learn the model parameter come outWithThe source sentence x's not trained Translating decision rule is to be calculated in the following way,

Show to use maximum probabilityIt can be factorized as word rank to calculate translation, It is similar to the work of forefathers' standard:

The decoding process that the present invention is previously mentioned also proposes a new decoding process other than such as Fig. 3, following real It applies shown in example.

In another specific embodiment of the invention, a kind of machine translation method, for a upper example and Fig. 3, this It invents while proposing second of decoding process, also with regard to another implementation of step E.That is the decoding side proposed in step E Method is as being previously mentioned in an embodiment as above, and mainly Softmax classifier sorts out the output result t neck come Each probability in domain is multiplied with T decoder outcomes of translation submodule decoder section and then all factors are added to obtain most Translation result afterwards.It is somewhat different with them to be, do not have to now and the T of the probability in each t field and translation module Decoder outcomes are multiplied, and the probability in t field wherein just needs maximum probability and maximum probability subscript t instead_jInstitute is right The T answered_jResult be multiplied, that is to say, that the operation of SUM and MAX are replaced, carried out according to following two small steps At:

The formula provided in embodiment two:

Regard as SUM decoding process and

It regards as MAX decoding process.

Finally, the present processes are only preferable embodiment, it is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in protection of the invention Within the scope of.

Claims

1. a kind of machine translation method characterized by comprising

Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as Softmax The input of classifier and translation module is trained；

Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on target side Decoder generate target side and corresponding field.

2. the method as described in claim 1, which is characterized in that before the step 1 further include:

Step 0, training corpus data set is constructed, the training corpus in the data set is pre-processed；Utilize Softmax points Class device model and translation model are trained the training corpus, respectively obtain classification and translation model parameter.

3. method according to claim 2, which is characterized in that between the step 0 and step 1 further include: based on described pre- Treated training corpus data set, the encoder stage of translation module and the input of Softmax classifier are obtained using GRU.

4. method according to claim 2, which is characterized in that the field generated in the step 2 according to Softmax classifier Number, the number for generating translation network decoder further comprises:

S21 divides Softmax sorter model to t domain class；

S22 generates t decoder according to the input of classifier modules in the decoder stage of translation module.

5. the method as described in claim 1, which is characterized in that before the step 1 further include: utilize two-way GRU neural network Obtaining the vector of the last one word of source sentence and first word in bilingual corpora training sample indicates；CNN can be also utilized simultaneously Neural network or LSTM neural network obtain the vector of source sentence the last one word and first word in bilingual corpora training sample It indicates.

6. method according to claim 2, which is characterized in that construct training corpus data set in the step 0 and further wrap It includes:

Collect bilingual sentence pair；Select training set, development set and test set；The bilingual sentence pair for no field markup information sentence It is right.

7. method as claimed in claim 6, which is characterized in that in the step 0 to the training corpus in the data set into Row pretreatment further comprises:

Sentence in the data set in source language text and target language text is cut into word and be converted to simultaneously capitalization or Small letter.

8. method according to claim 2, which is characterized in that the step 2 further comprises:

Wherein, x=x₁,…,x_i,…,x_IFor source sentence；Y=y₁,…,y_j,…,y_JFor target side sentence；Equation is right in formula The first item of side is the categorization module of entire model, by training by learn γ prediction be t, the mould of required in Section 2 Shape parameter θ_tPrediction is y；T ∈ { 1 ..., T } indicates the integer of the theme number of source sentence x, and T is the master pre-defined Inscribe number.

9. the method as described in claim 1, which is characterized in that the field number that the Softmax classifier generates being capable of root It is configured according to input.

10. the method as described in claim 1, which is characterized in that the step 2 further comprises:

For the decoder that the probability in the t field that Softmax classifier modules generate and the decoding stage of translation module generate Several original states is random entirely.