CN109271643A

CN109271643A - A kind of training method of translation model, interpretation method and device

Info

Publication number: CN109271643A
Application number: CN201810896694.XA
Authority: CN
Inventors: 邢启洲; 李健; 张连毅; 武卫东
Original assignee: BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP
Current assignee: BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP; Beijing Sinovoice Technology Co Ltd
Priority date: 2018-08-08
Filing date: 2018-08-08
Publication date: 2019-01-25

Abstract

The embodiment of the invention provides a kind of training method of translation model, interpretation method and device, which includes: extraction training corpus；Training corpus is pre-processed, preprocessed text is obtained；Word segmentation processing is carried out to preprocessed text, obtains participle text information；Participle text information is encoded from forward and reverse based on two-way RNN encoder, and determines two-way RNN encoder in the hidden state of each time step；It is decoded based on hidden state and semantic vector of the undirected RNN decoder to each time step of two-way RNN encoder and establishes translation model.It avoids and the semantic vector of all time steps is compressed in fixed length vector, it causes context detailed information to dilute or be capped and all time steps of decoder causes translation model to translate the problem of accuracy declines with reference to same fixed length vector, so that decoder is decoded in each time step with reference to different semantic vectors, the accuracy that translation model translates source statement is improved.

Description

A kind of training method of translation model, interpretation method and device

Technical field

The present invention relates to translation technology fields, more particularly to a kind of training method of translation model, interpretation method and turn over Translate training device, the translating equipment of model.

Background technique

Currently, the conversion of source statement to object statement is actually a kind of conversion of sequence to sequence (seq2seq), in order to It realizes the conversion of sequence to sequence, generallys use the realization of coding-decoded model (Encoder-Decoder) frame.

List entries, is exactly embedded as the vector of theorem in Euclid space by coding；Decoding, the vector exactly encoded are converted to Output sequence, the process coded and decoded in the prior art can be realized by neural network model RNN.

In existing coding and decoding frame, list entries is compressed into the vector of a fixed length by encoder, then from fixed length Vector decoding generates output sequence, and wherein the fixed length vector includes each of source statement detailed information, when sentence source statement is long It is detailed information meeting fixed, that the content that source statement first inputs carries since fixed length vector includes information content when spending long It by the detailed information dilution of the content of rear input or is capped, the longer especially source statement length the more serious, this is allowed for The general details information of source statement list entries can not be obtained in decoder decoding, cause the decoded accuracy of decoder by Negative effect, reduces the accuracy that traditional code-decoded model converts source statement.

Summary of the invention

The embodiment of the present invention the technical problem to be solved is that provide the training method of translation model a kind of, interpretation method and Training device, the translating equipment of translation model, to solve the problems, such as that existing coding-decoding translation model translation accuracy is low.

To solve the above-mentioned problems, the invention discloses a kind of training methods of translation model, comprising:

Extract the training corpus of preset quantity at random from preset Parallel Corpus；

The training corpus is pre-processed, preprocessed text is obtained；

Word segmentation processing is carried out to the preprocessed text, obtains participle text information；

The participle text information is encoded from forward and reverse based on two-way RNN encoder, is determined described two-way RNN encoder each time step hidden state, and, based on undirected RNN decoder to each of the two-way RNN encoder The hidden state and semantic vector of time step are decoded, and establish translation model.

Optionally, described that the participle text information is encoded from forward and reverse based on two-way RNN encoder, really Determining the two-way RNN encoder in the step of hidden state of each time step includes:

Positive RNN encodes the participle text information according to forward direction to obtain positive term vector characteristic sequence X_F= (X₁, X₂..., X_t), and positive hidden state Fhi is generated in each time step i, wherein Fh_i=(Fh₁,Fh₂..., Fh_t), i= L, 2 ..., T；F indicates the hidden state parameter of forward direction of translation model；

Reversed RNN is according to reversely encoding the participle text information to obtain reversed term vector characteristic sequence X_B= (X_t, X_t-1..., X₂, X₁), and reversed hidden state Bh is generated in each time step i_i, wherein Bh_i=(Bh₁, Bh₂..., Bh_t), I=l, 2 ..., T；The reversed hidden state parameter of B expression translation model；

According to the hidden state Fh of the forward direction_iWith reversed hidden state Bh_iDetermine the two-way RNN encoder in each time step Hidden state h_i, wherein h_i=[Fh_i, Bh_i]。

Optionally, it is described based on undirected RNN decoder to the hidden state and semanteme of each time step of two-way RNN encoder The step of vector is decoded, and establishes translation model include:

It is described based on undirected RNN decoder to the hidden state of each time step of two-way RNN encoder and semantic vector into Row decoding, obtains the decoded state function of translation model.

Optionally, it is described based on undirected RNN decoder to the hidden state and semanteme of each time step of two-way RNN encoder The step of vector is decoded, and obtains the decoded state function of translation model include:

Acquisition time walks the decoded state S of the undirected RNN decoder of i-1_i-1And corresponding label Y_i-1；

Obtain the hidden state h of the two-way RNN encoder of current time step i_iWith semantic vector C_i；

According to decoded state S_i-1, label Y_i-1, hidden state h_iAnd semantic vector C_iDetermine that current time step i is corresponding The decoded state S of undirected RNN decoder_i；

Wherein, S_i=P (S_i-1, Y_i-1, h_i, C_i), P () indicates decoded state function.

Optionally, the semantic vector C_iFor the h of the hidden state of the two-way RNN encoder_i=[h₁, h₂..., h_t] plus Quan He.

Optionally, described the step of establishing translation model further include:

The training objective corpus that the training corpus is aligned is extracted from the Parallel Corpus；

The probability that each training corpus predicts the training objective corpus is calculated according to decoded state function；

According to default loss function and the probability calculation loss late；

Gradient is calculated using the loss late；

Judge whether the gradient meets default iterated conditional；

If so, terminating translation model training；

If it is not, gradient decline is carried out to the model parameter of the translation model using the gradient and preset learning rate, Return to the step of extracting the training objective corpus that the training corpus is aligned from the Parallel Corpus.

To solve the above-mentioned problems, the embodiment of the invention discloses a kind of interpretation methods, comprising:

Obtain sentence to be translated；

Object statement will be extracted in translation model that the input by sentence to be translated is trained in advance；

Wherein, the translation model is trained in the following manner:

The training corpus is pre-processed, preprocessed text is obtained；

The embodiment of the invention also discloses a kind of training devices of translation model, comprising:

Training corpus extraction module, for extracting the training corpus of preset quantity at random from preset Parallel Corpus；

Preprocessing module obtains preprocessed text for pre-processing to the training corpus；

Word segmentation module obtains participle text information for carrying out word segmentation processing to the preprocessed text；

Modeling module, for being encoded from forward and reverse to the participle text information based on two-way RNN encoder, Determine the two-way RNN encoder in the hidden state of each time step, and, based on undirected RNN decoder to the two-way RNN The hidden state and semantic vector of each time step of encoder are decoded, and establish translation model.

Optionally, the modeling module includes:

Positive encoding submodule is encoded to obtain forward direction according to forward direction for positive RNN to the participle text information Term vector characteristic sequence X_F=(X₁, X₂..., X_t), and positive hidden state Fhi is generated in each time step i, wherein Fh_i= (Fh₁,Fh₂..., Fh_t), i=l, 2 ..., T；F indicates the hidden state parameter of forward direction of translation model；

Phase-reversal coding submodule according to reversely is encoded to obtain reversed for reversed RNN to the participle text information Term vector characteristic sequence X_B=(X_t, X_t-1..., X₂, X₁), and reversed hidden state Bh is generated in each time step i_i, wherein Bh_i =(Bh₁, Bh₂..., Bh_t), i=l, 2 ..., T；The reversed hidden state parameter of B expression translation model；

The two-way hidden state of RNN encoder determines submodule, for according to the hidden state Fh of the forward direction_iWith reversed hidden state Bh_i Determine the two-way RNN encoder in the hidden state h of each time step_i, wherein h_i=[Fh_i, Bh_i]。

Optionally, the modeling module includes:

Decoding sub-module, for based on undirected RNN decoder to the hidden state of each time step of two-way RNN encoder and Semantic vector is decoded, and obtains the decoded state function of translation model.

Optionally, the decoding sub-module includes:

Preceding time step state acquisition submodule, the decoded state S of the undirected RNN decoder for acquisition time step i-1_i-1 And corresponding label Y_i-1；

Current time walks hidden state and semantic vector acquisition submodule, for obtaining the two-way RNN coding of current time step i The hidden state h of device_iWith semantic vector C_i；

Decoded state determines submodule, for according to decoded state S_i-1, label Y_i-1, hidden state h_iAnd semantic vector C_i Determine the decoded state S of the corresponding undirected RNN decoder of current time step i_i；

Wherein, S_i=P (S_i-1, Y_i-1, h_i, C_i), P () indicates decoded state function.

Optionally, the training device further include:

Testing material extraction module, for extracting testing material, the testing material at random from the Parallel Corpus Including test source corpus and test target corpus；

Probability evaluation entity predicts the test target corpus for calculating each test source corpus according to decoded state function Probability；

Loss late computing module, for according to default loss function and the probability calculation loss late；

Gradient computing module, for calculating gradient using the loss late；

Iterated conditional judgment module, for judging whether the gradient meets default iterated conditional；

Training ending module, for then terminating translation model training；

Parameter adjustment module, for using the gradient and preset learning rate to the model parameter of the translation model into The decline of row gradient, returns to testing material extraction module.

The embodiment of the invention also discloses a kind of translating equipments, comprising:

Sentence to be translated obtains module, for obtaining sentence to be translated；

Object statement extraction module extracts target in the translation model for training the input by sentence to be translated in advance Sentence；

Wherein, the translation model passes through with lower module training:

Compared with the background art, the embodiment of the present invention includes following advantages:

In the embodiment of the present invention, participle text information is encoded from forward and reverse based on two-way RNN encoder, and Determine two-way RNN encoder in the hidden state of each time step；Each time of the undirected RNN decoder to two-way RNN encoder The hidden state and semantic vector of step are decoded, and establish translation model, when two-way RNN encoder encodes from positive and negative both direction into Row encodes and determines the hidden state and semantic vector of each time step, avoid by the hidden state of all time steps and it is semantic to Amount be compressed in a fixed length vector, avoid all information be all compressed in a fixed length vector cause information dilute or by The problem that covering and each time step of decoder cause translation model translation accuracy low with reference to the same fixed length vector, is improved The accuracy that translation model translates source statement.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of the training method embodiment of translation model of the invention；

Fig. 2 is a kind of step flow chart of interpretation method embodiment of the invention；

Fig. 3 is a kind of structural block diagram of the training device embodiment of translation model of the invention；

Fig. 4 is a kind of block diagram of translating equipment of the invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

Referring to Fig.1, a kind of step flow chart of the training method embodiment of translation model of the embodiment of the present invention is shown, It can specifically include following steps:

Step 101, the training corpus of preset quantity is extracted at random from preset Parallel Corpus.

Parallel Corpus is also known as translated corpora, is the corpus collectively constituted by original text and translation, is used for machine translation Training, test of model etc., such as can be by Chinese and Uighur, Chinese and English, Chinese and Japanese, Japanese and English The corpus that equal original texts and translation collectively constitute.

In the embodiment of the present invention, the training corpus of preset quantity can be extracted at random from Parallel Corpus, such as from the Chinese The language of 1000 pairs of Chinese and Uighur composition is extracted in the Parallel Corpus of language and Uighur composition to as training corpus, It is translation that Uighur, which can be defined, and Chinese is original text.

Step 102, the training corpus is pre-processed, obtains preprocessed text.

In embodiments of the present invention, training corpus can be carried out at the pretreatments such as regularization, error correction, digital regularization Reason.

Step 103, word segmentation processing is carried out to the preprocessed text, obtains participle text information.

Word segmentation processing can be carried out after pre-processing to training corpus, obtain the participle text information of training corpus, Such as word segmentation processing is carried out to pretreated source statement, the participle text information of source statement is obtained, for example, treated source language Sentence be " I does not have a meal this noon ", carry out word segmentation processing after the available other participle text information of character level: " I ", " the present ", " day ", " in ", " noon ", " no ", " eating ", " meal ", " ", ".".

Step 104, the participle text information is encoded from forward and reverse based on two-way RNN encoder, is determined The two-way RNN encoder each time step hidden state, and, the two-way RNN is encoded based on undirected RNN decoder The hidden state and semantic vector of each time step of device are decoded, and establish translation model.

In one preferred embodiment of the invention, translation can be established by the coding and decoding to participle text information Model, specifically, step 104 may include following sub-step:

Sub-step S11, positive RNN are encoded to obtain positive term vector feature according to forward direction to the participle text information Sequence X_F=(X₁, X₂..., X_t), and positive hidden state Fhi is generated in each time step i, wherein Fh_i=(Fh₁,Fh₂..., Fh_t), i=l, 2 ..., T；F indicates the hidden state parameter of forward direction of translation model.

In practical applications, a dictionary can be preset, the corresponding coding of each word, every in the dictionary in the dictionary The coding of a word be it is unique, the coding of the corresponding each word of participle text information can be searched by the dictionary, then according to Sequence forms positive term vector characteristic sequence.

Such as in dictionary, the coding of following character is as follows, " I ": 102, " the present ": 38, " day ": 5, " in ": 138, " noon ": 321, " no ": 8, " eating ": 29, " meal ": 290, " ": 202, ".": then segment text information: " I ", " the present ", " day ", " in ", " noon ", " no ", " eating ", " meal ", " ", "." positive term vector characteristic sequence are as follows: [102,38,5,138,321,8, 29,290,202,0].

Meanwhile positive term vector characteristic sequence is inputted in positive RNN, by RNN according to the hidden state parameter F of preset forward direction The hidden state Fhi of forward direction for calculating each time step i obtains the hidden state Fh of forward direction of all time steps_i=(Fh₁, Fh₂..., Fh_t), i=l, 2 ..., T, wherein T is time step.

Sub-step S12, reversed RNN to the participle text information according to reversely being encoded to obtain reversed term vector feature Sequence X_B=(X_t, X_t-1..., X₂, X₁), and reversed hidden state Bh is generated in each time step i_i, wherein Bh_i=(Bh₁, Bh₂..., Bh_t), i=l, 2 ..., T；The reversed hidden state parameter of B expression translation model.

Such as segmenting text information in sub-step S11: " I ", " the present ", " day ", " in ", " noon ", " no ", " eating ", " meal ", " ", ".", reversed term vector characteristic sequence is obtained after reversed RNN coding are as follows: [0,202,290,29,8,321, 138,5,38,102].

Meanwhile reversed term vector characteristic sequence is inputted in reversed RNN, by RNN according to preset reversed hidden state parameter B The hidden state Bhi of back for calculating each time step i obtains the reversed hidden state Bh of all time steps_i=(Bh₁, Bh₂..., Bh_t), i=l, 2 ..., T, wherein T is time step.

Sub-step S13, according to the hidden state Zh of the forward direction_iWith reversed hidden state Fh_iDetermine that the two-way RNN encoder exists The hidden state h of each time step_i, wherein h_i=[Fh_i, Bh_i]。

In practical applications, undirected RNN decoder can only be decoded each time step using a hidden state, therefore It needs to be determined that hidden state of the two-way RNN encoder in each time step, it specifically can be according to the hidden shape of forward direction of each time step i State Fh_iWith reversed hidden state Bh_iIt is comprehensive to determine two-way RNN encoder in the hidden state h of each time step i_i=[Fh_i, Bh_i], example It such as can be the hidden state h that summation mode determines each time step i_i。

Sub-step S14, based on undirected RNN decoder to the hidden state and semanteme of each time step of two-way RNN encoder Vector is decoded, and obtains the decoded state function of translation model.

In the embodiment of the present invention, sub-step S14 may include following sub-step:

Sub-step S141, acquisition time walk the decoded state S of the undirected RNN decoder of i-1_i-1And corresponding label Y_i-1；

Sub-step S142 obtains the hidden state h of the two-way RNN encoder of current time step i_iWith semantic vector C_i；

Sub-step S143, according to decoded state S_i-1, label Y_i-1, hidden state h_iAnd semantic vector C_iDetermine current time The decoded state S of the corresponding undirected RNN decoder of step-length i_i；

Wherein, S_i=P (S_i-1, Y_i-1, h_i, C_i), P () indicates decoded state function.

In the embodiment of the present invention, semantic vector C_iUndirected RNN decoder can be indicated in the every of output prediction object statement Selection contextual information the most suitable when a participle, specifically, semantic vector C_iIt can be the hidden shape of two-way RNN encoder The h of state_i=[h₁, h₂..., h_t] weighted sum.

It can specifically be realized by following equation 1-3:

e_ik=g (s_i-1, h_k) (3)

Wherein, g () is RNN neural network, i, j, and k indicates time step serial number, and i=1,2 ... ... .T, j=1, 2 ... ... .T, k=1,2 ... ... .T.

e_ikIt is output Y_iBy inputting X_jThe probability of translation, α_ijIt is i-th of target word pair of output according to probability calculation The weight for j-th of the source word answered, by weight α_ijWith hidden state h_jSummation obtains semantic vector C after multiplication_i, i.e., each semantic vector C_iIllustrate output Y_iWhen refer to each input X_jCorresponding hidden state h_jWeight, that is, determine output Y_iWhich inputs X with_jAssociation It is even more important.

It is non-directional in decoding process since undirected RNN decoder is relative to two-way RNN encoder, it is being decoded When, in addition to the hidden state h of each time step based on two-way RNN encoder_iOutside, it is every to reference is also made to two-way RNN encoder The semantic vector C of one time step_i, state S of the decoder in time step i_iBe by decoder time step i-l state S_i-1、 Corresponding label Y_i-1, current time alignment two-way RNN encoder hidden state h_iWith semantic vector C_iIt codetermines, so that nothing It can be decoded to each time step of RNN decoder with reference to different semantic vectors, avoid all information and be all compressed in One fixed length vector cause contextual information dilute or the capped and each time step of decoder with reference to the same fixed length to The problem that amount causes translation model translation accuracy low, so that decoder is carried out in each time step with reference to different semantic vectors Decoding, improves the accuracy that translation model translates source statement.

In one preferred embodiment of the invention, can with the following steps are included:

Step 105, the training objective corpus that the training corpus is aligned is extracted from the Parallel Corpus.

In the embodiment of the present invention, training corpus and training objective corpus can be pairing, Ke Yicong in Parallel Corpus The training objective corpus being aligned with training corpus is extracted in Parallel Corpus.

Step 106, the probability that each training corpus predicts the target corpus is calculated according to decoded state function.

Before training, model parameter, learning rate, the number of iterations in translation model are initialized, configure initial value, Then the training corpus extracted at random is inputted in translation model, extracts candidate target corpus, candidate target corpus may include It is multiple, wherein also including training objective corpus.Each candidate target corpus has a score value, such as can be each candidate Target corpus belongs to the probability of the target training corpus of training corpus alignment.

In the concrete realization, candidate target corpus can be calculated by way of multiple regression belong to training objective corpus Probability.

Step 107, according to default loss function and the probability calculation loss late.

In the training process, the score value of target training corpus is possible to not be inconsistent with the score value actually calculated, i.e. prediction result There are deviations, it is therefore desirable to translation model parameter is adjusted according to loss late, then can according to preset loss function and Probability calculation loss late.

Step 108, gradient is calculated using the loss late.

After obtaining loss late, gradient can be calculated and be adjusted with the parameter to model, it in practical applications, can be with Gradient is calculated according to loss late by way of seeking local derviation.

Step 109, judge whether the gradient meets default iterated conditional, if executing step 110, execute step if not 111；

Step 110, terminate translation model training；

Step 111, the model parameter of the translation model is carried out under gradient using the gradient and preset learning rate Drop, the model parameter include positive hidden state parameter and reversed hidden state parameter；And it returns and is extracted from the Parallel Corpus The step of training objective corpus of the training corpus alignment.

If the gradient being calculated does not meet preset iterated conditional, be greater than such as the difference between continuous multiple gradients or Equal to preset discrepancy threshold, or the number of iterations is not reached, then update the model parameter of translation model, such as adjustment translation mould The hidden state parameter of forward direction and reversed hidden state parameter of type, are entered next using updated model parameter and preset learning rate Iteration is taken turns, whereas if gradient meets preset iterated conditional, is preset as the difference between continuous multiple gradients is less than or equal to Discrepancy threshold, or reach the number of iterations, then terminate to train, output model parameter.

In the training process can using SGD (stochastic gradient descent, stochastic gradient descent), Adadelta and Adam (Adaptive Moment Estimation, adaptive moments estimation) carries out proposing gradient decline, while can Loss late: MLE (Maximum Likelihood Estimation, Maximum-likelihood estimation is calculated to use following loss function Method), MRT (Minimum Risk Training, minimum risk training) and SST (Semi-supervised Training, Semi-supervised training), the embodiment of the present invention is without restriction to the loss function mentioning descending method and using.

Be illustrated below by way of comprising 4,000,000 pairs of dimension Chinese bilingual teaching mode building translation models: detailed process is such as Under:

(1) 1000 pairs of corpus data preparation: are extracted at random in 4,000,000 pairs of corpus first as test set, then remaining Under corpus centering extract 1000 pairs of corpus as verifying collection, remaining corpus is to as training corpus.

(2) system building: modeling framework is built on the server of outfit, and disposes RNN.

(3) translation model training.Setting coding vocabulary number is 100,000 words, passes through RNN parameter setting term vector dimension 600 Dimension；Adam optimizer is selected to realize the adaptively changing of learning rate, the setting the number of iterations upper limit is 1,000,000 times, has been initialized At rear beginning model training.

(4) translation model is verified.It is tested, is finally obtained on 1000 test sets using trained translation model BLEU (bilingual evaluation understudy) value, can be bilingual intertranslation quality evaluation auxiliary tool, BLEU The similarity of machine translation and reference translation can be evaluated, if BLEU value is within a preset range, terminates translation model parameter Adjustment, otherwise adjust translation model parameter until BLEU value within a preset range.

In the embodiment of the present invention, encode from positive and negative both direction when two-way RNN encoder encodes and determination is each The hidden state and semantic vector of time step avoid the hidden state and semantic vector of all time steps being compressed in a fixed length In vector, avoids all information and be all compressed in a fixed length vector and cause that information dilutes or capped and decoder is each The problem that time step causes translation model translation accuracy low with reference to the same fixed length vector, improves translation model to source statement The accuracy of translation.

Referring to Fig. 2, a kind of step flow chart of interpretation method embodiment of the embodiment of the present invention is shown, specifically can wrap Include following steps:

Step 201, sentence to be translated is obtained.

In the embodiment of the present invention, sentence to be translated can be the text information that user directly inputs, such as user in PC, shifting The sentence that the needs inputted in the equipment such as dynamic terminal are translated can also be and be converted to after voice capture device acquires voice signal The sentence that text information obtains, such as sentence to be translated can be Chinese, and object statement can be Uighur.

Step 202, object statement will be extracted in translation model that the input by sentence to be translated is trained in advance.

In the embodiment of the present invention, translation model can be pre-established, it can be by statement translation to be translated by translation model For object statement, such as by after chinese input translation model, it can be translated as Uighur,

Using the embodiment of the present invention, the translation model passes through following steps training:

Sub-step S21 extracts the training corpus of preset quantity at random from preset Parallel Corpus；

Sub-step S22 pre-processes the training corpus, obtains preprocessed text；

Sub-step S23 carries out word segmentation processing to the preprocessed text, obtains participle text information；

Sub-step S24 encodes the participle text information from forward and reverse based on two-way RNN encoder, really The two-way RNN encoder is determined in the hidden state of each time step, and, the two-way RNN is compiled based on undirected RNN decoder The hidden state and semantic vector of each time step of code device are decoded, and establish translation model.

The training process of translation model can be with reference to the correlation step in translation model training method, and this will not be detailed here.

In the embodiment of the present invention, is encoded and determined from positive and negative both direction when being encoded using two-way RNN encoder The hidden state and semantic vector of each time step, and, using undirected RNN decoder to each time of two-way RNN encoder The hidden state and semantic vector of step are decoded, and establish translation model, avoid by the hidden state of all time steps and it is semantic to Amount is compressed in a fixed length vector, is avoided all information and is all compressed in a fixed length vector and contextual information is caused to dilute Or the capped and each time step of decoder causes translation model translation accuracy is low to ask with reference to the same fixed length vector Topic, improves the accuracy that translation model translates source statement.

It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.

Referring to Fig. 3, a kind of structural block diagram of the training device embodiment of translation model of the embodiment of the present invention is shown, is had Body may include following module:

Training corpus extraction module 301, for extracting the training language of preset quantity at random from preset Parallel Corpus Material；

Preprocessing module 302 obtains preprocessed text for pre-processing to the training corpus；

Word segmentation module 303 obtains participle text information for carrying out word segmentation processing to the preprocessed text；

Modeling module 304, for being compiled from forward and reverse to the participle text information based on two-way RNN encoder Code, determine the two-way RNN encoder in the hidden state of each time step, and, based on undirected RNN decoder to described two-way The hidden state and semantic vector of each time step of RNN encoder are decoded, and establish translation model.

Optionally, the modeling module 304 includes:

Optionally, the decoding sub-module includes:

Wherein, S_i=P (S_i-1, Y_i-1, h_i, C_i), P () indicates decoded state function.

Optionally, the training device further include:

Gradient computing module, for calculating gradient using the loss late；

Training ending module, for then terminating translation model training；

Referring to Fig. 4, a kind of structural block diagram of translating equipment embodiment of the embodiment of the present invention is shown, can specifically include Following module:

Sentence to be translated obtains module 401, for obtaining sentence to be translated；

Object statement extraction module 402 extracts in the translation model for training the input by sentence to be translated in advance Object statement；

Wherein, the translation model passes through with lower module training:

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of training method of translation model characterized by comprising

The training corpus is pre-processed, preprocessed text is obtained；

The participle text information is encoded from forward and reverse based on two-way RNN encoder, determines that the two-way RNN is compiled Code device each time step hidden state, and, based on undirected RNN decoder to each time of the two-way RNN encoder The hidden state and semantic vector of step are decoded, and establish translation model.

2. training method as described in claim 1, which is characterized in that described to be based on two-way RNN encoder from forward and reverse The participle text information is encoded, determines two-way RNN encoder packet the hidden state of each time step the step of It includes:

Positive RNN encodes the participle text information according to forward direction to obtain positive term vector characteristic sequence X_F=(X₁, X₂..., X_t), and positive hidden state Fhi is generated in each time step i, wherein Fh_i=(Fh₁,Fh₂..., Fh_t), i=l, 2 ..., T；F indicates the hidden state parameter of forward direction of translation model；

Reversed RNN is according to reversely encoding the participle text information to obtain reversed term vector characteristic sequence X_B=(X_t, X_t-1..., X₂, X₁), and reversed hidden state Bh is generated in each time step i_i, wherein Bh_i=(Bh₁, Bh₂..., Bh_t), i= L, 2 ..., T；The reversed hidden state parameter of B expression translation model；

According to the hidden state Fh of the forward direction_iWith reversed hidden state Bh_iDetermine the two-way RNN encoder in the hidden of each time step State h_i, wherein h_i=[Fh_i, Bh_i]。

3. training method as claimed in claim 2, which is characterized in that described to be encoded based on undirected RNN decoder to two-way RNN The step of hidden state and semantic vector of each time step of device are decoded, establish translation model include:

It is decoded, is obtained based on hidden state and semantic vector of the undirected RNN decoder to each time step of two-way RNN encoder To the decoded state function of translation model.

4. training method as claimed in claim 3, which is characterized in that described to be encoded based on undirected RNN decoder to two-way RNN The step of hidden state and semantic vector of each time step of device are decoded, obtain the decoded state function of translation model packet It includes:

According to decoded state S_i-1, label Y_i-1, hidden state h_iAnd semantic vector C_iDetermine that current time step i is corresponding undirected The decoded state S of RNN decoder_i；

Wherein, S_i=P (S_i-1, Y_i-1, h_i, C_i), P () indicates decoded state function.

5. training method as claimed in claim 4, which is characterized in that the semantic vector C_iFor the two-way RNN encoder The h of hidden state_i=[h₁, h₂..., h_t] weighted sum.

6. training method as claimed in claim 3, which is characterized in that described the step of establishing translation model further include:

According to default loss function and the probability calculation loss late；

Gradient is calculated using the loss late；

Judge whether the gradient meets default iterated conditional；

If so, terminating translation model training；

If it is not, carrying out gradient decline to the model parameter of the translation model using the gradient and preset learning rate, return The step of extracting the training objective corpus that the training corpus is aligned from the Parallel Corpus.

7. a kind of interpretation method characterized by comprising

Obtain sentence to be translated；

Wherein, the translation model is trained in the following manner:

The training corpus is pre-processed, preprocessed text is obtained；

8. a kind of training device of translation model characterized by comprising

Modeling module is determined for being encoded from forward and reverse to the participle text information based on two-way RNN encoder The two-way RNN encoder each time step hidden state, and, the two-way RNN is encoded based on undirected RNN decoder The hidden state and semantic vector of each time step of device are decoded, and establish translation model.

9. training device as claimed in claim 8, which is characterized in that the modeling module includes:

Positive encoding submodule, for positive RNN according to forward direction to the participle text information encoded to obtain positive word to Measure feature sequence X_F=(X₁, X₂..., X_t), and positive hidden state Fh is generated in each time step i_i, wherein Fh_i=(Fh₁, Fh₂..., Fh_t), i=l, 2 ..., T；F indicates the hidden state parameter of forward direction of translation model；

Phase-reversal coding submodule, for reversed RNN according to reversely to the participle text information encoded to obtain reversed word to Measure feature sequence X_B=(X_t, X_t-1..., X₂, X₁), and reversed hidden state Bh is generated in each time step i_i, wherein Bh_i= (Bh₁, Bh₂..., Bh_t), i=l, 2 ..., T；The reversed hidden state parameter of F expression translation model；

The two-way hidden state of RNN encoder determines submodule, for according to the hidden state Fh of the forward direction_iWith reversed hidden state Bh_iIt determines Hidden state h of the two-way RNN encoder in each time step_i, wherein h_i=[Fh_i, Bh_i]。

10. a kind of translating equipment characterized by comprising

Object statement extraction module extracts target language in the translation model for training the input by sentence to be translated in advance Sentence；

Wherein, the translation model passes through with lower module training: