CN107038159B - A kind of neural network machine interpretation method based on unsupervised domain-adaptive - Google Patents
A kind of neural network machine interpretation method based on unsupervised domain-adaptive Download PDFInfo
- Publication number
- CN107038159B CN107038159B CN201710139214.0A CN201710139214A CN107038159B CN 107038159 B CN107038159 B CN 107038159B CN 201710139214 A CN201710139214 A CN 201710139214A CN 107038159 B CN107038159 B CN 107038159B
- Authority
- CN
- China
- Prior art keywords
- translation
- word
- sentence
- field
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 26
- 238000013519 translation Methods 0.000 claims abstract description 113
- 238000012549 training Methods 0.000 claims abstract description 48
- 239000013598 vector Substances 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims description 8
- 238000011161 development Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005520 cutting process Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a kind of neural network machine interpretation method based on unsupervised domain-adaptive, comprising: is trained the input that the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as Softmax classifier and translation module;According to the field number that Softmax classifier generates, the number of translation network decoder is generated, the decoder based on target side generates target side and corresponding field.The present invention overcomes the problem of having marked FIELD Data is lacked in the prior art, save time and cost, can efficiently and accurately complete translation field between it is adaptive, have preferable practicability, have the good scope of application and scalability.
Description
Technical field
The present invention relates to machine learning and machine translation mothod fields, are based on unsupervised neck more particularly, to one kind
The adaptive neural network machine interpretation method in domain.
Background technique
Currently, increasingly gradually deepening with international exchange, demand of the people to language translation is growing day by day.However, generation
Category of language is various present on boundary, and internet is to become current modern the most easily to obtain information platform, and user to turning over online
The demand of translating is increasingly urgent to.Respectively there are feature, flexible form, so that language automatically processes, including the machine translation between language, at
For problem to be resolved.Simultaneously how for user provide the translation service of high quality as one be difficult to be resolved ask
Topic.Category of language present in internet is more, and each language has a large amount of ambiguity again, and language is in variation at every moment again
Among, this just puts forward higher requirements translation service.
In the prior art, to realize automatic machine translation, currently used technology is based on neural network and based on system
The method of meter, the former is NMT (Neural Machine Translation, neural network machine translation), the latter SMT
(Statistical Machien Translation, statistical machine translation).
However, in order to which the above-mentioned prior art can be good at realizing, need to collect the parallel corpora of extensive high quality with
Obtain reliable translation model.Instead, the parallel corpora of high quality usually exists only between a small amount of several language, and past
It is past to be limited to certain specific fields, such as public document, news etc..With the rise of internet, the circulation of international information becomes
It is unprecedentedly convenient to obtain, and people also become further urgent for the demand of high quality machine translation.At the same time, internet is also machine
Translation brings new opportunity.A large amount of corpus on internet, so that the parallel corpora acquisition of covering multilingual, field becomes
It may.Therefore the corpus in single field is seldom in the corpus that obtains from network, for example, become more readily available be news corpus,
But the corpus of a certain kind field, the especially medicine etc. such as government, film, trade, education, sport and literature and art is very difficult
To obtain.Trained corpus be belong to development set (carrying out tuning to the model trained with training corpus) corpus it is same
Field, while if testing material also belongs to the same field, (corpus in field) translation result is very good, otherwise (outside field
Corpus) it is very poor.Therefore, be badly in need of it is a kind of can the interpretation method for similarly realizing preferable translation effect between different field,
Such as training set and development set are news corpus, test set is that law corpus (outside field) can guarantee to avoid due to not as far as possible
The reason of same domain, declines the case where translation efficiency.
However, data weighting method in the prior art, the method is according to the similarity with corpus in domain come to those
Sentence distributes weight;The above-mentioned prior art be unable to do without the serious problems of mark corpus, if needing original training corpus cutting
Dry small component is so as to cause the complex operations such as model parameter number are increased, so that the performance drop of neural network machine translation
Low, translation efficiency is low and can not accurately obtain adaptive between each field.
Summary of the invention
The present invention in order to overcome the problems referred above or at least is partially solved the above problem, provides a kind of machine translation method.
According to an aspect of the present invention, a kind of machine translation method is provided characterized by comprising
Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as
The input of Softmax classifier and translation module is trained;
Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on mesh
The decoder for marking end generates target side and corresponding field.
The application proposes a kind of machine translation method, and the method can effectively be used without mark, i.e., not to
The parallel sentence pairs of subject information.Component part is indicated with hiding theme compared with the similar work in traditional SMT, and
It is not that original training data is cut into several composition datas.Most of all, the corpus of small component is fused to global mould
Type is a very big challenge for NMT, and reason is a lack of interpretability.Finally propose what an approximate parameter was shared
And model parameter number is reduced, and recognize the decoding algorithm being easily processed.Simultaneously with the similar Comparision in NMT, with
They do not have to the subject information predetermined marked by contrast, theme are regarded as in the present invention and treated as hidden variable
And holding is without the advantage of change translation model.Obviously, it is easy to be applied to other neural network in natural language processing
Model.The neural network machine interpretation method based on unsupervised domain-adaptive proposed in the present invention solves in the prior art
It is be unable to do without the serious problem of mark corpus, avoids causing to increase model by original several small components of training corpus cutting
The complex operations such as number of parameters improve the performance of neural network machine translation, improve translation efficiency and obtain more accurately neck
It is adaptive between domain.
Detailed description of the invention
Fig. 1 is according to a kind of overall flow schematic diagram of machine translation method of the embodiment of the present invention;
Fig. 2 is the domain classification knot according to Softmax sorter model in a kind of machine translation method of the embodiment of the present invention
Structure schematic diagram;
Fig. 3 is the hybrid decoding side according to categorization module and translation module in a kind of machine translation method of the embodiment of the present invention
Formula 1 (SUM) schematic diagram;
Fig. 4 is the hybrid decoding side according to categorization module and translation module in a kind of machine translation method of the embodiment of the present invention
Formula 2 (MAX) schematic diagram.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below
Example is not intended to limit the scope of the invention for illustrating the present invention.
It proposes in the present invention a kind of based on unsupervised domain-adaptive neural network machine interpretation method.
Transfer learning inside NMT or SMT can be divided into two kinds of forms, and one is DA (domain-adaptive), NMT models
Itself has terseness, and has done least a priori assumption, its performance on field adapts to is inherently more more excellent than SMT,
And it can more easily utilize the knowledge of different field.For example the knowledge of news can effectively help Interpreter, it is another
Kind is migration, how to utilize extensive single original knowledge improvement machine translation of language.
No matter the method for many domain-adaptives is proposed in NMT or in SMT in recent years, besides the DA in SMT
In-depth study is obtained.All methods can be summarized as following five kinds: (1) self-training method is utilized in the field of single language
Corpus.(2) data selecting method is retrieved some with corpus identical in domain.(3) data weighting method, according to corpus in domain
Similarity come give those sentences distribution weight.(4) based on the method for context, according to part or global context come
Distinguish the translation between different field.(5) suitable between sufficiently using topic model to go realization every field based on the method for theme
It answers.
Method proposed by the present invention is similar with the third method, but primary difference is that hiding theme has been used to carry out table
Show component part, rather than original training data is cut into several composition datas.Most of all, the corpus of small component
Being fused to world model is a very big challenge for NMT, and reason is a lack of interpretability.Finally propose an approximation
Parameter share and reduce model parameter number, and recognize the decoding algorithm being easily processed.
Others also did similar work in the same old way in NMT, such as the natural language processing group of Stanford University is done
One adapts to a new field using already present model, and acquisition is obviously improved.This method is being led on a large scale first
Overseas (it is as defined above, with the different field of test set, such as training set and development set news, test set government corpus,
Film or trade etc.) on corpus, continue to train in the data that then (provided explanation above) in new small-scale field again, this
When their method be concerned with from the overseas transmitting information process in domain, but concern that exploitation one is able to solve number
According to the mixed model of isomerism.
In addition to this, other job description is that subject information is directly embedded into neural network.Specifically give
NMT provides the corpus of the classification by the LDA subject information obtained or in decoding stage direct labor mark.Also some scholars
Subject information is further fused to the encoder and decoder stage.In addition, proposing one kind there are also in some new work
The method of control field and with realm information come expansion word vector layer.Predetermined marked is not had to by contrast with them
The difference of the subject information infused, technical solution of the present invention and these work is to regard theme as hidden variable (hidden variable)
To treat and keep the advantage without changing translation model.It is apparent that being easily applied in natural language processing
Other neural network model.
In conclusion it is badly in need of a kind of method of new domain-adaptive in machine translation (either NMT or SMT),
Solve the serious problems for be unableing to do without mark corpus in the prior art, avoid original several small cities of training corpus cutting point and
Cause to increase the complex operations such as model parameter number, improve the performance of neural network machine translation, improve translation efficiency and obtains
It obtains adaptive between more accurate field.
Neural network machine interpretation method based on unsupervised domain-adaptive is intended to be instructed in the parallel sentence pairs of no mark
Practice to generate and translates effect also preferable translation model outside a field on test set.
The method of the present invention is exactly mixed model in fact, because including the disaggregated model and neural network of Softmax classifier
The translation model of machine translation.The mixed model thought that forefathers are done be first training corpus according to field (news, film,
Government, trade etc.) resolve into several different component partsThen translation model P (y | x;θc) at each
Small-scale corpus < Xc,Yc> on training.The model combination of these each component parts is to an independent world model:Wherein λcHybrid parameter, i.e., the parameter of c-th component part, also just to
What the distance between the target phase text of translation and the corpus of each small component part matrix obtained, such as: tf/idf,
The methods of LSA, perplexity or EM (call TF-IDF, cosine similarity in the present invention in order to calculate similarity).This
A little hybrid parameters are to be come by text similarity to predict, without interfering in the learning process of entire mixed model.Although mixed
It is fine to close modeling pattern effect in SMT, but it is less simple to adapt to NMT.Basis in SMTFor, Model Fusion that very simply each has individually been instructed to global mould
Type, it appears that how he is mixed into NMT has point fuzziness.Main difficult point is that (different field is different next for different component parts
Source) thousands of a highests that the neural network translation model come is chosen at those for efficiency reasons are trained on small-scale corpus
There are apparent difference on the vocabulary of frequency, illustrate the translation model of training on each different field data set in each small rule
On mould corpus component part before training, in pretreatment stage, thousands of a vocabulary of high frequency are selected, at this time every field model
Apparent difference is generated on vocabulary.
If not the Model Fusion of small-scale component part to world model, parameter space will be will increase.Except this, search
Rope algorithm must add up the weight of the translation probability of all small-scale model sentence levels:
Target phase maximum probabilityIt will not picture
Equally it is broken down into word rank.However, being difficult the nerve those small-scale component parts in decoding process
The prediction fusion of Network-based machine translation model is got up, therefore needs to develop the neural network machine translation an of mixed model really
System.
Such as Fig. 1, a kind of overall flow schematic diagram of machine translation method in a specific embodiment of the invention is shown.It is overall
On, comprising:
Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as
The input of Softmax classifier and translation module is trained;
Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on mesh
The decoder for marking end generates target side and corresponding field.
In another of the invention specific embodiment, a kind of machine translation method, before the step 1 further include:
Step 0, training corpus data set is constructed, the training corpus in the data set is pre-processed;It utilizes
Softmax sorter model and translation model are trained the training corpus, respectively obtain classification and translation model parameter;
In another specific embodiment of the invention, a kind of machine translation method, between the step 0 and step 1 further include:
Based on the pretreated training corpus data set, the encoder stage of translation module is obtained using GRU and Softmax classifies
The input of device.
In another of the invention specific embodiment, a kind of machine translation method is classified according to Softmax in the step 2
The field number that device generates, the number for generating translation network decoder further comprises:
S21 divides Softmax sorter model to t domain class;
S22, in the decoder stage of translation module, according to t decoder of defeated generation of classifier modules.
In another of the invention specific embodiment, a kind of machine translation method, before the step 1 further include: using two-way
GRU neural network obtains the vector of the last one word of source sentence and first word in bilingual corpora training sample and indicates;Simultaneously
The last one word of source sentence and first in bilingual corpora training sample can be obtained using CNN neural network or LSTM neural network
The vector of a word indicates.
In another of the invention specific embodiment, a kind of machine translation method constructs training corpus data in the step 0
Collection further comprises:
Collect bilingual sentence pair;Select training set, development set and test set;The bilingual sentence pair is no field markup information
Sentence pair.
In another of the invention specific embodiment, a kind of machine translation method, in the data set in the step 0
Training corpus carries out pretreatment:
Sentence in the data set in source language text and target language text is cut into word and unified converted magnitude
It writes.
In another specific embodiment of the invention, a kind of machine translation method, the step 2 further comprises:
Wherein, the first item in formula on the right side of equation is the categorization module of entire model, will association's γ prediction by training
Be t, the model parameter θ of required in Section 2tPrediction is y;T ∈ { 1 ..., T } indicates the theme of source sentence x
Integer, T are the theme numbers pre-defined,It is to predict to distribute the theme of theme probability to x from model.
In another specific embodiment of the invention, a kind of machine translation method, the field that the Softmax classifier generates
Number can be configured according to input.
In another specific embodiment of the invention, a kind of machine translation method, the step 2 further comprises:
In the decoder stage of translation module, according to t decoder of defeated generation of classifier modules;
The decoder that the probability in the t field that Softmax classifier modules generate and the decoding stage of translation module generate
The original state of number be random entirely.
In another specific embodiment of the invention, a kind of machine translation method specifically comprises the following steps:
A, prepare on a large scale without mark, without the parallel sentence pairs training corpus of subject information;
B, reference term vector model indicates to obtain the term vector for each word that source sentence is included;
C, the encoder stage of translation module and the input of Softmax classifier are obtained using GRU;
D, obtained input is handled with two-way GRU, also just the vector of sentence is indicated to pass through the last one word
Term vector indicate (forward direction) and first word term vector expression (reversed) come what is obtained, be then delivered to classifier modules with
Translation module;
E, the field number generated according to Softmax classifier generates the number of translation network decoder and is solved
Code generates object language and corresponding field.
The step A further comprises: constructing data set and is pre-processed, and using production model in training set
Training corpus be trained, obtain model parameter;
The building data set includes collecting bilingual sentence pair, selects training set, development set and test set;
It is described carry out pretreatment include that word is cut into the sentence in data concentrated source language text and target language text
(being consistent the Chinese word segmentation Open-Source Tools for calling the exploitation of Stanford University's natural language processing group when participle), Yi Jitong
One converted magnitude, which is write, (with the MOSES truecaser.perl carried or or oneself can write the conversion of capital and small letter
Script) and tokennize (English corpus is needed without exception to carry out tokenize, in the same old way for being consistent property, Ke Yiyong
MOSES included tokenizer.perl).
Specifically, the model parameter includes the translation probability between source language and the target language;
Parallel sentence pairs in the step A refer to source sentence: x=x1,…,xi,…,xIWith target side y=y1,…,
yj,…,yJ。
Further, the step B is realized particular by the pre-treatment step of RNN language model:
Because RNN language model is made of look-up layers, three parts such as hidden layer and output layer.The sentence of input
Each word for being included is converted into corresponding term vector by look-up layers and indicated:
xt=look-up (s),
Wherein, xtIt is the term vector expression of s, s is that the input of each time period t and look-up indicate look-up
Layer.
Further, the step C is realized particular by execution following steps:
C1, it is made of because of whole network two parts, categorization module and translation module: translation module refers to nerve net
Network machine translation module, categorization module are only to have invoked Softmax classifier herein as an independent classifier.
C2, for pass through step A parallel sentence pairs source x=x obtained1,…,xi,…,xIAnd target phase y=
y1,…,yj,…,yJ, usually the translation probability of sentence level, factorization becomes the general of word rank for neural network machine translation
Rate:
Wherein, θ is a series of model parameter, y<jIt is partial translation.Expression formula:When as training set, the training objective of standard is the log- for maximizing training corpus
Likelihood:
The decision rule of translation is that (both not trained) the source sentence x not encountered is given by public affairs
The model parameter that formula is acquired
That is, by the best probability of target phaseIt calculates, these probability factors point
Solution is the translation of word rank:
Preferably, further comprising the steps of after the step C:
D, it after the input for obtaining Softmax classifier and translation module by step C, needs to be further processed, also just exist
The encoder stage (coding) of this two module what is desired is that sentence vector representation, obtained using two-way GRU whole
The expression of a source sentence.Because GRU is also a unit of RNN network, and explanation, RNN language have been given in step
Model is made of look-up layers, hidden layer and output layer.Asked in stepb by RNN the word of each word to
Amount indicates, then result is sent to the input of encoder, that is, gives encoder stage hidden layer institute ready message.Hidden layer calculating is worked as
It is both the several hidden status informations of term vector and front of each word to obtain using loop-up layers of output when preceding hidden state
It takes, term vector is just also mapped to context vector: ht=f (xt,ht-1) wherein to be an abstract function providing input to f
xtWith historic state ht-1Under the premise of, calculate current new hidden state.Original state h0It is usually set as 0, the choosing of common f function
Select as being following provide, σ be nonlinear function (such as: softmax or tanh) ht=σ (Wxhxt+Wxhht-1).Herein
Softmax refers to that translation module encoder section calculates the softmax called when hidden state, this with proposed in the present invention it is whole
The softmax called in a model classifiers module is different, and categorization module is independent domain classification device in fact, translates instead
The hidden state of calculating at modular coding device end and use, act on different, and find out current new hidden state when may not
Call softmax that can also use other nonlinear activation primitive tanh etc. in fact.
Therefore, forward direction (forward) state of two-way BiRNN is calculated according to formula:
Wherein It is term vector matrix, It is weight matrix.N × m is respectively term vector dimension and hidden state number.σ ⊙ is Logistic
Sigmoid function.Reversed stateIt calculates also as forward direction.Term vector square is shared between positive and repercussion
Battle arrayBut it is different from weight matrix.It is obtained after forward and reverse is mergedThe expression of these symbols, wherein
It further illustrating, single GRU is made of update door and adjustment door in two-way GRU, as follows:
ut=σ (Wuxt+Uuht-1+bu),
rt=σ (Wrxt+Urht-1+br),
Wherein utIt is to update door, rtIt is adjustment door,It is candidate activation, the operation of ⊙ element mode multiplication, htIt is upper one hidden
State ht-1It is activated with candidateBetween linear insertion.Intuitively, update agte selects hidden state whether by new shape
State updates out, and adjustment door determines situations such as whether a upper hidden state is ignored.
Further, further comprising the steps of after the step D:
E, Softmax classifier modules are primarily referred to as and sort out the probability for the every field come and the target phase of translation module
The generation of sentence.The information in the field that intuitively categorization module generates is that the different probability in t field are added to translation
Subject information is also just added to translation probability by module by way of hidden variable:
The decoder network of translation module originally also corresponding hidden state, but this hidden state is with encoder net
Network is not quite alike, and detailed calculating process is as follows:
Wherein:
zi=σ (WzEyi+Uzsi-1+Czci),
ri=σ (WrEyi+Ursi-1+Crci),
E is the term vector matrix for each word that target language sentence is included, W, Wz, U,Uz,Ur And C, Cz,These are weights.M, n are respectively term vector dimension and hidden state number.σ ⊙ is to patrol
Collect this meaning sigmoid function.Initial hidden state s0It is to calculate in the following way:
Wherein
Context vector (context vector) is by heavy come each time step (time step) to its model
Newly calculated:
Wherein
hjIt is j-th of symbol (hidden state) in source sentence,AndIt is weight vectors entirely.
In another specific embodiment of the invention, a kind of machine translation method, Fig. 2 is according to a kind of machine of the embodiment of the present invention
The domain classification structural schematic diagram of Softmax sorter model in device interpretation method.
Unlike the mixed model that the hybrid parameter being previously mentioned in SMT is obtained by text similarity, in this hair
It is bright it is middle theme submodel when hybrid weight calls, be with translation submodel together optimised hybrid weight.The present invention is mixed
Molding type expands the NMT of that standard by the way that hidden variable is added:
Wherein, t ∈ { 1 ..., T } is the integer for indicating the theme of source sentence x, and T is the theme number pre-defined,It is to predict the theme that theme probability is distributed to x from model, also with regard to the module in Fig. 2.
Be t theme translation submodule be respectively in Fig. 3 and 4 the right neural network machine translation module.
In order to solve the factorization problem of word rank above-mentioned, approximate mixed model and assume those word ranks
It translates mutually independent:
The third line of above formula shows that mixed model can permit the training in word grade, also just provides searching algorithm as follows
In effect.Although this approximation violates the independence in NMT, had been significantly improved in actual application.
In order to sentence classification and be previously mentioned in Fig. 2 theme submodel P (t | x;It may γ) utilize more networks
Framework, for example, CNN and Recursive Auto Encoder (recursive autocoder).It is utilized in the present invention based on logical
Encoder is crossed to indicate the simple Softmax classifier of study.Provide one include I vocabulary source sentence x, use with
GRU calculates positive state as the bi-RNN of unitWith reversed stateThen the positive shape of the last one word
State (state of the last one word is both calculated according to positive RNN) and the reverse state of first word (are counted with reversed RNN
Calculate the state of first word) it is combinedHe is sent to the input of Softmax classifier, and (work of forefathers is
Sentence is calculated in encoder stage and has finally added sentence finishing sign).This strategy has following several advantages:
GRU is obtained as unit by RNN the function of long-distance dependence.I.e.It is concluded that source sentence is anti-
To state, andIt is concluded that the forward condition of source sentence.
Softmax layers of inputIt is that dimension is fixed, and the length of sentence is inputted independently of source
Degree.
Theme submodel described in Fig. 2 and those translation submodels (Fig. 3, the translation submodule on the right in 4) are mutually altogether
One and same coding device is enjoyed, designs the parameter space very outstanding for reducing mixed model in this way.
Translation submodel P (y | t, x;In γ), the encoder based on attention mechanism of the standard of forefathers has been referred to
Decoder model.It shares simultaneously in order to reduce parameter space all neural networks translation submodules with theme submodel same.
In other words, mixed model has an encoder and T decoder.
In another specific embodiment of the invention, a kind of machine translation method, in conjunction with Fig. 3, the process of step A-D has been given
It explains out as described above, being therefore not repeated, the implementation of direct introduction step E.The classification mould that the embodiment of the present invention proposes
Hybrid decoding mode 1 (SUM's) of block and translation module comprises the concrete steps that:
The training set being previously mentioned in step ATraining objective find out some maximization training corpus
The model parameter of log-likelihood:
Wherein
Mini-batch (small a batch is parallel to train sentence pair) stochastic gradient descent algorithm of standard is for estimating theme and turning over
Translate the parameter of device submodel.
Fig. 3, in 4, provide to learn the model parameter come outWithThe source sentence x's not trained
Translating decision rule is to be calculated in the following way,
Show to use maximum probabilityIt can be factorized as word rank to calculate translation,
It is similar to the work of forefathers' standard:
The decoding process that the present invention is previously mentioned also proposes a new decoding process other than such as Fig. 3, following real
It applies shown in example.
In another specific embodiment of the invention, a kind of machine translation method, for a upper example and Fig. 3, this
It invents while proposing second of decoding process, also with regard to another implementation of step E.That is the decoding side proposed in step E
Method is as being previously mentioned in an embodiment as above, and mainly Softmax classifier sorts out the output result t neck come
Each probability in domain is multiplied with T decoder outcomes of translation submodule decoder section and then all factors are added to obtain most
Translation result afterwards.It is somewhat different with them to be, do not have to now and the T of the probability in each t field and translation module
Decoder outcomes are multiplied, and the probability in t field wherein just needs maximum probability and maximum probability subscript t insteadjInstitute is right
The T answeredjResult be multiplied, that is to say, that the operation of SUM and MAX are replaced, carried out according to following two small steps
At:
The formula provided in embodiment two:
Regard as SUM decoding process and
It regards as MAX decoding process.
Finally, the present processes are only preferable embodiment, it is not intended to limit the scope of the present invention.It is all
Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in protection of the invention
Within the scope of.
Claims (10)
1. a kind of machine translation method characterized by comprising
Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as Softmax
The input of classifier and translation module is trained;
Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on target side
Decoder generate target side and corresponding field.
2. the method as described in claim 1, which is characterized in that before the step 1 further include:
Step 0, training corpus data set is constructed, the training corpus in the data set is pre-processed;Utilize Softmax points
Class device model and translation model are trained the training corpus, respectively obtain classification and translation model parameter.
3. method according to claim 2, which is characterized in that between the step 0 and step 1 further include: based on described pre-
Treated training corpus data set, the encoder stage of translation module and the input of Softmax classifier are obtained using GRU.
4. method according to claim 2, which is characterized in that the field generated in the step 2 according to Softmax classifier
Number, the number for generating translation network decoder further comprises:
S21 divides Softmax sorter model to t domain class;
S22 generates t decoder according to the input of classifier modules in the decoder stage of translation module.
5. the method as described in claim 1, which is characterized in that before the step 1 further include: utilize two-way GRU neural network
Obtaining the vector of the last one word of source sentence and first word in bilingual corpora training sample indicates;CNN can be also utilized simultaneously
Neural network or LSTM neural network obtain the vector of source sentence the last one word and first word in bilingual corpora training sample
It indicates.
6. method according to claim 2, which is characterized in that construct training corpus data set in the step 0 and further wrap
It includes:
Collect bilingual sentence pair;Select training set, development set and test set;The bilingual sentence pair for no field markup information sentence
It is right.
7. method as claimed in claim 6, which is characterized in that in the step 0 to the training corpus in the data set into
Row pretreatment further comprises:
Sentence in the data set in source language text and target language text is cut into word and be converted to simultaneously capitalization or
Small letter.
8. method according to claim 2, which is characterized in that the step 2 further comprises:
Wherein, x=x1,…,xi,…,xIFor source sentence;Y=y1,…,yj,…,yJFor target side sentence;Equation is right in formula
The first item of side is the categorization module of entire model, by training by learn γ prediction be t, the mould of required in Section 2
Shape parameter θtPrediction is y;T ∈ { 1 ..., T } indicates the integer of the theme number of source sentence x, and T is the master pre-defined
Inscribe number.
9. the method as described in claim 1, which is characterized in that the field number that the Softmax classifier generates being capable of root
It is configured according to input.
10. the method as described in claim 1, which is characterized in that the step 2 further comprises:
For the decoder that the probability in the t field that Softmax classifier modules generate and the decoding stage of translation module generate
Several original states is random entirely.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710139214.0A CN107038159B (en) | 2017-03-09 | 2017-03-09 | A kind of neural network machine interpretation method based on unsupervised domain-adaptive |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710139214.0A CN107038159B (en) | 2017-03-09 | 2017-03-09 | A kind of neural network machine interpretation method based on unsupervised domain-adaptive |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107038159A CN107038159A (en) | 2017-08-11 |
CN107038159B true CN107038159B (en) | 2019-07-12 |
Family
ID=59534308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710139214.0A Active CN107038159B (en) | 2017-03-09 | 2017-03-09 | A kind of neural network machine interpretation method based on unsupervised domain-adaptive |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107038159B (en) |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107632981B (en) * | 2017-09-06 | 2020-11-03 | 沈阳雅译网络技术有限公司 | Neural machine translation method introducing source language chunk information coding |
CN107729326B (en) * | 2017-09-25 | 2020-12-25 | 沈阳航空航天大学 | Multi-BiRNN coding-based neural machine translation method |
CN107832845A (en) | 2017-10-30 | 2018-03-23 | 上海寒武纪信息科技有限公司 | A kind of information processing method and Related product |
CN107729329B (en) * | 2017-11-08 | 2021-03-26 | 苏州大学 | Neural machine translation method and device based on word vector connection technology |
CN107886940B (en) * | 2017-11-10 | 2021-10-08 | 科大讯飞股份有限公司 | Voice translation processing method and device |
RU2692049C1 (en) | 2017-12-29 | 2019-06-19 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for translating source sentence in first language by target sentence in second language |
CN111401084B (en) * | 2018-02-08 | 2022-12-23 | 腾讯科技(深圳)有限公司 | Method and device for machine translation and computer readable storage medium |
CN108460028B (en) * | 2018-04-12 | 2021-08-03 | 苏州大学 | Domain adaptation method for integrating sentence weight into neural machine translation |
EP3732633A1 (en) * | 2018-05-18 | 2020-11-04 | Google LLC | Universal transformers |
CN108763504B (en) * | 2018-05-30 | 2020-07-24 | 浙江大学 | Dialog reply generation method and system based on reinforced double-channel sequence learning |
CN110633801B (en) * | 2018-05-30 | 2024-09-24 | 北京三星通信技术研究有限公司 | Optimization processing method and device for deep learning model and storage medium |
CN109117483B (en) * | 2018-07-27 | 2020-05-19 | 清华大学 | Training method and device of neural network machine translation model |
US11373049B2 (en) * | 2018-08-30 | 2022-06-28 | Google Llc | Cross-lingual classification using multilingual neural machine translation |
US12094456B2 (en) | 2018-09-13 | 2024-09-17 | Shanghai Cambricon Information Technology Co., Ltd. | Information processing method and system |
CN109190131B (en) * | 2018-09-18 | 2023-04-14 | 北京工业大学 | Neural machine translation-based English word and case joint prediction method thereof |
CN109697292B (en) * | 2018-12-17 | 2023-04-21 | 北京百度网讯科技有限公司 | Machine translation method, device, electronic equipment and medium |
CN109697232B (en) * | 2018-12-28 | 2020-12-11 | 四川新网银行股份有限公司 | Chinese text emotion analysis method based on deep learning |
CN109726404B (en) * | 2018-12-29 | 2023-11-10 | 安徽省泰岳祥升软件有限公司 | Training data enhancement method, device and medium of end-to-end model |
CN109933808B (en) * | 2019-01-31 | 2022-11-22 | 沈阳雅译网络技术有限公司 | Neural machine translation method based on dynamic configuration decoding |
CN111783435B (en) * | 2019-03-18 | 2024-06-25 | 株式会社理光 | Shared vocabulary selection method, device and storage medium |
CN110309516B (en) * | 2019-05-30 | 2020-11-24 | 清华大学 | Training method and device of machine translation model and electronic equipment |
CN110472727B (en) * | 2019-07-25 | 2021-05-11 | 昆明理工大学 | Neural machine translation method based on re-reading and feedback mechanism |
CN110457710B (en) * | 2019-08-19 | 2022-08-02 | 电子科技大学 | Method and method for establishing machine reading understanding network model based on dynamic routing mechanism, storage medium and terminal |
CN110674648B (en) * | 2019-09-29 | 2021-04-27 | 厦门大学 | Neural network machine translation model based on iterative bidirectional migration |
CN111178085B (en) * | 2019-12-12 | 2020-11-24 | 科大讯飞(苏州)科技有限公司 | Text translator training method, and professional field text semantic parsing method and device |
CN112052692B (en) * | 2020-08-12 | 2021-08-31 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning |
CN111931854B (en) * | 2020-08-12 | 2021-03-23 | 北京建筑大学 | Method for improving portability of image recognition model |
CN112163372B (en) * | 2020-09-21 | 2022-05-13 | 上海玫克生储能科技有限公司 | SOC estimation method of power battery |
CN112966530B (en) * | 2021-04-08 | 2022-07-22 | 中译语通科技股份有限公司 | Self-adaptive method, system, medium and computer equipment in machine translation field |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106126505A (en) * | 2016-06-20 | 2016-11-16 | 清华大学 | Parallel phrase learning method and device |
CN106202068A (en) * | 2016-07-25 | 2016-12-07 | 哈尔滨工业大学 | The machine translation method of semantic vector based on multi-lingual parallel corpora |
-
2017
- 2017-03-09 CN CN201710139214.0A patent/CN107038159B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106126505A (en) * | 2016-06-20 | 2016-11-16 | 清华大学 | Parallel phrase learning method and device |
CN106202068A (en) * | 2016-07-25 | 2016-12-07 | 哈尔滨工业大学 | The machine translation method of semantic vector based on multi-lingual parallel corpora |
Also Published As
Publication number | Publication date |
---|---|
CN107038159A (en) | 2017-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107038159B (en) | A kind of neural network machine interpretation method based on unsupervised domain-adaptive | |
CN108734276B (en) | Simulated learning dialogue generation method based on confrontation generation network | |
CN112699247A (en) | Knowledge representation learning framework based on multi-class cross entropy contrast completion coding | |
CN106095872A (en) | Answer sort method and device for Intelligent Answer System | |
CN106547735A (en) | The structure and using method of the dynamic word or word vector based on the context-aware of deep learning | |
CN108830287A (en) | The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method | |
CN107145483A (en) | A kind of adaptive Chinese word cutting method based on embedded expression | |
CN107836000A (en) | For Language Modeling and the improved artificial neural network of prediction | |
CN107145484A (en) | A kind of Chinese word cutting method based on hidden many granularity local features | |
CN109753567A (en) | A kind of file classification method of combination title and text attention mechanism | |
CN107451278A (en) | Chinese Text Categorization based on more hidden layer extreme learning machines | |
CN109829049A (en) | The method for solving video question-answering task using the progressive space-time attention network of knowledge base | |
CN106980650A (en) | A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications | |
CN110162789A (en) | A kind of vocabulary sign method and device based on the Chinese phonetic alphabet | |
CN115510814B (en) | Chapter-level complex problem generation method based on dual planning | |
CN108829823A (en) | A kind of file classification method | |
CN115422369B (en) | Knowledge graph completion method and device based on improved TextRank | |
CN105975497A (en) | Automatic microblog topic recommendation method and device | |
CN107967253A (en) | A kind of low-resource field segmenter training method and segmenting method based on transfer learning | |
Zhao et al. | Synchronously improving multi-user English translation ability by using AI | |
CN115423118A (en) | Method, system and device for fine tuning of pre-training language model | |
CN112905762A (en) | Visual question-answering method based on equal attention-deficit-diagram network | |
Ku et al. | Adding learning to cellular genetic algorithms for training recurrent neural networks | |
CN116797850A (en) | Class increment image classification method based on knowledge distillation and consistency regularization | |
CN111967265B (en) | Chinese word segmentation and entity recognition combined learning method for automatic generation of data set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |