CN107038159B - A kind of neural network machine interpretation method based on unsupervised domain-adaptive - Google Patents

A kind of neural network machine interpretation method based on unsupervised domain-adaptive Download PDF

Info

Publication number
CN107038159B
CN107038159B CN201710139214.0A CN201710139214A CN107038159B CN 107038159 B CN107038159 B CN 107038159B CN 201710139214 A CN201710139214 A CN 201710139214A CN 107038159 B CN107038159 B CN 107038159B
Authority
CN
China
Prior art keywords
translation
word
sentence
field
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710139214.0A
Other languages
Chinese (zh)
Other versions
CN107038159A (en
Inventor
米尔阿迪力江·麦麦提
刘洋
栾焕博
孙茂松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201710139214.0A priority Critical patent/CN107038159B/en
Publication of CN107038159A publication Critical patent/CN107038159A/en
Application granted granted Critical
Publication of CN107038159B publication Critical patent/CN107038159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of neural network machine interpretation method based on unsupervised domain-adaptive, comprising: is trained the input that the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as Softmax classifier and translation module;According to the field number that Softmax classifier generates, the number of translation network decoder is generated, the decoder based on target side generates target side and corresponding field.The present invention overcomes the problem of having marked FIELD Data is lacked in the prior art, save time and cost, can efficiently and accurately complete translation field between it is adaptive, have preferable practicability, have the good scope of application and scalability.

Description

A kind of neural network machine interpretation method based on unsupervised domain-adaptive
Technical field
The present invention relates to machine learning and machine translation mothod fields, are based on unsupervised neck more particularly, to one kind The adaptive neural network machine interpretation method in domain.
Background technique
Currently, increasingly gradually deepening with international exchange, demand of the people to language translation is growing day by day.However, generation Category of language is various present on boundary, and internet is to become current modern the most easily to obtain information platform, and user to turning over online The demand of translating is increasingly urgent to.Respectively there are feature, flexible form, so that language automatically processes, including the machine translation between language, at For problem to be resolved.Simultaneously how for user provide the translation service of high quality as one be difficult to be resolved ask Topic.Category of language present in internet is more, and each language has a large amount of ambiguity again, and language is in variation at every moment again Among, this just puts forward higher requirements translation service.
In the prior art, to realize automatic machine translation, currently used technology is based on neural network and based on system The method of meter, the former is NMT (Neural Machine Translation, neural network machine translation), the latter SMT (Statistical Machien Translation, statistical machine translation).
However, in order to which the above-mentioned prior art can be good at realizing, need to collect the parallel corpora of extensive high quality with Obtain reliable translation model.Instead, the parallel corpora of high quality usually exists only between a small amount of several language, and past It is past to be limited to certain specific fields, such as public document, news etc..With the rise of internet, the circulation of international information becomes It is unprecedentedly convenient to obtain, and people also become further urgent for the demand of high quality machine translation.At the same time, internet is also machine Translation brings new opportunity.A large amount of corpus on internet, so that the parallel corpora acquisition of covering multilingual, field becomes It may.Therefore the corpus in single field is seldom in the corpus that obtains from network, for example, become more readily available be news corpus, But the corpus of a certain kind field, the especially medicine etc. such as government, film, trade, education, sport and literature and art is very difficult To obtain.Trained corpus be belong to development set (carrying out tuning to the model trained with training corpus) corpus it is same Field, while if testing material also belongs to the same field, (corpus in field) translation result is very good, otherwise (outside field Corpus) it is very poor.Therefore, be badly in need of it is a kind of can the interpretation method for similarly realizing preferable translation effect between different field, Such as training set and development set are news corpus, test set is that law corpus (outside field) can guarantee to avoid due to not as far as possible The reason of same domain, declines the case where translation efficiency.
However, data weighting method in the prior art, the method is according to the similarity with corpus in domain come to those Sentence distributes weight;The above-mentioned prior art be unable to do without the serious problems of mark corpus, if needing original training corpus cutting Dry small component is so as to cause the complex operations such as model parameter number are increased, so that the performance drop of neural network machine translation Low, translation efficiency is low and can not accurately obtain adaptive between each field.
Summary of the invention
The present invention in order to overcome the problems referred above or at least is partially solved the above problem, provides a kind of machine translation method.
According to an aspect of the present invention, a kind of machine translation method is provided characterized by comprising
Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as The input of Softmax classifier and translation module is trained;
Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on mesh The decoder for marking end generates target side and corresponding field.
The application proposes a kind of machine translation method, and the method can effectively be used without mark, i.e., not to The parallel sentence pairs of subject information.Component part is indicated with hiding theme compared with the similar work in traditional SMT, and It is not that original training data is cut into several composition datas.Most of all, the corpus of small component is fused to global mould Type is a very big challenge for NMT, and reason is a lack of interpretability.Finally propose what an approximate parameter was shared And model parameter number is reduced, and recognize the decoding algorithm being easily processed.Simultaneously with the similar Comparision in NMT, with They do not have to the subject information predetermined marked by contrast, theme are regarded as in the present invention and treated as hidden variable And holding is without the advantage of change translation model.Obviously, it is easy to be applied to other neural network in natural language processing Model.The neural network machine interpretation method based on unsupervised domain-adaptive proposed in the present invention solves in the prior art It is be unable to do without the serious problem of mark corpus, avoids causing to increase model by original several small components of training corpus cutting The complex operations such as number of parameters improve the performance of neural network machine translation, improve translation efficiency and obtain more accurately neck It is adaptive between domain.
Detailed description of the invention
Fig. 1 is according to a kind of overall flow schematic diagram of machine translation method of the embodiment of the present invention;
Fig. 2 is the domain classification knot according to Softmax sorter model in a kind of machine translation method of the embodiment of the present invention Structure schematic diagram;
Fig. 3 is the hybrid decoding side according to categorization module and translation module in a kind of machine translation method of the embodiment of the present invention Formula 1 (SUM) schematic diagram;
Fig. 4 is the hybrid decoding side according to categorization module and translation module in a kind of machine translation method of the embodiment of the present invention Formula 2 (MAX) schematic diagram.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.
It proposes in the present invention a kind of based on unsupervised domain-adaptive neural network machine interpretation method.
Transfer learning inside NMT or SMT can be divided into two kinds of forms, and one is DA (domain-adaptive), NMT models Itself has terseness, and has done least a priori assumption, its performance on field adapts to is inherently more more excellent than SMT, And it can more easily utilize the knowledge of different field.For example the knowledge of news can effectively help Interpreter, it is another Kind is migration, how to utilize extensive single original knowledge improvement machine translation of language.
No matter the method for many domain-adaptives is proposed in NMT or in SMT in recent years, besides the DA in SMT In-depth study is obtained.All methods can be summarized as following five kinds: (1) self-training method is utilized in the field of single language Corpus.(2) data selecting method is retrieved some with corpus identical in domain.(3) data weighting method, according to corpus in domain Similarity come give those sentences distribution weight.(4) based on the method for context, according to part or global context come Distinguish the translation between different field.(5) suitable between sufficiently using topic model to go realization every field based on the method for theme It answers.
Method proposed by the present invention is similar with the third method, but primary difference is that hiding theme has been used to carry out table Show component part, rather than original training data is cut into several composition datas.Most of all, the corpus of small component Being fused to world model is a very big challenge for NMT, and reason is a lack of interpretability.Finally propose an approximation Parameter share and reduce model parameter number, and recognize the decoding algorithm being easily processed.
Others also did similar work in the same old way in NMT, such as the natural language processing group of Stanford University is done One adapts to a new field using already present model, and acquisition is obviously improved.This method is being led on a large scale first Overseas (it is as defined above, with the different field of test set, such as training set and development set news, test set government corpus, Film or trade etc.) on corpus, continue to train in the data that then (provided explanation above) in new small-scale field again, this When their method be concerned with from the overseas transmitting information process in domain, but concern that exploitation one is able to solve number According to the mixed model of isomerism.
In addition to this, other job description is that subject information is directly embedded into neural network.Specifically give NMT provides the corpus of the classification by the LDA subject information obtained or in decoding stage direct labor mark.Also some scholars Subject information is further fused to the encoder and decoder stage.In addition, proposing one kind there are also in some new work The method of control field and with realm information come expansion word vector layer.Predetermined marked is not had to by contrast with them The difference of the subject information infused, technical solution of the present invention and these work is to regard theme as hidden variable (hidden variable) To treat and keep the advantage without changing translation model.It is apparent that being easily applied in natural language processing Other neural network model.
In conclusion it is badly in need of a kind of method of new domain-adaptive in machine translation (either NMT or SMT), Solve the serious problems for be unableing to do without mark corpus in the prior art, avoid original several small cities of training corpus cutting point and Cause to increase the complex operations such as model parameter number, improve the performance of neural network machine translation, improve translation efficiency and obtains It obtains adaptive between more accurate field.
Neural network machine interpretation method based on unsupervised domain-adaptive is intended to be instructed in the parallel sentence pairs of no mark Practice to generate and translates effect also preferable translation model outside a field on test set.
The method of the present invention is exactly mixed model in fact, because including the disaggregated model and neural network of Softmax classifier The translation model of machine translation.The mixed model thought that forefathers are done be first training corpus according to field (news, film, Government, trade etc.) resolve into several different component partsThen translation model P (y | x;θc) at each Small-scale corpus < Xc,Yc> on training.The model combination of these each component parts is to an independent world model:Wherein λcHybrid parameter, i.e., the parameter of c-th component part, also just to What the distance between the target phase text of translation and the corpus of each small component part matrix obtained, such as: tf/idf, The methods of LSA, perplexity or EM (call TF-IDF, cosine similarity in the present invention in order to calculate similarity).This A little hybrid parameters are to be come by text similarity to predict, without interfering in the learning process of entire mixed model.Although mixed It is fine to close modeling pattern effect in SMT, but it is less simple to adapt to NMT.Basis in SMTFor, Model Fusion that very simply each has individually been instructed to global mould Type, it appears that how he is mixed into NMT has point fuzziness.Main difficult point is that (different field is different next for different component parts Source) thousands of a highests that the neural network translation model come is chosen at those for efficiency reasons are trained on small-scale corpus There are apparent difference on the vocabulary of frequency, illustrate the translation model of training on each different field data set in each small rule On mould corpus component part before training, in pretreatment stage, thousands of a vocabulary of high frequency are selected, at this time every field model Apparent difference is generated on vocabulary.
If not the Model Fusion of small-scale component part to world model, parameter space will be will increase.Except this, search Rope algorithm must add up the weight of the translation probability of all small-scale model sentence levels:
Target phase maximum probabilityIt will not picture
Equally it is broken down into word rank.However, being difficult the nerve those small-scale component parts in decoding process The prediction fusion of Network-based machine translation model is got up, therefore needs to develop the neural network machine translation an of mixed model really System.
Such as Fig. 1, a kind of overall flow schematic diagram of machine translation method in a specific embodiment of the invention is shown.It is overall On, comprising:
Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as The input of Softmax classifier and translation module is trained;
Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on mesh The decoder for marking end generates target side and corresponding field.
In another of the invention specific embodiment, a kind of machine translation method, before the step 1 further include:
Step 0, training corpus data set is constructed, the training corpus in the data set is pre-processed;It utilizes Softmax sorter model and translation model are trained the training corpus, respectively obtain classification and translation model parameter;
In another specific embodiment of the invention, a kind of machine translation method, between the step 0 and step 1 further include: Based on the pretreated training corpus data set, the encoder stage of translation module is obtained using GRU and Softmax classifies The input of device.
In another of the invention specific embodiment, a kind of machine translation method is classified according to Softmax in the step 2 The field number that device generates, the number for generating translation network decoder further comprises:
S21 divides Softmax sorter model to t domain class;
S22, in the decoder stage of translation module, according to t decoder of defeated generation of classifier modules.
In another of the invention specific embodiment, a kind of machine translation method, before the step 1 further include: using two-way GRU neural network obtains the vector of the last one word of source sentence and first word in bilingual corpora training sample and indicates;Simultaneously The last one word of source sentence and first in bilingual corpora training sample can be obtained using CNN neural network or LSTM neural network The vector of a word indicates.
In another of the invention specific embodiment, a kind of machine translation method constructs training corpus data in the step 0 Collection further comprises:
Collect bilingual sentence pair;Select training set, development set and test set;The bilingual sentence pair is no field markup information Sentence pair.
In another of the invention specific embodiment, a kind of machine translation method, in the data set in the step 0 Training corpus carries out pretreatment:
Sentence in the data set in source language text and target language text is cut into word and unified converted magnitude It writes.
In another specific embodiment of the invention, a kind of machine translation method, the step 2 further comprises:
Wherein, the first item in formula on the right side of equation is the categorization module of entire model, will association's γ prediction by training Be t, the model parameter θ of required in Section 2tPrediction is y;T ∈ { 1 ..., T } indicates the theme of source sentence x Integer, T are the theme numbers pre-defined,It is to predict to distribute the theme of theme probability to x from model.
In another specific embodiment of the invention, a kind of machine translation method, the field that the Softmax classifier generates Number can be configured according to input.
In another specific embodiment of the invention, a kind of machine translation method, the step 2 further comprises:
In the decoder stage of translation module, according to t decoder of defeated generation of classifier modules;
The decoder that the probability in the t field that Softmax classifier modules generate and the decoding stage of translation module generate The original state of number be random entirely.
In another specific embodiment of the invention, a kind of machine translation method specifically comprises the following steps:
A, prepare on a large scale without mark, without the parallel sentence pairs training corpus of subject information;
B, reference term vector model indicates to obtain the term vector for each word that source sentence is included;
C, the encoder stage of translation module and the input of Softmax classifier are obtained using GRU;
D, obtained input is handled with two-way GRU, also just the vector of sentence is indicated to pass through the last one word Term vector indicate (forward direction) and first word term vector expression (reversed) come what is obtained, be then delivered to classifier modules with Translation module;
E, the field number generated according to Softmax classifier generates the number of translation network decoder and is solved Code generates object language and corresponding field.
The step A further comprises: constructing data set and is pre-processed, and using production model in training set Training corpus be trained, obtain model parameter;
The building data set includes collecting bilingual sentence pair, selects training set, development set and test set;
It is described carry out pretreatment include that word is cut into the sentence in data concentrated source language text and target language text (being consistent the Chinese word segmentation Open-Source Tools for calling the exploitation of Stanford University's natural language processing group when participle), Yi Jitong One converted magnitude, which is write, (with the MOSES truecaser.perl carried or or oneself can write the conversion of capital and small letter Script) and tokennize (English corpus is needed without exception to carry out tokenize, in the same old way for being consistent property, Ke Yiyong MOSES included tokenizer.perl).
Specifically, the model parameter includes the translation probability between source language and the target language;
Parallel sentence pairs in the step A refer to source sentence: x=x1,…,xi,…,xIWith target side y=y1,…, yj,…,yJ
Further, the step B is realized particular by the pre-treatment step of RNN language model:
Because RNN language model is made of look-up layers, three parts such as hidden layer and output layer.The sentence of input Each word for being included is converted into corresponding term vector by look-up layers and indicated:
xt=look-up (s),
Wherein, xtIt is the term vector expression of s, s is that the input of each time period t and look-up indicate look-up Layer.
Further, the step C is realized particular by execution following steps:
C1, it is made of because of whole network two parts, categorization module and translation module: translation module refers to nerve net Network machine translation module, categorization module are only to have invoked Softmax classifier herein as an independent classifier.
C2, for pass through step A parallel sentence pairs source x=x obtained1,…,xi,…,xIAnd target phase y= y1,…,yj,…,yJ, usually the translation probability of sentence level, factorization becomes the general of word rank for neural network machine translation Rate:
Wherein, θ is a series of model parameter, y<jIt is partial translation.Expression formula:When as training set, the training objective of standard is the log- for maximizing training corpus Likelihood:
The decision rule of translation is that (both not trained) the source sentence x not encountered is given by public affairs The model parameter that formula is acquired
That is, by the best probability of target phaseIt calculates, these probability factors point Solution is the translation of word rank:
Preferably, further comprising the steps of after the step C:
D, it after the input for obtaining Softmax classifier and translation module by step C, needs to be further processed, also just exist The encoder stage (coding) of this two module what is desired is that sentence vector representation, obtained using two-way GRU whole The expression of a source sentence.Because GRU is also a unit of RNN network, and explanation, RNN language have been given in step Model is made of look-up layers, hidden layer and output layer.Asked in stepb by RNN the word of each word to Amount indicates, then result is sent to the input of encoder, that is, gives encoder stage hidden layer institute ready message.Hidden layer calculating is worked as It is both the several hidden status informations of term vector and front of each word to obtain using loop-up layers of output when preceding hidden state It takes, term vector is just also mapped to context vector: ht=f (xt,ht-1) wherein to be an abstract function providing input to f xtWith historic state ht-1Under the premise of, calculate current new hidden state.Original state h0It is usually set as 0, the choosing of common f function Select as being following provide, σ be nonlinear function (such as: softmax or tanh) ht=σ (Wxhxt+Wxhht-1).Herein Softmax refers to that translation module encoder section calculates the softmax called when hidden state, this with proposed in the present invention it is whole The softmax called in a model classifiers module is different, and categorization module is independent domain classification device in fact, translates instead The hidden state of calculating at modular coding device end and use, act on different, and find out current new hidden state when may not Call softmax that can also use other nonlinear activation primitive tanh etc. in fact.
Therefore, forward direction (forward) state of two-way BiRNN is calculated according to formula:
Wherein It is term vector matrix, It is weight matrix.N × m is respectively term vector dimension and hidden state number.σ ⊙ is Logistic Sigmoid function.Reversed stateIt calculates also as forward direction.Term vector square is shared between positive and repercussion Battle arrayBut it is different from weight matrix.It is obtained after forward and reverse is mergedThe expression of these symbols, wherein
It further illustrating, single GRU is made of update door and adjustment door in two-way GRU, as follows:
ut=σ (Wuxt+Uuht-1+bu),
rt=σ (Wrxt+Urht-1+br),
Wherein utIt is to update door, rtIt is adjustment door,It is candidate activation, the operation of ⊙ element mode multiplication, htIt is upper one hidden State ht-1It is activated with candidateBetween linear insertion.Intuitively, update agte selects hidden state whether by new shape State updates out, and adjustment door determines situations such as whether a upper hidden state is ignored.
Further, further comprising the steps of after the step D:
E, Softmax classifier modules are primarily referred to as and sort out the probability for the every field come and the target phase of translation module The generation of sentence.The information in the field that intuitively categorization module generates is that the different probability in t field are added to translation Subject information is also just added to translation probability by module by way of hidden variable:
The decoder network of translation module originally also corresponding hidden state, but this hidden state is with encoder net Network is not quite alike, and detailed calculating process is as follows:
Wherein:
zi=σ (WzEyi+Uzsi-1+Czci),
ri=σ (WrEyi+Ursi-1+Crci),
E is the term vector matrix for each word that target language sentence is included, W, Wz, U,Uz,Ur And C, Cz,These are weights.M, n are respectively term vector dimension and hidden state number.σ ⊙ is to patrol Collect this meaning sigmoid function.Initial hidden state s0It is to calculate in the following way:
Wherein
Context vector (context vector) is by heavy come each time step (time step) to its model Newly calculated:
Wherein
hjIt is j-th of symbol (hidden state) in source sentence,AndIt is weight vectors entirely.
In another specific embodiment of the invention, a kind of machine translation method, Fig. 2 is according to a kind of machine of the embodiment of the present invention The domain classification structural schematic diagram of Softmax sorter model in device interpretation method.
Unlike the mixed model that the hybrid parameter being previously mentioned in SMT is obtained by text similarity, in this hair It is bright it is middle theme submodel when hybrid weight calls, be with translation submodel together optimised hybrid weight.The present invention is mixed Molding type expands the NMT of that standard by the way that hidden variable is added:
Wherein, t ∈ { 1 ..., T } is the integer for indicating the theme of source sentence x, and T is the theme number pre-defined,It is to predict the theme that theme probability is distributed to x from model, also with regard to the module in Fig. 2.
Be t theme translation submodule be respectively in Fig. 3 and 4 the right neural network machine translation module.
In order to solve the factorization problem of word rank above-mentioned, approximate mixed model and assume those word ranks It translates mutually independent:
The third line of above formula shows that mixed model can permit the training in word grade, also just provides searching algorithm as follows In effect.Although this approximation violates the independence in NMT, had been significantly improved in actual application.
In order to sentence classification and be previously mentioned in Fig. 2 theme submodel P (t | x;It may γ) utilize more networks Framework, for example, CNN and Recursive Auto Encoder (recursive autocoder).It is utilized in the present invention based on logical Encoder is crossed to indicate the simple Softmax classifier of study.Provide one include I vocabulary source sentence x, use with GRU calculates positive state as the bi-RNN of unitWith reversed stateThen the positive shape of the last one word State (state of the last one word is both calculated according to positive RNN) and the reverse state of first word (are counted with reversed RNN Calculate the state of first word) it is combinedHe is sent to the input of Softmax classifier, and (work of forefathers is Sentence is calculated in encoder stage and has finally added sentence finishing sign).This strategy has following several advantages:
GRU is obtained as unit by RNN the function of long-distance dependence.I.e.It is concluded that source sentence is anti- To state, andIt is concluded that the forward condition of source sentence.
Softmax layers of inputIt is that dimension is fixed, and the length of sentence is inputted independently of source Degree.
Theme submodel described in Fig. 2 and those translation submodels (Fig. 3, the translation submodule on the right in 4) are mutually altogether One and same coding device is enjoyed, designs the parameter space very outstanding for reducing mixed model in this way.
Translation submodel P (y | t, x;In γ), the encoder based on attention mechanism of the standard of forefathers has been referred to Decoder model.It shares simultaneously in order to reduce parameter space all neural networks translation submodules with theme submodel same. In other words, mixed model has an encoder and T decoder.
In another specific embodiment of the invention, a kind of machine translation method, in conjunction with Fig. 3, the process of step A-D has been given It explains out as described above, being therefore not repeated, the implementation of direct introduction step E.The classification mould that the embodiment of the present invention proposes Hybrid decoding mode 1 (SUM's) of block and translation module comprises the concrete steps that:
The training set being previously mentioned in step ATraining objective find out some maximization training corpus The model parameter of log-likelihood:
Wherein
Mini-batch (small a batch is parallel to train sentence pair) stochastic gradient descent algorithm of standard is for estimating theme and turning over Translate the parameter of device submodel.
Fig. 3, in 4, provide to learn the model parameter come outWithThe source sentence x's not trained Translating decision rule is to be calculated in the following way,
Show to use maximum probabilityIt can be factorized as word rank to calculate translation, It is similar to the work of forefathers' standard:
The decoding process that the present invention is previously mentioned also proposes a new decoding process other than such as Fig. 3, following real It applies shown in example.
In another specific embodiment of the invention, a kind of machine translation method, for a upper example and Fig. 3, this It invents while proposing second of decoding process, also with regard to another implementation of step E.That is the decoding side proposed in step E Method is as being previously mentioned in an embodiment as above, and mainly Softmax classifier sorts out the output result t neck come Each probability in domain is multiplied with T decoder outcomes of translation submodule decoder section and then all factors are added to obtain most Translation result afterwards.It is somewhat different with them to be, do not have to now and the T of the probability in each t field and translation module Decoder outcomes are multiplied, and the probability in t field wherein just needs maximum probability and maximum probability subscript t insteadjInstitute is right The T answeredjResult be multiplied, that is to say, that the operation of SUM and MAX are replaced, carried out according to following two small steps At:
The formula provided in embodiment two:
Regard as SUM decoding process and
It regards as MAX decoding process.
Finally, the present processes are only preferable embodiment, it is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in protection of the invention Within the scope of.

Claims (10)

1. a kind of machine translation method characterized by comprising
Step 1, the vector table of the last one word of source sentence and first word in bilingual corpora training sample is shown as Softmax The input of classifier and translation module is trained;
Step 2, the field number generated according to Softmax classifier generates the number of translation network decoder, is based on target side Decoder generate target side and corresponding field.
2. the method as described in claim 1, which is characterized in that before the step 1 further include:
Step 0, training corpus data set is constructed, the training corpus in the data set is pre-processed;Utilize Softmax points Class device model and translation model are trained the training corpus, respectively obtain classification and translation model parameter.
3. method according to claim 2, which is characterized in that between the step 0 and step 1 further include: based on described pre- Treated training corpus data set, the encoder stage of translation module and the input of Softmax classifier are obtained using GRU.
4. method according to claim 2, which is characterized in that the field generated in the step 2 according to Softmax classifier Number, the number for generating translation network decoder further comprises:
S21 divides Softmax sorter model to t domain class;
S22 generates t decoder according to the input of classifier modules in the decoder stage of translation module.
5. the method as described in claim 1, which is characterized in that before the step 1 further include: utilize two-way GRU neural network Obtaining the vector of the last one word of source sentence and first word in bilingual corpora training sample indicates;CNN can be also utilized simultaneously Neural network or LSTM neural network obtain the vector of source sentence the last one word and first word in bilingual corpora training sample It indicates.
6. method according to claim 2, which is characterized in that construct training corpus data set in the step 0 and further wrap It includes:
Collect bilingual sentence pair;Select training set, development set and test set;The bilingual sentence pair for no field markup information sentence It is right.
7. method as claimed in claim 6, which is characterized in that in the step 0 to the training corpus in the data set into Row pretreatment further comprises:
Sentence in the data set in source language text and target language text is cut into word and be converted to simultaneously capitalization or Small letter.
8. method according to claim 2, which is characterized in that the step 2 further comprises:
Wherein, x=x1,…,xi,…,xIFor source sentence;Y=y1,…,yj,…,yJFor target side sentence;Equation is right in formula The first item of side is the categorization module of entire model, by training by learn γ prediction be t, the mould of required in Section 2 Shape parameter θtPrediction is y;T ∈ { 1 ..., T } indicates the integer of the theme number of source sentence x, and T is the master pre-defined Inscribe number.
9. the method as described in claim 1, which is characterized in that the field number that the Softmax classifier generates being capable of root It is configured according to input.
10. the method as described in claim 1, which is characterized in that the step 2 further comprises:
For the decoder that the probability in the t field that Softmax classifier modules generate and the decoding stage of translation module generate Several original states is random entirely.
CN201710139214.0A 2017-03-09 2017-03-09 A kind of neural network machine interpretation method based on unsupervised domain-adaptive Active CN107038159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710139214.0A CN107038159B (en) 2017-03-09 2017-03-09 A kind of neural network machine interpretation method based on unsupervised domain-adaptive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710139214.0A CN107038159B (en) 2017-03-09 2017-03-09 A kind of neural network machine interpretation method based on unsupervised domain-adaptive

Publications (2)

Publication Number Publication Date
CN107038159A CN107038159A (en) 2017-08-11
CN107038159B true CN107038159B (en) 2019-07-12

Family

ID=59534308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710139214.0A Active CN107038159B (en) 2017-03-09 2017-03-09 A kind of neural network machine interpretation method based on unsupervised domain-adaptive

Country Status (1)

Country Link
CN (1) CN107038159B (en)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107632981B (en) * 2017-09-06 2020-11-03 沈阳雅译网络技术有限公司 Neural machine translation method introducing source language chunk information coding
CN107729326B (en) * 2017-09-25 2020-12-25 沈阳航空航天大学 Multi-BiRNN coding-based neural machine translation method
CN107832845A (en) 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN107729329B (en) * 2017-11-08 2021-03-26 苏州大学 Neural machine translation method and device based on word vector connection technology
CN107886940B (en) * 2017-11-10 2021-10-08 科大讯飞股份有限公司 Voice translation processing method and device
RU2692049C1 (en) 2017-12-29 2019-06-19 Общество С Ограниченной Ответственностью "Яндекс" Method and system for translating source sentence in first language by target sentence in second language
CN111401084B (en) * 2018-02-08 2022-12-23 腾讯科技(深圳)有限公司 Method and device for machine translation and computer readable storage medium
CN108460028B (en) * 2018-04-12 2021-08-03 苏州大学 Domain adaptation method for integrating sentence weight into neural machine translation
EP3732633A1 (en) * 2018-05-18 2020-11-04 Google LLC Universal transformers
CN108763504B (en) * 2018-05-30 2020-07-24 浙江大学 Dialog reply generation method and system based on reinforced double-channel sequence learning
CN110633801B (en) * 2018-05-30 2024-09-24 北京三星通信技术研究有限公司 Optimization processing method and device for deep learning model and storage medium
CN109117483B (en) * 2018-07-27 2020-05-19 清华大学 Training method and device of neural network machine translation model
US11373049B2 (en) * 2018-08-30 2022-06-28 Google Llc Cross-lingual classification using multilingual neural machine translation
US12094456B2 (en) 2018-09-13 2024-09-17 Shanghai Cambricon Information Technology Co., Ltd. Information processing method and system
CN109190131B (en) * 2018-09-18 2023-04-14 北京工业大学 Neural machine translation-based English word and case joint prediction method thereof
CN109697292B (en) * 2018-12-17 2023-04-21 北京百度网讯科技有限公司 Machine translation method, device, electronic equipment and medium
CN109697232B (en) * 2018-12-28 2020-12-11 四川新网银行股份有限公司 Chinese text emotion analysis method based on deep learning
CN109726404B (en) * 2018-12-29 2023-11-10 安徽省泰岳祥升软件有限公司 Training data enhancement method, device and medium of end-to-end model
CN109933808B (en) * 2019-01-31 2022-11-22 沈阳雅译网络技术有限公司 Neural machine translation method based on dynamic configuration decoding
CN111783435B (en) * 2019-03-18 2024-06-25 株式会社理光 Shared vocabulary selection method, device and storage medium
CN110309516B (en) * 2019-05-30 2020-11-24 清华大学 Training method and device of machine translation model and electronic equipment
CN110472727B (en) * 2019-07-25 2021-05-11 昆明理工大学 Neural machine translation method based on re-reading and feedback mechanism
CN110457710B (en) * 2019-08-19 2022-08-02 电子科技大学 Method and method for establishing machine reading understanding network model based on dynamic routing mechanism, storage medium and terminal
CN110674648B (en) * 2019-09-29 2021-04-27 厦门大学 Neural network machine translation model based on iterative bidirectional migration
CN111178085B (en) * 2019-12-12 2020-11-24 科大讯飞(苏州)科技有限公司 Text translator training method, and professional field text semantic parsing method and device
CN112052692B (en) * 2020-08-12 2021-08-31 内蒙古工业大学 Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning
CN111931854B (en) * 2020-08-12 2021-03-23 北京建筑大学 Method for improving portability of image recognition model
CN112163372B (en) * 2020-09-21 2022-05-13 上海玫克生储能科技有限公司 SOC estimation method of power battery
CN112966530B (en) * 2021-04-08 2022-07-22 中译语通科技股份有限公司 Self-adaptive method, system, medium and computer equipment in machine translation field

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126505A (en) * 2016-06-20 2016-11-16 清华大学 Parallel phrase learning method and device
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126505A (en) * 2016-06-20 2016-11-16 清华大学 Parallel phrase learning method and device
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora

Also Published As

Publication number Publication date
CN107038159A (en) 2017-08-11

Similar Documents

Publication Publication Date Title
CN107038159B (en) A kind of neural network machine interpretation method based on unsupervised domain-adaptive
CN108734276B (en) Simulated learning dialogue generation method based on confrontation generation network
CN112699247A (en) Knowledge representation learning framework based on multi-class cross entropy contrast completion coding
CN106095872A (en) Answer sort method and device for Intelligent Answer System
CN106547735A (en) The structure and using method of the dynamic word or word vector based on the context-aware of deep learning
CN108830287A (en) The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method
CN107145483A (en) A kind of adaptive Chinese word cutting method based on embedded expression
CN107836000A (en) For Language Modeling and the improved artificial neural network of prediction
CN107145484A (en) A kind of Chinese word cutting method based on hidden many granularity local features
CN109753567A (en) A kind of file classification method of combination title and text attention mechanism
CN107451278A (en) Chinese Text Categorization based on more hidden layer extreme learning machines
CN109829049A (en) The method for solving video question-answering task using the progressive space-time attention network of knowledge base
CN106980650A (en) A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications
CN110162789A (en) A kind of vocabulary sign method and device based on the Chinese phonetic alphabet
CN115510814B (en) Chapter-level complex problem generation method based on dual planning
CN108829823A (en) A kind of file classification method
CN115422369B (en) Knowledge graph completion method and device based on improved TextRank
CN105975497A (en) Automatic microblog topic recommendation method and device
CN107967253A (en) A kind of low-resource field segmenter training method and segmenting method based on transfer learning
Zhao et al. Synchronously improving multi-user English translation ability by using AI
CN115423118A (en) Method, system and device for fine tuning of pre-training language model
CN112905762A (en) Visual question-answering method based on equal attention-deficit-diagram network
Ku et al. Adding learning to cellular genetic algorithms for training recurrent neural networks
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization
CN111967265B (en) Chinese word segmentation and entity recognition combined learning method for automatic generation of data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant