CN113850089A

CN113850089A - Mongolian Chinese neural machine translation method based on fusion statistical machine translation model

Info

Publication number: CN113850089A
Application number: CN202111112986.8A
Authority: CN
Inventors: 仁庆道尔吉; 庞蕊; 张倩; 文丽霞; 刘永超; 张毕力格图; 李雷孝; 萨和雅
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2021-12-28

Abstract

The invention discloses a Mongolian Chinese neural machine translation method based on a fusion statistical machine translation model, which specifically comprises the following steps: s1, estimating the prediction probability of the words on the regular word list after the NMT classifier inherits the standard attention-based NMT; s2, calculating the probability of the SMT suggestion generated by the auxiliary SMT model by the SMT classifier; s3, integrating the SMT proposal into the NMT; the invention relates to the technical field of neural machine translation. The Mongolian neural machine translation method based on the fusion statistical machine translation model is characterized in that the statistical machine translation model is combined into a neural machine translation framework to achieve better translation by utilizing the advantages of the statistical machine translation model and the neural machine translation model, an SMT classifier and a gating function are jointly trained in an end-to-end mode in an NMT structure, and in addition, in order to better relieve the UNK problem in a test stage, an appropriate SMT suggestion is selected to replace a target UNK word by jointly considering the attention probability of the NMT model and the coverage rate information of the SMT model.

Description

Mongolian Chinese neural machine translation method based on fusion statistical machine translation model

Technical Field

The invention relates to the technical field of neural machine translation, in particular to a Mongolian Chinese neural machine translation method based on a fusion statistical machine translation model.

Background

The machine translation technology is an important application field of calculating semanteme and natural language information processing, is a comprehensive research subject across multiple subjects such as artificial intelligence, linguistics, computational linguistics and the like, belongs to the international advanced research field, and is a process for converting a natural language into another natural language with completely same semantics by using a computer, wherein Mongolian Chinese machine translation is a key technology for automatically translating Mongolian source language texts into Chinese target language texts by using a computer program, and the machine translation is a key technology for breaking through the language barrier problem in information transfer between different countries and nations and has important significance for promoting national association, strengthening cultural communication and promoting external trade.

The neural machine translation is a new machine translation method which develops rapidly in recent years, for neural network machine translation, a commonly used basic network model is a recurrent neural network machine translation model, the current recurrent neural network translation model based on an attention mechanism has a good translation effect, especially when long sentences are translated, the initial neural network machine translation does not consider the translation alignment information of a source end and a target end during translation, and the defect is improved in a translation model based on an attention mechanism, instead of using the same vector to generate all words in the target language sentence, the model takes into account the alignment weights between the words in the target language sentence and the words in the source language sentence, attention-based neural networks add attention weights between the encoder and decoder, for calculating alignment weights between words in the target language sentence and words in the source language sentence.

NMT often produces fluent but inadequate translation, first, NMT lacks a mechanism to record whether the source language word is translated, resulting in "over-translation" or "under-translation" problems; secondly, the problem of inaccurate translation is that NMT is easy to generate natural-looking words which do not reflect the original semantic of the source sentence; finally, the unregistered word problem, NMT uses a fixed medium-scale vocabulary to represent the most common words and replaces others with a replacement word.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a Mongolian Chinese neural machine translation method based on a fusion statistical machine translation model, and solves the problems that NMT lacks a mechanism for recording whether a source language word is translated or not, translation is not accurate, and words are not registered.

In order to achieve the purpose, the invention is realized by the following technical scheme: a Mongolian Chinese neural machine translation method based on a fusion statistical machine translation model specifically comprises the following steps:

s1, estimating the prediction probability of the words on the regular word list after the NMT classifier inherits the standard attention-based NMT;

s2, calculating the probability of the SMT suggestion generated by the auxiliary SMT model by the SMT classifier;

s3, integrating the SMT recommendation into the NMT.

Preferably, the standard attention-based NMT translation process in S1 is to give a source sentence

The NMT encodes it into a vector sequence, which is then used to generate the target sentence

Preferably, the attention-based NMT encodes the source sentence using a bi-directional RNN consisting of a forward RNN and a reverse RNN, wherein the forward RNN sequentially reads the source sentence x to generate a forward hidden state sequence

The reverse RNN reads the source sentence x reversely and generates a reverse hidden state sequence

The hidden state pairs of each position are connected in series to form a word annotation of the position, so that the whole source sentence is obtained

Is noted, wherein

Preferably, the target sequence y is output at the time step t of decoding_<t＝y₁,y₂,…,y_t-1After that, the next word y_tIs generated by the conditional probability shown in the following formula: p (y)_t|y_<t,x)＝softmax(f(s_t,y_t-1,c_t) Where f (-) is a non-linear activation function, s_tIs the hidden state of the decoder at time step t: s_t＝g(s_t-1,y_t-1,c_t) Where g (-) is a non-linear activation function, here using gated cyclic units as activation functions for the encoder and decoder; c. C_tIs a context vector calculated from the weighted sum of the source sentence annotations:

wherein h is_jAs the source word x_jNote of (1), weight α thereof_t,jCalculated from the attention model.

Preferably, the statistical machine translation model in S2 is defined by a log-linear framework:

wherein h is_m(y, x) is a characteristic function, λ_mIs its weight. During translation, the SMT decoder expands the partial translation y by selecting the appropriate target word or phrase translation for the untranslated sub-portion of the source sentence from the bilingual phrase table_<t＝y₁,y₂,…,y_t-1(referred to as translation hypotheses in SMT).

Preferably, the implementation manner in S2 is: given NMT generated word y_<t＝y₁,y₂,…,y_t-1SMT generates a suggestion for the next word and calculates a suggestion score using the following equation:

wherein y is_tIs a proposal of SMT, x_tIs the corresponding source span, h_m(y_t,x_t) Is a characteristic function, λ_mIs its weight, the SMT model can generate appropriate word recommendations (local translations) by expanding the generated words.

Preferably, two strategies are adopted in S3 to filter the low-quality recommendations to ensure the quality of the SMT recommendations: firstly, only the top N is reserved according to the translation scores_tmThe translation score is used as the weight sum of the translation probability to calculate; ② select the top N with the highest SMT score_recEach is calculated as a weighted sum of the SMT characteristics.

Preferably, a gate mechanism is introduced in S3 to update the word prediction probability of the proposed model, and the calculation method is as follows: alpha is alpha_t＝g_gate(f_gate(s_t,y_t-1,c_t) Wherein f) is_gate(. is a non-linear function, g_gate(. cndot.) is a sigmoid function.

Preferably, the testing phase replaces the unk word directly with SMT recommendations. For each unk word, the suggestion with the highest SMT score is selected as the final alternative, and to solve the unk problem, the method can utilize rich context information at the source end and the target end, and generate more reliable suggestions by using rearrangement information and SMT coverage vectors.

Preferably, the source language is Mongolian.

Advantageous effects

The invention provides a Mongolian Chinese neural machine translation method based on a fusion statistical machine translation model. Compared with the prior art, the method has the following beneficial effects: the Mongolian Chinese neural machine translation method based on the fusion statistical machine translation model specifically comprises the following steps: s1, estimating the prediction probability of the words on the regular word list after the NMT classifier inherits the standard attention-based NMT; s2, calculating the probability of the SMT suggestion generated by the auxiliary SMT model by the SMT classifier; s3, integrating the SMT proposal into the NMT; by incorporating a statistical machine translation model into a neural machine translation framework to take advantage of the advantages of both statistical and neural machine translation models to achieve better translation, incorporating the statistical machine translation model into a training phase of the neural machine translation, such that the NMT effectively learns merged SMT recommendations, wherein the SMT model is first trained independently on a bilingual corpus using conventional phrase-based SMT methods, in each decoding step of the training and testing phase, the SMT model provides translation recommendations based on NMT's decoding information (including generated local translation and attention history), scores the SMT recommendations using an auxiliary classifier, and linearly combines two probabilities between NMT generated and SMT recommendations using a gating function that reads current decoding information, thereby enabling the NMT and SMT probabilities to be dynamically assigned weights in different decoding steps, the SMT classifier and gating function are jointly trained in an end-to-end fashion in the NMT structure, and furthermore, to better alleviate the UNK problem in the testing phase, a suitable SMT suggestion is selected to replace the target UNK word by jointly considering the attention probability of the NMT model and the coverage information of the SMT model.

Drawings

Fig. 1 is a diagram of a decoder incorporating an SMT proposed NMT model.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: a Mongolian neural machine translation method based on a fusion statistical machine translation model is used for combining an SMT model into a neural machine translation NMT framework so as to realize better translation by utilizing the advantages of the statistical machine translation model and the neural machine translation model, and specifically comprises the following steps:

s3, integrating the SMT recommendation into the NMT.

In the present invention, the standard attention-based NMT translation process in S1 is to give the source sentence

NMT encodes it into a vector sequence, after encoding, the source language sentence is compressed into a vector, a simplest way is to add all vectors, but since each word is treated as the same weight, the processing is not reasonable, so neural machine translation based on attention mechanism is used, the vector representation of the sentence grasps all information of the source language sentence, after encoding, the decoder starts to generate the target sentence one word from left to right, when generating a certain word, the history is considered, after the first word is generated, the second word is generated until the sentence End symbol EOS (End of set) is generatedReference), the sentence is generated, attention is paid to measure the contribution of the source language word corresponding to the target word when it is generated, the NMT encodes it into a vector sequence, and then the vector sequence is used to generate the target sentence

In the present invention, attention-based NMT encodes source sentences using a bi-directional RNN consisting of a forward RNN and a reverse RNN, wherein the forward RNN reads the source sentences x sequentially, generating a forward hidden state sequence

Is noted, wherein

In the invention, at the time step t of decoding, the target sequence y is output_<t＝y₁,y₂,…,y_t-1After that, the next word y_tIs generated by the conditional probability shown in the following formula: p (y)_t|y_<t,x)＝softmax(f(s_t,y_t-1,c_t) Where f (-) is a non-linear activation function, s_tIs the hidden state of the decoder at time step t: s_t＝g(s_t-1,y_t-1,c_t) Where g (-) is a non-linear activation function, here using gated cyclic units as activation functions for the encoder and decoder; c. C_tIs a context vector calculated from the weighted sum of the source sentence annotations:

In the present invention, the statistical machine translation model in S2 is defined by a log-linear framework:

The logarithmic linear model mainly studies independence and correlation among a plurality of classification variables, generally does not divide dependent variables and independent variables, only analyzes the influence of each classification variable on the frequency in the cross cells, and generally the frequency obeys polynomial distribution.

In the present invention, the implementation manner in S2 is: given NMT generated word y_<t＝y₁,y₂,…,y_t-1SMT generates a suggestion for the next word and calculates a suggestion score using the following equation:

wherein y is_tIs a proposal of SMT, x_tIs the corresponding source span, h_m(y_t,x_t) Is a characteristic function, λ_mIs its weight, the SMT model can generate appropriate word recommendations (partial translations) by expanding the generated words, with a source span of y_tThe number of words of the corresponding source language sentence.

In the present invention,two strategies are employed in S3 to filter low quality recommendations to ensure the quality of SMT recommendations: firstly, only the top N is reserved according to the translation scores_tmThe translation score is used as the weight sum of the translation probability to calculate; ② select the top N with the highest SMT score_recEach is calculated as a weighted sum of the SMT characteristics.

In the invention, a door mechanism is introduced into S3 to update the word prediction probability of the proposed model, and the calculation mode is as follows: alpha is alpha_t＝g_gate(f_gate(s_t,y_t-1,c_t) Wherein f) is_gate(. is a non-linear function, g_gate(. cndot.) is a sigmoid function, also called Logistic function, used for hidden layer neuron output, with a range of values of (0, 1), which can map a real number to an interval of (0, 1), and can be used for two classifications.

In the present invention, the testing phase directly replaces the unk word with SMT recommendations. For each unk word, the suggestion with the highest SMT score is selected as the final alternative, and to solve the unk problem, the method can utilize rich context information at the source end and the target end, and generate more reliable suggestions by using rearrangement information and SMT coverage vectors, and minimize a set of training data

Training the proposed model with the negative log-likelihood:

the cost function of the model, in addition to introducing additional parameters for the SMT classifier and gates, is optimized by minimizing the cost function, in particular, the model is trained using a simple pre-training strategy, which is used to first train a conventional attention-based NMT model, then initialize the parameters of the codec using the model, and randomly initialize the parameters of the SMT classifier and gates, and finally train all the parameters of the model to minimize the cost function, for two reasons: due to calculation of SMT recommendation features and SMT recommendationsThe ranking is time consuming, so the training of the proposed model may take longer, taking the model before training as an anchor point, i.e. the peak of the prior distribution in the model space, thus shortening the training time,. ② the quality of the auto-learning attention weight is the key to calculate the SMT reordering cost, the low quality attention weight obtained without training may produce unreliable SMT recommendations, thus negatively affecting the training of the proposed model.

In the present invention, the source language is Mongolian.

And those not described in detail in this specification are well within the skill of those in the art.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A Mongolian Chinese neural machine translation method based on a fusion statistical machine translation model is characterized in that: the method specifically comprises the following steps:

s3, integrating the SMT recommendation into the NMT.

2. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 1, wherein: the standard attention-based NMT translation process in S1 is to give a source sentence

3. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 2, wherein: the attention-based NMT encodes source sentences using a bi-directional RNN consisting of a forward RNN and a reverse RNN, wherein the forward RNN reads the source sentences x sequentially, generating a forward hidden state sequence

Is noted, wherein

4. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 2, wherein: the target sequence y is output at the time step t of decoding_＜t＝y₁，y₂，...，y_t-1After that, the next word y_tIs generated by the conditional probability shown in the following formula: p (y)_t|y_＜t，x)＝softmax(f(s_t，y_t-1，c_t) Where f (-) is a non-linear activation function, s_tIs the hidden state of the decoder at time step t: s_t＝g(s_t-1，y_t-1，c_t) Where g (-) is a non-linear activation function, here using gated cyclic units as activation functions for the encoder and decoder; c. C_tIs a context vector calculated from the weighted sum of the source sentence annotations:

wherein h is_jAs the source word x_jNote of (1), weight α thereof_t，jCalculated from the attention model.

5. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 1, wherein: the statistical machine translation model in the S2 is defined by a log-linear framework:

wherein h is_m(y, x) is a characteristic function, λ_mIs its weight. During translation, the SMT decoder expands the partial translation y by selecting the appropriate target word or phrase translation for the untranslated sub-portion of the source sentence from the bilingual phrase table_＜t＝y₁，y₂，...，y_t-1(referred to as translation hypotheses in SMT).

6. The method of claim 1 or 5A Mongolian Chinese neural machine translation method based on a fusion statistical machine translation model is characterized in that: the implementation manner in S2 is: given NMT generated word y_＜t＝y₁，y₂，...，y_t-1SMT generates a suggestion for the next word and calculates a suggestion score using the following equation:

wherein y is_tIs a proposal of SMT, x_tIs the corresponding source span, h_m(y_t，x_t) Is a characteristic function, λ_mIs its weight, the SMT model can generate appropriate word recommendations (local translations) by expanding the generated words.

7. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 1, wherein: two strategies are used in S3 to filter low quality recommendations to ensure the quality of SMT recommendations: firstly, only the top N is reserved according to the translation scores_tmThe translation score is used as the weight sum of the translation probability to calculate; ② select the top N with the highest SMT score_recEach is calculated as a weighted sum of the SMT characteristics.

8. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 1, wherein: a gate mechanism is introduced in the S3 to update the word prediction probability of the proposed model, and the calculation method is as follows: alpha is alpha_t＝g_gate(f_gate(s_t，y_t-1，c_t) Wherein f) is_gate(. is a non-linear function, g_gate(. cndot.) is a sigmoid function.

9. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 1, wherein: the testing phase directly replaced the unk word with SMT recommendations. For each unk word, the suggestion with the highest SMT score is selected as the final alternative, and to solve the unk problem, the method can utilize rich context information at the source end and the target end, and generate more reliable suggestions by using rearrangement information and SMT coverage vectors.

10. The Mongolian neural machine translation method based on the fusion statistical machine translation model as claimed in claim 1, wherein: the source language is Mongolian.