CN107943795B

CN107943795B - Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment

Info

Publication number: CN107943795B
Application number: CN201711123864.2A
Authority: CN
Inventors: 张家俊; 赵阳; 宗成庆
Original assignee: Institute of Automation of Chinese Academy of Science; Boeing China Co Ltd
Current assignee: Institute of Automation of Chinese Academy of Science; Boeing China Co Ltd
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2020-05-19
Anticipated expiration: 2037-11-14
Also published as: CN107943795A

Abstract

The invention relates to the field of machine translation, in particular to a method, a translation method, a system and equipment for improving the translation accuracy of a neural machine, and aims to solve the problems of missing and repeated translation of the neural machine translation system. The method for improving the translation accuracy of the neural machine, which is provided by the invention, introduces a common preprocessing method, namely pre-sequence adjustment, in the statistical machine translation into the neural machine translation, thereby realizing unexpected technical effects, namely greatly relieving the problems of missing and repeated translation. In addition, a position vector is added in the attention layer of the neural machine translation to enhance the monotonous translation, and a coverage vector is added, so that the problems of missing and repeated translation are further relieved. Compared with the existing neural machine translation method, the method has the advantages that the translation quality is improved, and the missing and repeated turns are reduced.

Description

Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment

Technical Field

The invention relates to the field of machine translation, in particular to a method, a translation system and equipment for improving the translation accuracy of a neural machine.

Background

Machine translation is the conversion between different languages implemented by a computer. The translated language is often referred to as the source language and the translated result language as the target language. Machine translation is the process of achieving a conversion from a source language to a target language.

The neural machine translation is the latest machine translation method appearing in recent years, and has a remarkable improvement on the translation quality compared with the original statistical machine translation method. Compared with the prior statistical machine translation method, the neural machine translation method has the advantages of less required engineering design and better translation effect. When it was first proposed, a comparable accuracy to statistical methods was achieved on a common reference data set of medium size. Since then, researchers have proposed many techniques to improve neural machine translation, and now have greatly surpassed statistical methods in translation quality, many industry companies including Google translation and centesimal translation have recently updated their own translation systems from statistical-based methods to neural-network-based methods, and have gained wide acceptance.

However, machine translation has not been completely solved. While neural-machine translation works well, it still makes significant errors that some human translators do not, with the most significant errors being missed and redone. The missing translation means that when the machine translates the source language, some words in the source language need to be translated and are mistakenly missed by the machine; by re-translated, it is meant that some words in the source language are incorrectly translated multiple times.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a method, a translation method, a system and a device for improving the translation accuracy of a neural machine, which significantly reduce the probability of missing and re-flipping.

In one aspect of the invention, a method for improving the translation accuracy of a neural machine is provided, and source languages are pre-sequenced before translation; the method specifically comprises the following steps:

training a pre-tuning model by using bilingual training data;

using the pre-sequencing model to sequence the original source language to enable the original source language to be close to the word sequence of the target language;

and (4) replacing the original source language with the sequenced source language, and training the neural machine translation model.

Preferably, after "using the pre-tuning model to tune the original source language to be close to the word sequence of the target language", before "using the tuned source language to replace the original source language to train the neural machine translation model", the method further includes: and adding a position vector into an attention layer of the neural machine translation model, and expanding the attention model based on the hidden layer state into a mixed attention model based on the hidden layer state and the position vector.

Preferably, after "adding a position vector in an attention layer of the neural machine translation model, extending the attention model based on the hidden layer state into a hybrid attention model based on the hidden layer state and the position vector", before "training the neural machine translation model by replacing the original source language with the sequenced source language", the method further includes: and adding a coverage vector in an attention layer of the neural machine translation model to measure whether the source specific word is translated or not.

Preferably, the training pre-sequencing model adopts a method of automatically extracting a sequencing rule.

In another aspect of the present invention, a neural machine translation method is provided, which improves the existing neural machine translation method by using the above-mentioned method for improving the accuracy of neural machine translation.

In a third aspect of the present invention, a neural machine translation system is provided, which is based on the above neural machine translation method.

In a fourth aspect of the present invention, a storage device is provided, adapted to store a plurality of stored programs, said programs being adapted to be loaded and executed by a processor to implement the above-mentioned method of improving the accuracy of neural machine translation.

In a fifth aspect of the present invention, a processing apparatus is provided, comprising: a processor and a memory;

the processor is suitable for executing various programs; the memory adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the method of improving neural machine translation accuracy described above.

The invention has the beneficial effects that:

the invention introduces a common preprocessing method in statistical machine translation, namely, pre-sequence adjustment, into the neural machine translation, and realizes unexpected technical effects, namely greatly relieving the problems of missing and repeated turns. In addition, a position vector is added in an attention layer of the neural machine translation to enhance the monotonous translation, so that the problem of missing translation is further relieved; and adding a coverage vector, and further relieving the problems of missing turning and re-turning. Compared with the existing neural machine translation method, the method has the advantages that the translation quality is improved, and the missing and repeated turns are reduced.

Drawings

FIG. 1 is a flowchart illustrating a first embodiment of a method for improving the accuracy of neural machine translation according to the present invention;

FIG. 2 is a flowchart illustrating a second embodiment of the method for improving the accuracy of neural machine translation according to the present invention;

fig. 3 is a flowchart illustrating a third embodiment of the method for improving the accuracy of neural machine translation according to the present invention.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

The neural machine translation system has the problems of missing and re-turning when translating a source language, and after analyzing a translation result output by the neural machine translation system, the words needing to be subjected to sequence adjustment in translation are found to be easier to miss or re-turn, so that the missing and re-turning problems are relieved by utilizing a preprocessing method, namely pre-sequence adjustment, commonly used in statistical machine translation. And adding a position vector to an attention layer translated by a neural machine to enhance monotonous translation, and adding a coverage vector to further relieve the problems of missing and repeated translation.

We have experimented with the chinese political news translation task. The experimental result shows that compared with the existing neural network method, the method has improvements in improving the translation quality and reducing the missing and the repeated turns, wherein the improvement of the translation quality is 1.65 BLEU; the number of missed turns is reduced by 30.4%, and the number of repeated turns is reduced by 15.6%. This fully demonstrates the effectiveness and superiority of using the pre-sequencing method to mitigate the missing and repeated turns of neural machine translation.

The embodiment I of the method for improving the translation accuracy of the neural machine, which is provided by the invention, comprises the steps of pre-sequencing a source language before translation; as shown in fig. 1, the method specifically includes:

in step S10, the pre-tuning model is trained using bilingual training data.

In the statistical machine translation, there are many methods for training the pre-sequencing model, and here we adopt a method for automatically extracting the sequencing rule. The method for automatically extracting the order-adjusting rule can automatically extract the order-adjusting rule from parallel bilingual training data. The tool for extracting the sequencing rule can be downloaded for free at the following website: https:// github. com/StatNLP/otedama.

In step S20, the original source language is sequenced to approximate the word sequence of the target language using the pre-sequencing model.

After the pre-sequence-adjusting model is obtained, the training data and the test data are input into the pre-sequence-adjusting model, the pre-sequence-adjusting model can output the source language after sequence adjustment, and the word sequence of the source language after sequence adjustment is closer to the word sequence required by the target. It should be noted that the word order of the target language does not change.

For example, the source languages are: "American officials are firmly standing for foreign language words of biting characters. The corresponding target language is "the of fisials infected with worsully word differentiated speech". It can be seen that the "foreign words in the bit" and "insights" require a transposition of the order when translating the source language. And the trained pre-sequencing model can sequence the source language into 'foreign words which are firmly called as chewing words by officers in the United states'. At the moment, the order of the words does not need to be adjusted when the source language after the order is adjusted is translated, and the source language is translated monotonously.

In step S30, the neural machine translation model is trained using the sorted source language instead of the original source language.

In the second embodiment, as shown in fig. 2, step S21 is added on the basis of the first embodiment:

in step S21, a position vector is added to the attention layer of the neural machine translation model, and the attention model based on the hidden state is expanded to a hybrid attention model based on the hidden state and the position vector.

The attention layer is an important group of components of neural machine translation to calculate which word is the source of the translation at the present time of the translation system. Assuming that the neural machine translation system is translating a word, the probability of attention for that word is high and the probability of attention for other words is low, and vice versa. Currently, the way to compute word attention is based on hidden layer states, as shown in equation (1):

wherein e is_i,jThe attention value of the jth word in the source language when the ith word in the target language is predicted;

W_aand U_aUpdating and optimizing the model parameters of the neural network in the training process; z is a radical of_iHidden state, h, obtained by the recurrent neural network of the word vector of the ith word of the target language_jAnd obtaining a hidden layer state of a word vector of a jth word of the source language through a recurrent neural network. Thus, traditional neural machine translation calculates attention values by measuring the similarity of source and target hidden states, called hidden state-based annotationAn intention model. The traditional attention model based on the hidden layer state is expanded into a hybrid attention model based on the hidden layer state and a position vector, and the specific process is as follows:

first, we randomly generate a position matrix E for each of the source and target ends_sAnd E_tIn which E_sIs a position matrix of source ends, E_tIs the position matrix of the target end. E_s(j) A position vector, E, representing a source end position j_t(i) A position vector representing the target end position i.

Then, we change the traditional attention model based on the hidden state to a hybrid attention model based on the hidden state and the position vector, as shown in equation (2):

wherein, W_t、W_sAre all weighted, updated and optimized during the training process, E_t(i)、E_s(j) Also updated and optimized during the training process. In equation (2), when some source words are missed by the attention model based on the hidden state, the attention model based on the position vector can compensate for it, and vice versa.

In the third embodiment, as shown in fig. 3, a step S22 is added on the basis of the second embodiment:

in step S22, a coverage vector is added to the attention layer of the neural machine translation model to measure whether the source specific word has been translated.

By using the pre-tuning model and the position vector, we can alleviate the problems of missing and re-flipping much. We add coverage vectors in formula (2) to further mitigate miss-turns and re-turns, in a specific manner:

first, we initialize a coverage vector C first_iAnd updated at each decoding instant. The coverage vector is used to measure whether a word from the source has been translated. The initial value of the coverage vector is C_i0,0, meaning that all words of the source are not translated, at every word, it is not translatedA vector of coverage at decoding instant C_iEach value c of_i,jUpdating is performed, as shown in formula (3):

wherein, a_i,j＝softmax(e_i,j)，Φ_jAs a word x_jThe multiplication rate of (i.e. word x)_jThe number of words corresponding to the target end during translation can be obtained by calculation according to the formula (4):

Φ_j＝N*σ(U_fh_j) (4)

where N is the maximum multiplication rate, and is set to 2, σ (U)_fh_j) As a sigmoid function, U_fIs a parameter, h_jIs the hidden state of the jth word in the source language.

After the coverage vector is obtained, its influence and attention model of the decoding time instant after adjustment can be used, as shown in equation (5):

wherein, V_aUpdating and optimizing network parameters in training; c. C_i-1,jIs the predicted coverage vector of the last word (i.e., the i-1 th word).

Experimental results for this example:

we performed experiments on the chinese political news translation task, and the experimental results are shown in tables 1 and 2:

TABLE 1

	MT01	MT02	MT03	MT04	MT05	AVE
							Prior Art	38.99	40.69	35.20	38.60	28.48	36.39
The invention	40.42	42.23	37.63	39.94	29.97	38.04

Table 1 shows the BLEU values of the present invention and the existing neural machine translation system on different test sets, where BLEU is an automatic evaluation method for machine translation. Five test data (MT01-MT05) control values are given in the table, as well as the mean (AVE) of these five data. The training data comprises two million parallel sentence pairs, and after the mean value comparison of the two methods is carried out, the evaluation index automatically given by the machine is improved by 1.65BLEU compared with the existing neural machine translation system, so that the effectiveness and the superiority of the method are fully illustrated.

TABLE 2

	Missing turnover	Reversal of
			Prior Art	92	32
The invention	64	27

Table 2 shows the number of missed turns and repeated turns of 500 test sentences in the present invention and the existing neural machine translation system. It can be seen that the present invention has a certain reduction in both miss-turn and re-turn compared to existing neural machine translation systems, especially a significant reduction in the number of miss-turns.

From experimental data it can be calculated: (92-64)/92 is 30.4%, and (32-27)/32 is 15.6%; therefore, after the method disclosed by the invention is adopted, the missing turnover number is reduced by 30.4%, and the re-turnover number is reduced by 15.6%. Therefore, the invention can greatly improve the translation effect of the neural machine translation system and reduce the occurrence of missing and repeated translation.

The method of the present invention has general applicability, since it is not proposed for only two specific languages. Although the invention has been tested only in the direction of translation from Chinese to English, the invention is also suitable for translation from other language pairs, such as English to Chinese, Chinese to French, etc.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be made by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

An embodiment of a neural machine translation method of the present invention improves upon existing neural machine translation methods by using the method described above to improve the accuracy of neural machine translation.

An embodiment of the present invention is based on the neural machine translation method described above.

An embodiment of a memory device of the invention is adapted to store a plurality of stored programs, said programs being adapted to be loaded and executed by a processor to implement the method of improving the accuracy of neural machine translation described above.

An embodiment of a processing apparatus of the invention comprises: a processor and a memory;

Those of skill in the art will appreciate that the method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A method for improving the translation accuracy of a neural machine is characterized in that source languages are pre-sequenced before translation; the method specifically comprises the following steps:

training a pre-sequencing model by using bilingual training data and adopting a method for automatically extracting sequencing rules;

adding a position vector into an attention layer of the neural machine translation model, and expanding an attention model based on a hidden layer state into a mixed attention model based on the hidden layer state and the position vector;

adding coverage vectors into an attention layer of the neural machine translation model, and using the coverage vectors to influence and adjust the mixed attention model;

2. The method of claim 1, wherein the coverage vector is used to measure whether a source specific word has been translated.

3. A neural machine translation method, wherein the method for improving the accuracy of neural machine translation according to any one of claims 1 to 2 is used to improve the existing neural machine translation method.

4. A storage device adapted to store a plurality of stored programs, wherein said programs are adapted to be loaded and executed by a processor to implement the method of improving the accuracy of neural machine translation recited in any one of claims 1-2.

5. A processing device, comprising:

a processor adapted to execute various programs; and

a memory adapted to store a plurality of programs;

characterized in that said program is adapted to be loaded and executed by a processor to implement the method of improving the accuracy of neural machine translation of any one of claims 1-2.