CN107943795B - Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment - Google Patents

Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment Download PDF

Info

Publication number
CN107943795B
CN107943795B CN201711123864.2A CN201711123864A CN107943795B CN 107943795 B CN107943795 B CN 107943795B CN 201711123864 A CN201711123864 A CN 201711123864A CN 107943795 B CN107943795 B CN 107943795B
Authority
CN
China
Prior art keywords
translation
neural machine
machine translation
improving
accuracy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711123864.2A
Other languages
Chinese (zh)
Other versions
CN107943795A (en
Inventor
张家俊
赵阳
宗成庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Boeing China Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Boeing China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science, Boeing China Co Ltd filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201711123864.2A priority Critical patent/CN107943795B/en
Publication of CN107943795A publication Critical patent/CN107943795A/en
Application granted granted Critical
Publication of CN107943795B publication Critical patent/CN107943795B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the field of machine translation, in particular to a method, a translation method, a system and equipment for improving the translation accuracy of a neural machine, and aims to solve the problems of missing and repeated translation of the neural machine translation system. The method for improving the translation accuracy of the neural machine, which is provided by the invention, introduces a common preprocessing method, namely pre-sequence adjustment, in the statistical machine translation into the neural machine translation, thereby realizing unexpected technical effects, namely greatly relieving the problems of missing and repeated translation. In addition, a position vector is added in the attention layer of the neural machine translation to enhance the monotonous translation, and a coverage vector is added, so that the problems of missing and repeated translation are further relieved. Compared with the existing neural machine translation method, the method has the advantages that the translation quality is improved, and the missing and repeated turns are reduced.

Description

Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment
Technical Field
The invention relates to the field of machine translation, in particular to a method, a translation system and equipment for improving the translation accuracy of a neural machine.
Background
Machine translation is the conversion between different languages implemented by a computer. The translated language is often referred to as the source language and the translated result language as the target language. Machine translation is the process of achieving a conversion from a source language to a target language.
The neural machine translation is the latest machine translation method appearing in recent years, and has a remarkable improvement on the translation quality compared with the original statistical machine translation method. Compared with the prior statistical machine translation method, the neural machine translation method has the advantages of less required engineering design and better translation effect. When it was first proposed, a comparable accuracy to statistical methods was achieved on a common reference data set of medium size. Since then, researchers have proposed many techniques to improve neural machine translation, and now have greatly surpassed statistical methods in translation quality, many industry companies including Google translation and centesimal translation have recently updated their own translation systems from statistical-based methods to neural-network-based methods, and have gained wide acceptance.
However, machine translation has not been completely solved. While neural-machine translation works well, it still makes significant errors that some human translators do not, with the most significant errors being missed and redone. The missing translation means that when the machine translates the source language, some words in the source language need to be translated and are mistakenly missed by the machine; by re-translated, it is meant that some words in the source language are incorrectly translated multiple times.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a method, a translation method, a system and a device for improving the translation accuracy of a neural machine, which significantly reduce the probability of missing and re-flipping.
In one aspect of the invention, a method for improving the translation accuracy of a neural machine is provided, and source languages are pre-sequenced before translation; the method specifically comprises the following steps:
training a pre-tuning model by using bilingual training data;
using the pre-sequencing model to sequence the original source language to enable the original source language to be close to the word sequence of the target language;
and (4) replacing the original source language with the sequenced source language, and training the neural machine translation model.
Preferably, after "using the pre-tuning model to tune the original source language to be close to the word sequence of the target language", before "using the tuned source language to replace the original source language to train the neural machine translation model", the method further includes: and adding a position vector into an attention layer of the neural machine translation model, and expanding the attention model based on the hidden layer state into a mixed attention model based on the hidden layer state and the position vector.
Preferably, after "adding a position vector in an attention layer of the neural machine translation model, extending the attention model based on the hidden layer state into a hybrid attention model based on the hidden layer state and the position vector", before "training the neural machine translation model by replacing the original source language with the sequenced source language", the method further includes: and adding a coverage vector in an attention layer of the neural machine translation model to measure whether the source specific word is translated or not.
Preferably, the training pre-sequencing model adopts a method of automatically extracting a sequencing rule.
In another aspect of the present invention, a neural machine translation method is provided, which improves the existing neural machine translation method by using the above-mentioned method for improving the accuracy of neural machine translation.
In a third aspect of the present invention, a neural machine translation system is provided, which is based on the above neural machine translation method.
In a fourth aspect of the present invention, a storage device is provided, adapted to store a plurality of stored programs, said programs being adapted to be loaded and executed by a processor to implement the above-mentioned method of improving the accuracy of neural machine translation.
In a fifth aspect of the present invention, a processing apparatus is provided, comprising: a processor and a memory;
the processor is suitable for executing various programs; the memory adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the method of improving neural machine translation accuracy described above.
The invention has the beneficial effects that:
the invention introduces a common preprocessing method in statistical machine translation, namely, pre-sequence adjustment, into the neural machine translation, and realizes unexpected technical effects, namely greatly relieving the problems of missing and repeated turns. In addition, a position vector is added in an attention layer of the neural machine translation to enhance the monotonous translation, so that the problem of missing translation is further relieved; and adding a coverage vector, and further relieving the problems of missing turning and re-turning. Compared with the existing neural machine translation method, the method has the advantages that the translation quality is improved, and the missing and repeated turns are reduced.
Drawings
FIG. 1 is a flowchart illustrating a first embodiment of a method for improving the accuracy of neural machine translation according to the present invention;
FIG. 2 is a flowchart illustrating a second embodiment of the method for improving the accuracy of neural machine translation according to the present invention;
fig. 3 is a flowchart illustrating a third embodiment of the method for improving the accuracy of neural machine translation according to the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The neural machine translation system has the problems of missing and re-turning when translating a source language, and after analyzing a translation result output by the neural machine translation system, the words needing to be subjected to sequence adjustment in translation are found to be easier to miss or re-turn, so that the missing and re-turning problems are relieved by utilizing a preprocessing method, namely pre-sequence adjustment, commonly used in statistical machine translation. And adding a position vector to an attention layer translated by a neural machine to enhance monotonous translation, and adding a coverage vector to further relieve the problems of missing and repeated translation.
We have experimented with the chinese political news translation task. The experimental result shows that compared with the existing neural network method, the method has improvements in improving the translation quality and reducing the missing and the repeated turns, wherein the improvement of the translation quality is 1.65 BLEU; the number of missed turns is reduced by 30.4%, and the number of repeated turns is reduced by 15.6%. This fully demonstrates the effectiveness and superiority of using the pre-sequencing method to mitigate the missing and repeated turns of neural machine translation.
The embodiment I of the method for improving the translation accuracy of the neural machine, which is provided by the invention, comprises the steps of pre-sequencing a source language before translation; as shown in fig. 1, the method specifically includes:
in step S10, the pre-tuning model is trained using bilingual training data.
In the statistical machine translation, there are many methods for training the pre-sequencing model, and here we adopt a method for automatically extracting the sequencing rule. The method for automatically extracting the order-adjusting rule can automatically extract the order-adjusting rule from parallel bilingual training data. The tool for extracting the sequencing rule can be downloaded for free at the following website: https:// github. com/StatNLP/otedama.
In step S20, the original source language is sequenced to approximate the word sequence of the target language using the pre-sequencing model.
After the pre-sequence-adjusting model is obtained, the training data and the test data are input into the pre-sequence-adjusting model, the pre-sequence-adjusting model can output the source language after sequence adjustment, and the word sequence of the source language after sequence adjustment is closer to the word sequence required by the target. It should be noted that the word order of the target language does not change.
For example, the source languages are: "American officials are firmly standing for foreign language words of biting characters. The corresponding target language is "the of fisials infected with worsully word differentiated speech". It can be seen that the "foreign words in the bit" and "insights" require a transposition of the order when translating the source language. And the trained pre-sequencing model can sequence the source language into 'foreign words which are firmly called as chewing words by officers in the United states'. At the moment, the order of the words does not need to be adjusted when the source language after the order is adjusted is translated, and the source language is translated monotonously.
In step S30, the neural machine translation model is trained using the sorted source language instead of the original source language.
In the second embodiment, as shown in fig. 2, step S21 is added on the basis of the first embodiment:
in step S21, a position vector is added to the attention layer of the neural machine translation model, and the attention model based on the hidden state is expanded to a hybrid attention model based on the hidden state and the position vector.
The attention layer is an important group of components of neural machine translation to calculate which word is the source of the translation at the present time of the translation system. Assuming that the neural machine translation system is translating a word, the probability of attention for that word is high and the probability of attention for other words is low, and vice versa. Currently, the way to compute word attention is based on hidden layer states, as shown in equation (1):
Figure BDA0001467977520000041
wherein e isi,jThe attention value of the jth word in the source language when the ith word in the target language is predicted;
Figure BDA0001467977520000042
Waand UaUpdating and optimizing the model parameters of the neural network in the training process; z is a radical ofiHidden state, h, obtained by the recurrent neural network of the word vector of the ith word of the target languagejAnd obtaining a hidden layer state of a word vector of a jth word of the source language through a recurrent neural network. Thus, traditional neural machine translation calculates attention values by measuring the similarity of source and target hidden states, called hidden state-based annotationAn intention model. The traditional attention model based on the hidden layer state is expanded into a hybrid attention model based on the hidden layer state and a position vector, and the specific process is as follows:
first, we randomly generate a position matrix E for each of the source and target endssAnd EtIn which EsIs a position matrix of source ends, EtIs the position matrix of the target end. Es(j) A position vector, E, representing a source end position jt(i) A position vector representing the target end position i.
Then, we change the traditional attention model based on the hidden state to a hybrid attention model based on the hidden state and the position vector, as shown in equation (2):
Figure BDA0001467977520000051
wherein, Wt、WsAre all weighted, updated and optimized during the training process, Et(i)、Es(j) Also updated and optimized during the training process. In equation (2), when some source words are missed by the attention model based on the hidden state, the attention model based on the position vector can compensate for it, and vice versa.
In the third embodiment, as shown in fig. 3, a step S22 is added on the basis of the second embodiment:
in step S22, a coverage vector is added to the attention layer of the neural machine translation model to measure whether the source specific word has been translated.
By using the pre-tuning model and the position vector, we can alleviate the problems of missing and re-flipping much. We add coverage vectors in formula (2) to further mitigate miss-turns and re-turns, in a specific manner:
first, we initialize a coverage vector C firstiAnd updated at each decoding instant. The coverage vector is used to measure whether a word from the source has been translated. The initial value of the coverage vector is Ci0,0, meaning that all words of the source are not translated, at every word, it is not translatedA vector of coverage at decoding instant CiEach value c ofi,jUpdating is performed, as shown in formula (3):
Figure BDA0001467977520000052
wherein, ai,j=softmax(ei,j),ΦjAs a word xjThe multiplication rate of (i.e. word x)jThe number of words corresponding to the target end during translation can be obtained by calculation according to the formula (4):
Φj=N*σ(Ufhj) (4)
where N is the maximum multiplication rate, and is set to 2, σ (U)fhj) As a sigmoid function, UfIs a parameter, hjIs the hidden state of the jth word in the source language.
After the coverage vector is obtained, its influence and attention model of the decoding time instant after adjustment can be used, as shown in equation (5):
Figure BDA0001467977520000061
wherein, VaUpdating and optimizing network parameters in training; c. Ci-1,jIs the predicted coverage vector of the last word (i.e., the i-1 th word).
Experimental results for this example:
we performed experiments on the chinese political news translation task, and the experimental results are shown in tables 1 and 2:
TABLE 1
MT01 MT02 MT03 MT04 MT05 AVE
Prior Art 38.99 40.69 35.20 38.60 28.48 36.39
The invention 40.42 42.23 37.63 39.94 29.97 38.04
Table 1 shows the BLEU values of the present invention and the existing neural machine translation system on different test sets, where BLEU is an automatic evaluation method for machine translation. Five test data (MT01-MT05) control values are given in the table, as well as the mean (AVE) of these five data. The training data comprises two million parallel sentence pairs, and after the mean value comparison of the two methods is carried out, the evaluation index automatically given by the machine is improved by 1.65BLEU compared with the existing neural machine translation system, so that the effectiveness and the superiority of the method are fully illustrated.
TABLE 2
Missing turnover Reversal of
Prior Art 92 32
The invention 64 27
Table 2 shows the number of missed turns and repeated turns of 500 test sentences in the present invention and the existing neural machine translation system. It can be seen that the present invention has a certain reduction in both miss-turn and re-turn compared to existing neural machine translation systems, especially a significant reduction in the number of miss-turns.
From experimental data it can be calculated: (92-64)/92 is 30.4%, and (32-27)/32 is 15.6%; therefore, after the method disclosed by the invention is adopted, the missing turnover number is reduced by 30.4%, and the re-turnover number is reduced by 15.6%. Therefore, the invention can greatly improve the translation effect of the neural machine translation system and reduce the occurrence of missing and repeated translation.
The method of the present invention has general applicability, since it is not proposed for only two specific languages. Although the invention has been tested only in the direction of translation from Chinese to English, the invention is also suitable for translation from other language pairs, such as English to Chinese, Chinese to French, etc.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be made by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.
An embodiment of a neural machine translation method of the present invention improves upon existing neural machine translation methods by using the method described above to improve the accuracy of neural machine translation.
An embodiment of the present invention is based on the neural machine translation method described above.
An embodiment of a memory device of the invention is adapted to store a plurality of stored programs, said programs being adapted to be loaded and executed by a processor to implement the method of improving the accuracy of neural machine translation described above.
An embodiment of a processing apparatus of the invention comprises: a processor and a memory;
the processor is suitable for executing various programs; the memory adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the method of improving neural machine translation accuracy described above.
Those of skill in the art will appreciate that the method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (5)

1. A method for improving the translation accuracy of a neural machine is characterized in that source languages are pre-sequenced before translation; the method specifically comprises the following steps:
training a pre-sequencing model by using bilingual training data and adopting a method for automatically extracting sequencing rules;
using the pre-sequencing model to sequence the original source language to enable the original source language to be close to the word sequence of the target language;
adding a position vector into an attention layer of the neural machine translation model, and expanding an attention model based on a hidden layer state into a mixed attention model based on the hidden layer state and the position vector;
adding coverage vectors into an attention layer of the neural machine translation model, and using the coverage vectors to influence and adjust the mixed attention model;
and (4) replacing the original source language with the sequenced source language, and training the neural machine translation model.
2. The method of claim 1, wherein the coverage vector is used to measure whether a source specific word has been translated.
3. A neural machine translation method, wherein the method for improving the accuracy of neural machine translation according to any one of claims 1 to 2 is used to improve the existing neural machine translation method.
4. A storage device adapted to store a plurality of stored programs, wherein said programs are adapted to be loaded and executed by a processor to implement the method of improving the accuracy of neural machine translation recited in any one of claims 1-2.
5. A processing device, comprising:
a processor adapted to execute various programs; and
a memory adapted to store a plurality of programs;
characterized in that said program is adapted to be loaded and executed by a processor to implement the method of improving the accuracy of neural machine translation of any one of claims 1-2.
CN201711123864.2A 2017-11-14 2017-11-14 Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment Active CN107943795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711123864.2A CN107943795B (en) 2017-11-14 2017-11-14 Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711123864.2A CN107943795B (en) 2017-11-14 2017-11-14 Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment

Publications (2)

Publication Number Publication Date
CN107943795A CN107943795A (en) 2018-04-20
CN107943795B true CN107943795B (en) 2020-05-19

Family

ID=61932042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711123864.2A Active CN107943795B (en) 2017-11-14 2017-11-14 Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment

Country Status (1)

Country Link
CN (1) CN107943795B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708098A (en) * 2012-05-30 2012-10-03 中国科学院自动化研究所 Dependency coherence constraint-based automatic alignment method for bilingual words

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708098A (en) * 2012-05-30 2012-10-03 中国科学院自动化研究所 Dependency coherence constraint-based automatic alignment method for bilingual words

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Chinese Syntactic Reordering for Statistical Machine Translation;Chao Wang et al.;《Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational》;20070630;742页第1栏倒数第2段 *
COVERAGE-BASED NEURAL MACHINE TRANSLATION;Zhaopeng Tu et al.;《Workshop track - ICLR 2016》;20160215;第1页第2-5段,第2页2-5段 *
Exploiting Source-side Monolingual Data in Neural Machine Translation;Jiajun Zhang,Chengqing Zong;《Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing》;20161130;1538页第2栏第3段,1539页第2栏第2段,1544页第2栏第11段 *
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model;Trevor Cohn et al.;《Proceedings of NAACL-HLT 2016》;20160630;878页第2栏1-4段 *

Also Published As

Publication number Publication date
CN107943795A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN108052512B (en) Image description generation method based on depth attention mechanism
US11403520B2 (en) Neural network machine translation method and apparatus
CN107766319B (en) Sequence conversion method and device
CN112464676A (en) Machine translation result scoring method and device
Park et al. Building a neural machine translation system using only synthetic parallel data
CN107832300A (en) Towards minimally invasive medical field text snippet generation method and device
JP2017021422A (en) Statistical translation optimization device, statistical translation system, and computer program
CN108763230B (en) Neural machine translation method using external information
CN111160014A (en) Intelligent word segmentation method
CN113822054A (en) Chinese grammar error correction method and device based on data enhancement
JP2023025126A (en) Training method and apparatus for deep learning model, text data processing method and apparatus, electronic device, storage medium, and computer program
US11694041B2 (en) Chapter-level text translation method and device
CN112287694A (en) Shared encoder-based Chinese-crossing unsupervised neural machine translation method
CN114861637A (en) Method and device for generating spelling error correction model and method and device for spelling error correction
CN113204978B (en) Machine translation enhancement training method and system
CN107943795B (en) Method for improving translation accuracy of neural machine, translation method, translation system and translation equipment
WO2021239631A1 (en) Neural machine translation method, neural machine translation system, learning method, learning system, and programm
Shi et al. Adding Visual Information to Improve Multimodal Machine Translation for Low‐Resource Language
CN115495578B (en) Text pre-training model backdoor elimination method, system and medium based on maximum entropy loss
CN114298061B (en) Machine translation and model training quality evaluation method, electronic device and storage medium
WO2022242535A1 (en) Translation method, translation apparatus, translation device and storage medium
JP2017142746A (en) Word vector learning device, natural language processing device, program, and program
US20220171926A1 (en) Information processing method, storage medium, and information processing device
CN108932231B (en) Machine translation method and device
CN112257469B (en) Compression method of deep nerve machine translation model for small mobile equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant