CN113095092A - Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship - Google Patents

Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship Download PDF

Info

Publication number
CN113095092A
CN113095092A CN202110416255.6A CN202110416255A CN113095092A CN 113095092 A CN113095092 A CN 113095092A CN 202110416255 A CN202110416255 A CN 202110416255A CN 113095092 A CN113095092 A CN 113095092A
Authority
CN
China
Prior art keywords
target language
language sequence
machine translation
neural machine
modeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110416255.6A
Other languages
Chinese (zh)
Other versions
CN113095092B (en
Inventor
黄书剑
王东琪
鲍宇
张建兵
戴新宇
陈家骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110416255.6A priority Critical patent/CN113095092B/en
Publication of CN113095092A publication Critical patent/CN113095092A/en
Application granted granted Critical
Publication of CN113095092B publication Critical patent/CN113095092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

本发明公开了一种通过建模协同关系提高非自回归神经机器翻译质量的方法,将源端表示结合目标语言序列的长度构造非自回归神经机器翻译模型中解码器的输入,然后结合依存语法树、源端表示、解码器输入得到目标语言序列的协同关系矩阵,最后将目标语言序列的协同关系矩阵集成于非自回归神经机器翻译模型中的解码器。本发明通过依存语法树来建模目标序列中词与词之间的协同关系,在兼顾依赖关系的同时使翻译质量获得了显著的提升。

Figure 202110416255

The invention discloses a method for improving the quality of non-autoregressive neural machine translation by modeling a synergistic relationship. The source end representation is combined with the length of the target language sequence to construct the input of the decoder in the non-autoregressive neural machine translation model, and then combined with the dependency grammar The tree, source representation, and decoder input get the synergistic relationship matrix of the target language sequence. Finally, the synergy relationship matrix of the target language sequence is integrated into the decoder in the non-autoregressive neural machine translation model. The present invention models the synergistic relationship between words in the target sequence by using the dependency syntax tree, so that the translation quality is significantly improved while taking into account the dependency relationship.

Figure 202110416255

Description

Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship
Technical Field
The invention relates to the field of neural machine translation models, in particular to a method for improving the translation quality of a non-autoregressive neural machine through modeling synergistic relationship.
Background
With the trend of economic globalization, communication and cooperation between the international countries become more frequent. The manual translation depending on the translator needs to consume huge manpower and financial resources, and cannot meet the increasing translation requirements, so that the machine translation is carried forward. Machine translation, as the name implies, refers to the process of converting a source language into a target language semantically equivalent thereto using computer technology.
Thanks to the improvement of computer computing power and the development of deep learning research, a Neural Machine Translation (NMT) model based on a deep Neural network occupies the leading position of Machine Translation research. The neural machine translation model adopts a coder-decoder framework, obtains excellent translation performance, and is widely applied. Specifically, given a source language sentence X ═ { X ═ X1,x2,…,xmIn which xiRepresenting the ith subword in the source language sentence, i ═ {1,2, …, m }, the NMT model first encodes it using an encoder into a source-side representation E ═ { E ═ E }1,e2,…,emIn which eiRepresenting the semantic representation corresponding to the ith subword in the source language sentence, i ═ {1,2, …, m }, and then decoding through a decoder to obtain the translation Y ═ { Y } of the target language1,y2,…,ynIn which y isjWhich represents the jth subword in the target language sentence, j ═ 1,2, …, n. NMT models can be classified into two categories according to the way decoders work: the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are translation principles of the autoregressive neural machine translation model and the non-autoregressive neural machine translation model respectively as shown in fig. 1a and 1b, source language sentences input by the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are both 'I love China', and target language translation results of the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are both Chinese.
In autoregressive neural machine translation models, classical, e.g. Transformer, modelType (Ashish Vawani, Noam Shazer, Niki Parmar, Jakob Uszkoroiit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin.2017.Attention is all you need in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems2017, Decumber 4-9, 2017, Long Beach, CA, USA, pages 5998. 6008.), RNN model (Zarea, Wojciech, Yalsatsu Sutskey, and Oriol Vinyakus, "Regourent process" network prediction ", Xylarrar 1241, Australian coding model, and coding model, all of the coding model of the coding, generated from left to right in fig. 1 a): for prediction at time t, the decoder uses the output Y before time tt={y1,y2,...,yt-1Where y isjRepresents the jth subword in the target language sentence, j ═ 1, 2., t-1, and the source end encoded by the encoder in combination with the attention mechanism represents the E-predicted target word y't. The Transformer model can achieve excellent performance on a plurality of translation data sets, but the autoregressive decoding process adopted by the Transformer model has the following problems: 1) there is an exposure bias problem: the historical information during training is taken from the reference translation, and the historical information during testing can only be obtained from the prediction of the model, so that the problem of inconsistency between training and testing is caused, and the performance is reduced. 2) The translation efficiency is low: the serial working mode of the autoregressive model cannot utilize the high parallel characteristic of GPU hardware during testing, the prediction time of the autoregressive model is positively correlated with the sequence length, and the translation speed is low when a long sentence is translated.
Unlike autoregressive neural Machine Translation models, Non-autoregressive neural Machine Translation models (NAT) assumeThe words in the target language sequence are independent of one another. The NAT model generates words in the target language sequence in parallel: after the source end representation E is obtained by the coder, the length n of the target language sequence is obtained by the length predictor, and the input D of the decoder is constructed by the length predictor and is { D ═ D }1,d2,...,dnIn which d isjWhich indicates the decoder input corresponding to the j-th position, j ═ 1, 2.. times, n, and then the corresponding word is predicted by the decoder at the same time. By removing the dependence on historical information through the independence assumption, the NAT model relieves the exposure bias problem in the autoregressive translation besides having extremely high translation efficiency. However, performance is far behind that of the autoregressive machine translation model because there is no explicit dependency between the outputs, making it difficult for predictions from different locations of the NAT model to be coordinated to produce consistent translations. Furthermore, the multimodality phenomenon (i.e., one source language sentence has multiple correct target language sentences corresponding to it) of the translation task deepens the problem, resulting in lower final translation performance.
Around the problem of dependency loss in non-autoregressive neural machine translation models, there are two types of solutions: one class of schemes directly model the dependency relationship between words in a target sequence; another type of scheme selects to introduce hidden variables to implicitly model the missing dependencies.
The scheme of direct modeling of the dependency relationship adopts a training strategy similar to an autoregressive model. The researcher proposes to take part of the words in the reference translation Y as input and train the decoder to predict the words not provided, thereby modeling the dependency between the provided part of the words and the rest of the words. This scheme significantly improves the performance of non-autoregressive machine translation in conjunction with iterative decoding strategies (Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, and Lei Li.2020. Glancing translation for non-autoregressive neural translation. arXiv prediction arXiv: 2008.07905; Marjan Ghazing jad, Omer Levy, Yinhan Liu, and Luke Zettle layer.2019. Mass-prediction: Parallel decoding of connected texture regulation. in Processing of the Conference on the Natural translation and Processing of the family, while the method of the present invention is applied to the family of the family members, the family members 6112, the family members for the family of the family members, and the family members for the family members, and the family members, the family. However, this solution has the following problems: 1) partial words using the reference sequence are used as input in the training stage, and the predicted words are decoded or provided in the testing stage at the same time, so that the problem of exposure deviation still exists, and the prediction performance of the model is reduced; 2) the decoding algorithm of multiple iterations results in reduced model efficiency.
The scheme based on the hidden variables utilizes a deep neural network to code the dependency information of the target sequence into the hidden variables, and then a model is trained to model the hidden variables. Modeling of hidden variables can be an intermediate process to non-autoregressive modeling. Specifically, hidden variable-based schemes first predict the hidden variables either auto-regressively or non-auto-regressively, followed by prediction of the target sequence (documents Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Edurard Hovy.2019. FlowSeq: non-innovative comparative sequence generation with genetic flow. in Processing of the2019Conference on Empirical Methods in Natural Language Processing and 9th International journal Conference Natural Language Processing (EMNLPICNLP), pages 4282-. However, modeling of hidden variables relies on complex deep neural networks, even though the translation of the model is less efficient and often less interpretable.
Disclosure of Invention
The invention aims to provide a method for improving the quality of a non-autoregressive neural machine translation model through a modeling synergistic relationship, so as to solve the problem that the translation performance is reduced due to the fact that the existing non-autoregressive neural machine translation model lacks an explicit dependency relationship modeling.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the method for improving the translation quality of the non-autoregressive neural machine through modeling cooperative relationship comprises the following steps of firstly obtaining source end representation corresponding to a source language sequence, then obtaining the length of a target language sequence, and then constructing the input of a decoder in a non-autoregressive neural machine translation model by combining the source end representation with the length of the target language sequence, wherein the method further comprises the following steps:
step 1, obtaining a dependency syntax tree of a target language sequence based on source end representation and decoder input, and converting the dependency syntax tree of the target language sequence into a collaborative relationship matrix.
And 2, integrating the cooperative relationship matrix of the target language sequence into a decoder in the non-autoregressive neural machine translation model, and decoding the input of the decoder by using the decoder integrated with the cooperative relationship matrix to obtain the target language sequence.
In the method for improving the translation quality of the non-autoregressive neural machine through the modeling synergistic relationship, the source end expression is obtained by encoding a source language sequence by an encoder.
In the method for improving the translation quality of the non-autoregressive neural machine through the modeling synergistic relationship, the length of the target language sequence is obtained through a length predictor in a non-autoregressive neural machine translation model based on the source end representation prediction. The length predictor firstly predicts the length difference between the source language sequence and the target language sequence based on the source end expression, and then obtains the length of the target language sequence according to the length difference and the length of the source language sequence.
In the method for improving the translation quality of the non-autoregressive neural machine through modeling the cooperative relationship, in step 1, a cooperative relationship predictor is adopted to construct a dependency syntax tree of a target language sequence based on source end representation and decoder input, and the dependency syntax tree of the target language sequence is converted into a cooperative relationship matrix of the target language sequence.
The cooperative relationship predictor predicts a dependency syntax tree by adopting a double affine dependency parser model, the double-emulation dependency parser model takes the source end representation and the decoder input as input, the dependency syntax tree of the target language is used as a training target for training, the dependency syntax tree of the target language sequence is obtained by predicting the trained double affine dependency parser model, and then the cooperative relationship predictor converts the dependency syntax tree of the target sequence into a cooperative relationship matrix. The dependency syntax tree of the target language is extracted from the reference translation of the target language.
In the step 2 of the method for improving the translation quality of the non-autoregressive neural machine through modeling the cooperative relationship, a cooperative relationship layer is constructed in a decoder in a non-autoregressive neural machine translation model, and the cooperative relationship layer comprises an attention layer based on a target language sequence cooperative relationship matrix, a source end-target end attention layer and a feedforward neural network layer, so that the cooperative relationship matrix of the target language sequence is integrated in the decoder.
Compared with the traditional autoregressive decoding scheme, the neural machine translation model adopting non-autoregressive decoding has extremely high efficiency and is more suitable for the requirements of the industry, but the application of the model is influenced by the lower translation quality. At the very root, the non-autoregressive neural machine translation model lacks explicit modeling of the dependency relationship between target language sequence words, so that the multi-modal phenomenon commonly existing in a machine translation task is difficult to deal with. Existing non-autoregressive neural machine translation studies around modeling dependencies rely either on inefficient rounds of iteration or on complex depth networks. The invention extracts the undirected dependency relationship (namely the cooperative relationship) between words in the target language sequence through the dependency syntax tree, then models the cooperative relationship through a simple cooperative relationship predictor, and improves the translation quality of the non-autoregressive neural machine translation model.
By analyzing the working mode of the non-autoregressive neural machine translation model (NAT), the fact that the parallel prediction of the NAT is actually the target sequence which is predicted cooperatively, and the existing work selection modeling (directed) dependency relationship is not the essential appeal of the NAT. Therefore, the invention provides a method for modeling the cooperative relationship between words in the target language sequence and integrating the cooperative relationship into the decoding process of the NAT.
Compared with the prior art, the invention has the advantages that:
1) the invention firstly provides the cooperative relationship among words in the modeling target sequence, expresses the cooperative relationship into a cooperative relationship matrix, and accordingly constructs a cooperative relationship layer and integrates the cooperative relationship layer into a decoder of an NAT model.
2) The invention provides a method for modeling the cooperative relationship between words in a target sequence by a dependency syntax tree, extracting a cooperative relationship matrix from the dependency syntax tree and integrating the cooperative relationship matrix into the decoding process of NAT. Therefore, the translation quality is obviously improved while the dependency relationship is considered, and the huge value of modeling the cooperative relationship in the NAT is shown.
Drawings
FIG. 1a is a schematic diagram of a prior art autoregressive neural machine translation model.
FIG. 1b is a schematic diagram of a prior art non-autoregressive neural machine translation model.
Fig. 2 is a block flow diagram of an embodiment of the invention.
FIG. 3 is a schematic diagram of a non-autoregressive neural machine translation model as applied to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a decoder integrated co-relationship layer in an embodiment of the invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
In this embodiment, a NAT model system of one german language is given as an example, the input source language is german, the source language sequence is "Ich habe ine katze", the target language desired to be output is english, and the target language sequence desired to be output is "I have a cat". As shown in fig. 2, the method for improving the translation quality of the non-autoregressive neural machine by modeling the cooperative relationship in the embodiment includes the following steps:
step one, setting a source language sequence X as { X ═ X1,x2,...,xmIn which xiRepresents the ith subword in the source language sentence, i ═ 1, 2. The invention adopts an encoder of an autoregressive Transformer model to convert a source language sequence x into { x ═ x1,x2,..,xmInputting the data into an encoder, and encoding the data by the encoder to obtain a corresponding source end tableDenotes E ═ E1,e2,..,emIn which eiAnd representing a semantic representation corresponding to the ith subword in the source language sentence, wherein i is {1, 2.
Secondly, through a length predictor in the NAT model, the length of the target language sequence is obtained through prediction based on the source end expression obtained in the first step, and then the input of a decoder in the non-autoregressive neural machine translation model is constructed by combining the source end expression E and the length of the target language sequence, specifically:
firstly, using a length predictor to predict the length difference delta (L) between the target language sequence and the source language sequence based on the source end representation, and calculating the length n of the target language sequence according to the length difference delta (L), as shown in formula (1):
Figure BDA0003026022270000061
in formula (1), MLP represents a multi-layered perceptron,
Figure BDA0003026022270000062
is a parameter of the length predictor, mean-posing denotes the average pooling operation, and m denotes the length of the source language sequence.
Figure BDA0003026022270000063
Indicating the difference in length of the target and source language sequences for a given source terminal x
Figure BDA0003026022270000064
Probability distribution of (2). Δ (L) occurs below the argmax function to represent the function
Figure BDA0003026022270000065
Is returned a value of
Figure BDA0003026022270000066
Corresponding when taking the maximum value
Figure BDA0003026022270000067
Then, based on the target language sequence length n and the source end representation E, the input D ═ D of the decoder in the NAT model is constructed1,d2,...,dnIn which d isjThis indicates the decoder input corresponding to the j-th position, j ═ 1, 2.
Figure BDA0003026022270000068
In formula (2), τ is a hyper-parameter for controlling the sharpness of the softmax function, i denotes a subscript of the source language sequence, i ═ 1,2,.., m }, j denotes a subscript of the target language sequence, j ═ 1,2,.., n }, and e ═ isiAnd representing semantic representations corresponding to the ith sub-word in the source language sentence. w is aijAnd representing the relevance of the ith sub-word to the jth sub-word.
Thirdly, obtaining a dependency syntax tree of the target language sequence based on the source end representation and the decoder input, and converting the dependency syntax tree of the target language sequence into a collaborative relationship matrix, which is specifically described as follows:
the dependency syntax tree clearly defines the grammatical dependency relationship between words in the sentence, and can significantly improve the performance of a non-autoregressive machine translation (NAT) model, so the embodiment first adopts the collaborative relationship predictor to obtain the dependency syntax tree of the target language sequence and converts the dependency syntax tree into the collaborative relationship matrix.
Get dependency syntax tree step, in the training phase, the embodiment first uses an external dependency syntax tree extraction tool (e.g., stanza) to extract the dependency syntax tree of the reference translation. Then, a dual affine dependency parser model is trained to extract the dependency syntax tree for the corresponding reference translation from the decoder input and source representation (literature, memory do and Christopher d. management.2017. deep biaffine entry for neural dependency matching.in 5th International Conference on Learning responses, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track progress. openview. network.). The dual affine dependency parser model is trained with decoder input D and source representation E as inputs, and the dependency syntax tree of the target language as a training target. In the testing stage, the dependency syntax tree of the target language sequence is obtained through the trained dual affine dependency parser model prediction.
The present embodiment parses the reference translation of the target language using an external dependency syntax tree extraction tool (stanza). It is noted that the processing units of the dependency syntax tree extraction tool are words, while the processing units of the NMT model are subwords, and thus it is necessary to convert word-level dependency syntax trees into subword-level dependency syntax trees. Suppose a word yjIs decomposed into three subwords y1j,y2j,y3jThen the first subword y1jIs tjThe remaining subwords { y2j,y3jThe parent node of is y1j
This example predicts the dependency syntax tree for a target language sequence using the dual affine dependency parser model proposed in the literature Timothy Donat and Christoph D.Manning.2017.deep biaffine orientation for neural dependency addressing.In 5th International Conference on Learning Repressions, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track proceedings. The dual affine dependency parser model takes decoder input D and source representation E as inputs and the dependency syntax tree of the target language as a training target, predicting its parent node for each subword in parallel. However, unlike the architecture in the above-mentioned document, the present embodiment removes the post-processing part using the minimum spanning tree, and uses a transform encoder instead of the LSTM encoder. Processing the parsing result of the dependency parser using the minimum spanning tree may obtain a higher quality dependency syntax tree, but brings a huge time overhead, so this embodiment removes this module. Meanwhile, the encoding capability of the transform encoder is stronger than that of the LSTM, so the embodiment uses a 4-layer transform encoder layer instead of the LSTM encoder layer in the literature to extract the dependency information. The joint training NMT task and the dependency syntax tree prediction task can not only generate the cooperative relationship of the target sequence, but also regularize the representation of the encoder, and further improve the translation performance.
In the step of converting the dependency syntax tree of the target language sequence into a co-relationship matrix, for a given target language the dependency syntax tree t, t of the reference translation is assignediRepresenting the parent node subscript of the ith node, the present embodiment converts the dependency syntax tree of the target language sequence into the co-relationship matrix of the target language sequence using formula (3), where formula (3) is as follows:
Figure BDA0003026022270000081
in the formula (3), AkjRepresenting the synergetic relationship between the kth subword and the jth subword. k denotes an index of the target language sequence, k ═ 1, 2., n }, j denotes an index of the target language sequence, and j ═ 1, 2., n }.
Intuitively, the present invention considers: (1) nodes in parent-child relationship have a cooperative relationship. (2) Each node and itself have a cooperative relationship. As shown in fig. 3, a dependency syntax tree and a corresponding collaborative relationship matrix of the sentence "I have a cat.
And step four, integrating the cooperative relationship matrix of the target language sequence into a decoder in the non-autoregressive neural machine translation model, and decoding the input of the decoder by the decoder to obtain the target language sequence as a translation result.
In this embodiment, in order to integrate the synergy relationship matrix of the target language sequence into the decoding process of the NAT model, according to the following documents Peter Shaw, Jakob Uszkoreit, and Ashish vaswani.2018. Self-attributes with relative position representation, in Proceedings of the 2018 Conference of the North American header of the Association for the general knowledge: in the self-attention component based on relative position proposed in Human Language Technologies, Volume 2(Short Papers), pages 464-. The calculation method of the self-attention layer based on the cooperation matrix is shown as the formula (4):
Figure BDA0003026022270000091
in formula (4), k denotes an index of the target language sequence, k ═ 1, 2.. multidata, n }, j denotes an index of the target language sequence, j ═ 1, 2.. multidata, n },
Figure BDA0003026022270000092
respectively, a representation of the collaborative relationship between the kth word and the jth word, dkAnd djDecoder inputs representing the k-th and j-th positions, respectively, dmodelRepresenting the size of the model, alphakjRepresenting the degree of association of the kth sub-word with the jth sub-word, hk representing the hidden layer state of the kth sub-word, Wv、 WQAnd WKAre trainable parameters, the remaining N-1 layers in the decoder and the Transformer use the same architecture.
In training, the present invention uses the course learning strategy proposed in GLAT (Ashish Vaswani, Noam Shazer, Niki Parmar, Jakob Uszkorit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin, 2017.Attention all you need in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems2017, December4-9, 2017, Long Beach, CA, USA, pages 5998-. Specifically, it performs a two-stage decoding process, in the first stage, the co-relationship predictor predicts the dependent syntax tree t ' according to the decoder input D and the source-side representation E, calculates the quality of the predicted dependent syntax tree t ', and mixes the reference translation Y with the decoder input D according to the quality to obtain a new vector representation D '; in the second phase, the predictor predicts the dependency syntax tree from D' and the source-side representation E, calculates the loss, and updates the model.
The invention integrates the cooperative relationship matrix into the decoding process of the NAT model by using the cooperative relationship layer, and predicts the cooperative relationship by using the cooperative relationship predictor, thereby supplementing the cooperative relationship lacking in the NAT model and improving the performance of the NAT model.
In a technical aspect, the dependency syntax tree is used for modeling the word-word coordination relationship in the target sequence and is integrated into the NAT decoding process, so that the obvious performance improvement is brought.
From the application level, the invention obtains the current optimal performance in 3 widely used machine translation data sets (WMT14 end, WMT16 enro, IWSLT deen), proves that the decoding process of the NAT does not have the modeling of the cooperative relationship, and the performance of machine translation can be obviously improved by using the syntax tree to model the cooperative relationship.
The embodiments of the present invention are described only for the preferred embodiments of the present invention, and not for the purpose of limiting the spirit and scope of the present invention, and various modifications and improvements of the technical solutions of the present invention made by those skilled in the art without departing from the design concept of the present invention shall fall within the protection scope of the present invention, and the technical contents of the present invention as claimed are all described in the claims.

Claims (8)

1.通过建模协同关系提高非自回归神经机器翻译质量的方法,首先获取源语言序列对应的源端表示,接着获取目标语言序列的长度,然后以所述源端表示结合目标语言序列的长度构造非自回归神经机器翻译模型中解码器的输入,其特征在于,还包括以下步骤:1. To improve the quality of non-autoregressive neural machine translation by modeling synergistic relationships, first obtain the source representation corresponding to the source language sequence, then obtain the length of the target language sequence, and then combine the source representation with the length of the target language sequence Constructing the input of the decoder in the non-autoregressive neural machine translation model, characterized in that it also includes the following steps: 步骤1、基于源端表示、解码器输入获得目标语言序列的依存句法树,并将所述目标语言序列的依存句法树转换成协同关系矩阵;Step 1, obtaining the dependency syntax tree of the target language sequence based on the source end representation and the decoder input, and converting the dependency syntax tree of the target language sequence into a collaborative relationship matrix; 步骤2、将目标语言序列的协同关系矩阵集成于非自回归神经机器翻译模型中的解码器,由集成有协同关系矩阵的解码器对解码器输入进行解码,得到目标语言序列。Step 2: Integrate the cooperative relationship matrix of the target language sequence into the decoder in the non-autoregressive neural machine translation model, and decode the decoder input by the decoder integrated with the cooperative relationship matrix to obtain the target language sequence. 2.根据权利要求1所述的通过建模协同关系提高非自回归神经机器翻译质量的方法,其特征在于,所述源端表示由编码器对源语言序列编码后得到。2 . The method for improving the quality of non-autoregressive neural machine translation by modeling a synergistic relationship according to claim 1 , wherein the source end representation is obtained by encoding the source language sequence by an encoder. 3 . 3.根据权利要求1所述的通过建模协同关系提高非自回归神经机器翻译质量的方法,其特征在于,通过非自回归神经机器翻译模型中的长度预测器,基于所述源端表示预测获取目标语言序列的长度。3. The method for improving the quality of non-autoregressive neural machine translation by modeling a synergistic relationship according to claim 1, wherein, by the length predictor in the non-autoregressive neural machine translation model, based on the source end representation prediction Get the length of the target language sequence. 4.根据权利要求3所述的通过建模协同关系提高非自回归神经机器翻译质量的方法,其特征在于,所述长度预测器基于源端表示首先预测得到源语言序列和目标语言序列的长度差异,然后根据长度差异和源语言序列的长度得到目标语言序列的长度。4. The method for improving the quality of non-autoregressive neural machine translation by modeling a synergistic relationship according to claim 3, wherein the length predictor first predicts the length of the source language sequence and the target language sequence based on the source end representation difference, and then get the length of the target language sequence based on the length difference and the length of the source language sequence. 5.根据权利要求1所述的通过建模协同关系提高非自回归神经机器翻译质量的方法,其特征在于,步骤1中采用协同关系预测器基于源端表示、解码器输入构建目标语言序列的依存句法树,并由协同关系预测器将目标语言序列的依存句法树转换为目标语言序列的协同关系矩阵。5. The method for improving the quality of non-autoregressive neural machine translation by modeling a synergistic relationship according to claim 1, wherein in step 1, a synergistic relationship predictor is used to construct a target language sequence based on source end representation and decoder input. The dependency syntax tree is converted from the dependency syntax tree of the target language sequence into the collaborative relation matrix of the target language sequence by the collaborative relation predictor. 6.根据权利要求5所述的通过建模协同关系提高非自回归神经机器翻译质量的方法,其特征在于,所述协同关系预测器采用双仿射依存解析器模型,所述双仿射依存解析器模型以所述源端表示、解码器输入作为输入,并以目标语言的依存句法树作为训练目标进行训练,由训练后的双仿射依存解析器模型预测得到目标语言序列的依存句法树并转换为协同关系矩阵。6. The method for improving the quality of non-autoregressive neural machine translation by modeling a synergistic relationship according to claim 5, wherein the synergistic relationship predictor adopts a double affine dependency parser model, and the double affine dependency The parser model takes the source representation and the decoder input as input, and uses the dependency syntax tree of the target language as the training target for training, and predicts the dependency syntax tree of the target language sequence from the trained bi-affine dependency parser model. And converted into a synergy relationship matrix. 7.根据权利要求6所述的通过建模协同关系提高非自回归神经机器翻译质量的方法,其特征在于,所述目标语言的依存句法树是从目标语言参考译文中提取得到的。7 . The method for improving the quality of non-autoregressive neural machine translation by modeling synergistic relationships according to claim 6 , wherein the dependency syntax tree of the target language is extracted from the target language reference translation. 8 . 8.根据权利要求1所述的通过建模协同关系提高非自回归神经机器翻译质量的方法,其特征在于,步骤2中,在非自回归神经机器翻译模型中的解码器中构建协同关系层,所述协同关系层包括基于目标语言序列协同关系矩阵的自注意力层,还包括源端-目标端注意力层和前馈神经网络层,由此实现将目标语言序列的协同关系矩阵集成于所述解码器。8. The method for improving the quality of non-autoregressive neural machine translation by modeling a synergistic relationship according to claim 1, wherein in step 2, a synergistic relationship layer is constructed in the decoder in the non-autoregressive neural machine translation model , the collaborative relationship layer includes a self-attention layer based on the collaborative relationship matrix of the target language sequence, and also includes a source-target attention layer and a feed-forward neural network layer, thereby integrating the collaborative relationship matrix of the target language sequence into the decoder.
CN202110416255.6A 2021-04-19 2021-04-19 Method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship Active CN113095092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110416255.6A CN113095092B (en) 2021-04-19 2021-04-19 Method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110416255.6A CN113095092B (en) 2021-04-19 2021-04-19 Method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship

Publications (2)

Publication Number Publication Date
CN113095092A true CN113095092A (en) 2021-07-09
CN113095092B CN113095092B (en) 2024-05-31

Family

ID=76678402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110416255.6A Active CN113095092B (en) 2021-04-19 2021-04-19 Method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship

Country Status (1)

Country Link
CN (1) CN113095092B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887249A (en) * 2021-09-23 2022-01-04 内蒙古工业大学 A Mongolian-Chinese Neural Machine Translation Method Based on Dependency Syntax Information and Transformer Model
CN114065784A (en) * 2021-11-16 2022-02-18 北京百度网讯科技有限公司 Training method, translation method, device, electronic equipment and storage medium
CN114282552A (en) * 2021-11-16 2022-04-05 北京百度网讯科技有限公司 Training method and device of non-autoregressive translation model

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202017105835U1 (en) * 2016-09-26 2018-01-02 Google Inc. Neural machine translation systems
CN108845994A (en) * 2018-06-07 2018-11-20 南京大学 Utilize the neural machine translation system of external information and the training method of translation system
CN110442878A (en) * 2019-06-19 2019-11-12 腾讯科技(深圳)有限公司 Interpretation method, the training method of Machine Translation Model, device and storage medium
CN110852116A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Non-autoregressive neural machine translation method, device, computer equipment and medium
CN111382582A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 Neural machine translation decoding acceleration method based on non-autoregressive
CN111581988A (en) * 2020-05-09 2020-08-25 浙江大学 A training method and training system for a non-autoregressive machine translation model based on task-level curriculum learning
CN112052692A (en) * 2020-08-12 2020-12-08 内蒙古工业大学 A Mongolian-Chinese neural machine translation method based on grammar supervision and deep reinforcement learning
WO2020253669A1 (en) * 2019-06-19 2020-12-24 腾讯科技(深圳)有限公司 Translation method, apparatus and device based on machine translation model, and storage medium
CN112417901A (en) * 2020-12-03 2021-02-26 内蒙古工业大学 A non-autoregressive Mongolian-Chinese machine translation method based on look-around decoding and lexical attention
CN114611488A (en) * 2022-03-12 2022-06-10 云知声智能科技股份有限公司 Knowledge-enhanced non-autoregressive neural machine translation method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202017105835U1 (en) * 2016-09-26 2018-01-02 Google Inc. Neural machine translation systems
CN108845994A (en) * 2018-06-07 2018-11-20 南京大学 Utilize the neural machine translation system of external information and the training method of translation system
CN110442878A (en) * 2019-06-19 2019-11-12 腾讯科技(深圳)有限公司 Interpretation method, the training method of Machine Translation Model, device and storage medium
WO2020253669A1 (en) * 2019-06-19 2020-12-24 腾讯科技(深圳)有限公司 Translation method, apparatus and device based on machine translation model, and storage medium
CN110852116A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Non-autoregressive neural machine translation method, device, computer equipment and medium
CN111382582A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 Neural machine translation decoding acceleration method based on non-autoregressive
CN111581988A (en) * 2020-05-09 2020-08-25 浙江大学 A training method and training system for a non-autoregressive machine translation model based on task-level curriculum learning
CN112052692A (en) * 2020-08-12 2020-12-08 内蒙古工业大学 A Mongolian-Chinese neural machine translation method based on grammar supervision and deep reinforcement learning
CN112417901A (en) * 2020-12-03 2021-02-26 内蒙古工业大学 A non-autoregressive Mongolian-Chinese machine translation method based on look-around decoding and lexical attention
CN114611488A (en) * 2022-03-12 2022-06-10 云知声智能科技股份有限公司 Knowledge-enhanced non-autoregressive neural machine translation method and device

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
""Non-autoregressive Machine Translation by Modeling Syntactic Dependency Interrelation"", NAACL2022 CONFERENCE, pages 1 - 14 *
YU BAO 等: ""Non-Autoregressive Translation by Learning Target Categorical Codes"", 《ARXIV:2013.11405V [CS.CL]》, pages 1 - 11 *
YU BAO 等: ""Non-Autoregressive Translation by Learning Target Categorical Codes"", pages 1 - 11, Retrieved from the Internet <URL:《https://arxiv.org/abs/2103.11405》> *
ZHUOHAN LI 等: ""Hint-based trainint for non-autoregressive machine translation"", ARXIV, pages 1 - 9 *
冯洋 等: ""神经机器翻译前沿综述"", 《中文信息学报》, vol. 34, no. 07, pages 1 - 18 *
冯洋 等: "神经机器翻译前沿综述", 《中文信息学报》, vol. 34, no. 07, 15 July 2020 (2020-07-15), pages 1 - 18 *
朱相荣 等: ""基于非自回归方法的维汉神经机器翻译"", 《计算机应用》, vol. 40, no. 7, pages 1891 - 1895 *
王星: ""融合结构信息的神经机器翻译模型研究"", 《中国博士学位论文全文数据库信息科技辑》, no. 12, 15 December 2018 (2018-12-15), pages 138 - 141 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887249A (en) * 2021-09-23 2022-01-04 内蒙古工业大学 A Mongolian-Chinese Neural Machine Translation Method Based on Dependency Syntax Information and Transformer Model
CN113887249B (en) * 2021-09-23 2024-07-12 内蒙古工业大学 Mongolian neural machine translation method based on dependency syntax information and transducer model
CN114065784A (en) * 2021-11-16 2022-02-18 北京百度网讯科技有限公司 Training method, translation method, device, electronic equipment and storage medium
CN114282552A (en) * 2021-11-16 2022-04-05 北京百度网讯科技有限公司 Training method and device of non-autoregressive translation model
CN114282552B (en) * 2021-11-16 2022-11-04 北京百度网讯科技有限公司 Training method and device of non-autoregressive translation model
CN114065784B (en) * 2021-11-16 2023-03-10 北京百度网讯科技有限公司 Training method, translation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113095092B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
KR102382499B1 (en) Translation method, target information determination method, related apparatus and storage medium
CN107967262B (en) A kind of neural network illiteracy Chinese machine translation method
CN113095092A (en) Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship
CN111382582B (en) Neural machine translation decoding acceleration method based on non-autoregressive
Chitnis et al. Variable-length word encodings for neural translation models
CN113239710B (en) Multilingual machine translation method, device, electronic equipment and storage medium
Garg et al. Machine translation: a literature review
CN111160050A (en) Chapter-level neural machine translation method based on context memory network
CN110598221A (en) A Method of Improving the Quality of Mongolian-Chinese Translation Using Generative Adversarial Networks to Construct Mongolian-Chinese Parallel Corpus
CN106776548B (en) Text similarity calculation method and device
CN111597327A (en) Public opinion analysis-oriented unsupervised multi-document abstract generation method
CN113468895A (en) Non-autoregressive neural machine translation method based on decoder input enhancement
CN113901847A (en) Neural machine translation method based on source language syntax enhanced decoding
CN110298046B (en) Translation model training method, text translation method and related device
CN117852543A (en) A document-level entity relationship extraction method based on dual-granularity graph
CN112507732A (en) Unsupervised Chinese-transcendental machine translation method integrated into bilingual dictionary
CN115268868A (en) An intelligent source code conversion method based on supervised learning
CN113177113A (en) Task type dialogue model pre-training method, device, equipment and storage medium
CN110728155A (en) Tree-to-sequence-based Mongolian Chinese machine translation method
CN115985298A (en) End-to-end speech translation method based on automatic alignment, mixing and self-training of speech texts
CN111353315B (en) Deep nerve machine translation system based on random residual error algorithm
CN117725432A (en) Text semantic similarity comparison method, device, equipment and readable storage medium
Zhao et al. An efficient character-level neural machine translation
Lei Intelligent Recognition English Translation Model Based on Embedded Machine Learning and Improved GLR Algorithm
Chang et al. Improving language translation using the hidden Markov model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant