CN113095092A - Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship - Google Patents
Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship Download PDFInfo
- Publication number
- CN113095092A CN113095092A CN202110416255.6A CN202110416255A CN113095092A CN 113095092 A CN113095092 A CN 113095092A CN 202110416255 A CN202110416255 A CN 202110416255A CN 113095092 A CN113095092 A CN 113095092A
- Authority
- CN
- China
- Prior art keywords
- target language
- language sequence
- decoder
- neural machine
- syntax tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013519 translation Methods 0.000 title claims abstract description 96
- 230000001537 neural effect Effects 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000002195 synergetic effect Effects 0.000 title claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims description 13
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 10
- 230000009977 dual effect Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 5
- 230000010354 integration Effects 0.000 claims 1
- 230000014616 translation Effects 0.000 description 80
- 230000000875 corresponding effect Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000010365 information processing Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for improving translation quality of a non-autoregressive neural machine through modeling synergistic relationship, which comprises the steps of constructing the input of a decoder in a non-autoregressive neural machine translation model by combining source end representation with the length of a target language sequence, obtaining a synergistic relationship matrix of the target language sequence by combining a dependency syntax tree, the source end representation and the decoder input, and finally integrating the synergistic relationship matrix of the target language sequence in the decoder in the non-autoregressive neural machine translation model. The method models the cooperative relationship between words in the target sequence through the dependency syntax tree, and obviously improves the translation quality while considering the dependency relationship.
Description
Technical Field
The invention relates to the field of neural machine translation models, in particular to a method for improving the translation quality of a non-autoregressive neural machine through modeling synergistic relationship.
Background
With the trend of economic globalization, communication and cooperation between the international countries become more frequent. The manual translation depending on the translator needs to consume huge manpower and financial resources, and cannot meet the increasing translation requirements, so that the machine translation is carried forward. Machine translation, as the name implies, refers to the process of converting a source language into a target language semantically equivalent thereto using computer technology.
Thanks to the improvement of computer computing power and the development of deep learning research, a Neural Machine Translation (NMT) model based on a deep Neural network occupies the leading position of Machine Translation research. The neural machine translation model adopts a coder-decoder framework, obtains excellent translation performance, and is widely applied. Specifically, given a source language sentence X ═ { X ═ X1,x2,…,xmIn which xiRepresenting the ith subword in the source language sentence, i ═ {1,2, …, m }, the NMT model first encodes it using an encoder into a source-side representation E ═ { E ═ E }1,e2,…,emIn which eiRepresenting the semantic representation corresponding to the ith subword in the source language sentence, i ═ {1,2, …, m }, and then decoding through a decoder to obtain the translation Y ═ { Y } of the target language1,y2,…,ynIn which y isjWhich represents the jth subword in the target language sentence, j ═ 1,2, …, n. NMT models can be classified into two categories according to the way decoders work: the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are translation principles of the autoregressive neural machine translation model and the non-autoregressive neural machine translation model respectively as shown in fig. 1a and 1b, source language sentences input by the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are both 'I love China', and target language translation results of the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are both Chinese.
In autoregressive neural machine translation models, classical, e.g. Transformer, modelType (Ashish Vawani, Noam Shazer, Niki Parmar, Jakob Uszkoroiit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin.2017.Attention is all you need in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems2017, Decumber 4-9, 2017, Long Beach, CA, USA, pages 5998. 6008.), RNN model (Zarea, Wojciech, Yalsatsu Sutskey, and Oriol Vinyakus, "Regourent process" network prediction ", Xylarrar 1241, Australian coding model, and coding model, all of the coding model of the coding, generated from left to right in fig. 1 a): for prediction at time t, the decoder uses the output Y before time tt={y1,y2,...,yt-1Where y isjRepresents the jth subword in the target language sentence, j ═ 1, 2., t-1, and the source end encoded by the encoder in combination with the attention mechanism represents the E-predicted target word y't. The Transformer model can achieve excellent performance on a plurality of translation data sets, but the autoregressive decoding process adopted by the Transformer model has the following problems: 1) there is an exposure bias problem: the historical information during training is taken from the reference translation, and the historical information during testing can only be obtained from the prediction of the model, so that the problem of inconsistency between training and testing is caused, and the performance is reduced. 2) The translation efficiency is low: the serial working mode of the autoregressive model cannot utilize the high parallel characteristic of GPU hardware during testing, the prediction time of the autoregressive model is positively correlated with the sequence length, and the translation speed is low when a long sentence is translated.
Unlike autoregressive neural Machine Translation models, Non-autoregressive neural Machine Translation models (NAT) assumeThe words in the target language sequence are independent of one another. The NAT model generates words in the target language sequence in parallel: after the source end representation E is obtained by the coder, the length n of the target language sequence is obtained by the length predictor, and the input D of the decoder is constructed by the length predictor and is { D ═ D }1,d2,...,dnIn which d isjWhich indicates the decoder input corresponding to the j-th position, j ═ 1, 2.. times, n, and then the corresponding word is predicted by the decoder at the same time. By removing the dependence on historical information through the independence assumption, the NAT model relieves the exposure bias problem in the autoregressive translation besides having extremely high translation efficiency. However, performance is far behind that of the autoregressive machine translation model because there is no explicit dependency between the outputs, making it difficult for predictions from different locations of the NAT model to be coordinated to produce consistent translations. Furthermore, the multimodality phenomenon (i.e., one source language sentence has multiple correct target language sentences corresponding to it) of the translation task deepens the problem, resulting in lower final translation performance.
Around the problem of dependency loss in non-autoregressive neural machine translation models, there are two types of solutions: one class of schemes directly model the dependency relationship between words in a target sequence; another type of scheme selects to introduce hidden variables to implicitly model the missing dependencies.
The scheme of direct modeling of the dependency relationship adopts a training strategy similar to an autoregressive model. The researcher proposes to take part of the words in the reference translation Y as input and train the decoder to predict the words not provided, thereby modeling the dependency between the provided part of the words and the rest of the words. This scheme significantly improves the performance of non-autoregressive machine translation in conjunction with iterative decoding strategies (Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, and Lei Li.2020. Glancing translation for non-autoregressive neural translation. arXiv prediction arXiv: 2008.07905; Marjan Ghazing jad, Omer Levy, Yinhan Liu, and Luke Zettle layer.2019. Mass-prediction: Parallel decoding of connected texture regulation. in Processing of the Conference on the Natural translation and Processing of the family, while the method of the present invention is applied to the family of the family members, the family members 6112, the family members for the family of the family members, and the family members for the family members, and the family members, the family. However, this solution has the following problems: 1) partial words using the reference sequence are used as input in the training stage, and the predicted words are decoded or provided in the testing stage at the same time, so that the problem of exposure deviation still exists, and the prediction performance of the model is reduced; 2) the decoding algorithm of multiple iterations results in reduced model efficiency.
The scheme based on the hidden variables utilizes a deep neural network to code the dependency information of the target sequence into the hidden variables, and then a model is trained to model the hidden variables. Modeling of hidden variables can be an intermediate process to non-autoregressive modeling. Specifically, hidden variable-based schemes first predict the hidden variables either auto-regressively or non-auto-regressively, followed by prediction of the target sequence (documents Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Edurard Hovy.2019. FlowSeq: non-innovative comparative sequence generation with genetic flow. in Processing of the2019Conference on Empirical Methods in Natural Language Processing and 9th International journal Conference Natural Language Processing (EMNLPICNLP), pages 4282-. However, modeling of hidden variables relies on complex deep neural networks, even though the translation of the model is less efficient and often less interpretable.
Disclosure of Invention
The invention aims to provide a method for improving the quality of a non-autoregressive neural machine translation model through a modeling synergistic relationship, so as to solve the problem that the translation performance is reduced due to the fact that the existing non-autoregressive neural machine translation model lacks an explicit dependency relationship modeling.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the method for improving the translation quality of the non-autoregressive neural machine through modeling cooperative relationship comprises the following steps of firstly obtaining source end representation corresponding to a source language sequence, then obtaining the length of a target language sequence, and then constructing the input of a decoder in a non-autoregressive neural machine translation model by combining the source end representation with the length of the target language sequence, wherein the method further comprises the following steps:
step 1, obtaining a dependency syntax tree of a target language sequence based on source end representation and decoder input, and converting the dependency syntax tree of the target language sequence into a collaborative relationship matrix.
And 2, integrating the cooperative relationship matrix of the target language sequence into a decoder in the non-autoregressive neural machine translation model, and decoding the input of the decoder by using the decoder integrated with the cooperative relationship matrix to obtain the target language sequence.
In the method for improving the translation quality of the non-autoregressive neural machine through the modeling synergistic relationship, the source end expression is obtained by encoding a source language sequence by an encoder.
In the method for improving the translation quality of the non-autoregressive neural machine through the modeling synergistic relationship, the length of the target language sequence is obtained through a length predictor in a non-autoregressive neural machine translation model based on the source end representation prediction. The length predictor firstly predicts the length difference between the source language sequence and the target language sequence based on the source end expression, and then obtains the length of the target language sequence according to the length difference and the length of the source language sequence.
In the method for improving the translation quality of the non-autoregressive neural machine through modeling the cooperative relationship, in step 1, a cooperative relationship predictor is adopted to construct a dependency syntax tree of a target language sequence based on source end representation and decoder input, and the dependency syntax tree of the target language sequence is converted into a cooperative relationship matrix of the target language sequence.
The cooperative relationship predictor predicts a dependency syntax tree by adopting a double affine dependency parser model, the double-emulation dependency parser model takes the source end representation and the decoder input as input, the dependency syntax tree of the target language is used as a training target for training, the dependency syntax tree of the target language sequence is obtained by predicting the trained double affine dependency parser model, and then the cooperative relationship predictor converts the dependency syntax tree of the target sequence into a cooperative relationship matrix. The dependency syntax tree of the target language is extracted from the reference translation of the target language.
In the step 2 of the method for improving the translation quality of the non-autoregressive neural machine through modeling the cooperative relationship, a cooperative relationship layer is constructed in a decoder in a non-autoregressive neural machine translation model, and the cooperative relationship layer comprises an attention layer based on a target language sequence cooperative relationship matrix, a source end-target end attention layer and a feedforward neural network layer, so that the cooperative relationship matrix of the target language sequence is integrated in the decoder.
Compared with the traditional autoregressive decoding scheme, the neural machine translation model adopting non-autoregressive decoding has extremely high efficiency and is more suitable for the requirements of the industry, but the application of the model is influenced by the lower translation quality. At the very root, the non-autoregressive neural machine translation model lacks explicit modeling of the dependency relationship between target language sequence words, so that the multi-modal phenomenon commonly existing in a machine translation task is difficult to deal with. Existing non-autoregressive neural machine translation studies around modeling dependencies rely either on inefficient rounds of iteration or on complex depth networks. The invention extracts the undirected dependency relationship (namely the cooperative relationship) between words in the target language sequence through the dependency syntax tree, then models the cooperative relationship through a simple cooperative relationship predictor, and improves the translation quality of the non-autoregressive neural machine translation model.
By analyzing the working mode of the non-autoregressive neural machine translation model (NAT), the fact that the parallel prediction of the NAT is actually the target sequence which is predicted cooperatively, and the existing work selection modeling (directed) dependency relationship is not the essential appeal of the NAT. Therefore, the invention provides a method for modeling the cooperative relationship between words in the target language sequence and integrating the cooperative relationship into the decoding process of the NAT.
Compared with the prior art, the invention has the advantages that:
1) the invention firstly provides the cooperative relationship among words in the modeling target sequence, expresses the cooperative relationship into a cooperative relationship matrix, and accordingly constructs a cooperative relationship layer and integrates the cooperative relationship layer into a decoder of an NAT model.
2) The invention provides a method for modeling the cooperative relationship between words in a target sequence by a dependency syntax tree, extracting a cooperative relationship matrix from the dependency syntax tree and integrating the cooperative relationship matrix into the decoding process of NAT. Therefore, the translation quality is obviously improved while the dependency relationship is considered, and the huge value of modeling the cooperative relationship in the NAT is shown.
Drawings
FIG. 1a is a schematic diagram of a prior art autoregressive neural machine translation model.
FIG. 1b is a schematic diagram of a prior art non-autoregressive neural machine translation model.
Fig. 2 is a block flow diagram of an embodiment of the invention.
FIG. 3 is a schematic diagram of a non-autoregressive neural machine translation model as applied to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a decoder integrated co-relationship layer in an embodiment of the invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
In this embodiment, a NAT model system of one german language is given as an example, the input source language is german, the source language sequence is "Ich habe ine katze", the target language desired to be output is english, and the target language sequence desired to be output is "I have a cat". As shown in fig. 2, the method for improving the translation quality of the non-autoregressive neural machine by modeling the cooperative relationship in the embodiment includes the following steps:
step one, setting a source language sequence X as { X ═ X1,x2,...,xmIn which xiRepresents the ith subword in the source language sentence, i ═ 1, 2. The invention adopts an encoder of an autoregressive Transformer model to convert a source language sequence x into { x ═ x1,x2,..,xmInputting the data into an encoder, and encoding the data by the encoder to obtain a corresponding source end tableDenotes E ═ E1,e2,..,emIn which eiAnd representing a semantic representation corresponding to the ith subword in the source language sentence, wherein i is {1, 2.
Secondly, through a length predictor in the NAT model, the length of the target language sequence is obtained through prediction based on the source end expression obtained in the first step, and then the input of a decoder in the non-autoregressive neural machine translation model is constructed by combining the source end expression E and the length of the target language sequence, specifically:
firstly, using a length predictor to predict the length difference delta (L) between the target language sequence and the source language sequence based on the source end representation, and calculating the length n of the target language sequence according to the length difference delta (L), as shown in formula (1):
in formula (1), MLP represents a multi-layered perceptron,is a parameter of the length predictor, mean-posing denotes the average pooling operation, and m denotes the length of the source language sequence.Indicating the difference in length of the target and source language sequences for a given source terminal xProbability distribution of (2). Δ (L) occurs below the argmax function to represent the functionIs returned a value ofCorresponding when taking the maximum value
Then, based on the target language sequence length n and the source end representation E, the input D ═ D of the decoder in the NAT model is constructed1,d2,...,dnIn which d isjThis indicates the decoder input corresponding to the j-th position, j ═ 1, 2.
In formula (2), τ is a hyper-parameter for controlling the sharpness of the softmax function, i denotes a subscript of the source language sequence, i ═ 1,2,.., m }, j denotes a subscript of the target language sequence, j ═ 1,2,.., n }, and e ═ isiAnd representing semantic representations corresponding to the ith sub-word in the source language sentence. w is aijAnd representing the relevance of the ith sub-word to the jth sub-word.
Thirdly, obtaining a dependency syntax tree of the target language sequence based on the source end representation and the decoder input, and converting the dependency syntax tree of the target language sequence into a collaborative relationship matrix, which is specifically described as follows:
the dependency syntax tree clearly defines the grammatical dependency relationship between words in the sentence, and can significantly improve the performance of a non-autoregressive machine translation (NAT) model, so the embodiment first adopts the collaborative relationship predictor to obtain the dependency syntax tree of the target language sequence and converts the dependency syntax tree into the collaborative relationship matrix.
Get dependency syntax tree step, in the training phase, the embodiment first uses an external dependency syntax tree extraction tool (e.g., stanza) to extract the dependency syntax tree of the reference translation. Then, a dual affine dependency parser model is trained to extract the dependency syntax tree for the corresponding reference translation from the decoder input and source representation (literature, memory do and Christopher d. management.2017. deep biaffine entry for neural dependency matching.in 5th International Conference on Learning responses, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track progress. openview. network.). The dual affine dependency parser model is trained with decoder input D and source representation E as inputs, and the dependency syntax tree of the target language as a training target. In the testing stage, the dependency syntax tree of the target language sequence is obtained through the trained dual affine dependency parser model prediction.
The present embodiment parses the reference translation of the target language using an external dependency syntax tree extraction tool (stanza). It is noted that the processing units of the dependency syntax tree extraction tool are words, while the processing units of the NMT model are subwords, and thus it is necessary to convert word-level dependency syntax trees into subword-level dependency syntax trees. Suppose a word yjIs decomposed into three subwords y1j,y2j,y3jThen the first subword y1jIs tjThe remaining subwords { y2j,y3jThe parent node of is y1j。
This example predicts the dependency syntax tree for a target language sequence using the dual affine dependency parser model proposed in the literature Timothy Donat and Christoph D.Manning.2017.deep biaffine orientation for neural dependency addressing.In 5th International Conference on Learning Repressions, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track proceedings. The dual affine dependency parser model takes decoder input D and source representation E as inputs and the dependency syntax tree of the target language as a training target, predicting its parent node for each subword in parallel. However, unlike the architecture in the above-mentioned document, the present embodiment removes the post-processing part using the minimum spanning tree, and uses a transform encoder instead of the LSTM encoder. Processing the parsing result of the dependency parser using the minimum spanning tree may obtain a higher quality dependency syntax tree, but brings a huge time overhead, so this embodiment removes this module. Meanwhile, the encoding capability of the transform encoder is stronger than that of the LSTM, so the embodiment uses a 4-layer transform encoder layer instead of the LSTM encoder layer in the literature to extract the dependency information. The joint training NMT task and the dependency syntax tree prediction task can not only generate the cooperative relationship of the target sequence, but also regularize the representation of the encoder, and further improve the translation performance.
In the step of converting the dependency syntax tree of the target language sequence into a co-relationship matrix, for a given target language the dependency syntax tree t, t of the reference translation is assignediRepresenting the parent node subscript of the ith node, the present embodiment converts the dependency syntax tree of the target language sequence into the co-relationship matrix of the target language sequence using formula (3), where formula (3) is as follows:
in the formula (3), AkjRepresenting the synergetic relationship between the kth subword and the jth subword. k denotes an index of the target language sequence, k ═ 1, 2., n }, j denotes an index of the target language sequence, and j ═ 1, 2., n }.
Intuitively, the present invention considers: (1) nodes in parent-child relationship have a cooperative relationship. (2) Each node and itself have a cooperative relationship. As shown in fig. 3, a dependency syntax tree and a corresponding collaborative relationship matrix of the sentence "I have a cat.
And step four, integrating the cooperative relationship matrix of the target language sequence into a decoder in the non-autoregressive neural machine translation model, and decoding the input of the decoder by the decoder to obtain the target language sequence as a translation result.
In this embodiment, in order to integrate the synergy relationship matrix of the target language sequence into the decoding process of the NAT model, according to the following documents Peter Shaw, Jakob Uszkoreit, and Ashish vaswani.2018. Self-attributes with relative position representation, in Proceedings of the 2018 Conference of the North American header of the Association for the general knowledge: in the self-attention component based on relative position proposed in Human Language Technologies, Volume 2(Short Papers), pages 464-. The calculation method of the self-attention layer based on the cooperation matrix is shown as the formula (4):
in formula (4), k denotes an index of the target language sequence, k ═ 1, 2.. multidata, n }, j denotes an index of the target language sequence, j ═ 1, 2.. multidata, n },respectively, a representation of the collaborative relationship between the kth word and the jth word, dkAnd djDecoder inputs representing the k-th and j-th positions, respectively, dmodelRepresenting the size of the model, alphakjRepresenting the degree of association of the kth sub-word with the jth sub-word, hk representing the hidden layer state of the kth sub-word, Wv、 WQAnd WKAre trainable parameters, the remaining N-1 layers in the decoder and the Transformer use the same architecture.
In training, the present invention uses the course learning strategy proposed in GLAT (Ashish Vaswani, Noam Shazer, Niki Parmar, Jakob Uszkorit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin, 2017.Attention all you need in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems2017, December4-9, 2017, Long Beach, CA, USA, pages 5998-. Specifically, it performs a two-stage decoding process, in the first stage, the co-relationship predictor predicts the dependent syntax tree t ' according to the decoder input D and the source-side representation E, calculates the quality of the predicted dependent syntax tree t ', and mixes the reference translation Y with the decoder input D according to the quality to obtain a new vector representation D '; in the second phase, the predictor predicts the dependency syntax tree from D' and the source-side representation E, calculates the loss, and updates the model.
The invention integrates the cooperative relationship matrix into the decoding process of the NAT model by using the cooperative relationship layer, and predicts the cooperative relationship by using the cooperative relationship predictor, thereby supplementing the cooperative relationship lacking in the NAT model and improving the performance of the NAT model.
In a technical aspect, the dependency syntax tree is used for modeling the word-word coordination relationship in the target sequence and is integrated into the NAT decoding process, so that the obvious performance improvement is brought.
From the application level, the invention obtains the current optimal performance in 3 widely used machine translation data sets (WMT14 end, WMT16 enro, IWSLT deen), proves that the decoding process of the NAT does not have the modeling of the cooperative relationship, and the performance of machine translation can be obviously improved by using the syntax tree to model the cooperative relationship.
The embodiments of the present invention are described only for the preferred embodiments of the present invention, and not for the purpose of limiting the spirit and scope of the present invention, and various modifications and improvements of the technical solutions of the present invention made by those skilled in the art without departing from the design concept of the present invention shall fall within the protection scope of the present invention, and the technical contents of the present invention as claimed are all described in the claims.
Claims (8)
1. The method for improving the translation quality of the non-autoregressive neural machine through modeling cooperative relationship comprises the steps of firstly obtaining source end representation corresponding to a source language sequence, then obtaining the length of a target language sequence, and then constructing the input of a decoder in a non-autoregressive neural machine translation model by combining the source end representation with the length of the target language sequence, and is characterized by further comprising the following steps:
step 1, obtaining a dependency syntax tree of a target language sequence based on source end representation and decoder input, and converting the dependency syntax tree of the target language sequence into a collaborative relationship matrix;
and 2, integrating the cooperative relationship matrix of the target language sequence into a decoder in the non-autoregressive neural machine translation model, and decoding the input of the decoder by using the decoder integrated with the cooperative relationship matrix to obtain the target language sequence.
2. The method for improving non-autoregressive neural machine translation quality by modeling synergistic relationships of claim 1, wherein the source representation is obtained by an encoder encoding a source language sequence.
3. The method of improving non-autoregressive neural machine translation quality by modeling synergistic relationships of claim 1, wherein the length of the derived target language sequence is predicted based on the source representation by a length predictor in a non-autoregressive neural machine translation model.
4. The method of improving non-autoregressive neural machine translation quality by modeling synergistic relationships of claim 3, wherein the length predictor first predicts the length difference between the source language sequence and the target language sequence based on source end representation and then obtains the length of the target language sequence from the length difference and the length of the source language sequence.
5. The method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship as claimed in claim 1, wherein in step 1, a dependency syntax tree of the target language sequence is constructed by using a synergistic relationship predictor based on source end representation and decoder input, and the dependency syntax tree of the target language sequence is converted into a synergistic relationship matrix of the target language sequence by using the synergistic relationship predictor.
6. The method of claim 5, wherein the co-relationship predictor employs a dual affine-dependent parser model, the dual affine-dependent parser model takes the source representation and the decoder input as input, and takes the dependency syntax tree of the target language as a training target for training, and the trained dual affine-dependent parser model predicts the dependency syntax tree of the target language sequence and converts the dependency syntax tree into the co-relationship matrix.
7. The method for improving non-autoregressive neural machine translation quality by modeling synergistic relationships according to claim 6, wherein the dependency syntax tree of the target language is extracted from the reference translation of the target language.
8. The method according to claim 1, wherein in step 2, a cooperative relationship layer is constructed in a decoder in the non-autoregressive neural machine translation model, and the cooperative relationship layer comprises an attention layer based on a target language sequence cooperative relationship matrix, a source-target attention layer and a feedforward neural network layer, thereby realizing integration of the target language sequence cooperative relationship matrix into the decoder.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110416255.6A CN113095092B (en) | 2021-04-19 | 2021-04-19 | Method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110416255.6A CN113095092B (en) | 2021-04-19 | 2021-04-19 | Method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113095092A true CN113095092A (en) | 2021-07-09 |
CN113095092B CN113095092B (en) | 2024-05-31 |
Family
ID=76678402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110416255.6A Active CN113095092B (en) | 2021-04-19 | 2021-04-19 | Method for improving non-autoregressive neural machine translation quality through modeling synergistic relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113095092B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113887249A (en) * | 2021-09-23 | 2022-01-04 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on dependency syntax information and Transformer model |
CN114065784A (en) * | 2021-11-16 | 2022-02-18 | 北京百度网讯科技有限公司 | Training method, translation method, device, electronic equipment and storage medium |
CN114282552A (en) * | 2021-11-16 | 2022-04-05 | 北京百度网讯科技有限公司 | Training method and device of non-autoregressive translation model |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE202017105835U1 (en) * | 2016-09-26 | 2018-01-02 | Google Inc. | Neural machine translation systems |
CN108845994A (en) * | 2018-06-07 | 2018-11-20 | 南京大学 | Utilize the neural machine translation system of external information and the training method of translation system |
CN110442878A (en) * | 2019-06-19 | 2019-11-12 | 腾讯科技(深圳)有限公司 | Interpretation method, the training method of Machine Translation Model, device and storage medium |
CN110852116A (en) * | 2019-11-07 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Non-autoregressive neural machine translation method, device, computer equipment and medium |
CN111382582A (en) * | 2020-01-21 | 2020-07-07 | 沈阳雅译网络技术有限公司 | Neural machine translation decoding acceleration method based on non-autoregressive |
CN111581988A (en) * | 2020-05-09 | 2020-08-25 | 浙江大学 | Training method and training system of non-autoregressive machine translation model based on task level course learning |
CN112052692A (en) * | 2020-08-12 | 2020-12-08 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning |
WO2020253669A1 (en) * | 2019-06-19 | 2020-12-24 | 腾讯科技(深圳)有限公司 | Translation method, apparatus and device based on machine translation model, and storage medium |
CN112417901A (en) * | 2020-12-03 | 2021-02-26 | 内蒙古工业大学 | Non-autoregressive Mongolian machine translation method based on look-around decoding and vocabulary attention |
CN114611488A (en) * | 2022-03-12 | 2022-06-10 | 云知声智能科技股份有限公司 | Knowledge-enhanced non-autoregressive neural machine translation method and device |
-
2021
- 2021-04-19 CN CN202110416255.6A patent/CN113095092B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE202017105835U1 (en) * | 2016-09-26 | 2018-01-02 | Google Inc. | Neural machine translation systems |
CN108845994A (en) * | 2018-06-07 | 2018-11-20 | 南京大学 | Utilize the neural machine translation system of external information and the training method of translation system |
CN110442878A (en) * | 2019-06-19 | 2019-11-12 | 腾讯科技(深圳)有限公司 | Interpretation method, the training method of Machine Translation Model, device and storage medium |
WO2020253669A1 (en) * | 2019-06-19 | 2020-12-24 | 腾讯科技(深圳)有限公司 | Translation method, apparatus and device based on machine translation model, and storage medium |
CN110852116A (en) * | 2019-11-07 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Non-autoregressive neural machine translation method, device, computer equipment and medium |
CN111382582A (en) * | 2020-01-21 | 2020-07-07 | 沈阳雅译网络技术有限公司 | Neural machine translation decoding acceleration method based on non-autoregressive |
CN111581988A (en) * | 2020-05-09 | 2020-08-25 | 浙江大学 | Training method and training system of non-autoregressive machine translation model based on task level course learning |
CN112052692A (en) * | 2020-08-12 | 2020-12-08 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning |
CN112417901A (en) * | 2020-12-03 | 2021-02-26 | 内蒙古工业大学 | Non-autoregressive Mongolian machine translation method based on look-around decoding and vocabulary attention |
CN114611488A (en) * | 2022-03-12 | 2022-06-10 | 云知声智能科技股份有限公司 | Knowledge-enhanced non-autoregressive neural machine translation method and device |
Non-Patent Citations (8)
Title |
---|
""Non-autoregressive Machine Translation by Modeling Syntactic Dependency Interrelation"", NAACL2022 CONFERENCE, pages 1 - 14 * |
YU BAO 等: ""Non-Autoregressive Translation by Learning Target Categorical Codes"", 《ARXIV:2013.11405V [CS.CL]》, pages 1 - 11 * |
YU BAO 等: ""Non-Autoregressive Translation by Learning Target Categorical Codes"", pages 1 - 11, Retrieved from the Internet <URL:《https://arxiv.org/abs/2103.11405》> * |
ZHUOHAN LI 等: ""Hint-based trainint for non-autoregressive machine translation"", ARXIV, pages 1 - 9 * |
冯洋 等: ""神经机器翻译前沿综述"", 《中文信息学报》, vol. 34, no. 07, pages 1 - 18 * |
冯洋 等: "神经机器翻译前沿综述", 《中文信息学报》, vol. 34, no. 07, 15 July 2020 (2020-07-15), pages 1 - 18 * |
朱相荣 等: ""基于非自回归方法的维汉神经机器翻译"", 《计算机应用》, vol. 40, no. 7, pages 1891 - 1895 * |
王星: ""融合结构信息的神经机器翻译模型研究"", 《中国博士学位论文全文数据库信息科技辑》, no. 12, 15 December 2018 (2018-12-15), pages 138 - 141 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113887249A (en) * | 2021-09-23 | 2022-01-04 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on dependency syntax information and Transformer model |
CN113887249B (en) * | 2021-09-23 | 2024-07-12 | 内蒙古工业大学 | Mongolian neural machine translation method based on dependency syntax information and transducer model |
CN114065784A (en) * | 2021-11-16 | 2022-02-18 | 北京百度网讯科技有限公司 | Training method, translation method, device, electronic equipment and storage medium |
CN114282552A (en) * | 2021-11-16 | 2022-04-05 | 北京百度网讯科技有限公司 | Training method and device of non-autoregressive translation model |
CN114282552B (en) * | 2021-11-16 | 2022-11-04 | 北京百度网讯科技有限公司 | Training method and device of non-autoregressive translation model |
CN114065784B (en) * | 2021-11-16 | 2023-03-10 | 北京百度网讯科技有限公司 | Training method, translation method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113095092B (en) | 2024-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113095092A (en) | Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship | |
CN107967262A (en) | A kind of neutral net covers Chinese machine translation method | |
Zhang et al. | Deep Neural Networks in Machine Translation: An Overview. | |
CN110059324B (en) | Neural network machine translation method and device based on dependency information supervision | |
Chitnis et al. | Variable-length word encodings for neural translation models | |
JP5128629B2 (en) | Part-of-speech tagging system, part-of-speech tagging model training apparatus and method | |
CN106776548B (en) | Text similarity calculation method and device | |
CN111597327B (en) | Public opinion analysis-oriented unsupervised multi-document abstract generation method | |
Garg et al. | Machine translation: a literature review | |
CN108491372B (en) | Chinese word segmentation method based on seq2seq model | |
CN113239710B (en) | Multilingual machine translation method, device, electronic equipment and storage medium | |
CN111858932A (en) | Multiple-feature Chinese and English emotion classification method and system based on Transformer | |
CN109062897A (en) | Sentence alignment method based on deep neural network | |
CN109062910A (en) | Sentence alignment method based on deep neural network | |
CN113378547B (en) | GCN-based Chinese complex sentence implicit relation analysis method and device | |
CN113065358A (en) | Text-to-semantic matching method based on multi-granularity alignment for bank consultation service | |
CN112507732A (en) | Unsupervised Chinese-transcendental machine translation method integrated into bilingual dictionary | |
CN116129902A (en) | Cross-modal alignment-based voice translation method and system | |
CN112580370A (en) | Mongolian Chinese neural machine translation method fusing semantic knowledge | |
CN114168754A (en) | Relation extraction method based on syntactic dependency and fusion information | |
CN110298046B (en) | Translation model training method, text translation method and related device | |
CN112257468A (en) | Method for improving translation performance of multi-language neural machine | |
CN110728155A (en) | Tree-to-sequence-based Mongolian Chinese machine translation method | |
CN113177113A (en) | Task type dialogue model pre-training method, device, equipment and storage medium | |
CN117725432A (en) | Text semantic similarity comparison method, device, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |