CN113378584B - Non-autoregressive neural machine translation method based on auxiliary representation fusion - Google Patents

Non-autoregressive neural machine translation method based on auxiliary representation fusion Download PDF

Info

Publication number
CN113378584B
CN113378584B CN202110592517.4A CN202110592517A CN113378584B CN 113378584 B CN113378584 B CN 113378584B CN 202110592517 A CN202110592517 A CN 202110592517A CN 113378584 B CN113378584 B CN 113378584B
Authority
CN
China
Prior art keywords
machine translation
autoregressive
model
neural machine
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110592517.4A
Other languages
Chinese (zh)
Other versions
CN113378584A (en
Inventor
杜权
刘兴宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Yayi Network Technology Co ltd
Original Assignee
Shenyang Yayi Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Yayi Network Technology Co ltd filed Critical Shenyang Yayi Network Technology Co ltd
Priority to CN202110592517.4A priority Critical patent/CN113378584B/en
Publication of CN113378584A publication Critical patent/CN113378584A/en
Application granted granted Critical
Publication of CN113378584B publication Critical patent/CN113378584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a non-autoregressive neural machine translation method based on auxiliary representation fusion, which comprises the following steps: constructing an autoregressive neural machine translation model; constructing training parallel corpus, and training a model with only one layer of decoder; constructing a non-autoregressive neural machine translation model; the output of the feedforward neural network at the top layer of the autoregressive neural machine translation model decoder is subjected to weighted fusion with the top layer representation of the non-autoregressive neural machine translation model encoder, and the weighted fusion is used as the input of the non-autoregressive neural machine translation model decoder; the encoder extracts the source sentence sub-information, and the decoder predicts the corresponding target sentence sub-according to the source sentence sub-information; training a non-autoregressive neural machine translation model; and sending the source sentence into a non-autoregressive neural machine translation model, and decoding translation results with different lengths. The invention combines the advantages of an autoregressive model and a non-autoregressive model, and can obtain 7-9 times of speed improvement under the condition of losing smaller performance.

Description

Non-autoregressive neural machine translation method based on auxiliary representation fusion
Technical Field
The invention relates to a neural machine translation inference acceleration method, in particular to a non-autoregressive neural machine translation method based on auxiliary representation fusion.
Background
Machine translation is a technique of translating one natural language into another. Machine translation is a branch of natural language processing, is one of the ultimate targets of artificial intelligence, and has important scientific research value. Meanwhile, with the rapid development of internet technology, the machine translation technology plays an increasingly important role in daily life and work of people.
The machine translation technology is developed for years from the method based on rules in the 70 th century, the method based on examples in the 80 th century and the method based on statistics in the 90 th year to the method based on the neural network, finally achieves good effects, and is widely used in daily life of people.
The most widely used neural machine translation systems currently employ end-to-end encoder-decoder frameworks based on neural networks, where the most powerful is the transducer model structure based on self-attention mechanisms, achieving optimal translation performance across multiple languages. The transducer consists of an encoder and a decoder based on a self-attention mechanism. A standard Transformer encoder consists of six stacked encoding layers, the decoder also comprising six decoding layers. The traditional RNN and CNN were discarded from the whole model, consisting entirely of the attention mechanism. More precisely, the transducer consists of and only of the attention mechanism and the feed forward neural network. Compared with RNNs, the method has the advantages that the limitation of sequential calculation only is abandoned by a transducer, and the parallelism capability of the system is improved. Meanwhile, due to the processing mode of parallel computing, the phenomenon that long-term dependence is difficult to process in sequential computing is also relieved. The coding layer of the transducer comprises a self-attention layer and a feedforward neural network. The sentences which are output from the attention coding end and are expressed by dense vectors are sent into a feedforward neural network after feature extraction. The decoder models the mapping relationship between the source and target languages by adding an encoding-decoding attention layer between the self-attention layer and the feedforward neural network layer relative to the decoder.
Neural network-based machine translation systems have made significant advances in performance over previously-focused, statistical-based translation systems. But because neural networks involve a large number of matrix operations, training and decoding can be more time consuming than previous approaches. For both of these time consuming aspects, in practice the time consumption for decoding tends to be more important. In order for a neuromotor translation system to be practical, it must be required that the system has a high response speed in the decoding process, otherwise it is difficult for the user to accept in many scenarios even though the translation system has more excellent performance.
Most machine translation models are currently implemented using an encoder-decoder framework, where the encoder feeds a representation of the source sentence to the decoder to generate the target sentence; the decoder typically works in an autoregressive manner to generate target sentences from left to right, word by word, the generation of the t-th target word being dependent on the t-1 target words previously generated. The autoregressive decoding mode accords with the habit of reading and generating sentences, and can effectively capture the distribution situation of real translation. Each step of the decoder must run sequentially rather than in parallel and thus autoregressive decoding prevents architectures such as transformers from fully exploiting the performance advantages of their training in the reasoning process.
To mitigate the inference delay, a non-autoregressive translation model is proposed that uses replicated source inputs to initialize decoder inputs in a left-to-right fashion and independently generates all target words simultaneously. However, while accelerating, the NAT model has to process the translation task with weak target information at its decoder, thereby reducing the accuracy of the translation.
Disclosure of Invention
Aiming at the problem of reduced translation quality caused by weak target end information in a non-autoregressive machine translation model, the invention provides a non-autoregressive neural machine translation method based on auxiliary representation fusion, which can enable the non-autoregressive machine translation to obtain the performance equivalent to that of autoregressive machine translation, has higher response speed and better practical application.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a non-autoregressive neural machine translation method based on auxiliary representation fusion, which comprises the following steps:
1) Constructing an autoregressive neural machine translation model comprising an encoder and a decoder by adopting a transducer model based on an autoregressive mechanism;
2) Constructing training parallel corpus, performing word segmentation and word segmentation pretreatment to obtain a source language sequence and a target language sequence, generating a machine translation word list, and training a model with only one layer of decoder until the model converges;
3) Removing a matrix of the original decoder shielding future information in the transducer model, and simultaneously adding multi-head position attention between self-attention and coding and decoding attention to construct a non-autoregressive neural machine translation model;
4) The front part word of the source language, which is set according to a specified proportion, is decoded by using a shallow autoregressive model, and the output of a feedforward neural network at the top layer of the autoregressive neural machine translation model decoder is subjected to weighted fusion with the top layer representation of a non-autoregressive neural machine translation model encoder to be used as the input of the non-autoregressive neural machine translation model decoder;
5) Using parallel corpus training to use fusion representation as an input non-autoregressive neural machine translation model, an encoder encodes a source sentence, extracts source sentence sub-information, and a decoder predicts a corresponding target sentence sub-according to the source sentence sub-information; calculating the difference between the predicted data distribution and the real data distribution, and continuously reducing the difference in a back propagation mode until the model converges to complete the training process of the non-autoregressive neural machine translation model;
6) And sending the source sentence input by the user into a non-autoregressive neural machine translation model, decoding translation results with different lengths, and selecting an optimal translation result through evaluation of the autoregressive neural machine translation model.
In the step 3), constructing a non-autoregressive neural machine translation model, which specifically comprises the following steps:
301 Modeling translation problems after removing the matrix of the decoding end for future information shielding:
wherein X is a source language sequence, Y is a target language sequence, T is a target language sequence length, T' is a source language sequence length, T is a target language position, and X 1…T′ For source language sentences, y t The target word is the target word at the t-th position;
302 Adding an additional multi-headed location attention module in each decoder layer, the module being:
wherein Q is a query matrix, K is a key matrix, V is a value matrix, and softmax (DEG) is a normalization functionAttention (& gt) is the Attention calculating function, d k Is the dimension of the key matrix;
303 Before decoding begins, the target length is estimated using the source length and the estimated target length data is sent to a non-autoregressive neural machine translation model to generate all words in parallel.
In step 4), the translation result of the autoregressive model is used to improve the input of the non-autoregressive neural machine translation model, specifically:
401 Input at decoding end of non-autoregressive machine translation model as follows:
wherein θ at And theta nat The parameters of the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are respectively, T is the target language sequence length, T' is the source language sequence length, y t Is the target word at the t-th position, x 1…T′ For source language sentences, y <t For 1 st to t-1 th target words, z nat Input of a decoding end of a non-autoregressive neural machine translation model;
402 Constructing a fusion function, adopting a weighted sum mode, specifically:
Fusion=λDecoder at (y 1…k )+μEncoder nat (x 1…T′ )
where λ and μ are hyper-parameters controlling weights of different representation terms, decoder at () is the output of the autoregressive neural machine translation model decoder, the Encoder nat () output of non-autoregressive neural machine translation model decoder, y 1…k For words 1 to k, x 1…T′ Is a source sentence;
403 Before feeding the above calculated fusion representation to the decoder, the forward layer input and the backward layer gradient are normalized by applying a layer normalization operation.
Z in step 401) nat The calculation formula of (2) is as follows:
z nat =Fusion(Decoder at (y 1…k ),Encoder nat (x 1…T′ ))
wherein the Decoder at (.) is the output of the decoding end of the autoregressive model, the Encoder nat (.) is the output of the decoding end of the non-autoregressive model, fusion (-) is the auxiliary representation Fusion function, y 1…k For words 1 to k, x 1…T′ Is a source sentence.
And 5) in the training process of the non-autoregressive neural machine translation model, parallel corpus is sent into the model to calculate cross entropy loss, and then corresponding gradient is calculated to update parameters so as to complete the training process.
Step 6), the source sentence input by the user is sent to a non-autoregressive neural machine translation model, and a plurality of translation results are obtained by designating different target language lengths; then, an autoregressive neural machine translation model is used as a scoring function for these decoded translation results, and the best overall translation is selected.
The invention has the following beneficial effects and advantages:
1. the invention provides a non-autoregressive neural machine translation method based on auxiliary representation fusion, which is implemented by
The high-level representation encoded in the autoregressive model is introduced into the non-autoregressive model to improve the translation quality of the non-autoregressive model. Combining the advantages of the autoregressive model and the non-autoregressive model, a quick and accurate translation can be achieved.
2. The method uses the fusion expression of the source language and part of the target language as input, greatly relieves the problem of weak target end information of the non-autoregressive model, and effectively improves the performance of the non-autoregressive model.
3. The method has strong expansibility, and can comprise an autoregressive neural machine translation model and a non-autoregressive neural machine translation model by adjusting the proportion of the previous part of words decoded by using the autoregressive neural machine translation model.
Drawings
FIG. 1 is a diagram of a non-autoregressive neural machine translation model based on a fusion representation in accordance with the present invention;
FIG. 2 is a schematic diagram of the structure of the encoding layer and decoding layer in the conventional transducer according to the present invention.
Detailed Description
The invention is further elucidated below in connection with the drawings of the specification.
The invention optimizes the translation performance of the non-autoregressive neural machine translation system from the expression fusion angle, and aims to realize accurate and rapid translation.
The invention provides a non-autoregressive neural machine translation method based on auxiliary representation fusion, which comprises the following steps:
1) Constructing an autoregressive neural machine translation model comprising an encoder and a decoder by adopting a transducer model based on an autoregressive mechanism;
2) Constructing training parallel corpus, carrying out word segmentation and word segmentation preprocessing flow to obtain a source language sequence and a target language sequence, generating a machine translation word list, and training a model with only one layer of decoding end until convergence;
3) Removing a matrix of a decoding end for shielding future information in a Transformer, and simultaneously adding multi-head position attention between self-attention and coding and decoding attention to construct a non-autoregressive machine translation model;
4) The front part of word is decoded by using a shallow autoregressive model, and the output of the feedforward neural network at the topmost layer of the decoding end of the autoregressive machine translation model is subjected to weighted fusion with the top layer representation at the encoding end of the non-autoregressive machine translation model to be used as the input of the decoding end of the non-autoregressive machine translation model;
5) The parallel corpus training is used for using a fusion representation as a non-autoregressive machine translation model of input, an encoder encodes source sentence sub-information is extracted from the source sentence sub-information, and a decoder predicts a corresponding target sentence according to the information. Then calculating the loss of the predicted distribution and the real data distribution, and continuously reducing the loss through back propagation to complete the training process of the model;
6) And sending the source sentence input by the user into a machine translation model, decoding translation results with different lengths, and obtaining an optimal translation result through evaluation of an autoregressive model.
In step 1), the transducer consists of only the attentional mechanism and the feedforward neural network, as shown in fig. 2. The transducer is still based on an encoder-decoder framework, which constitutes an encoder and a decoder, respectively, by stacking a plurality of identical stacks, the sub-layer structures of which are slightly different. The transducer achieves significant performance improvement over multiple data sets of the machine translation task, achieves the best performance at the time, and has a faster training speed. The mechanism of attention is an important component in neural machine translation models. In the original encoder-decoder framework, the neural network has difficulty in learning the corresponding information of the source end and the target end due to the above reasons, and the translation system has poor translation effect on the input long sentences. In the self-attention mechanism, the Query (Query, Q), key (Key, K) and Value (Value, V) come from the same content, firstly, the three matrices are respectively subjected to linear transformation, then the dot product scaling operation is carried out, namely, the Query and the Key are calculated to carry out dot product calculation, and in order to prevent the overlarge calculation result, the dimension of the Key is dividedTo achieve the regulation function, as shown in the following formula:
wherein Q is a query matrix, K is a key matrix, V is a value matrix, softmax (.) is a normalization function, attention (.) is an Attention calculation function, and d k Is the dimension of the key matrix.
In step 2), the autoregressive model operates most time-consuming at the decoder at decoding time. Since there is no reference translation in the decoding stage, the autoregressive neural machine translation model predicts the current target word using the generated sequence, which results in a serious decoding delay. The decoding speed can be greatly improved by using a lightweight autoregressive neural machine translation model.
In the step 3), constructing a non-autoregressive neural machine translation model, which specifically comprises the following steps:
301 Modeling translation problems after removing the matrix of the decoding end for future information shielding:
wherein X is a source language sequence, Y is a target language sequence, T is a target language sequence length, T' is a source language sequence length, T is a target language position, and X 1…T′ For source language sentences, y t The target word is the target word at the t-th position;
302 Adding an additional multi-headed location attention module in each decoder layer, the module being:
wherein Q is a query matrix, K is a key matrix, V is a value matrix, softmax (.) is a normalization function, attention (.) is an Attention calculation function, and d k Hiding the dimensions of the layers for the model;
303 Before decoding begins, the target length is estimated using the source length and the estimated target length data is sent to a non-autoregressive neural machine translation model to generate all words in parallel.
In step 4), the translation result of the autoregressive neural machine translation model is used to reconstruct the input of the non-autoregressive neural machine translation model, specifically:
401 Using a shallow autoregressive model to decode the previous part of words, and carrying out weighted fusion on the output of the feedforward neural network at the topmost layer of the decoding end of the autoregressive machine translation model and the top layer representation of the encoding end of the non-autoregressive machine translation model to serve as the input of the decoding end of the non-autoregressive machine translation model, wherein the weighted fusion is as follows:
401 Input at decoding end of non-autoregressive machine translation model as follows:
wherein θ at And theta nat The parameters of the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are respectively, T is the target language sequence length, T' is the source language sequence length, y t Is the target word at the t-th position, x 1…T′ For source language sentences, y <t For 1 st to t-1 th target words, z nat Input of a decoding end of a non-autoregressive neural machine translation model; z nat The calculation formula of (2) is as follows:
z nat =Fusion(Decoder at (y 1…k ),Encoder nat (x 1…T′ ))
wherein the Decoder at (.) is the output of the decoding end of the autoregressive model, the Encoder nat (.) is the output of the decoding end of the non-autoregressive model, fusion (-) is the auxiliary representation Fusion function, y 1…k For words 1 to k, x 1…T′ Is a source sentence.
402 Constructing a fusion function, adopting a weighted sum mode, specifically:
Fusion=λDecoder at (y 1…k )+μEncoder nat (x 1…T′ )
where λ and μ are hyper-parameters controlling weights of different representation terms, decoder at () is the output of the autoregressive neural machine translation model decoder, the Encoder nat () output of non-autoregressive neural machine translation model decoder, y 1…k For words 1 to k, x 1…T′ Is a source sentence;
403 Before feeding the above calculated fusion representation to the decoder, the forward layer input and the backward layer gradient are normalized by applying a layer normalization operation.
In the step 5), parallel corpus is sent into a model to calculate cross entropy loss in the training process of non-autoregressive neural machine translation, and then corresponding gradient is calculated to update parameters so as to complete the training process.
In step 6), the source sentence input by the user is sent into the model, and a plurality of translation results are obtained by designating different target language lengths; then, using an autoregressive model as a scoring function for the decoded translation results, and further selecting the best overall translation; since all translation samples can be calculated and scored completely independently, the process can only double as much time if there is sufficient parallelism to calculate a single translation.
The invention uses the current common data set ISLT 14 Deying spoken language data set and WMT14 Deying language data set to verify the effectiveness of the proposed method, and the training set comprises 16 ten thousand and 450 ten thousand parallel sentence pairs respectively. And obtaining the processed bilingual corpus training data in a byte pair encoder word segmentation mode. However, since the non-autoregressive model is difficult to fit to the multimodal distribution in the real data, this embodiment solves the problem by using sentence-level knowledge refinement. That is, sentences generated by the autoregressive neural machine translation with the same parameter configuration are used as training samples and provided for non-autoregressive neural machine translation for learning.
As shown in fig. 1, in this embodiment, the first two words "We total" of the source language sentence are first sent to the encoder of the autoregressive neural machine translation model, and the multi-head attention of the encoder extracts the source language sentence information by obtaining the correlation coefficient between the words and then sending the obtained correlation coefficient to the feedforward neural network. Then, the decoder of the autoregressive neural machine translation model receives the information and then sequentially passes through the multi-head self-attention layer, the multi-head coding decoding attention layer and the feedforward neural network layer and then carries out linear transformation again to obtain a translation result 'complete'. Then, the non-autoregressive neural machine translation model uses the translation result to carry out representation fusion with the encoder information of the non-autoregressive neural machine translation model to be used as the input of a decoder of the non-autoregressive neural machine translation model, finally, the decoder uses the extracted source language sentence information and the decoder input to sequentially pass through a multi-head self-attention layer, a multi-head position attention, a multi-head encoding decoding attention layer and a feedforward neural network layer to obtain the whole target language sentence which is completely accepted by the user after linear change again.
The invention uses bilingual evaluation index BLEU commonly used in machine translation task as evaluation standard. Experimental results show that 9 candidate translations of different lengths are simultaneously decoded by using an auxiliary representation fusion method as the input of a non-autoregressive model, and then 9.4 times of speed improvement is obtained under the condition that 14 percent of performance is lost on an IWSLT14 Deying dataset by using an autoregressive model evaluation method; on the WMT14 de-english dataset, a 7.9-fold speed improvement was obtained with only a loss of 8.5 percent performance.
The invention optimizes the translation performance of the non-autoregressive neural machine translation system from the expression fusion angle, and aims to realize accurate and rapid translation. By introducing the high-level representation encoded in the shallow autoregressive model into the non-autoregressive model, the translation quality of the non-autoregressive model is improved and efficient reasoning speed is ensured. The fusion expression of the source language and part of the target language is used as input, so that the problem of weak target end information of the non-autoregressive model is greatly solved, and the performance of the model is effectively enhanced.

Claims (6)

1. The non-autoregressive neural machine translation method based on auxiliary representation fusion is characterized by comprising the following steps of:
1) Constructing an autoregressive neural machine translation model comprising an encoder and a decoder by adopting a transducer model based on an autoregressive mechanism;
2) Constructing training parallel corpus, performing word segmentation and word segmentation pretreatment to obtain a source language sequence and a target language sequence, generating a machine translation word list, and training a model with only one layer of decoder until the model converges;
3) Removing a matrix of the original decoder shielding future information in the transducer model, and simultaneously adding multi-head position attention between self-attention and coding and decoding attention to construct a non-autoregressive neural machine translation model;
4) The front part word of the source language, which is set according to a specified proportion, is decoded by using a shallow autoregressive model, and the output of a feedforward neural network at the top layer of the autoregressive neural machine translation model decoder is subjected to weighted fusion with the top layer representation of a non-autoregressive neural machine translation model encoder to be used as the input of the non-autoregressive neural machine translation model decoder;
5) Using parallel corpus training to use fusion representation as an input non-autoregressive neural machine translation model, an encoder encodes a source sentence, extracts source sentence sub-information, and a decoder predicts a corresponding target sentence sub-according to the source sentence sub-information; calculating the difference between the predicted data distribution and the real data distribution, and continuously reducing the difference in a back propagation mode until the model converges to complete the training process of the non-autoregressive neural machine translation model;
6) And sending the source sentence input by the user into a non-autoregressive neural machine translation model, decoding translation results with different lengths, and selecting an optimal translation result through evaluation of the autoregressive neural machine translation model.
2. The non-autoregressive neural machine translation method based on auxiliary representation fusion of claim 1, wherein: in the step 3), constructing a non-autoregressive neural machine translation model, which specifically comprises the following steps:
301 Modeling translation problems after removing the matrix of the decoding end for future information shielding:
wherein X is a source language sequence, Y is a target language sequence, T is a target language sequence length, T' is a source language sequence length, T is a target language position, and X 1…T′ For source language sentences, y t The target word is the target word at the t-th position;
302 Adding an additional multi-headed location attention module in each decoder layer, the module being:
wherein Q is a query matrix, K is a key matrix, V is a value matrix, softmax (.) is a normalization function, attention (.) is an Attention calculation function, and d k Is the dimension of the key matrix;
303 Before decoding begins, the target length is estimated using the source length and the estimated target length data is sent to a non-autoregressive neural machine translation model to generate all words in parallel.
3. The non-autoregressive neural machine translation method based on auxiliary representation fusion of claim 1, wherein: in step 4), the translation result of the autoregressive model is used to improve the input of the non-autoregressive neural machine translation model, specifically:
401 Input at decoding end of non-autoregressive machine translation model as follows:
wherein θ at And theta nat The parameters of the autoregressive neural machine translation model and the non-autoregressive neural machine translation model are respectively, T is the target language sequence length, T' is the source language sequence length, y t Is the target word at the t-th position, x 1...T′ For source sentence, y < t is 1 st to t-1 st target word, z nat Input of a decoding end of a non-autoregressive neural machine translation model;
402 Constructing a fusion function, adopting a weighted sum mode, specifically:
Fusion=λDecoder at (y 1...k )+μEncoder nat (x 1...T′ )
where λ and μ are hyper-parameters controlling weights of different representation terms, decoder at () is the output of the autoregressive neural machine translation model decoder, the Encoder nat () output of non-autoregressive neural machine translation model decoder, y 1...k For words 1 to k, x 1...T′ Is a source sentence;
403 Before feeding the above calculated fusion representation to the decoder, the forward layer input and the backward layer gradient are normalized by applying a layer normalization operation.
4. The non-autoregressive neural machine translation method based on auxiliary representation fusion of claim 3, wherein z in step 401) nat The calculation formula of (2) is as follows:
z nat =Fusion(Decoder at (y 1...k ),Encoder nat (x 1...T′ ))
wherein the Decoder at (.) is the output of the decoding end of the autoregressive model, the Encoder nat (.) is the output of the decoding end of the non-autoregressive model, fusion (-) is the auxiliary representation Fusion function, y 1...k For words 1 to k, x 1...T′ Is a source sentence.
5. The non-autoregressive neural machine translation method based on auxiliary representation fusion of claim 1, wherein: and 5) in the training process of the non-autoregressive neural machine translation model, parallel corpus is sent into the model to calculate cross entropy loss, and then corresponding gradient is calculated to update parameters so as to complete the training process.
6. The non-autoregressive neural machine translation method based on auxiliary representation fusion of claim 1, wherein: step 6), the source sentence input by the user is sent to a non-autoregressive neural machine translation model, and a plurality of translation results are obtained by designating different target language lengths; then, an autoregressive neural machine translation model is used as a scoring function for these decoded translation results, and the best overall translation is selected.
CN202110592517.4A 2021-05-28 2021-05-28 Non-autoregressive neural machine translation method based on auxiliary representation fusion Active CN113378584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110592517.4A CN113378584B (en) 2021-05-28 2021-05-28 Non-autoregressive neural machine translation method based on auxiliary representation fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110592517.4A CN113378584B (en) 2021-05-28 2021-05-28 Non-autoregressive neural machine translation method based on auxiliary representation fusion

Publications (2)

Publication Number Publication Date
CN113378584A CN113378584A (en) 2021-09-10
CN113378584B true CN113378584B (en) 2023-09-05

Family

ID=77574788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110592517.4A Active CN113378584B (en) 2021-05-28 2021-05-28 Non-autoregressive neural machine translation method based on auxiliary representation fusion

Country Status (1)

Country Link
CN (1) CN113378584B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065784B (en) * 2021-11-16 2023-03-10 北京百度网讯科技有限公司 Training method, translation method, device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160050A (en) * 2019-12-20 2020-05-15 沈阳雅译网络技术有限公司 Chapter-level neural machine translation method based on context memory network
CN112052692A (en) * 2020-08-12 2020-12-08 内蒙古工业大学 Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200044201A (en) * 2018-10-10 2020-04-29 한국전자통신연구원 Neural machine translation model learning method and apparatus for improving translation performance

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160050A (en) * 2019-12-20 2020-05-15 沈阳雅译网络技术有限公司 Chapter-level neural machine translation method based on context memory network
CN112052692A (en) * 2020-08-12 2020-12-08 内蒙古工业大学 Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
神经机器翻译前沿综述;冯洋;邵晨泽;;中文信息学报(第07期);全文 *

Also Published As

Publication number Publication date
CN113378584A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN113468895B (en) Non-autoregressive neural machine translation method based on decoder input enhancement
CN110598221B (en) Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN111160050A (en) Chapter-level neural machine translation method based on context memory network
CN108897740A (en) A kind of illiteracy Chinese machine translation method based on confrontation neural network
CN111178093B (en) Neural machine translation system training acceleration method based on stacking algorithm
CN109299479B (en) Method for integrating translation memory into neural machine translation through gating mechanism
CN114787914A (en) System and method for streaming end-to-end speech recognition with asynchronous decoder
CN111597778A (en) Method and system for automatically optimizing machine translation based on self-supervision
CN112257465B (en) Multi-mode machine translation data enhancement method based on image description generation
Sen et al. Neural machine translation of low-resource languages using SMT phrase pair injection
CN111178087B (en) Neural machine translation decoding acceleration method based on discrete type attention mechanism
CN110781690A (en) Fusion and compression method of multi-source neural machine translation model
CN112417901A (en) Non-autoregressive Mongolian machine translation method based on look-around decoding and vocabulary attention
CN113378584B (en) Non-autoregressive neural machine translation method based on auxiliary representation fusion
CN114691858B (en) Improved UNILM digest generation method
CN113657125B (en) Mongolian non-autoregressive machine translation method based on knowledge graph
Chen et al. Research on neural machine translation model
Han et al. A coordinated representation learning enhanced multimodal machine translation approach with multi-attention
CN117218503A (en) Cross-Han language news text summarization method integrating image information
Shi et al. Adding Visual Information to Improve Multimodal Machine Translation for Low-Resource Language
CN116663578A (en) Neural machine translation method based on strategy gradient method improvement
CN116663577A (en) Cross-modal characterization alignment-based english end-to-end speech translation method
CN115659172A (en) Generation type text summarization method based on key information mask and copy
CN112257463B (en) Compression method of neural machine translation model for Chinese-English inter-translation
Wang et al. Multimodal object classification using bidirectional gated recurrent unit networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant