CN114611488A

CN114611488A - Knowledge-enhanced non-autoregressive neural machine translation method and device

Info

Publication number: CN114611488A
Application number: CN202210243650.3A
Authority: CN
Inventors: 王亦宁; 刘升平; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2022-03-12
Filing date: 2022-03-12
Publication date: 2022-06-10

Abstract

A knowledge-enhanced non-autoregressive neural machine translation method and device, the method carries on data preprocessing and word vector coding to bilingual parallel language pair; the method comprises the steps that word vector representation of a source language is input into an encoder network, and the encoder network encodes source language document information to obtain encoded representation of input word sequence information; using a word alignment model to construct a corresponding relation between a source language and a target language and construct a multiplication rate model; constructing input and output coding representations of a decoder model; and establishing the dependence among the target language vocabularies through a conditional random field model, and sequentially decoding to generate a final translation result. The invention uses conditional random field to decode at the decoding end; the pre-training language model has strong context information, the conditional random field constructs a context dependency relationship, the phenomena of a large number of repeated turns, missed turns and inconsistent front and back which are easy to occur in non-autoregressive translation are relieved, and the translation result with higher quality is obtained.

Description

Knowledge-enhanced non-autoregressive neural machine translation method and device

Technical Field

The invention belongs to the technical field of machine translation, and particularly relates to a knowledge-enhanced non-autoregressive neural machine translation method and device.

Background

The neural machine translation is an autoregressive decoding mode, target languages are generated by decoding from left to right sequentially, and due to the characteristic, words at different positions cannot be generated in parallel in the decoding process. The non-autoregressive translation abandons the time sequence of the target-end language generation process, and the decoding process does not depend on the previous translation result, so that high reasoning speed is obtained, all target language vocabularies can be generated simultaneously in the decoding process, and the decoding speed of the model is greatly accelerated.

The current non-autoregressive translation method simultaneously generates target language words at all moments, although the decoding speed is greatly improved, the dependency among the words is abandoned, translation results that translation contents are inconsistent before and after, translation contents are omitted or the same contents are repeated for many times are easily caused, the translation quality is poor, and the normal high-quality translation requirement cannot be met.

Disclosure of Invention

Therefore, the invention provides a knowledge-enhanced non-autoregressive neural machine translation method and a knowledge-enhanced non-autoregressive neural machine translation device, and solves the problems that in non-autoregressive neural machine translation, target language text generation cannot depend on context information, and is easy to generate language retracing, language missing and inconsistent before and after translation.

In order to achieve the above purpose, the invention provides the following technical scheme: a knowledge-enhanced non-autoregressive neural machine translation method, comprising the steps of:

(1) carrying out data preprocessing and word vector coding on bilingual parallel language pairs;

(2) the method comprises the steps that word vector representation of a source language is input into an encoder network, and the encoder network encodes source language document information to obtain encoded representation of input word sequence information;

(3) using a word alignment model to construct a corresponding relation between a source language and a target language and construct a multiplication rate model;

(4) constructing input and output coding representations of a decoder model;

(5) and establishing the dependence among the target language vocabularies through a conditional random field model, and sequentially decoding to generate a final translation result.

As a preferred embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (1) comprises:

(11) performing sub-word segmentation on sentences in all the training corpora by using a BPE algorithm;

(12) predefining a sub-word sequence representing a source language, and obtaining word vector coding representation of the source language by using a pre-training model;

(13) acquiring a position vector code of a source language input sequence;

(14) and adding the word vector code and the position vector code to obtain the input code representation of the source language.

As a preferable embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (2) includes:

(21) acquiring a word sequence matrix of a source language subjected to word vector preprocessing;

(22) using a Transformer layer based on the self-attention mechanism, a top-most encoded representation of each word through the encoder network is obtained.

As a preferable scheme of the knowledge-enhanced non-autoregressive neural machine translation method, the step (3) comprises the following steps:

(31) predefining word sequences representing different target languages, and constructing a corresponding relation between a source language word sequence and a target language by using a word alignment model;

(32) according to the corresponding relation between the source language word sequence and the target language, taking the token number of the target language corresponding to the source language as a reproduction rate sequence;

(33) calculating softmax for each word obtained in the step (22) through the topmost coding representation of the coder network to obtain probability distribution of the multiplication rate;

(34) selecting the output corresponding to the maximum probability as a generation result of the encoder network;

the step (33) includes:

(331) carrying out one-layer linear transformation on the output hidden state;

(332) and outputting the probability distribution in the multiplication rate by the result obtained by one layer of linear transformation through softmax.

As a preferable embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (4) includes:

(41) constructing the input of a decoder end according to the multiplication rate result obtained in the step (33);

(42) obtaining a decoder input encoded representation;

(43) an output encoded representation of the decoder is obtained.

As a preferable embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (5) includes:

(51) performing one-layer linear transformation on the hidden state output by the topmost layer of the decoder network;

(52) outputting the output probability distribution of each moment by a result obtained by one layer of linear transformation through a CRF linear chain;

(53) and selecting the word corresponding to the maximum probability as a translation result at the specified time.

The invention also provides a knowledge-enhanced non-autoregressive neural machine translation device, comprising:

the first processing module is used for carrying out data preprocessing and word vector encoding on the bilingual parallel language pair;

the second processing module is used for inputting the word vector representation of the source language into the encoder network, and the encoder network encodes the source language document information to obtain the encoded representation of the input word sequence information;

the third processing module is used for constructing a corresponding relation between a source language and a target language by using the word alignment model and constructing a reproduction rate model;

a fourth processing module for constructing input and output encoded representations of the decoder model;

and the fifth processing module is used for establishing the dependence between the target language vocabularies through the conditional random field model and sequentially decoding to generate a final translation result.

As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the first processing module includes:

the sub-word segmentation submodule is used for performing sub-word segmentation on sentences in all the training corpora by using a BPE algorithm;

the first obtaining submodule is used for predefining a sub-word sequence representing a source language and obtaining word vector coding representation of the source language by using a pre-training model;

the second obtaining submodule is used for obtaining the position vector code of the source language input sequence;

and the input code representation submodule is used for adding the word vector code and the position vector code to obtain the input code representation of the source language.

As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the second processing module includes:

the word sequence matrix submodule is used for acquiring a word sequence matrix of a source language subjected to word vector preprocessing;

and the top-level coding representation submodule is used for obtaining the top-level coding representation of each word passing through the coder network by using a Transformer layer based on a self-attention mechanism.

As a preferable aspect of the knowledge-enhanced non-autoregressive neural machine translation apparatus, the third processing module includes:

the corresponding relation construction submodule is used for predefining word sequences representing different target languages and constructing the corresponding relation between the source language word sequence and the target language by using a word alignment model;

the multiplication rate sequence submodule is used for taking the number of tokens of the target language corresponding to the source language as a multiplication rate sequence according to the corresponding relation between the source language word sequence and the target language;

the reproduction rate probability distribution submodule is used for calculating softmax for each obtained word through the topmost coding representation of the coder network to obtain the probability distribution of the reproduction rate;

a generation result submodule for selecting the output corresponding to the maximum probability as the generation result of the encoder network;

in the reproduction rate probability distribution submodule, performing one-layer linear transformation on the output hidden state; and outputting the probability distribution in the multiplication rate by the result obtained by one layer of linear transformation through softmax.

As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the fourth processing module includes:

the input construction submodule is used for constructing the input of the decoder end according to the reproduction rate result obtained by the reproduction rate probability distribution submodule;

an input encoded representation sub-module for obtaining a decoder input encoded representation;

an output encoded representation sub-module for obtaining an output encoded representation of the decoder.

As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the fifth processing module includes:

the hidden state linear transformation submodule is used for carrying out one-layer linear transformation on the hidden state output by the topmost layer of the decoder network;

the output probability distribution submodule is used for outputting the output probability distribution of each moment of the result obtained by the linear transformation of the layer through a CRF linear chain;

and the translation result submodule is used for selecting the word corresponding to the maximum probability as the translation result at the appointed moment.

The invention has the following advantages: carrying out data preprocessing and word vector encoding on bilingual parallel language pairs; the method comprises the steps that word vector representation of a source language is input into an encoder network, and the encoder network encodes source language document information to obtain encoded representation of input word sequence information; using a word alignment model to construct a corresponding relation between a source language and a target language and construct a multiplication rate model; constructing input and output coding representations of a decoder model; and establishing the dependence among the target language vocabularies through a conditional random field model, and sequentially decoding to generate a final translation result. The method utilizes the pre-training language model knowledge and uses a conditional random field at a decoding end for decoding; the pre-training language model has strong context information, and the conditional random field constructs a context dependency relationship, so that the phenomena of a large number of repeated turns, missed turns and inconsistent front and back which are easy to occur in non-autoregressive translation are relieved, and the translation result with higher quality is obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

FIG. 1 is a flow chart of a knowledge-enhanced non-autoregressive neural machine translation method provided in example 1 of the present invention;

fig. 2 is a schematic diagram of a knowledge-enhanced non-autoregressive neural machine translation apparatus provided in embodiment 2 of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Example 1

Referring to fig. 1, embodiment 1 of the present invention provides a knowledge-enhanced non-autoregressive neural machine translation method, including the following steps:

s1, carrying out data preprocessing and word vector encoding on the bilingual parallel language pairs;

s2, inputting the word vector representation of the source language into an encoder network, wherein the encoder network encodes the source language document information to obtain the encoded representation of the input word sequence information;

s3, establishing a corresponding relation between a source language and a target language by using a word alignment model, and establishing a reproduction rate model;

s4, constructing input and output coding representation of the decoder model;

and S5, establishing the dependence among the target language vocabularies through the conditional random field model, and sequentially decoding to generate a final translation result.

In this embodiment, step S1 includes:

s11, carrying out sub-word segmentation on sentences in all training corpora by using a BPE algorithm;

s12, predefining a sub-word sequence representing the source language, and obtaining word vector coding representation of the source language by using a pre-training model;

s13, acquiring position vector codes of the source language input sequence;

and S14, adding the word vector code and the position vector code to obtain the input code expression of the source language.

Specifically, in step S11, in order to reduce the influence of the out-of-set word on the translation performance, the BPE algorithm is first used to perform subword segmentation on the sentences in all the corpus, so that the input units of the encoder network and the output units of the decoder network are both subword sequences.

In step S12, x ═ x is first predefined₁,…,x_n]Representing a sequence of subwords in a source language, using a pre-trained model to obtain a word vector encoded representation TE in the source language_x：

TE_x＝Bert_emb(x)＝[v₁,…,v_n]

In step S13, a position vector encoding PE of an input sequence of a source language is obtained_x；

In step S14, the word vector is encoded into TE_xAnd position vector encoding PE_xAdding to obtain an input code vector representation E of the source language_x：E_x＝TE_x+PE_x。

In this embodiment, step S2 includes:

s21, acquiring a word sequence matrix of a source language subjected to word vector preprocessing;

s22, obtaining the topmost coded representation of each word passing through the coder network by using a Transformer layer based on a self-attention mechanism.

Specifically, in step S21, E is used_x＝[v₁,…,v_n]Representing a matrix of input word sequences pre-processed by word vectors, where v_iA vector representing the ith subword.

In step S22, using the transform layer based on the self-attention mechanism, the encoded representation of each word passing through the encoder is calculated by the following formula:

Vⁿ＝SelfAttn(E_x,E_x,E_x)

wherein E is_xRepresenting the input code vector, VⁿThe output from the attention mechanism is shown. Using the encoder, the top-most coded representation can be obtained

In this embodiment, step S3 includes:

s31, predefining word sequences representing different target languages, and constructing a corresponding relation between a source language word sequence and a target language by using a word alignment model;

s32, according to the corresponding relation between the source language word sequence and the target language, taking the token number of the target language corresponding to the source language as a reproduction rate sequence;

s33, calculating softmax for each word obtained in the step S22 through the topmost coding expression of the coder network, and obtaining the probability distribution of the reproduction rate;

and S34, selecting the output corresponding to the maximum probability as the generation result of the encoder network.

Specifically, in step S31, y ═ y is defined₁,…,y_n]And representing word sequences of different target languages, and constructing a corresponding relation Map between the source language word sequence x and the target language y by using a word alignment model.

In step S32, the token number of the target language corresponding to the source language is used as a reproduction rate sequence F by the correspondence Map:

F＝[f₁,…,f_n]

will f is mixed_iIs limited to a natural number between 0 and 50.

In step S33, the encoded representation of the top layer of the encoder obtained in step S22 is used

Calculating softmax to obtain the probability distribution of the reproduction rate. Hidden state O to be outputⁿPerforming a layer of linear transformation:

linearly transforming a layer to obtain a result

The probability distribution in the multiplication rate is output by softmax:

where W and b are the training parameters of the model and W dimension is 51.

In this embodiment, step S4 includes:

s41, constructing the input of the decoder end according to the multiplication rate result obtained in the step S33;

s42, obtaining a decoder input coding representation;

s43, obtaining the output coding representation of the decoder.

Specifically, in step S41, the input of the decoder side is constructed according to the result of the multiplication rate obtained in step 3: y is F.x is F_ix_i(ii) a For example, Source language x_iCorresponding multiplication rate f_iWhen 3, then y_i＝[x_i,x_i,x_i]。

In step S42, similar to steps S12-S14, a decoder input encoded representation is obtained; in step S43, similar to steps S21-S22, an output encoded representation of the decoder is obtained

In this embodiment, step S5 includes:

s51, performing one-layer linear transformation on the hidden state output by the top layer of the decoder network;

s52, outputting the output probability distribution of each moment by the result obtained by the linear transformation of the layer through a CRF linear chain;

and S53, selecting the word corresponding to the maximum probability as the translation result at the specified time.

Specifically, in step S51, the output at the top of the encoder network is hidden

After one layer of linear transformation, the following is shown:

in step S52, the linear transformation is performed

Output probability distribution Prob for each time instant by CRF linear chain_y|x：

Where s represents the prediction target word y_iT denotes the probability of passing between words, and z (x) denotes the normalization factor.

In step S53, the word corresponding to the maximum probability is selected as the translation result at time i: max (Prob)_y|x) (ii) a According to the steps, the most optimal is generated by decoding in sequenceFinal translation result y ═ y₁,…,y_n]。

In conclusion, the invention uses BPE algorithm to segment the sub-words of the sentences in all the training corpora; predefining a sub-word sequence representing a source language, and obtaining word vector coding representation of the source language by using a pre-training model; acquiring a position vector code of a source language input sequence; adding the word vector code and the position vector code to obtain an input code representation of a source language; using a word alignment model to construct a corresponding relation between a source language and a target language, constructing a multiplication rate model, predefining word sequences representing different target languages, and using the word alignment model to construct a corresponding relation between the source language word sequence and the target language; according to the corresponding relation between the source language word sequence and the target language, taking the token number of the target language corresponding to the source language as a reproduction rate sequence; calculating softmax for each obtained word through the topmost coding representation of the coder network to obtain the probability distribution of the multiplication rate; selecting the output corresponding to the maximum probability as a generation result of the encoder network; constructing input and output coding representations of a decoder model; performing one-layer linear transformation on the hidden state output by the topmost layer of the decoder network; outputting the output probability distribution of each moment by a result obtained by one layer of linear transformation through a CRF linear chain; and selecting the word corresponding to the maximum probability as a translation result at the specified time. The encoder encodes an input text into a semantic vector with context information by utilizing a pre-training model, and a decoder adds a conditional random field at the top layer of a decoding end so as to establish a context timing sequence dependency relationship; the method utilizes the pre-training language model knowledge to decode by using a conditional random field at a decoding end; the pre-training language model has strong context information, and the conditional random field constructs a context dependency relationship, so that the phenomena of a large number of repeated turns, missed turns and inconsistent front and back which are easy to occur in non-autoregressive translation are relieved, and the translation result with higher quality is obtained.

Example 2

Referring to fig. 2, embodiment 2 of the present invention further provides a knowledge-enhanced non-autoregressive neural machine translation apparatus, including:

the first processing module 1 is used for carrying out data preprocessing and word vector encoding on bilingual parallel language pairs;

the second processing module 2 is used for inputting the word vector representation of the source language into the encoder network, and the encoder network encodes the document information of the source language to obtain the encoded representation of the input word sequence information;

the third processing module 3 is used for constructing a corresponding relation between the source language and the target language by using the word alignment model and constructing a reproduction rate model;

a fourth processing module 4 for constructing input and output coded representations of the decoder model;

and the fifth processing module 5 is used for establishing the dependence among the target language vocabularies through the conditional random field model, and sequentially decoding to generate a final translation result.

In this embodiment, the first processing module 1 includes:

the subword segmentation submodule 11 is used for performing subword segmentation on sentences in all the training corpora by using a BPE algorithm;

the first obtaining submodule 12 is used for predefining a sub-word sequence representing a source language and obtaining word vector coding representation of the source language by using a pre-training model;

a second obtaining submodule 13, configured to obtain a position vector code of the source language input sequence;

and an input code representation submodule 14, configured to add the word vector code and the position vector code to obtain an input code representation in the source language.

In this embodiment, the second processing module 2 includes:

the word sequence matrix submodule 21 is configured to obtain a word sequence matrix of the source language after word vector preprocessing;

and a top-level coded representation sub-module 22 for obtaining a top-level coded representation of each word through the encoder network using a transform layer based on a self-attention mechanism.

In this embodiment, the third processing module 3 includes:

a correspondence construction submodule 31 for predefining word sequences representing different target languages, and constructing a correspondence between a source language word sequence and a target language using a word alignment model;

the multiplication rate sequence submodule 32 is configured to use the token number of the target language corresponding to the source language as a multiplication rate sequence according to the corresponding relationship between the source language word sequence and the target language;

a multiplication rate probability distribution submodule 33, configured to calculate softmax for each obtained word through the topmost code representation of the encoder network, so as to obtain probability distribution of the multiplication rate;

a generation result sub-module 34, configured to select an output corresponding to the maximum probability as a generation result of the encoder network;

in the reproduction rate probability distribution submodule 33, one layer of linear transformation is performed on the output hidden state; and outputting the probability distribution in the multiplication rate through softmax according to the result obtained by one layer of linear transformation.

In this embodiment, the fourth processing module 4 includes:

an input construction submodule 41, configured to construct an input of the decoder end according to the multiplication rate result obtained by the multiplication rate probability distribution submodule;

an input coded representation sub-module 42 for obtaining a decoder input coded representation;

an output encoded representation sub-module 43 for obtaining an output encoded representation of the decoder.

In this embodiment, the fifth processing module 5 includes:

a hidden state linear transformation submodule 51, configured to perform one-layer linear transformation on a hidden state output from the top layer of the decoder network;

an output probability distribution submodule 52, configured to output, through a CRF linear chain, an output probability distribution at each time of a result obtained by one layer of linear transformation;

and a translation result sub-module 53, configured to select a word corresponding to the maximum probability as a translation result at a specific time.

It should be noted that, for the information interaction, execution process, and other contents between the modules/sub-modules of the apparatus, since the same concept is based on the method embodiment in embodiment 1 of the present application, the technical effect brought by the information interaction, execution process, and other contents are the same as those of the method embodiment of the present application, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not described herein again.

Example 3

Embodiment 3 of the present invention provides a non-transitory computer readable storage medium having stored therein program code of the knowledge-enhanced non-autoregressive neural machine translation method, the program code comprising instructions for performing the knowledge-enhanced non-autoregressive neural machine translation method of embodiment 1 or any possible implementation thereof.

The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Example 4

An embodiment 4 of the present invention provides an electronic device, including: a memory and a processor;

the processor and the memory are communicated with each other through a bus; the memory stores program instructions executable by the processor to invoke the method of knowledge-enhanced non-autoregressive neural machine translation of embodiment 1 or any possible implementation thereof.

Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory, which may be integrated in the processor, located external to the processor, or stand-alone.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

Although the invention has been described in detail with respect to the general description and the specific embodiments, it will be apparent to those skilled in the art that modifications and improvements may be made based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A knowledge-enhanced non-autoregressive neural machine translation method, comprising the steps of:

(4) constructing input and output coding representations of a decoder model;

2. The knowledge-enhanced non-autoregressive neural machine translation method of claim 1, wherein step (1) comprises:

(13) acquiring a position vector code of a source language input sequence;

3. The knowledge-enhanced non-autoregressive neural machine translation method of claim 2, wherein step (2) comprises:

(22) using a transform layer based on the self-attention mechanism, a top-most encoded representation of each word through the network of encoders is obtained.

4. The knowledge-enhanced non-autoregressive neural machine translation method of claim 3, wherein step (3) comprises:

the step (33) includes:

(331) carrying out one-layer linear transformation on the output hidden state;

5. The knowledge-enhanced non-autoregressive neural machine translation method of claim 4, wherein step (4) comprises:

(42) obtaining a decoder input encoded representation;

(43) an output encoded representation of the decoder is obtained.

6. The knowledge-enhanced non-autoregressive neural machine translation method of claim 5, wherein step (5) comprises:

7. A knowledge-enhanced non-autoregressive neural machine translation device, comprising:

and the fifth processing module is used for establishing the dependence among the target language vocabularies through the conditional random field model, and sequentially decoding to generate a final translation result.

8. The knowledge-enhanced non-autoregressive neural machine translation device of claim 7, wherein the first processing module comprises:

the input code representation submodule is used for adding the word vector code and the position vector code to obtain input code representation of a source language;

the second processing module comprises:

9. The knowledge-enhanced non-autoregressive neural machine translation device of claim 8, wherein the third processing module comprises:

10. The knowledge-enhanced non-autoregressive neural machine translation device of claim 9, wherein the fourth processing module comprises:

the input construction submodule is used for constructing the input of the decoder end according to the multiplication rate result obtained by the multiplication rate probability distribution submodule;

an output encoded representation sub-module for obtaining an output encoded representation of the decoder;

the fifth processing module includes: