CN114611488A - Knowledge-enhanced non-autoregressive neural machine translation method and device - Google Patents

Knowledge-enhanced non-autoregressive neural machine translation method and device Download PDF

Info

Publication number
CN114611488A
CN114611488A CN202210243650.3A CN202210243650A CN114611488A CN 114611488 A CN114611488 A CN 114611488A CN 202210243650 A CN202210243650 A CN 202210243650A CN 114611488 A CN114611488 A CN 114611488A
Authority
CN
China
Prior art keywords
word
source language
language
input
submodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210243650.3A
Other languages
Chinese (zh)
Inventor
王亦宁
刘升平
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202210243650.3A priority Critical patent/CN114611488A/en
Publication of CN114611488A publication Critical patent/CN114611488A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

A knowledge-enhanced non-autoregressive neural machine translation method and device, the method carries on data preprocessing and word vector coding to bilingual parallel language pair; the method comprises the steps that word vector representation of a source language is input into an encoder network, and the encoder network encodes source language document information to obtain encoded representation of input word sequence information; using a word alignment model to construct a corresponding relation between a source language and a target language and construct a multiplication rate model; constructing input and output coding representations of a decoder model; and establishing the dependence among the target language vocabularies through a conditional random field model, and sequentially decoding to generate a final translation result. The invention uses conditional random field to decode at the decoding end; the pre-training language model has strong context information, the conditional random field constructs a context dependency relationship, the phenomena of a large number of repeated turns, missed turns and inconsistent front and back which are easy to occur in non-autoregressive translation are relieved, and the translation result with higher quality is obtained.

Description

Knowledge-enhanced non-autoregressive neural machine translation method and device
Technical Field
The invention belongs to the technical field of machine translation, and particularly relates to a knowledge-enhanced non-autoregressive neural machine translation method and device.
Background
The neural machine translation is an autoregressive decoding mode, target languages are generated by decoding from left to right sequentially, and due to the characteristic, words at different positions cannot be generated in parallel in the decoding process. The non-autoregressive translation abandons the time sequence of the target-end language generation process, and the decoding process does not depend on the previous translation result, so that high reasoning speed is obtained, all target language vocabularies can be generated simultaneously in the decoding process, and the decoding speed of the model is greatly accelerated.
The current non-autoregressive translation method simultaneously generates target language words at all moments, although the decoding speed is greatly improved, the dependency among the words is abandoned, translation results that translation contents are inconsistent before and after, translation contents are omitted or the same contents are repeated for many times are easily caused, the translation quality is poor, and the normal high-quality translation requirement cannot be met.
Disclosure of Invention
Therefore, the invention provides a knowledge-enhanced non-autoregressive neural machine translation method and a knowledge-enhanced non-autoregressive neural machine translation device, and solves the problems that in non-autoregressive neural machine translation, target language text generation cannot depend on context information, and is easy to generate language retracing, language missing and inconsistent before and after translation.
In order to achieve the above purpose, the invention provides the following technical scheme: a knowledge-enhanced non-autoregressive neural machine translation method, comprising the steps of:
(1) carrying out data preprocessing and word vector coding on bilingual parallel language pairs;
(2) the method comprises the steps that word vector representation of a source language is input into an encoder network, and the encoder network encodes source language document information to obtain encoded representation of input word sequence information;
(3) using a word alignment model to construct a corresponding relation between a source language and a target language and construct a multiplication rate model;
(4) constructing input and output coding representations of a decoder model;
(5) and establishing the dependence among the target language vocabularies through a conditional random field model, and sequentially decoding to generate a final translation result.
As a preferred embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (1) comprises:
(11) performing sub-word segmentation on sentences in all the training corpora by using a BPE algorithm;
(12) predefining a sub-word sequence representing a source language, and obtaining word vector coding representation of the source language by using a pre-training model;
(13) acquiring a position vector code of a source language input sequence;
(14) and adding the word vector code and the position vector code to obtain the input code representation of the source language.
As a preferable embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (2) includes:
(21) acquiring a word sequence matrix of a source language subjected to word vector preprocessing;
(22) using a Transformer layer based on the self-attention mechanism, a top-most encoded representation of each word through the encoder network is obtained.
As a preferable scheme of the knowledge-enhanced non-autoregressive neural machine translation method, the step (3) comprises the following steps:
(31) predefining word sequences representing different target languages, and constructing a corresponding relation between a source language word sequence and a target language by using a word alignment model;
(32) according to the corresponding relation between the source language word sequence and the target language, taking the token number of the target language corresponding to the source language as a reproduction rate sequence;
(33) calculating softmax for each word obtained in the step (22) through the topmost coding representation of the coder network to obtain probability distribution of the multiplication rate;
(34) selecting the output corresponding to the maximum probability as a generation result of the encoder network;
the step (33) includes:
(331) carrying out one-layer linear transformation on the output hidden state;
(332) and outputting the probability distribution in the multiplication rate by the result obtained by one layer of linear transformation through softmax.
As a preferable embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (4) includes:
(41) constructing the input of a decoder end according to the multiplication rate result obtained in the step (33);
(42) obtaining a decoder input encoded representation;
(43) an output encoded representation of the decoder is obtained.
As a preferable embodiment of the knowledge-enhanced non-autoregressive neural machine translation method, the step (5) includes:
(51) performing one-layer linear transformation on the hidden state output by the topmost layer of the decoder network;
(52) outputting the output probability distribution of each moment by a result obtained by one layer of linear transformation through a CRF linear chain;
(53) and selecting the word corresponding to the maximum probability as a translation result at the specified time.
The invention also provides a knowledge-enhanced non-autoregressive neural machine translation device, comprising:
the first processing module is used for carrying out data preprocessing and word vector encoding on the bilingual parallel language pair;
the second processing module is used for inputting the word vector representation of the source language into the encoder network, and the encoder network encodes the source language document information to obtain the encoded representation of the input word sequence information;
the third processing module is used for constructing a corresponding relation between a source language and a target language by using the word alignment model and constructing a reproduction rate model;
a fourth processing module for constructing input and output encoded representations of the decoder model;
and the fifth processing module is used for establishing the dependence between the target language vocabularies through the conditional random field model and sequentially decoding to generate a final translation result.
As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the first processing module includes:
the sub-word segmentation submodule is used for performing sub-word segmentation on sentences in all the training corpora by using a BPE algorithm;
the first obtaining submodule is used for predefining a sub-word sequence representing a source language and obtaining word vector coding representation of the source language by using a pre-training model;
the second obtaining submodule is used for obtaining the position vector code of the source language input sequence;
and the input code representation submodule is used for adding the word vector code and the position vector code to obtain the input code representation of the source language.
As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the second processing module includes:
the word sequence matrix submodule is used for acquiring a word sequence matrix of a source language subjected to word vector preprocessing;
and the top-level coding representation submodule is used for obtaining the top-level coding representation of each word passing through the coder network by using a Transformer layer based on a self-attention mechanism.
As a preferable aspect of the knowledge-enhanced non-autoregressive neural machine translation apparatus, the third processing module includes:
the corresponding relation construction submodule is used for predefining word sequences representing different target languages and constructing the corresponding relation between the source language word sequence and the target language by using a word alignment model;
the multiplication rate sequence submodule is used for taking the number of tokens of the target language corresponding to the source language as a multiplication rate sequence according to the corresponding relation between the source language word sequence and the target language;
the reproduction rate probability distribution submodule is used for calculating softmax for each obtained word through the topmost coding representation of the coder network to obtain the probability distribution of the reproduction rate;
a generation result submodule for selecting the output corresponding to the maximum probability as the generation result of the encoder network;
in the reproduction rate probability distribution submodule, performing one-layer linear transformation on the output hidden state; and outputting the probability distribution in the multiplication rate by the result obtained by one layer of linear transformation through softmax.
As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the fourth processing module includes:
the input construction submodule is used for constructing the input of the decoder end according to the reproduction rate result obtained by the reproduction rate probability distribution submodule;
an input encoded representation sub-module for obtaining a decoder input encoded representation;
an output encoded representation sub-module for obtaining an output encoded representation of the decoder.
As a preferable aspect of the knowledge-enhanced non-autoregressive neuro-machine translation apparatus, the fifth processing module includes:
the hidden state linear transformation submodule is used for carrying out one-layer linear transformation on the hidden state output by the topmost layer of the decoder network;
the output probability distribution submodule is used for outputting the output probability distribution of each moment of the result obtained by the linear transformation of the layer through a CRF linear chain;
and the translation result submodule is used for selecting the word corresponding to the maximum probability as the translation result at the appointed moment.
The invention has the following advantages: carrying out data preprocessing and word vector encoding on bilingual parallel language pairs; the method comprises the steps that word vector representation of a source language is input into an encoder network, and the encoder network encodes source language document information to obtain encoded representation of input word sequence information; using a word alignment model to construct a corresponding relation between a source language and a target language and construct a multiplication rate model; constructing input and output coding representations of a decoder model; and establishing the dependence among the target language vocabularies through a conditional random field model, and sequentially decoding to generate a final translation result. The method utilizes the pre-training language model knowledge and uses a conditional random field at a decoding end for decoding; the pre-training language model has strong context information, and the conditional random field constructs a context dependency relationship, so that the phenomena of a large number of repeated turns, missed turns and inconsistent front and back which are easy to occur in non-autoregressive translation are relieved, and the translation result with higher quality is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.
FIG. 1 is a flow chart of a knowledge-enhanced non-autoregressive neural machine translation method provided in example 1 of the present invention;
fig. 2 is a schematic diagram of a knowledge-enhanced non-autoregressive neural machine translation apparatus provided in embodiment 2 of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Example 1
Referring to fig. 1, embodiment 1 of the present invention provides a knowledge-enhanced non-autoregressive neural machine translation method, including the following steps:
s1, carrying out data preprocessing and word vector encoding on the bilingual parallel language pairs;
s2, inputting the word vector representation of the source language into an encoder network, wherein the encoder network encodes the source language document information to obtain the encoded representation of the input word sequence information;
s3, establishing a corresponding relation between a source language and a target language by using a word alignment model, and establishing a reproduction rate model;
s4, constructing input and output coding representation of the decoder model;
and S5, establishing the dependence among the target language vocabularies through the conditional random field model, and sequentially decoding to generate a final translation result.
In this embodiment, step S1 includes:
s11, carrying out sub-word segmentation on sentences in all training corpora by using a BPE algorithm;
s12, predefining a sub-word sequence representing the source language, and obtaining word vector coding representation of the source language by using a pre-training model;
s13, acquiring position vector codes of the source language input sequence;
and S14, adding the word vector code and the position vector code to obtain the input code expression of the source language.
Specifically, in step S11, in order to reduce the influence of the out-of-set word on the translation performance, the BPE algorithm is first used to perform subword segmentation on the sentences in all the corpus, so that the input units of the encoder network and the output units of the decoder network are both subword sequences.
In step S12, x ═ x is first predefined1,…,xn]Representing a sequence of subwords in a source language, using a pre-trained model to obtain a word vector encoded representation TE in the source languagex
TEx=Bertemb(x)=[v1,…,vn]
In step S13, a position vector encoding PE of an input sequence of a source language is obtainedx
In step S14, the word vector is encoded into TExAnd position vector encoding PExAdding to obtain an input code vector representation E of the source languagex:Ex=TEx+PEx
In this embodiment, step S2 includes:
s21, acquiring a word sequence matrix of a source language subjected to word vector preprocessing;
s22, obtaining the topmost coded representation of each word passing through the coder network by using a Transformer layer based on a self-attention mechanism.
Specifically, in step S21, E is usedx=[v1,…,vn]Representing a matrix of input word sequences pre-processed by word vectors, where viA vector representing the ith subword.
In step S22, using the transform layer based on the self-attention mechanism, the encoded representation of each word passing through the encoder is calculated by the following formula:
Vn=SelfAttn(Ex,Ex,Ex)
Figure BDA0003544041570000071
wherein E isxRepresenting the input code vector, VnThe output from the attention mechanism is shown. Using the encoder, the top-most coded representation can be obtained
Figure BDA0003544041570000072
In this embodiment, step S3 includes:
s31, predefining word sequences representing different target languages, and constructing a corresponding relation between a source language word sequence and a target language by using a word alignment model;
s32, according to the corresponding relation between the source language word sequence and the target language, taking the token number of the target language corresponding to the source language as a reproduction rate sequence;
s33, calculating softmax for each word obtained in the step S22 through the topmost coding expression of the coder network, and obtaining the probability distribution of the reproduction rate;
and S34, selecting the output corresponding to the maximum probability as the generation result of the encoder network.
Specifically, in step S31, y ═ y is defined1,…,yn]And representing word sequences of different target languages, and constructing a corresponding relation Map between the source language word sequence x and the target language y by using a word alignment model.
In step S32, the token number of the target language corresponding to the source language is used as a reproduction rate sequence F by the correspondence Map:
F=[f1,…,fn]
will f is mixediIs limited to a natural number between 0 and 50.
In step S33, the encoded representation of the top layer of the encoder obtained in step S22 is used
Figure BDA0003544041570000081
Calculating softmax to obtain the probability distribution of the reproduction rate. Hidden state O to be outputnPerforming a layer of linear transformation:
Figure BDA0003544041570000082
linearly transforming a layer to obtain a result
Figure BDA0003544041570000083
The probability distribution in the multiplication rate is output by softmax:
Figure BDA0003544041570000084
where W and b are the training parameters of the model and W dimension is 51.
In this embodiment, step S4 includes:
s41, constructing the input of the decoder end according to the multiplication rate result obtained in the step S33;
s42, obtaining a decoder input coding representation;
s43, obtaining the output coding representation of the decoder.
Specifically, in step S41, the input of the decoder side is constructed according to the result of the multiplication rate obtained in step 3: y is F.x is Fixi(ii) a For example, Source language xiCorresponding multiplication rate fiWhen 3, then yi=[xi,xi,xi]。
In step S42, similar to steps S12-S14, a decoder input encoded representation is obtained; in step S43, similar to steps S21-S22, an output encoded representation of the decoder is obtained
Figure BDA0003544041570000085
In this embodiment, step S5 includes:
s51, performing one-layer linear transformation on the hidden state output by the top layer of the decoder network;
s52, outputting the output probability distribution of each moment by the result obtained by the linear transformation of the layer through a CRF linear chain;
and S53, selecting the word corresponding to the maximum probability as the translation result at the specified time.
Specifically, in step S51, the output at the top of the encoder network is hidden
Figure BDA0003544041570000086
After one layer of linear transformation, the following is shown:
Figure BDA0003544041570000087
in step S52, the linear transformation is performed
Figure BDA0003544041570000088
Output probability distribution Prob for each time instant by CRF linear chainy|x
Figure BDA0003544041570000089
Where s represents the prediction target word yiT denotes the probability of passing between words, and z (x) denotes the normalization factor.
In step S53, the word corresponding to the maximum probability is selected as the translation result at time i: max (Prob)y|x) (ii) a According to the steps, the most optimal is generated by decoding in sequenceFinal translation result y ═ y1,…,yn]。
In conclusion, the invention uses BPE algorithm to segment the sub-words of the sentences in all the training corpora; predefining a sub-word sequence representing a source language, and obtaining word vector coding representation of the source language by using a pre-training model; acquiring a position vector code of a source language input sequence; adding the word vector code and the position vector code to obtain an input code representation of a source language; using a word alignment model to construct a corresponding relation between a source language and a target language, constructing a multiplication rate model, predefining word sequences representing different target languages, and using the word alignment model to construct a corresponding relation between the source language word sequence and the target language; according to the corresponding relation between the source language word sequence and the target language, taking the token number of the target language corresponding to the source language as a reproduction rate sequence; calculating softmax for each obtained word through the topmost coding representation of the coder network to obtain the probability distribution of the multiplication rate; selecting the output corresponding to the maximum probability as a generation result of the encoder network; constructing input and output coding representations of a decoder model; performing one-layer linear transformation on the hidden state output by the topmost layer of the decoder network; outputting the output probability distribution of each moment by a result obtained by one layer of linear transformation through a CRF linear chain; and selecting the word corresponding to the maximum probability as a translation result at the specified time. The encoder encodes an input text into a semantic vector with context information by utilizing a pre-training model, and a decoder adds a conditional random field at the top layer of a decoding end so as to establish a context timing sequence dependency relationship; the method utilizes the pre-training language model knowledge to decode by using a conditional random field at a decoding end; the pre-training language model has strong context information, and the conditional random field constructs a context dependency relationship, so that the phenomena of a large number of repeated turns, missed turns and inconsistent front and back which are easy to occur in non-autoregressive translation are relieved, and the translation result with higher quality is obtained.
Example 2
Referring to fig. 2, embodiment 2 of the present invention further provides a knowledge-enhanced non-autoregressive neural machine translation apparatus, including:
the first processing module 1 is used for carrying out data preprocessing and word vector encoding on bilingual parallel language pairs;
the second processing module 2 is used for inputting the word vector representation of the source language into the encoder network, and the encoder network encodes the document information of the source language to obtain the encoded representation of the input word sequence information;
the third processing module 3 is used for constructing a corresponding relation between the source language and the target language by using the word alignment model and constructing a reproduction rate model;
a fourth processing module 4 for constructing input and output coded representations of the decoder model;
and the fifth processing module 5 is used for establishing the dependence among the target language vocabularies through the conditional random field model, and sequentially decoding to generate a final translation result.
In this embodiment, the first processing module 1 includes:
the subword segmentation submodule 11 is used for performing subword segmentation on sentences in all the training corpora by using a BPE algorithm;
the first obtaining submodule 12 is used for predefining a sub-word sequence representing a source language and obtaining word vector coding representation of the source language by using a pre-training model;
a second obtaining submodule 13, configured to obtain a position vector code of the source language input sequence;
and an input code representation submodule 14, configured to add the word vector code and the position vector code to obtain an input code representation in the source language.
In this embodiment, the second processing module 2 includes:
the word sequence matrix submodule 21 is configured to obtain a word sequence matrix of the source language after word vector preprocessing;
and a top-level coded representation sub-module 22 for obtaining a top-level coded representation of each word through the encoder network using a transform layer based on a self-attention mechanism.
In this embodiment, the third processing module 3 includes:
a correspondence construction submodule 31 for predefining word sequences representing different target languages, and constructing a correspondence between a source language word sequence and a target language using a word alignment model;
the multiplication rate sequence submodule 32 is configured to use the token number of the target language corresponding to the source language as a multiplication rate sequence according to the corresponding relationship between the source language word sequence and the target language;
a multiplication rate probability distribution submodule 33, configured to calculate softmax for each obtained word through the topmost code representation of the encoder network, so as to obtain probability distribution of the multiplication rate;
a generation result sub-module 34, configured to select an output corresponding to the maximum probability as a generation result of the encoder network;
in the reproduction rate probability distribution submodule 33, one layer of linear transformation is performed on the output hidden state; and outputting the probability distribution in the multiplication rate through softmax according to the result obtained by one layer of linear transformation.
In this embodiment, the fourth processing module 4 includes:
an input construction submodule 41, configured to construct an input of the decoder end according to the multiplication rate result obtained by the multiplication rate probability distribution submodule;
an input coded representation sub-module 42 for obtaining a decoder input coded representation;
an output encoded representation sub-module 43 for obtaining an output encoded representation of the decoder.
In this embodiment, the fifth processing module 5 includes:
a hidden state linear transformation submodule 51, configured to perform one-layer linear transformation on a hidden state output from the top layer of the decoder network;
an output probability distribution submodule 52, configured to output, through a CRF linear chain, an output probability distribution at each time of a result obtained by one layer of linear transformation;
and a translation result sub-module 53, configured to select a word corresponding to the maximum probability as a translation result at a specific time.
It should be noted that, for the information interaction, execution process, and other contents between the modules/sub-modules of the apparatus, since the same concept is based on the method embodiment in embodiment 1 of the present application, the technical effect brought by the information interaction, execution process, and other contents are the same as those of the method embodiment of the present application, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not described herein again.
Example 3
Embodiment 3 of the present invention provides a non-transitory computer readable storage medium having stored therein program code of the knowledge-enhanced non-autoregressive neural machine translation method, the program code comprising instructions for performing the knowledge-enhanced non-autoregressive neural machine translation method of embodiment 1 or any possible implementation thereof.
The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Example 4
An embodiment 4 of the present invention provides an electronic device, including: a memory and a processor;
the processor and the memory are communicated with each other through a bus; the memory stores program instructions executable by the processor to invoke the method of knowledge-enhanced non-autoregressive neural machine translation of embodiment 1 or any possible implementation thereof.
Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory, which may be integrated in the processor, located external to the processor, or stand-alone.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Although the invention has been described in detail with respect to the general description and the specific embodiments, it will be apparent to those skilled in the art that modifications and improvements may be made based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. A knowledge-enhanced non-autoregressive neural machine translation method, comprising the steps of:
(1) carrying out data preprocessing and word vector coding on bilingual parallel language pairs;
(2) the method comprises the steps that word vector representation of a source language is input into an encoder network, and the encoder network encodes source language document information to obtain encoded representation of input word sequence information;
(3) using a word alignment model to construct a corresponding relation between a source language and a target language and construct a multiplication rate model;
(4) constructing input and output coding representations of a decoder model;
(5) and establishing the dependence among the target language vocabularies through a conditional random field model, and sequentially decoding to generate a final translation result.
2. The knowledge-enhanced non-autoregressive neural machine translation method of claim 1, wherein step (1) comprises:
(11) performing sub-word segmentation on sentences in all the training corpora by using a BPE algorithm;
(12) predefining a sub-word sequence representing a source language, and obtaining word vector coding representation of the source language by using a pre-training model;
(13) acquiring a position vector code of a source language input sequence;
(14) and adding the word vector code and the position vector code to obtain the input code representation of the source language.
3. The knowledge-enhanced non-autoregressive neural machine translation method of claim 2, wherein step (2) comprises:
(21) acquiring a word sequence matrix of a source language subjected to word vector preprocessing;
(22) using a transform layer based on the self-attention mechanism, a top-most encoded representation of each word through the network of encoders is obtained.
4. The knowledge-enhanced non-autoregressive neural machine translation method of claim 3, wherein step (3) comprises:
(31) predefining word sequences representing different target languages, and constructing a corresponding relation between a source language word sequence and a target language by using a word alignment model;
(32) according to the corresponding relation between the source language word sequence and the target language, taking the token number of the target language corresponding to the source language as a reproduction rate sequence;
(33) calculating softmax for each word obtained in the step (22) through the topmost coding representation of the coder network to obtain probability distribution of the multiplication rate;
(34) selecting the output corresponding to the maximum probability as a generation result of the encoder network;
the step (33) includes:
(331) carrying out one-layer linear transformation on the output hidden state;
(332) and outputting the probability distribution in the multiplication rate by the result obtained by one layer of linear transformation through softmax.
5. The knowledge-enhanced non-autoregressive neural machine translation method of claim 4, wherein step (4) comprises:
(41) constructing the input of a decoder end according to the multiplication rate result obtained in the step (33);
(42) obtaining a decoder input encoded representation;
(43) an output encoded representation of the decoder is obtained.
6. The knowledge-enhanced non-autoregressive neural machine translation method of claim 5, wherein step (5) comprises:
(51) performing one-layer linear transformation on the hidden state output by the topmost layer of the decoder network;
(52) outputting the output probability distribution of each moment by a result obtained by one layer of linear transformation through a CRF linear chain;
(53) and selecting the word corresponding to the maximum probability as a translation result at the specified time.
7. A knowledge-enhanced non-autoregressive neural machine translation device, comprising:
the first processing module is used for carrying out data preprocessing and word vector encoding on the bilingual parallel language pair;
the second processing module is used for inputting the word vector representation of the source language into the encoder network, and the encoder network encodes the source language document information to obtain the encoded representation of the input word sequence information;
the third processing module is used for constructing a corresponding relation between a source language and a target language by using the word alignment model and constructing a reproduction rate model;
a fourth processing module for constructing input and output encoded representations of the decoder model;
and the fifth processing module is used for establishing the dependence among the target language vocabularies through the conditional random field model, and sequentially decoding to generate a final translation result.
8. The knowledge-enhanced non-autoregressive neural machine translation device of claim 7, wherein the first processing module comprises:
the sub-word segmentation submodule is used for performing sub-word segmentation on sentences in all the training corpora by using a BPE algorithm;
the first obtaining submodule is used for predefining a sub-word sequence representing a source language and obtaining word vector coding representation of the source language by using a pre-training model;
the second obtaining submodule is used for obtaining the position vector code of the source language input sequence;
the input code representation submodule is used for adding the word vector code and the position vector code to obtain input code representation of a source language;
the second processing module comprises:
the word sequence matrix submodule is used for acquiring a word sequence matrix of a source language subjected to word vector preprocessing;
and the top-level coding representation submodule is used for obtaining the top-level coding representation of each word passing through the coder network by using a Transformer layer based on a self-attention mechanism.
9. The knowledge-enhanced non-autoregressive neural machine translation device of claim 8, wherein the third processing module comprises:
the corresponding relation construction submodule is used for predefining word sequences representing different target languages and constructing the corresponding relation between the source language word sequence and the target language by using a word alignment model;
the multiplication rate sequence submodule is used for taking the number of tokens of the target language corresponding to the source language as a multiplication rate sequence according to the corresponding relation between the source language word sequence and the target language;
the reproduction rate probability distribution submodule is used for calculating softmax for each obtained word through the topmost coding representation of the coder network to obtain the probability distribution of the reproduction rate;
a generation result submodule for selecting the output corresponding to the maximum probability as the generation result of the encoder network;
in the reproduction rate probability distribution submodule, performing one-layer linear transformation on the output hidden state; and outputting the probability distribution in the multiplication rate by the result obtained by one layer of linear transformation through softmax.
10. The knowledge-enhanced non-autoregressive neural machine translation device of claim 9, wherein the fourth processing module comprises:
the input construction submodule is used for constructing the input of the decoder end according to the multiplication rate result obtained by the multiplication rate probability distribution submodule;
an input encoded representation sub-module for obtaining a decoder input encoded representation;
an output encoded representation sub-module for obtaining an output encoded representation of the decoder;
the fifth processing module includes:
the hidden state linear transformation submodule is used for carrying out one-layer linear transformation on the hidden state output by the topmost layer of the decoder network;
the output probability distribution submodule is used for outputting the output probability distribution of each moment of the result obtained by the linear transformation of the layer through a CRF linear chain;
and the translation result submodule is used for selecting the word corresponding to the maximum probability as the translation result at the appointed moment.
CN202210243650.3A 2022-03-12 2022-03-12 Knowledge-enhanced non-autoregressive neural machine translation method and device Pending CN114611488A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210243650.3A CN114611488A (en) 2022-03-12 2022-03-12 Knowledge-enhanced non-autoregressive neural machine translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210243650.3A CN114611488A (en) 2022-03-12 2022-03-12 Knowledge-enhanced non-autoregressive neural machine translation method and device

Publications (1)

Publication Number Publication Date
CN114611488A true CN114611488A (en) 2022-06-10

Family

ID=81862896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210243650.3A Pending CN114611488A (en) 2022-03-12 2022-03-12 Knowledge-enhanced non-autoregressive neural machine translation method and device

Country Status (1)

Country Link
CN (1) CN114611488A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095092A (en) * 2021-04-19 2021-07-09 南京大学 Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095092A (en) * 2021-04-19 2021-07-09 南京大学 Method for improving translation quality of non-autoregressive neural machine through modeling synergistic relationship

Similar Documents

Publication Publication Date Title
US20210390271A1 (en) Neural machine translation systems
US11544474B2 (en) Generation of text from structured data
CN110110337B (en) Translation model training method, medium, device and computing equipment
CN110619043A (en) Automatic text abstract generation method based on dynamic word vector
CN116072098B (en) Audio signal generation method, model training method, device, equipment and medium
CN113590761B (en) Training method of text processing model, text processing method and related equipment
US11475225B2 (en) Method, system, electronic device and storage medium for clarification question generation
Yoon et al. TutorNet: Towards flexible knowledge distillation for end-to-end speech recognition
CN112163434B (en) Text translation method, device, medium and electronic equipment based on artificial intelligence
Delbrouck et al. Modulating and attending the source image during encoding improves multimodal translation
CN111666756A (en) Sequence model text abstract generation method based on topic fusion
CN112560456A (en) Generation type abstract generation method and system based on improved neural network
CN114611488A (en) Knowledge-enhanced non-autoregressive neural machine translation method and device
CN117877460A (en) Speech synthesis method, device, speech synthesis model training method and device
CN111475635B (en) Semantic completion method and device and electronic equipment
CN117875395A (en) Training method, device and storage medium of multi-mode pre-training model
CN112765968A (en) Grammar error correction method and training method and product for grammar error correction model
CN115357710B (en) Training method and device for table description text generation model and electronic equipment
CN113420869B (en) Translation method based on omnidirectional attention and related equipment thereof
CN115496134A (en) Traffic scene video description generation method and device based on multi-modal feature fusion
CN114912441A (en) Text error correction model generation method, error correction method, system, device and medium
CN114372140A (en) Layered conference abstract generation model training method, generation method and device
CN113593534A (en) Method and apparatus for multi-accent speech recognition
CN112836526A (en) Multi-language neural machine translation method and device based on gating mechanism
Lin et al. A Generative Adversarial Constraint Encoder-Decoder Model for the Text Summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination