CN111428519B - Entropy-based neural machine translation dynamic decoding method and system - Google Patents

Entropy-based neural machine translation dynamic decoding method and system Download PDF

Info

Publication number
CN111428519B
CN111428519B CN202010151246.4A CN202010151246A CN111428519B CN 111428519 B CN111428519 B CN 111428519B CN 202010151246 A CN202010151246 A CN 202010151246A CN 111428519 B CN111428519 B CN 111428519B
Authority
CN
China
Prior art keywords
entropy
vector
word
time step
target language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010151246.4A
Other languages
Chinese (zh)
Other versions
CN111428519A (en
Inventor
程学旗
郭嘉丰
范意兴
王素
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010151246.4A priority Critical patent/CN111428519B/en
Publication of CN111428519A publication Critical patent/CN111428519A/en
Application granted granted Critical
Publication of CN111428519B publication Critical patent/CN111428519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a neural machine translation dynamic decoding method and system based on entropy, which are used for finding that the average entropy value of words in a sentence with a high BLEU value is smaller than that of words in a sentence with a low BLEU value by analyzing the relation between the entropy value and the BLEU value of the sentence, and the BLEU value of the sentence with the low entropy value is generally higher than that of the sentence with the high entropy value. By calculating the Pearson coefficient between the entropy value and the BLEU value of the sentence, the correlation between the two is found. Therefore, the invention proposes that at each time step of the decoding stage in the training process, not only real words or predicted words are sampled and selected with a certain probability to obtain context information, but also entropy is calculated according to the prediction result of the previous time step, and then the weight of the context information is dynamically adjusted according to the entropy. The problem of error accumulation caused by context information difference between training and inference in the decoding process of the neural machine translation model is solved.

Description

Entropy-based neural machine translation dynamic decoding method and system
Technical Field
The invention relates to the technical field of natural language processing and neural machine translation, in particular to a neural machine translation dynamic decoding method and system based on entropy.
Background
Machine translation is an important task in natural language processing, and in recent years, with the rise of deep neural networks, machine translation methods based on neural networks have made great progress and have gradually become mainstream machine translation methods. The neural machine translation model mainly comprises three parts: an encoder network, a decoder network, and an attention network.
The encoder network is responsible for encoding the source language sentence into a list of hidden vectors, one hidden vector representation for each word. The encoder network is typically a multi-layer bi-directional RNN structure, with forward RNN
Figure BDA0002402513420000011
Sequentially reading in a sequence of source language sentences (from x)1To x|x|) Calculating to obtain a forward hidden state sequence
Figure BDA0002402513420000012
Reverse RNN
Figure BDA0002402513420000013
Reading in the Source language sentence sequence in reverse order (from x)|x|To x1) Calculating to obtain a reverse hidden state sequence
Figure BDA0002402513420000014
Word xiThe corresponding hidden vector is expressed as
Figure BDA0002402513420000015
Thus hiNot only contains the semantic information of the preceding words, but also contains the semantic information of the following words.
The attention network generates a list of hidden vectors (h) from the encoder network1,…,h|x|) And the current hidden state vector sj-1Computing a context vector cjAnd passed to the decoder network. First, calculate the hidden vector list (h)1,…,h|x|) With the current hidden layer state vector sj-1The degree of correlation between the two is obtained to obtain a weight list (alpha)1j,…,α|x|j) Then, the weight list is used to perform weighted summation on the hidden vector list to calculate a context vector cjFor the next hidden state vector sjAnd (4) calculating.
The decoder network is typically a multi-layer RNN structure, each time step being based on the current word vector
Figure BDA0002402513420000016
Hidden layer state vector sj-1And context vector c calculated by attention networkjCalculating the hidden state vector s of the next time stepjAnd decoding a target language word yjUntil a special end of sentence symbol (EOS) is generated.
The existing neural machine translation model architecture is shown in fig. 1. Although the existing neural machine translation model has achieved good effects, some shortcomings still exist. In the prior art, the model decodes the target words in turn according to the context information. In the training phase, the model predicts using the real word as context information, while in the inference phase, it must generate the whole sequence from scratch, and can predict using the prediction result of the previous time step as context information. This difference in context information between training and inference results in an accumulation of errors, such that the model must predict without seeing the training phase.
In existing neural-machine translation models, each time step of the decoding process is based on the current word vector
Figure BDA0002402513420000021
Hidden layer state vector sj-1And context vector c calculated by attention networkjCalculating the hidden state vector s of the next time stepjI.e. by
Figure BDA0002402513420000022
In the training phase, yj-1Is a real target language word in the training corpus
Figure BDA0002402513420000023
While in the inference phase, yj-1Is a target language word predicted at the last time step
Figure BDA0002402513420000024
To solve the difference between training and inference, the model samples the context information from the real sequence and the predicted sequence with a certain probability in the training phase, rather than merely selecting the target language word in the real sequence, i.e. selecting the target language word in the real sequence
Figure BDA0002402513420000025
Although the method reduces the difference between the training phase and the inference phase to a certain extent and improves the translation effect, when the sampling selects the words in the prediction sequence
Figure BDA0002402513420000026
Due to uncertainty of prediction, prediction error is introduced in the training process, and robustness of the model is reduced.
Disclosure of Invention
The invention aims to solve the problem of error accumulation of a neural machine translation model in a decoding process due to the context information difference between training and inference. The prior art provides a method for keeping consistency of training and prediction in machine translation, and in the training process, a correct word or a predicted word is sampled and selected for each decoding position with a certain probability, so that the flow of training and prediction can be kept consistent. However, this method introduces prediction errors into the training process when selecting words in the prediction sequence due to the uncertainty of the prediction itself, reducing the robustness of the model. According to the invention, by analyzing the correlation between the entropy value of the sentence and the double pre-evaluation substitution value (BLEU), on the basis of a method for keeping the consistency of training and prediction in machine translation, the weight of context information is dynamically adjusted according to the entropy value, and the influence of uncertainty on the translation result is reduced.
Specifically, the invention provides a neural machine translation dynamic decoding method based on entropy, which comprises
Step 1, transmitting word vectors of words in a source language sentence in a training corpus into an encoder network to obtain a code vector list (h) of the source language sentence1,…,h|x|) And hidden state vector s of j-1 time stepj-1
Step 2, the attention network according to the code vector list (h)1,…,h|x|) And hidden state vector sj-1To obtain a context vector cj
Step 3, obtaining the j-1 th real target language word in the training corpus
Figure BDA0002402513420000031
And j-1 time step predicted target language word
Figure BDA0002402513420000032
And selects the real target language word with probability p
Figure BDA0002402513420000033
Selecting predicted target language words with a probability of 1-p
Figure BDA0002402513420000034
Obtaining a selection result yj-1
Step 4, according to the selection result yj-1The entropy e of the j-1 time step is obtained by the following formulaj-1Where N is the size of the target language lexicon,
Figure BDA0002402513420000035
step 5, selecting a result yj-1Hidden state vector sj-1And a context vector cjAnd entropy value ej-1Inputting the signal into a decoder network to obtain a hidden layer state vector s of the current jth time stepj,pi,j-1A predicted probability for the ith target language word;
step 6, according to yj-1Hidden state vector sjAnd a context vector cjTo obtain the target language word of the jth time step
Figure BDA0002402513420000036
Step 7, list of encoding vectors (h)1,…,h|x|) The hidden state vector s of the jth time stepjJ, the actual target language word in the training corpus
Figure BDA0002402513420000037
And target language words predicted at jth time step
Figure BDA0002402513420000038
And (5) transmitting the j +1 th time step, and continuing the decoding process until a special end of sentence symbol EOS is generated.
The neural machine translation dynamic decoding method based on entropy, wherein the step 2 comprises:
by paying attention toThe force network obtains the context vector cjWeight αijIs in a hidden layer state hiWith respect to hidden state sj-1In determining the next hidden state sjAnd predicting yjImportance of (1);
Figure BDA0002402513420000039
Figure BDA00024025134200000310
Figure BDA00024025134200000311
where x is the source language sentence, Va、WaAnd UaAre all parameters to be learned in the neural network.
The neural machine translation dynamic decoding method based on entropy, wherein the step 3 comprises:
Figure BDA00024025134200000312
Figure BDA00024025134200000313
where μ is the hyperparameter and e is the number of training rounds.
The neural machine translation dynamic decoding method based on entropy, wherein the step 6 comprises:
Figure BDA0002402513420000041
oj=Wotj
Pj=softmax(oj);
Figure BDA0002402513420000042
wherein eyj-1Entropy of the probability distribution of the predicted word at j-1 time step, WoIs a parameter to be learned in the neural network.
The neural machine translation dynamic decoding method based on entropy, wherein the step 7 comprises:
the encoder network computes the list of hidden vectors (h) corresponding to the source language sentence1,…,h|x|),
Figure BDA0002402513420000043
Is the word xiIs used to represent the word vector of (a),
Figure BDA0002402513420000044
is the word xiThe corresponding hidden vector representation is represented by a hidden vector,
Figure BDA0002402513420000045
Figure BDA0002402513420000046
the invention also provides a neural machine translation dynamic decoding system based on entropy, which comprises
Module 1, transmitting word vectors of words in source language sentences in training corpus into encoder network to obtain encoding vector list (h) of the source language sentences1,…,h|x|) And hidden state vector s of j-1 time stepj-1
Module 2, attention network from the list of code vectors (h)1,…,h|x|) And hidden state vector sj-1To obtain a context vector cj
Module 3, obtaining j-1 real target language word in training corpus
Figure BDA0002402513420000047
And j-1 time step predicted target language word
Figure BDA0002402513420000048
And selects the real target language word with probability p
Figure BDA0002402513420000049
Selecting predicted target language words with a probability of 1-p
Figure BDA00024025134200000410
Obtaining a selection result yj-1
Module 4, according to the selection result yj-1The entropy e of the j-1 time step is obtained by the following formulaj-1Where N is the size of the target language lexicon,
Figure BDA00024025134200000411
module 5, selecting result yj-1Hidden state vector sj-1And a context vector cjAnd entropy value ej-1Inputting the signal into a decoder network to obtain a hidden layer state vector s of the current jth time stepj,pi,j-1A predicted probability for the ith target language word;
module 6 according to yj-1Hidden state vector sjAnd a context vector cjTo obtain the target language word of the jth time step
Figure BDA00024025134200000412
Module 7, List of encoding vectors (h)1,…,h|x|) The hidden state vector s of the jth time stepjJ, the actual target language word in the training corpus
Figure BDA0002402513420000051
And target language words predicted at jth time step
Figure BDA0002402513420000052
And (5) transmitting the j +1 th time step, and continuing the decoding process until a special end of sentence symbol EOS is generated.
The neural machine translation dynamic decoding system based on entropy, wherein the module 2 comprises:
obtaining the context vector c through an attention networkjWeight αijIs in a hidden layer state hiWith respect to hidden state sj-1In determining the next hidden state sjAnd predicting yjImportance of (1);
Figure BDA0002402513420000053
Figure BDA0002402513420000054
Figure BDA0002402513420000055
where x is the source language sentence, Va、WaAnd UaAre all parameters to be learned in the neural network.
The neural machine translation dynamic decoding system based on entropy, wherein the module 3 comprises:
Figure BDA0002402513420000056
Figure BDA0002402513420000057
where μ is the hyperparameter and e is the number of training rounds.
The neural machine translation dynamic decoding system based on entropy, wherein the module 6 comprises:
Figure BDA0002402513420000058
oj=Wotj
Pj=softmax(oj);
Figure BDA0002402513420000059
wherein eyj-1Entropy of the probability distribution of the predicted word at j-1 time step, WoIs a parameter to be learned in the neural network.
The neural machine translation dynamic decoding system based on entropy, wherein the module 7 comprises:
the encoder network computes the list of hidden vectors (h) corresponding to the source language sentence1,…,h|x|),
Figure BDA00024025134200000510
Is the word xiIs used to represent the word vector of (a),
Figure BDA00024025134200000511
is the word xiThe corresponding hidden vector representation is represented by a hidden vector,
Figure BDA00024025134200000512
Figure BDA00024025134200000513
drawings
FIG. 1 is a diagram of a prior art neural machine translation model architecture;
FIG. 2 is yj-1Sampling a picture;
FIG. 3 is a GRU calculation schematic;
fig. 4 is a flowchart of a method for entropy-based dynamic decoding.
Detailed Description
When the inventor conducts the research of the neural machine translation technology, the relation between the entropy value and the BLEU value of a sentence is analyzed, and the fact that the average entropy value of words in the sentence with a high BLEU value is smaller than the average entropy value of words in the sentence with a low BLEU value is found, and the BLEU value of the sentence with a low entropy value is higher than the BLEU value of the sentence with a high entropy value is found. The inventor finds that there is a correlation between the entropy value and the BLEU value of a sentence by calculating the Pearson coefficient between the two. Therefore, the invention proposes that at each time step of the decoding stage in the training process, not only real words or predicted words are sampled and selected with a certain probability to obtain context information, but also entropy is calculated according to the prediction result of the previous time step, and then the weight of the context information is dynamically adjusted according to the entropy.
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
RNN-basedNMTModel
Let source language sentence be x ═ x1,…,x|x|The corresponding target language sentence is
Figure BDA0002402513420000061
Figure BDA0002402513420000062
Encoder:
The encoder network computes a list of hidden vectors (h) corresponding to the source language sentence1,…,h|x|),
Figure BDA0002402513420000063
Is the word xiIs used to represent the word vector of (a),
Figure BDA0002402513420000064
is the word xiThe corresponding hidden vector representation.
Figure BDA0002402513420000065
Figure BDA0002402513420000066
Attention:
Attention network computing context vector cjWeight αijReflects the hidden layer state hiWith respect to hidden state sj-1In determining the next hidden state sjAnd predicting yjOf importance in (1).
Figure BDA0002402513420000067
Figure BDA0002402513420000068
Figure BDA0002402513420000069
Wherein Va、WaAnd UaAre all parameters to be learned in the neural network.
Decoder:
The decoder network decodes the target language words in turn until a special end of sentence symbol (EOS) is generated.
yj-1(the j-1 word in the target language sentence) is selected (its role is to select, and it is only the calculation formula to realize) by the probability of p (p is the sampling probability, and the value is obtained by formula 6. concretely, it is to generate a sampling vector of 0-1 by the value of probability p, and by the way of multiplication, the position with sampling probability of 1 is selected
Figure BDA0002402513420000071
The position with the sampling probability of 0 is selected
Figure BDA0002402513420000072
For example,the probability p is 0.3, and the value of the sampling vector is [1,0,0,1,0,0,1,0,0,0]Then, then
Figure BDA0002402513420000073
Figure BDA0002402513420000074
Real target language words
Figure BDA0002402513420000075
Selecting predicted target language words with a probability of 1-p
Figure BDA0002402513420000076
μ is the hyperparameter and e is the number of training rounds. As shown in fig. 2.
Figure BDA0002402513420000077
Figure BDA0002402513420000078
Computing a hidden state vector sj
Figure BDA0002402513420000079
Figure BDA00024025134200000710
GRU2The calculation principle of (c) is shown in fig. 3.
[ c ] in formula (9)j;ej-1](vector stitching) corresponds to x in FIG. 3t
Figure BDA00024025134200000711
Corresponding to h in FIG. 3t-1,sjCorresponding to h in FIG. 3tAccording to the entropy value ej-1Adjustment cjThe weight of (c). Entropy value ej-1The larger the uncertainty, the more the description
Figure BDA00024025134200000712
The worse the translation, and therefore the less utilization in the next time step
Figure BDA00024025134200000713
More utilizes cjThe information of (1).
The formula for calculating the entropy value is as follows, where N is the size of the target language dictionary. The prediction probability at the j-1 time step is denoted as Pj-1It is an N-dimensional vector representing the predicted probabilities of all words in the target language lexicon, where the predicted probability of the ith target language word is denoted as pi,j-1
Figure BDA00024025134200000714
Probability distribution P over all words in the target language dictionaryjThe calculation formula of (a) is as follows:
Figure BDA0002402513420000081
oj=Wotj (12)
Pj=softmax(oj) (13)
Figure BDA0002402513420000082
wherein eyj-1Is an entropy value reflecting the uncertainty of the probability distribution of the predicted word at the j-1 time step, WoAre the parameters to be learned in the network.
The use of the above-described entropy-based dynamic decoding technique is explained:
first, word vectors of words in source language sentencesTransmitting into encoder network, and obtaining encoding vector list (h) corresponding to source language sentence according to formulas (1) - (2)1,…,h|x|). The target language words are then decoded in sequence until a special end of sentence symbol (EOS) is generated. The specific decoding process at the jth time step is as follows:
step S1, known quantity: code vector list (h)1,…,h|x|) Hidden state vector s at the j-1 st time stepj-1(the whole decoding process is carried out backward along with the time step, which is a concrete decoding process of the jth time step, so that the first j-1 time steps are all calculated), and the jth-1 real target language word in the training corpus
Figure BDA0002402513420000083
Target language word predicted at j-1 time step
Figure BDA0002402513420000084
Step S2, list of known encoding vectors (h)1,…,h|x|) And hidden state vector sj-1The attention network computes a context vector c according to equations (3) - (5)j
Step S3, knowing the real target language word
Figure BDA0002402513420000085
And predicted target language words
Figure BDA0002402513420000086
Sampling selection y according to equations (6) - (7)j-1
Step S4, known yj-1Calculating an entropy value e according to the formula (10)j-1
Step S5, knowing the entropy value ej-1Target language word yj-1Hidden state vector sj-1And a context vector cjThe decoder network calculates the hidden state vector s for the jth time step according to equations (8) - (9)j
Step S6, known yj-1Hidden layerState vector sjAnd a context vector cjPredicting the target language word at the jth time step according to equations (11) - (14)
Figure BDA0002402513420000087
Step S7, encoding vector list (h)1,…,h|x|) The hidden state vector s of the jth time stepjJ, the actual target language word in the training corpus
Figure BDA0002402513420000088
And target language words predicted at jth time step
Figure BDA0002402513420000089
And (4) transmitting the j +1 th time step, and continuing the decoding process until a special end of sentence symbol (EOS) is generated.
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a neural machine translation dynamic decoding system based on entropy, which comprises
Module 1, transmitting word vectors of words in source language sentences in training corpus into encoder network to obtain encoding vector list (h) of the source language sentences1,…,h|x|) And hidden state vector s of j-1 time stepj-1
Module 2, attention network from the list of code vectors (h)1,…,h|x|) And hidden state vector sj-1To obtain a context vector cj
Module 3, obtaining j-1 real target language word in training corpus
Figure BDA0002402513420000091
And j-1 time step predicted target language word
Figure BDA0002402513420000092
And selects the real target language word with probability p
Figure BDA0002402513420000093
Selecting predicted target language words with a probability of 1-p
Figure BDA0002402513420000094
Obtaining a selection result yj-1
Module 4, according to the selection result yj-1The entropy e of the j-1 time step is obtained by the following formulaj-1Where N is the size of the target language lexicon,
Figure BDA0002402513420000095
module 5, selecting result yj-1Hidden state vector sj-1And a context vector cjAnd entropy value ej-1Inputting the signal into a decoder network to obtain a hidden layer state vector s of the current jth time stepj,pi,j-1A predicted probability for the ith target language word;
module 6 according to yj-1Hidden state vector sjAnd a context vector cjTo obtain the target language word of the jth time step
Figure BDA0002402513420000096
Module 7, List of encoding vectors (h)1,…,h|x|) The hidden state vector s of the jth time stepjJ, the actual target language word in the training corpus
Figure BDA0002402513420000097
And target language words predicted at jth time step
Figure BDA0002402513420000098
And (5) transmitting the j +1 th time step, and continuing the decoding process until a special end of sentence symbol EOS is generated.
The neural machine translation dynamic decoding system based on entropy, wherein the module 2 comprises:
obtaining the context vector c through an attention networkjWeight αijIs in a hidden layer state hiWith respect to hidden state sj-1In determining the next hidden state sjAnd predicting yjImportance of (1);
Figure BDA0002402513420000099
Figure BDA00024025134200000910
Figure BDA00024025134200000911
where x is the source language sentence, Va、WaAnd UaAre all parameters to be learned in the neural network.
The neural machine translation dynamic decoding system based on entropy, wherein the module 3 comprises:
Figure BDA0002402513420000101
Figure BDA0002402513420000102
where μ is the hyperparameter and e is the number of training rounds.
The neural machine translation dynamic decoding system based on entropy, wherein the module 6 comprises:
Figure BDA0002402513420000103
oj=Wotj
Pj=softmax(oj);
Figure BDA0002402513420000104
wherein eyj-1Entropy of the probability distribution of the predicted word at j-1 time step, WoIs a parameter to be learned in the neural network.
The neural machine translation dynamic decoding system based on entropy, wherein the module 7 comprises:
the encoder network computes the list of hidden vectors (h) corresponding to the source language sentence1,…,h|x|),
Figure BDA0002402513420000105
Is the word xiIs used to represent the word vector of (a),
Figure BDA0002402513420000106
is the word xiThe corresponding hidden vector representation is represented by a hidden vector,
Figure BDA0002402513420000107
Figure BDA0002402513420000108

Claims (10)

1. an entropy-based neural machine translation dynamic decoding method is characterized by comprising
Step 1, transmitting word vectors of words in a source language sentence in a training corpus into an encoder network to obtain a code vector list (h) of the source language sentence1,…,h|x|) And hidden state vector s of j-1 time stepj-1
Step 2, the attention network according to the code vector list (h)1,…,h|x|) And hidden state vector sj-1To obtain a context vector cj
Step 3, obtaining the j-1 th real target language word in the training corpus
Figure FDA0003321525180000011
And j-1 time step predicted target language word
Figure FDA0003321525180000012
And selects the real target language word with probability p
Figure FDA0003321525180000013
Selecting predicted target language words with a probability of 1-p
Figure FDA0003321525180000014
Obtaining a selection result yj-1
Step 4, according to the selection result yj-1The entropy e of the j-1 time step is obtained by the following formulaj-1Where N is the size of the target language lexicon,
Figure FDA0003321525180000015
step 5, selecting a result yj-1Hidden state vector sj-1Context vector cjAnd entropy value ej-1Inputting the signal into a decoder network to obtain a hidden layer state vector s of the current jth time stepj,pi,j-1A predicted probability for the ith target language word;
step 6, according to yj-1Hidden state vector sjAnd a context vector cjTo obtain the target language word of the jth time step
Figure FDA0003321525180000016
Step 7, list of encoding vectors (h)1,…,h|x|) The hidden state vector s of the jth time stepjJ, the actual target language word in the training corpus
Figure FDA0003321525180000017
And target language words predicted at jth time step
Figure FDA0003321525180000018
And (5) transmitting the j +1 th time step, and continuing the decoding process until a special end of sentence symbol EOS is generated.
2. An entropy-based neural machine translation dynamic decoding method as claimed in claim 1, wherein the step 2 comprises:
obtaining the context vector c through an attention networkjWeight αijIs in a hidden layer state hiWith respect to hidden state sj-1In determining the next hidden state sjAnd predicting yjImportance of (1);
Figure FDA0003321525180000019
Figure FDA0003321525180000021
Figure FDA0003321525180000022
where x is the source language sentence,
Figure FDA0003321525180000023
Waand UaAre all to be learned in neural networksAnd (4) parameters.
3. An entropy-based neural machine translation dynamic decoding method as claimed in claim 2, wherein the step 3 comprises:
Figure FDA0003321525180000024
Figure FDA0003321525180000025
where μ is the hyperparameter and e is the number of training rounds.
4. An entropy-based neural machine translation dynamic decoding method as claimed in claim 3, wherein the step 6 comprises:
Figure FDA0003321525180000026
oj=Wotj
Pj=softmax(oj);
Figure FDA0003321525180000027
wherein
Figure FDA0003321525180000028
Entropy of the probability distribution of the predicted word at j-1 time step, WoIs a parameter to be learned in the neural network.
5. An entropy-based neural machine translation dynamic decoding method as claimed in claim 4, wherein the step 7 comprises:
the encoder network calculates the code vector corresponding to the source language sentenceList (h)1,…,h|x|),
Figure FDA0003321525180000029
Is the word xiIs used to represent the word vector of (a),
Figure FDA00033215251800000210
is the word xiThe corresponding hidden vector representation is represented by a hidden vector,
Figure FDA00033215251800000211
Figure FDA00033215251800000212
6. an entropy-based neural machine translation dynamic decoding system is characterized by comprising
Module 1, transmitting word vectors of words in source language sentences in training corpus into encoder network to obtain encoding vector list (h) of the source language sentences1,…,h|x|) And hidden state vector s of j-1 time stepj-1
Module 2, attention network from the list of code vectors (h)1,…,h|x|) And hidden state vector sj-1To obtain a context vector cj
Module 3, obtaining j-1 real target language word in training corpus
Figure FDA00033215251800000213
And j-1 time step predicted target language word
Figure FDA0003321525180000031
And selects the real target language word with probability p
Figure FDA0003321525180000032
Selecting predicted target language words with a probability of 1-p
Figure FDA0003321525180000033
Obtaining a selection result yj-1
Module 4, according to the selection result yj-1The entropy e of the j-1 time step is obtained by the following formulaj-1Where N is the size of the target language lexicon,
Figure FDA0003321525180000034
module 5, selecting result yj-1Hidden state vector sj-1Context vector cjAnd entropy value ej-1Inputting the signal into a decoder network to obtain a hidden layer state vector s of the current jth time stepj,pi,j-1A predicted probability for the ith target language word;
module 6 according to yj-1Hidden state vector sjAnd a context vector cjTo obtain the target language word of the jth time step
Figure FDA0003321525180000035
Module 7, List of encoding vectors (h)1,…,h|x|) The hidden state vector s of the jth time stepjJ, the actual target language word in the training corpus
Figure FDA0003321525180000036
And target language words predicted at jth time step
Figure FDA0003321525180000037
And (5) transmitting the j +1 th time step, and continuing the decoding process until a special end of sentence symbol EOS is generated.
7. An entropy-based neural machine translation dynamic decoding system as claimed in claim 6, wherein the module 2 comprises:
obtaining the context vector c through an attention networkjWeight αijIs in a hidden layer state hiWith respect to hidden state sj-1In determining the next hidden state sjAnd predicting yjImportance of (1);
Figure FDA0003321525180000038
Figure FDA0003321525180000039
where x is the source language sentence,
Figure FDA00033215251800000310
Waand UaAre all parameters to be learned in the neural network.
8. An entropy-based neural machine translation dynamic decoding system as claimed in claim 7, wherein the module 3 comprises:
Figure FDA00033215251800000311
Figure FDA00033215251800000312
where μ is the hyperparameter and e is the number of training rounds.
9. An entropy-based neural machine translation dynamic decoding system as claimed in claim 8, wherein the module 6 comprises:
Figure FDA0003321525180000041
oj=Wotj
Pj=softmax(oj);
Figure FDA0003321525180000042
wherein
Figure FDA0003321525180000043
Entropy of the probability distribution of the predicted word at j-1 time step, WoIs a parameter to be learned in the neural network.
10. An entropy-based neural machine translation dynamic decoding system as claimed in claim 9, wherein the module 7 comprises:
the encoder network computes the list of encoding vectors (h) corresponding to the source language sentence1,…,h|x|),
Figure FDA0003321525180000044
Is the word xiIs used to represent the word vector of (a),
Figure FDA0003321525180000045
is the word xiThe corresponding hidden vector representation is represented by a hidden vector,
Figure FDA0003321525180000046
Figure FDA0003321525180000047
CN202010151246.4A 2020-03-06 2020-03-06 Entropy-based neural machine translation dynamic decoding method and system Active CN111428519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010151246.4A CN111428519B (en) 2020-03-06 2020-03-06 Entropy-based neural machine translation dynamic decoding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010151246.4A CN111428519B (en) 2020-03-06 2020-03-06 Entropy-based neural machine translation dynamic decoding method and system

Publications (2)

Publication Number Publication Date
CN111428519A CN111428519A (en) 2020-07-17
CN111428519B true CN111428519B (en) 2022-03-29

Family

ID=71547442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010151246.4A Active CN111428519B (en) 2020-03-06 2020-03-06 Entropy-based neural machine translation dynamic decoding method and system

Country Status (1)

Country Link
CN (1) CN111428519B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016332B (en) * 2020-08-26 2021-05-07 华东师范大学 Multi-modal machine translation method based on variational reasoning and multi-task learning
CN112836485B (en) * 2021-01-25 2023-09-19 中山大学 Similar medical record prediction method based on neural machine translation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795912A (en) * 2019-09-19 2020-02-14 平安科技(深圳)有限公司 Method, device and equipment for encoding text based on neural network and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10049106B2 (en) * 2017-01-18 2018-08-14 Xerox Corporation Natural language generation through character-based recurrent neural networks with finite-state prior knowledge
CN108984539B (en) * 2018-07-17 2022-05-17 苏州大学 Neural machine translation method based on translation information simulating future moment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795912A (en) * 2019-09-19 2020-02-14 平安科技(深圳)有限公司 Method, device and equipment for encoding text based on neural network and storage medium

Also Published As

Publication number Publication date
CN111428519A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
US10762305B2 (en) Method for generating chatting data based on artificial intelligence, computer device and computer-readable storage medium
US11210475B2 (en) Enhanced attention mechanisms
US11776531B2 (en) Encoder-decoder models for sequence to sequence mapping
CN107870902B (en) Neural machine translation system
CN110442878B (en) Translation method, training method and device of machine translation model and storage medium
CN114787914A (en) System and method for streaming end-to-end speech recognition with asynchronous decoder
CN110326002B (en) Sequence processing using online attention
CN111488807A (en) Video description generation system based on graph convolution network
WO2020048389A1 (en) Method for compressing neural network model, device, and computer apparatus
CN112528655B (en) Keyword generation method, device, equipment and storage medium
CN111128137A (en) Acoustic model training method and device, computer equipment and storage medium
CN110929092A (en) Multi-event video description method based on dynamic attention mechanism
CN110569505B (en) Text input method and device
CN110598224A (en) Translation model training method, text processing device and storage medium
CN111428519B (en) Entropy-based neural machine translation dynamic decoding method and system
Li et al. End-to-end speech recognition with adaptive computation steps
US10783452B2 (en) Learning apparatus and method for learning a model corresponding to a function changing in time series
CN108763230B (en) Neural machine translation method using external information
CN111401081A (en) Neural network machine translation method, model and model forming method
CN113609284A (en) Method and device for automatically generating text abstract fused with multivariate semantics
Mezzoudj et al. An empirical study of statistical language models: n-gram language models vs. neural network language models
US11694041B2 (en) Chapter-level text translation method and device
CN109635302B (en) Method and device for training text abstract generation model
Brakel et al. An actor-critic algorithm for sequence prediction
CN114730380A (en) Deep parallel training of neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant