CN107368475B - Machine translation method and system based on generation of antagonistic neural network - Google Patents

Machine translation method and system based on generation of antagonistic neural network Download PDF

Info

Publication number
CN107368475B
CN107368475B CN201710586841.9A CN201710586841A CN107368475B CN 107368475 B CN107368475 B CN 107368475B CN 201710586841 A CN201710586841 A CN 201710586841A CN 107368475 B CN107368475 B CN 107368475B
Authority
CN
China
Prior art keywords
network
generation
machine translation
output
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710586841.9A
Other languages
Chinese (zh)
Other versions
CN107368475A (en
Inventor
李世奇
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Tone Communication Technology Co ltd
Original Assignee
Global Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Tone Communication Technology Co ltd filed Critical Global Tone Communication Technology Co ltd
Priority to CN201710586841.9A priority Critical patent/CN107368475B/en
Publication of CN107368475A publication Critical patent/CN107368475A/en
Application granted granted Critical
Publication of CN107368475B publication Critical patent/CN107368475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention belongs to the technical field of computers, and discloses a machine translation method and a machine translation system based on a generated countermeasure neural network, wherein the method comprises the following steps: introducing a discrimination network which is in confrontation with the original machine translation generation network on the basis of the original machine translation generation network; the method is used for judging whether the translation of the target language is from a training parallel corpus or a network machine translation result generated by the original machine translation; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification; the system comprises: network discrimination, network generation, monolingual corpus and parallel corpus. The invention can fully utilize the bilingual parallel corpus resources marked by manpower and simultaneously can fully utilize the monolingual corpus resources to carry out semi-supervised learning; the monolingual corpus resources are very rich and easy to obtain, and the problem that the corpus required by a neural network machine translation model is not sufficient is solved.

Description

Machine translation method and system based on generation of antagonistic neural network
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a machine translation method and system based on generation of a confrontation neural network.
Background
Machine translation is the process of automatically translating a sentence in a source language into another sentence in a target language using computer algorithms. Machine translation is a research direction of artificial intelligence, and has very important scientific research value and practical value. Along with the continuous deepening of the globalization process and the rapid development of the internet, the machine translation technology plays an increasingly important role in political, economic, social, cultural communication and the like at home and abroad.
At present, a machine translation method based on a deep neural network is the best method in the field of machine translation. The encoding-decoding structure is mainly adopted and comprises an encoder and a decoder, wherein the encoder and the decoder both adopt a Recurrent Neural Network (RNN) and Long-Short-Term Memory (LSTM) Network structure. The translation process comprises the following steps: first, the encoder converts the input source language sentence into a word vector sequence as the input of the recurrent neural network, and the encoder outputs a dense vector of fixed length, called the context vector. The decoder then uses another recurrent neural network in conjunction with a Softmax classifier, with the context vector as input, to output a sequence of word vectors in the target language. Finally, the word vectors are mapped into target language words one by utilizing the dictionary, and the whole translation process is completed.
In summary, the problems of the prior art are as follows:
the main defect of the prior art is that the training of the deep neural network model depends heavily on a large-scale manually labeled bilingual parallel sentence-to-corpus. Because the cost of manual labeling is high, and a large-scale and high-quality manual labeling bilingual parallel corpus is lacked, the training data of the neural network machine translation model is insufficient, the performance is poor, and the problem is the bottleneck problem faced by the existing neural network machine translation model; especially in some languages, the parallel corpus resources available for training the neural network model are few and few, and it is difficult to construct a high-performance machine translation system.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a machine translation method and a machine translation system based on generation of an antagonistic neural network.
The invention is realized by the machine translation method based on the generation of the antagonistic neural network, which introduces a discrimination network antagonistic to the original machine translation generation network on the basis of the original machine translation generation network; the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification.
Further, the binary classification method comprises the following steps:
in the form of a hyperbolic tangent function:
Figure BDA0001353756230000021
wherein T (x) is the activation function of the hidden layer; h (x) is a hidden layer function;
the whole multi-layer perceptron feedforward neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2),
wherein the model parameter W2And b2Respectively representing a weight matrix from a hidden layer to an output layer and an output layer offset vector; s (x) is the activation function of the hidden layer; the activation function takes the form of a sigmoid function:
Figure BDA0001353756230000022
when the multi-layer perceptron feedforward neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate the output vector Y, the category represented by the dimension with larger value in Y is selected as a classification result, and the translation is indicated to be from the training corpus or from the generation network.
Further, the generation network consists of an encoder and a decoder; the encoder adopts a bidirectional Short-Term Memory (LSTM) neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;
then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer obtained by a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.
Further, the input X of the neural network machine translation modeltAnd ht-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output htRepresenting the output of the LSTM neural network element at the current time;
the method specifically comprises the following steps:
it=g(Wxixt+Whiht-1+bi);
ft=g(Wxfxt+Whfht-1+bf);
ot=g(Wxoxt+Whoht-1+bo);
Figure BDA0001353756230000031
Figure BDA0001353756230000032
ht=ot·tanh(ct);
wherein it、ft、otRespectively representing an input gate, an output gate and forgetting; c. Ct-1Representing the state of the neuron at time t-1, ctAnd
Figure BDA0001353756230000034
representing the state and hidden state of the neuron, htIs the output of the LSTM neuron; parameters W and b represent the connection weight and offset of each layer respectively;
further, the encoder adopts two LSTM networks, one of which inputs a forward word vector sequence and the other inputs a reverse word vector sequence to form a bidirectional LSTM network, and the vectors output by the two networks are connected to form a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:
Figure BDA0001353756230000033
wherein (theta)12,…,θk) K is the total number of categories of the classifier, i represents a certain classification category;and converting the states output by the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.
Further, the discrimination network is used for synchronously improving the capability of generating a network to generate a target language and improving the capability of judging a translation source by the discrimination network through antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;
in the machine translation method based on the generation of the antagonistic neural network, the process of judging network learning is a competition process between a generation network and a judgment network; the method specifically comprises the following steps:
randomly taking one of the real sample and the sample generated by the generation model, and judging whether the real sample is true by the judgment network;
the performance of generating the network and judging the network is continuously improved through a competitive machine learning mechanism; when the whole network reaches a Nash equilibrium state, namely two network parameters are stable, training is finished; at this time, a machine translation result generated by the network is generated, and the discriminant network can be cheated to make the translated text from the parallel corpus; in this case, the generated network model may be used as an output machine translation model.
Further, the machine translation method based on the generation of the antagonistic neural network utilizes the manually labeled bilingual parallel corpus resources and also utilizes the monolingual corpus resources to perform semi-supervised learning.
Further, the machine translation method based on generation of the antagonistic neural network specifically includes:
constructing a bidirectional long-time memory neural network as a discrimination network;
combining the generation network and the discrimination network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished.
And after the training of the generation confrontation network model is completed, the generation network part in the network is used as an output machine translation model for subsequent use.
Another object of the present invention is to provide a machine translation system based on generation of a countering neural network, comprising:
the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a multi-layer perceptron feedforward neural network model is adopted to realize a binary classification discrimination network.
Further, the machine translation system based on generation of the antagonistic neural network further comprises:
generating a network, and combining the generated network with the judgment network to form a complete generated countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
the monolingual corpus is integrated with the parallel corpus to form a semi-supervised corpus, and the semi-supervised corpus trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.
The invention has the advantages and positive effects that:
the invention introduces a discrimination network which is confronted with the original machine translation generation network on the basis of the original machine translation generation network, namely, a coding-decoding structure neural network machine translation model; and the method is used for judging whether the translation of the target language is from the training corpus or is translated by the original machine to generate a network machine translation result.
The invention improves the whole framework system of the existing machine translation method based on the artificial neural network. A machine translation method based on a generation countermeasure network is provided, so that a neural network machine translation model has self-learning capability. The manually labeled bilingual parallel corpus resources are fully utilized, and meanwhile, the monolingual corpus resources can be utilized for semi-supervised learning. The monolingual corpus resources are very rich and easy to obtain, the bottleneck problem that training corpora required by neural network machine translation are insufficient is solved, and the cost of manually marking the corpora can be saved by more than 50%.
After the model is trained, the parameter scale and the operation time of the model in the invention are equivalent to those of the current neural network machine translation model in practical application, and the complexity of the machine translation model in practical use cannot be increased.
Drawings
Fig. 1 is a flowchart of a machine translation method based on generation of an antagonistic neural network according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a machine translation system based on generation of an antagonistic neural network according to an embodiment of the present invention.
In the figure: 1. judging a network; 2. generating a network; 3. monolingual corpus; 4. and (4) parallel corpora.
Fig. 3 is a schematic diagram of a neural network machine translation model based on an "encoding-decoding" structure according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of an LSTM neural network unit provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
At present, the most important defect of the prior art is that the training of the deep neural network model depends heavily on a large-scale manually labeled bilingual parallel sentence-to-corpus. Because the cost of manual labeling is high, and a large-scale and high-quality manual labeling bilingual parallel corpus is lacked, the training data of the neural network machine translation model is insufficient, the performance is poor, and the problem is the bottleneck problem faced by the existing neural network machine translation model; especially in some languages, the parallel corpus resources available for training the neural network model are few and few, and it is difficult to construct a high-performance machine translation system.
The invention adopts a multilayer perceptron feedforward neural network model to construct a discrimination network, and realizes binary classification. The multilayer perceptron neural network model includes an input layer X: { X1,x2,…,xnH, an implicit layer H: { H1,h2,…,hmAnd an output layer Y: { Y: }1,y2}。
The hidden layer function h (x) can be formally expressed as:
;h(x)=T(W1x+b1)
wherein the model parameter W1And b1Respectively representing a weight matrix from an input layer to a hidden layer and a hidden layer bias vector; t (x) is the activation function of the hidden layer, and the hyperbolic tangent function is adopted in the invention:
Figure BDA0001353756230000061
the whole multilayer perceptron neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2);
wherein the model parameter W2And b2Representing the weight matrix of the hidden layer to the output layer and the output layer bias vector, respectively. S (x) is an activation function of the hidden layer, and the sigmoid function is adopted in the invention:
Figure BDA0001353756230000071
when the multilayer perceptron neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate a two-dimensional output vector Y, and the classification represented by the dimension with larger value in Y is selected as a classification result.
The application of the principles of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the machine translation method based on generation of the neural network according to the embodiment of the present invention,
on the basis of the traditional neural network machine translation, introducing another artificial neural network which is confronted with the neural network, and calling the artificial neural network as a discrimination network; the original machine translates the LSTM neural network to be referred to as a generative network. In the generation of the network machine translation model, the model adopted by the generation network is a traditional neural network translation model based on coding-decoding, and the function of the model is to generate a corresponding target language sentence according to an input source language sentence; the model adopted by the discrimination network is a multi-layer perceptron feedforward neural network model, a binary classification function is realized, and each node in the neural countermeasure network is a perceptron. The function of the discrimination network is to judge whether the translation of the target language is from the training corpus or based on the result of the machine translation of the recurrent neural network.
The generation countermeasure network introduces a mechanism of competitive countermeasure between the generation network and the judgment network, and synchronously improves the capability of generating the target language of the generation network and the capability of judging the source of the translation by the judgment network through countermeasure training. In the training process, judging whether the training target of the network is the real data in the corpus or the machine translation result; and the training target of the generated network is that the generated translation result can cheat the discriminant network, so that the discriminant network considers that the result of the machine translation is the result from the real corpus.
The learning process in the machine translation method based on the generation of the antagonistic neural network provided by the embodiment of the invention is changed into a competition process between the generation network and the discrimination network, namely one of the real sample and the sample generated by the generation model is randomly selected, so that the discrimination network judges whether the real sample is true or not. Through the competitive machine learning mechanism, the performance of generating the network and judging the network is continuously improved. When the whole network reaches the Nash equilibrium state, namely two network parameters basically do not change, the training is finished. At this time, the machine translation result generated by the generated network is indicated, and the discriminant network can be cheated, so that the translation is considered to be the source and parallel corpus. In this case, the generated network model may be used as an output machine translation model.
As shown in fig. 2, a machine translation system based on generation of an antagonistic neural network according to an embodiment of the present invention includes:
the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a discrimination network 1 of binary classification is realized by adopting a multi-layer perceptron feedforward neural network model.
The machine translation system based on generation of the antagonistic neural network further comprises:
generating a network 2, and combining the network with the judgment network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
the monolingual corpus 3 is integrated with the parallel corpus 4 to form a semi-supervised corpus which trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.
The invention is further described below in connection with the positive effects.
The embodiment of the invention constructs a long-time memory neural network based on an encoding-decoding structure, and then trains a generated network by using bilingual parallel linguistic data.
The embodiment of the invention constructs another bidirectional long-time and short-time memory neural network as a discrimination network.
The application of the principles of the present invention will now be described in further detail with reference to specific embodiments.
In the embodiment of the invention, the binary classification method comprises the following steps:
in the form of a hyperbolic tangent function:
Figure BDA0001353756230000081
wherein T (x) is the activation function of the hidden layer; h (x) is a hidden layer function;
the whole multi-layer perceptron feedforward neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2),
wherein the model parameter W2And b2Respectively representing a weight matrix from a hidden layer to an output layer and an output layer offset vector; s (x) is the activation function of the hidden layer; the activation function takes the form of a sigmoid function:
Figure BDA0001353756230000082
when the multi-layer perceptron feedforward neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate the output vector Y, the category represented by the dimension with larger value in Y is selected as a classification result, and the translation is indicated to be from the training corpus or from the generation network.
As shown in fig. 3, the generation network consists of two parts, an encoder and a decoder; the encoder adopts a bidirectional Short-Term Memory (LSTM) neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;
then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer obtained by a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.
As shown in FIG. 4, input X of the neural network machine translation modeltAnd ht-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output htRepresenting the output of the LSTM neural network element at the current time;
the method specifically comprises the following steps:
it=g(Wxixt+Whiht-1+bi);
ft=g(Wxfxt+Whfht-1+bf);
ot=g(Wxoxt+Whoht-1+bo);
Figure BDA0001353756230000091
Figure BDA0001353756230000092
ht=ot·tanh(ct);
wherein it、ft、otRespectively representing an input gate, an output gate and forgetting; c. Ct-1Representing the state of the neuron at time t-1, ctAnd
Figure BDA0001353756230000093
representing the state and hidden state of the neuron, htIs the output of the LSTM neuron; parameters W and b represent the connection weight and offset of each layer respectively;
the encoder adopts two LSTM networks, one of which inputs a forward word vector sequence and the other inputs a reverse word vector sequence to form a bidirectional LSTM network, and the vectors output by the two networks are connected to form a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:
Figure BDA0001353756230000101
wherein (theta)12,…,θk) K is the total number of categories of the classifier, i represents a certain classification category; will solveAnd converting the output states of the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.
The discrimination network is used for synchronously improving the capability of generating a network to generate a target language and improving the capability of judging a translation source by the discrimination network through antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;
in the machine translation method based on the generation of the antagonistic neural network, the process of judging network learning is a competition process between a generation network and a judgment network; the method specifically comprises the following steps:
randomly taking one of the real sample and the sample generated by the generation model, and judging whether the real sample is true by the judgment network;
the performance of generating the network and judging the network is continuously improved through a competitive machine learning mechanism; when the whole network reaches a Nash equilibrium state, namely two network parameters are stable, training is finished; at this time, a machine translation result generated by the network is generated, and the discriminant network can be cheated to make the translated text from the parallel corpus; in this case, the generated network model may be used as an output machine translation model.
The machine translation method based on the generation of the antagonistic neural network utilizes the bilingual parallel corpus resources labeled manually and also utilizes the monolingual corpus resources to perform semi-supervised learning.
The machine translation method based on generation of the antagonistic neural network specifically comprises the following steps:
constructing a bidirectional long-time memory neural network as a discrimination network;
combining the generation network and the discrimination network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished.
And after the training of the generation confrontation network model is completed, the generation network part in the network is used as an output machine translation model for subsequent use.
The invention combines the generation network and the discrimination network to form a complete generation countermeasure network. Specifically, an input vector of an encoder in a generating network is connected with an output vector of a decoder, and the input vector is transmitted to a judging network as input; meanwhile, the output result (0 or 1) of the discrimination network is fed back to the generation network.
The invention integrates the parallel language material and the single language material to form a large-scale semi-supervised language material, and the language material is used for training the whole to generate the confrontation network. When the parameters of the generated confrontation network are kept stable, the training is finished.
After the training of the generation of the confrontation network model is completed, the generation network part in the network can be used as an output machine translation model and can be used subsequently, and the specific use method is as follows: and performing word segmentation on the source language, inputting the result after word segmentation into an encoder for generating a network, inputting each source language word into a corresponding neural network node in sequence, and generating an output result of a network decoder, namely the output result is the corresponding target language translation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A machine translation method based on a generation antagonistic neural network is characterized in that the machine translation method based on the generation antagonistic neural network introduces a discrimination network antagonistic to an original machine translation generation network on the basis of the original machine translation generation network; the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification;
the binary classification method comprises the following steps:
in the form of a hyperbolic tangent function:
Figure FDA0002961494070000011
wherein T (x) is the activation function of the hidden layer; h (x) is a hidden layer function;
the whole multi-layer perceptron feedforward neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2),
wherein the model parameter W2And b2Respectively representing a weight matrix from a hidden layer to an output layer and an output layer offset vector; s (x) is the activation function of the hidden layer; the activation function takes the form of a sigmoid function:
Figure FDA0002961494070000012
when the multi-layer perceptron feedforward neural network model carries out binary classification, substituting the input layer vector X into f (X) to calculate an output vector Y, selecting the category represented by the dimension with larger value in Y as a classification result, and indicating whether a translation is from a training corpus or a generation network;
the generation network consists of an encoder and a decoder; the encoder adopts a bidirectional long-time and short-time memory neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;
then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer of a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.
2. The method for machine translation based on generation of an antagonistic neural network of claim 1 characterized in that said neural network machine translation model has input XtAnd ht-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output htRepresenting the output of the LSTM neural network element at the current time;
the method specifically comprises the following steps:
it=g(Wxixt+Whiht-1+bi);
ft=g(Wxfxt+Whfht-1+bf);
ot=g(Wxoxt+Whoht-1+bo);
Figure FDA0002961494070000021
Figure FDA0002961494070000022
ht=ot·tanh(ct);
where g denotes the activation function of the hidden layer, it、ft、otRespectively representing an input gate, an output gate and a forgetting gate; c. Ct-1Representing the state of the neuron at time t-1, ctThe state of the neuron is represented by,
Figure FDA0002961494070000023
representing hidden states of neurons, htIs the output of the LSTM neuron; the parameters W and b represent the connection weight of each layerValue and offset, in particular, WxiRepresenting the weight matrix from the hidden layer to the input layer corresponding to the input gate; wxfRepresenting the weight matrix from the hidden layer corresponding to the output gate to the input layer; wxoRepresenting a weight matrix from a hidden layer corresponding to the forgetting gate to an input layer; wxcRepresenting a weight matrix from a hidden layer to an input layer corresponding to the hidden state; whiA weight matrix representing the hidden layer to the output layer corresponding to the input gate; whfRepresenting the weight matrix from the hidden layer to the output layer corresponding to the output gate; whoRepresenting a weight matrix from a hidden layer corresponding to the forgetting gate to an output layer; whcRepresenting a weight matrix from a hidden layer to an output layer corresponding to the hidden state; bi represents the output layer offset vector corresponding to the input gate; bfRepresenting output layer bias vectors corresponding to the output gates; boRepresenting output layer bias vectors corresponding to the forgetting gates; bcRepresenting the output layer bias vector corresponding to the hidden state.
3. The method of machine translation based on generation of an antagonistic neural network of claim 1 in which the encoder uses two LSTM networks, one input forward word vector sequence and the other input backward word vector sequence, forming a bidirectional LSTM network, concatenating the vectors output by the two networks, forming a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:
Figure FDA0002961494070000031
wherein (theta)1,θ2,……,θk) K is the total number of categories of the classifier, i represents a certain classification category; and converting the states output by the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.
4. The machine translation method based on generation of an antagonistic neural network as claimed in claim 1, wherein the discriminant network is used for synchronously improving the capability of the generation network to generate the target language and improving the capability of the discriminant network to judge the source of the translated text through the antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;
in the machine translation method based on the generation of the antagonistic neural network, the process of judging network learning is a competition process between a generation network and a judgment network; the method specifically comprises the following steps:
randomly taking one of the real sample and the sample generated by the generation model, and judging whether the real sample is true by the judgment network;
the performance of generating the network and judging the network is continuously improved through a competitive machine learning mechanism; when the whole network reaches a Nash equilibrium state, namely two network parameters are stable, training is finished; at this time, a machine translation result generated by the network is generated, and the discriminant network can be cheated to make the translated text from the parallel corpus; in this case, the generated network model may be used as an output machine translation model.
5. The machine translation method based on generation of the countermeasure neural network of claim 1, wherein the machine translation method based on generation of the countermeasure neural network utilizes the bilingual parallel corpus resources labeled manually and also utilizes the monolingual corpus resources to perform semi-supervised learning.
6. The machine translation method based on generation of the antagonistic neural network according to claim 1, specifically comprising:
constructing a bidirectional long-time memory neural network as a discrimination network;
combining the generation network and the discrimination network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished;
and after the training of the generation confrontation network model is completed, the generation network part in the network is used as an output machine translation model for subsequent use.
7. The machine translation system based on the anti-neural network generation based on the machine translation method based on the anti-neural network generation of claim 1, wherein the machine translation system based on the anti-neural network generation comprises:
the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a multi-layer perceptron feedforward neural network model is adopted to realize a binary classification discrimination network.
8. The machine translation system based on generation of an antagonistic neural network of claim 7 further comprising:
generating a network, and combining the generated network with the judgment network to form a complete generated countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
the monolingual corpus is integrated with the parallel corpus to form a semi-supervised corpus, and the semi-supervised corpus trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.
CN201710586841.9A 2017-07-18 2017-07-18 Machine translation method and system based on generation of antagonistic neural network Active CN107368475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710586841.9A CN107368475B (en) 2017-07-18 2017-07-18 Machine translation method and system based on generation of antagonistic neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710586841.9A CN107368475B (en) 2017-07-18 2017-07-18 Machine translation method and system based on generation of antagonistic neural network

Publications (2)

Publication Number Publication Date
CN107368475A CN107368475A (en) 2017-11-21
CN107368475B true CN107368475B (en) 2021-06-04

Family

ID=60308088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710586841.9A Active CN107368475B (en) 2017-07-18 2017-07-18 Machine translation method and system based on generation of antagonistic neural network

Country Status (1)

Country Link
CN (1) CN107368475B (en)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109887494B (en) 2017-12-01 2022-08-16 腾讯科技(深圳)有限公司 Method and apparatus for reconstructing a speech signal
CN107991876A (en) * 2017-12-14 2018-05-04 南京航空航天大学 Aero-engine condition monitoring data creation method based on production confrontation network
CN108304390B (en) * 2017-12-15 2020-10-16 腾讯科技(深圳)有限公司 Translation model-based training method, training device, translation method and storage medium
CN108388549B (en) * 2018-02-26 2021-02-19 腾讯科技(深圳)有限公司 Information conversion method, information conversion device, storage medium and electronic device
CN108415906B (en) * 2018-03-28 2021-08-17 中译语通科技股份有限公司 Automatic identification discourse machine translation method and machine translation system based on field
CN108734276B (en) * 2018-04-28 2021-12-31 同济大学 Simulated learning dialogue generation method based on confrontation generation network
CN108829685A (en) * 2018-05-07 2018-11-16 内蒙古工业大学 A kind of illiteracy Chinese inter-translation method based on single language training
CN108897740A (en) * 2018-05-07 2018-11-27 内蒙古工业大学 A kind of illiteracy Chinese machine translation method based on confrontation neural network
CN108874978B (en) * 2018-06-08 2021-09-10 杭州一知智能科技有限公司 Method for solving conference content abstract task based on layered adaptive segmented network
CN108846130B (en) * 2018-06-29 2021-02-05 北京百度网讯科技有限公司 Question text generation method, device, equipment and medium
CN110750997A (en) * 2018-07-05 2020-02-04 普天信息技术有限公司 Machine translation method and device based on generation countermeasure learning
CN110852066B (en) * 2018-07-25 2021-06-01 清华大学 Multi-language entity relation extraction method and system based on confrontation training mechanism
CN109241540B (en) * 2018-08-07 2020-09-15 中国科学院计算技术研究所 Hanblindness automatic conversion method and system based on deep neural network
CN110874537B (en) * 2018-08-31 2023-06-27 阿里巴巴集团控股有限公司 Method for generating multilingual translation model, translation method and equipment
CN110895935B (en) * 2018-09-13 2023-10-27 阿里巴巴集团控股有限公司 Speech recognition method, system, equipment and medium
US11151334B2 (en) * 2018-09-26 2021-10-19 Huawei Technologies Co., Ltd. Systems and methods for multilingual text generation field
CN109523021B (en) * 2018-09-28 2020-12-11 浙江工业大学 Dynamic network structure prediction method based on long-time and short-time memory network
CN109410179B (en) * 2018-09-28 2021-07-23 合肥工业大学 Image anomaly detection method based on generation countermeasure network
CN109547320B (en) * 2018-09-29 2022-08-30 创新先进技术有限公司 Social contact method, device and equipment
CN109670180B (en) * 2018-12-21 2020-05-08 语联网(武汉)信息技术有限公司 Method and device for translating individual characteristics of vectorized translator
CN109887047B (en) * 2018-12-28 2023-04-07 浙江工业大学 Signal-image translation method based on generation type countermeasure network
CN109902310A (en) * 2019-01-15 2019-06-18 深圳中兴网信科技有限公司 Vocabulary detection method, vocabulary detection system and computer readable storage medium
CN110110337B (en) * 2019-05-08 2023-04-18 网易有道信息技术(北京)有限公司 Translation model training method, medium, device and computing equipment
CN110069790B (en) * 2019-05-10 2022-12-06 东北大学 Machine translation system and method for contrasting original text through translated text retranslation
CN110309512A (en) * 2019-07-05 2019-10-08 北京邮电大学 A kind of Chinese grammer error correction method thereof based on generation confrontation network
CN110334361B (en) * 2019-07-12 2022-11-22 电子科技大学 Neural machine translation method for Chinese language
CN110555247A (en) * 2019-08-16 2019-12-10 华南理工大学 structure damage early warning method based on multipoint sensor data and BilSTM
CN110472255B (en) * 2019-08-20 2021-03-02 腾讯科技(深圳)有限公司 Neural network machine translation method, model, electronic terminal, and storage medium
CN110598221B (en) * 2019-08-29 2020-07-07 内蒙古工业大学 Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN110866395B (en) * 2019-10-30 2023-05-05 语联网(武汉)信息技术有限公司 Word vector generation method and device based on translator editing behaviors
CN110866404B (en) * 2019-10-30 2023-05-05 语联网(武汉)信息技术有限公司 Word vector generation method and device based on LSTM neural network
CN111178094B (en) * 2019-12-20 2023-04-07 沈阳雅译网络技术有限公司 Pre-training-based scarce resource neural machine translation training method
CN111178097B (en) * 2019-12-24 2023-07-04 语联网(武汉)信息技术有限公司 Method and device for generating Zhongtai bilingual corpus based on multistage translation model
CN111310480B (en) * 2020-01-20 2021-12-28 昆明理工大学 Weakly supervised Hanyue bilingual dictionary construction method based on English pivot
CN113283249A (en) * 2020-02-19 2021-08-20 阿里巴巴集团控股有限公司 Machine translation method, device and computer readable storage medium
CN111523308B (en) * 2020-03-18 2024-01-26 大箴(杭州)科技有限公司 Chinese word segmentation method and device and computer equipment
CN111460837A (en) * 2020-03-31 2020-07-28 广州大学 Character-level confrontation sample generation method and device for neural machine translation
CN111914552A (en) * 2020-07-31 2020-11-10 平安科技(深圳)有限公司 Training method and device of data enhancement model
CN112633018B (en) * 2020-12-28 2022-04-15 内蒙古工业大学 Mongolian Chinese neural machine translation method based on data enhancement
CN113343719B (en) * 2021-06-21 2023-03-14 哈尔滨工业大学 Unsupervised bilingual translation dictionary acquisition method for collaborative training by using different word embedding models
CN113642341A (en) * 2021-06-30 2021-11-12 深译信息科技(横琴)有限公司 Deep confrontation generation method for solving scarcity of medical text data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154221A (en) * 2006-09-28 2008-04-02 株式会社东芝 Apparatus performing translation process from inputted speech
DE202017102381U1 (en) * 2017-04-21 2017-05-11 Robert Bosch Gmbh Device for improving the robustness against "Adversarial Examples"

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154221A (en) * 2006-09-28 2008-04-02 株式会社东芝 Apparatus performing translation process from inputted speech
DE202017102381U1 (en) * 2017-04-21 2017-05-11 Robert Bosch Gmbh Device for improving the robustness against "Adversarial Examples"

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Lijun Wu 等.《Adversarial Neural Machine Translation》.《ResearchGate》.2017, *
yunyoubars.《谷歌大脑科学家亲解LSTM:一个关于"遗忘"与"记忆"的故事》.《豆丁网》.2017, *

Also Published As

Publication number Publication date
CN107368475A (en) 2017-11-21

Similar Documents

Publication Publication Date Title
CN107368475B (en) Machine translation method and system based on generation of antagonistic neural network
Koncel-Kedziorski et al. Text generation from knowledge graphs with graph transformers
CN110163299B (en) Visual question-answering method based on bottom-up attention mechanism and memory network
Rastgoo et al. Sign language production: A review
CN108897740A (en) A kind of illiteracy Chinese machine translation method based on confrontation neural network
CN107729311B (en) Chinese text feature extraction method fusing text moods
Li et al. Sentiment infomation based model for chinese text sentiment analysis
Li et al. Insufficient data can also rock! learning to converse using smaller data with augmentation
CN110807320A (en) Short text emotion analysis method based on CNN bidirectional GRU attention mechanism
Ling et al. Context-controlled topic-aware neural response generation for open-domain dialog systems
CN111985205A (en) Aspect level emotion classification model
CN115099409A (en) Text-image enhanced multi-mode knowledge map embedding method
He et al. MF-BERT: Multimodal fusion in pre-trained BERT for sentiment analysis
CN113901208B (en) Method for analyzing emotion tendentiousness of mid-cross language comments blended with theme characteristics
Huo et al. Terg: Topic-aware emotional response generation for chatbot
Maslennikova ELMo Word Representations For News Protection.
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
CN115730232A (en) Topic-correlation-based heterogeneous graph neural network cross-language text classification method
CN115169348A (en) Event extraction method based on hybrid neural network
CN114444481A (en) Sentiment analysis and generation method of news comments
CN113642630A (en) Image description method and system based on dual-path characteristic encoder
CN113255360A (en) Document rating method and device based on hierarchical self-attention network
Jiang et al. An affective chatbot with controlled specific emotion expression
Tong et al. Text classification based on graph convolutional network with attention
Frias et al. Attention-based Bilateral LSTM-CNN for the Sentiment Analysis of Code-mixed Filipino-English Social Media Texts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100040 Shijingshan Road, Shijingshan District, Beijing, No. 20, 16 layer 1601

Applicant after: Chinese translation language through Polytron Technologies Inc

Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor

Applicant before: Mandarin Technology (Beijing) Co., Ltd.

GR01 Patent grant
GR01 Patent grant