CN107368475B - Machine translation method and system based on generation of antagonistic neural network - Google Patents
Machine translation method and system based on generation of antagonistic neural network Download PDFInfo
- Publication number
- CN107368475B CN107368475B CN201710586841.9A CN201710586841A CN107368475B CN 107368475 B CN107368475 B CN 107368475B CN 201710586841 A CN201710586841 A CN 201710586841A CN 107368475 B CN107368475 B CN 107368475B
- Authority
- CN
- China
- Prior art keywords
- network
- generation
- machine translation
- output
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention belongs to the technical field of computers, and discloses a machine translation method and a machine translation system based on a generated countermeasure neural network, wherein the method comprises the following steps: introducing a discrimination network which is in confrontation with the original machine translation generation network on the basis of the original machine translation generation network; the method is used for judging whether the translation of the target language is from a training parallel corpus or a network machine translation result generated by the original machine translation; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification; the system comprises: network discrimination, network generation, monolingual corpus and parallel corpus. The invention can fully utilize the bilingual parallel corpus resources marked by manpower and simultaneously can fully utilize the monolingual corpus resources to carry out semi-supervised learning; the monolingual corpus resources are very rich and easy to obtain, and the problem that the corpus required by a neural network machine translation model is not sufficient is solved.
Description
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a machine translation method and system based on generation of a confrontation neural network.
Background
Machine translation is the process of automatically translating a sentence in a source language into another sentence in a target language using computer algorithms. Machine translation is a research direction of artificial intelligence, and has very important scientific research value and practical value. Along with the continuous deepening of the globalization process and the rapid development of the internet, the machine translation technology plays an increasingly important role in political, economic, social, cultural communication and the like at home and abroad.
At present, a machine translation method based on a deep neural network is the best method in the field of machine translation. The encoding-decoding structure is mainly adopted and comprises an encoder and a decoder, wherein the encoder and the decoder both adopt a Recurrent Neural Network (RNN) and Long-Short-Term Memory (LSTM) Network structure. The translation process comprises the following steps: first, the encoder converts the input source language sentence into a word vector sequence as the input of the recurrent neural network, and the encoder outputs a dense vector of fixed length, called the context vector. The decoder then uses another recurrent neural network in conjunction with a Softmax classifier, with the context vector as input, to output a sequence of word vectors in the target language. Finally, the word vectors are mapped into target language words one by utilizing the dictionary, and the whole translation process is completed.
In summary, the problems of the prior art are as follows:
the main defect of the prior art is that the training of the deep neural network model depends heavily on a large-scale manually labeled bilingual parallel sentence-to-corpus. Because the cost of manual labeling is high, and a large-scale and high-quality manual labeling bilingual parallel corpus is lacked, the training data of the neural network machine translation model is insufficient, the performance is poor, and the problem is the bottleneck problem faced by the existing neural network machine translation model; especially in some languages, the parallel corpus resources available for training the neural network model are few and few, and it is difficult to construct a high-performance machine translation system.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a machine translation method and a machine translation system based on generation of an antagonistic neural network.
The invention is realized by the machine translation method based on the generation of the antagonistic neural network, which introduces a discrimination network antagonistic to the original machine translation generation network on the basis of the original machine translation generation network; the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification.
Further, the binary classification method comprises the following steps:
in the form of a hyperbolic tangent function:
wherein T (x) is the activation function of the hidden layer; h (x) is a hidden layer function;
the whole multi-layer perceptron feedforward neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2),
wherein the model parameter W2And b2Respectively representing a weight matrix from a hidden layer to an output layer and an output layer offset vector; s (x) is the activation function of the hidden layer; the activation function takes the form of a sigmoid function:
when the multi-layer perceptron feedforward neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate the output vector Y, the category represented by the dimension with larger value in Y is selected as a classification result, and the translation is indicated to be from the training corpus or from the generation network.
Further, the generation network consists of an encoder and a decoder; the encoder adopts a bidirectional Short-Term Memory (LSTM) neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;
then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer obtained by a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.
Further, the input X of the neural network machine translation modeltAnd ht-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output htRepresenting the output of the LSTM neural network element at the current time;
the method specifically comprises the following steps:
it=g(Wxixt+Whiht-1+bi);
ft=g(Wxfxt+Whfht-1+bf);
ot=g(Wxoxt+Whoht-1+bo);
ht=ot·tanh(ct);
wherein it、ft、otRespectively representing an input gate, an output gate and forgetting; c. Ct-1Representing the state of the neuron at time t-1, ctAndrepresenting the state and hidden state of the neuron, htIs the output of the LSTM neuron; parameters W and b represent the connection weight and offset of each layer respectively;
further, the encoder adopts two LSTM networks, one of which inputs a forward word vector sequence and the other inputs a reverse word vector sequence to form a bidirectional LSTM network, and the vectors output by the two networks are connected to form a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:
wherein (theta)1,θ2,…,θk) K is the total number of categories of the classifier, i represents a certain classification category;and converting the states output by the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.
Further, the discrimination network is used for synchronously improving the capability of generating a network to generate a target language and improving the capability of judging a translation source by the discrimination network through antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;
in the machine translation method based on the generation of the antagonistic neural network, the process of judging network learning is a competition process between a generation network and a judgment network; the method specifically comprises the following steps:
randomly taking one of the real sample and the sample generated by the generation model, and judging whether the real sample is true by the judgment network;
the performance of generating the network and judging the network is continuously improved through a competitive machine learning mechanism; when the whole network reaches a Nash equilibrium state, namely two network parameters are stable, training is finished; at this time, a machine translation result generated by the network is generated, and the discriminant network can be cheated to make the translated text from the parallel corpus; in this case, the generated network model may be used as an output machine translation model.
Further, the machine translation method based on the generation of the antagonistic neural network utilizes the manually labeled bilingual parallel corpus resources and also utilizes the monolingual corpus resources to perform semi-supervised learning.
Further, the machine translation method based on generation of the antagonistic neural network specifically includes:
constructing a bidirectional long-time memory neural network as a discrimination network;
combining the generation network and the discrimination network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished.
And after the training of the generation confrontation network model is completed, the generation network part in the network is used as an output machine translation model for subsequent use.
Another object of the present invention is to provide a machine translation system based on generation of a countering neural network, comprising:
the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a multi-layer perceptron feedforward neural network model is adopted to realize a binary classification discrimination network.
Further, the machine translation system based on generation of the antagonistic neural network further comprises:
generating a network, and combining the generated network with the judgment network to form a complete generated countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
the monolingual corpus is integrated with the parallel corpus to form a semi-supervised corpus, and the semi-supervised corpus trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.
The invention has the advantages and positive effects that:
the invention introduces a discrimination network which is confronted with the original machine translation generation network on the basis of the original machine translation generation network, namely, a coding-decoding structure neural network machine translation model; and the method is used for judging whether the translation of the target language is from the training corpus or is translated by the original machine to generate a network machine translation result.
The invention improves the whole framework system of the existing machine translation method based on the artificial neural network. A machine translation method based on a generation countermeasure network is provided, so that a neural network machine translation model has self-learning capability. The manually labeled bilingual parallel corpus resources are fully utilized, and meanwhile, the monolingual corpus resources can be utilized for semi-supervised learning. The monolingual corpus resources are very rich and easy to obtain, the bottleneck problem that training corpora required by neural network machine translation are insufficient is solved, and the cost of manually marking the corpora can be saved by more than 50%.
After the model is trained, the parameter scale and the operation time of the model in the invention are equivalent to those of the current neural network machine translation model in practical application, and the complexity of the machine translation model in practical use cannot be increased.
Drawings
Fig. 1 is a flowchart of a machine translation method based on generation of an antagonistic neural network according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a machine translation system based on generation of an antagonistic neural network according to an embodiment of the present invention.
In the figure: 1. judging a network; 2. generating a network; 3. monolingual corpus; 4. and (4) parallel corpora.
Fig. 3 is a schematic diagram of a neural network machine translation model based on an "encoding-decoding" structure according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of an LSTM neural network unit provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
At present, the most important defect of the prior art is that the training of the deep neural network model depends heavily on a large-scale manually labeled bilingual parallel sentence-to-corpus. Because the cost of manual labeling is high, and a large-scale and high-quality manual labeling bilingual parallel corpus is lacked, the training data of the neural network machine translation model is insufficient, the performance is poor, and the problem is the bottleneck problem faced by the existing neural network machine translation model; especially in some languages, the parallel corpus resources available for training the neural network model are few and few, and it is difficult to construct a high-performance machine translation system.
The invention adopts a multilayer perceptron feedforward neural network model to construct a discrimination network, and realizes binary classification. The multilayer perceptron neural network model includes an input layer X: { X1,x2,…,xnH, an implicit layer H: { H1,h2,…,hmAnd an output layer Y: { Y: }1,y2}。
The hidden layer function h (x) can be formally expressed as:
;h(x)=T(W1x+b1)
wherein the model parameter W1And b1Respectively representing a weight matrix from an input layer to a hidden layer and a hidden layer bias vector; t (x) is the activation function of the hidden layer, and the hyperbolic tangent function is adopted in the invention:
the whole multilayer perceptron neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2);
wherein the model parameter W2And b2Representing the weight matrix of the hidden layer to the output layer and the output layer bias vector, respectively. S (x) is an activation function of the hidden layer, and the sigmoid function is adopted in the invention:
when the multilayer perceptron neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate a two-dimensional output vector Y, and the classification represented by the dimension with larger value in Y is selected as a classification result.
The application of the principles of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the machine translation method based on generation of the neural network according to the embodiment of the present invention,
on the basis of the traditional neural network machine translation, introducing another artificial neural network which is confronted with the neural network, and calling the artificial neural network as a discrimination network; the original machine translates the LSTM neural network to be referred to as a generative network. In the generation of the network machine translation model, the model adopted by the generation network is a traditional neural network translation model based on coding-decoding, and the function of the model is to generate a corresponding target language sentence according to an input source language sentence; the model adopted by the discrimination network is a multi-layer perceptron feedforward neural network model, a binary classification function is realized, and each node in the neural countermeasure network is a perceptron. The function of the discrimination network is to judge whether the translation of the target language is from the training corpus or based on the result of the machine translation of the recurrent neural network.
The generation countermeasure network introduces a mechanism of competitive countermeasure between the generation network and the judgment network, and synchronously improves the capability of generating the target language of the generation network and the capability of judging the source of the translation by the judgment network through countermeasure training. In the training process, judging whether the training target of the network is the real data in the corpus or the machine translation result; and the training target of the generated network is that the generated translation result can cheat the discriminant network, so that the discriminant network considers that the result of the machine translation is the result from the real corpus.
The learning process in the machine translation method based on the generation of the antagonistic neural network provided by the embodiment of the invention is changed into a competition process between the generation network and the discrimination network, namely one of the real sample and the sample generated by the generation model is randomly selected, so that the discrimination network judges whether the real sample is true or not. Through the competitive machine learning mechanism, the performance of generating the network and judging the network is continuously improved. When the whole network reaches the Nash equilibrium state, namely two network parameters basically do not change, the training is finished. At this time, the machine translation result generated by the generated network is indicated, and the discriminant network can be cheated, so that the translation is considered to be the source and parallel corpus. In this case, the generated network model may be used as an output machine translation model.
As shown in fig. 2, a machine translation system based on generation of an antagonistic neural network according to an embodiment of the present invention includes:
the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a discrimination network 1 of binary classification is realized by adopting a multi-layer perceptron feedforward neural network model.
The machine translation system based on generation of the antagonistic neural network further comprises:
generating a network 2, and combining the network with the judgment network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
the monolingual corpus 3 is integrated with the parallel corpus 4 to form a semi-supervised corpus which trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.
The invention is further described below in connection with the positive effects.
The embodiment of the invention constructs a long-time memory neural network based on an encoding-decoding structure, and then trains a generated network by using bilingual parallel linguistic data.
The embodiment of the invention constructs another bidirectional long-time and short-time memory neural network as a discrimination network.
The application of the principles of the present invention will now be described in further detail with reference to specific embodiments.
In the embodiment of the invention, the binary classification method comprises the following steps:
in the form of a hyperbolic tangent function:
wherein T (x) is the activation function of the hidden layer; h (x) is a hidden layer function;
the whole multi-layer perceptron feedforward neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2),
wherein the model parameter W2And b2Respectively representing a weight matrix from a hidden layer to an output layer and an output layer offset vector; s (x) is the activation function of the hidden layer; the activation function takes the form of a sigmoid function:
when the multi-layer perceptron feedforward neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate the output vector Y, the category represented by the dimension with larger value in Y is selected as a classification result, and the translation is indicated to be from the training corpus or from the generation network.
As shown in fig. 3, the generation network consists of two parts, an encoder and a decoder; the encoder adopts a bidirectional Short-Term Memory (LSTM) neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;
then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer obtained by a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.
As shown in FIG. 4, input X of the neural network machine translation modeltAnd ht-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output htRepresenting the output of the LSTM neural network element at the current time;
the method specifically comprises the following steps:
it=g(Wxixt+Whiht-1+bi);
ft=g(Wxfxt+Whfht-1+bf);
ot=g(Wxoxt+Whoht-1+bo);
ht=ot·tanh(ct);
wherein it、ft、otRespectively representing an input gate, an output gate and forgetting; c. Ct-1Representing the state of the neuron at time t-1, ctAndrepresenting the state and hidden state of the neuron, htIs the output of the LSTM neuron; parameters W and b represent the connection weight and offset of each layer respectively;
the encoder adopts two LSTM networks, one of which inputs a forward word vector sequence and the other inputs a reverse word vector sequence to form a bidirectional LSTM network, and the vectors output by the two networks are connected to form a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:
wherein (theta)1,θ2,…,θk) K is the total number of categories of the classifier, i represents a certain classification category; will solveAnd converting the output states of the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.
The discrimination network is used for synchronously improving the capability of generating a network to generate a target language and improving the capability of judging a translation source by the discrimination network through antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;
in the machine translation method based on the generation of the antagonistic neural network, the process of judging network learning is a competition process between a generation network and a judgment network; the method specifically comprises the following steps:
randomly taking one of the real sample and the sample generated by the generation model, and judging whether the real sample is true by the judgment network;
the performance of generating the network and judging the network is continuously improved through a competitive machine learning mechanism; when the whole network reaches a Nash equilibrium state, namely two network parameters are stable, training is finished; at this time, a machine translation result generated by the network is generated, and the discriminant network can be cheated to make the translated text from the parallel corpus; in this case, the generated network model may be used as an output machine translation model.
The machine translation method based on the generation of the antagonistic neural network utilizes the bilingual parallel corpus resources labeled manually and also utilizes the monolingual corpus resources to perform semi-supervised learning.
The machine translation method based on generation of the antagonistic neural network specifically comprises the following steps:
constructing a bidirectional long-time memory neural network as a discrimination network;
combining the generation network and the discrimination network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished.
And after the training of the generation confrontation network model is completed, the generation network part in the network is used as an output machine translation model for subsequent use.
The invention combines the generation network and the discrimination network to form a complete generation countermeasure network. Specifically, an input vector of an encoder in a generating network is connected with an output vector of a decoder, and the input vector is transmitted to a judging network as input; meanwhile, the output result (0 or 1) of the discrimination network is fed back to the generation network.
The invention integrates the parallel language material and the single language material to form a large-scale semi-supervised language material, and the language material is used for training the whole to generate the confrontation network. When the parameters of the generated confrontation network are kept stable, the training is finished.
After the training of the generation of the confrontation network model is completed, the generation network part in the network can be used as an output machine translation model and can be used subsequently, and the specific use method is as follows: and performing word segmentation on the source language, inputting the result after word segmentation into an encoder for generating a network, inputting each source language word into a corresponding neural network node in sequence, and generating an output result of a network decoder, namely the output result is the corresponding target language translation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (8)
1. A machine translation method based on a generation antagonistic neural network is characterized in that the machine translation method based on the generation antagonistic neural network introduces a discrimination network antagonistic to an original machine translation generation network on the basis of the original machine translation generation network; the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification;
the binary classification method comprises the following steps:
in the form of a hyperbolic tangent function:
wherein T (x) is the activation function of the hidden layer; h (x) is a hidden layer function;
the whole multi-layer perceptron feedforward neural network model function f (x) can be formally expressed as:
f(x)=S(W2·h(x)+b2)=S(W2·T(W1x+b1)+b2),
wherein the model parameter W2And b2Respectively representing a weight matrix from a hidden layer to an output layer and an output layer offset vector; s (x) is the activation function of the hidden layer; the activation function takes the form of a sigmoid function:
when the multi-layer perceptron feedforward neural network model carries out binary classification, substituting the input layer vector X into f (X) to calculate an output vector Y, selecting the category represented by the dimension with larger value in Y as a classification result, and indicating whether a translation is from a training corpus or a generation network;
the generation network consists of an encoder and a decoder; the encoder adopts a bidirectional long-time and short-time memory neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;
then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer of a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.
2. The method for machine translation based on generation of an antagonistic neural network of claim 1 characterized in that said neural network machine translation model has input XtAnd ht-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output htRepresenting the output of the LSTM neural network element at the current time;
the method specifically comprises the following steps:
it=g(Wxixt+Whiht-1+bi);
ft=g(Wxfxt+Whfht-1+bf);
ot=g(Wxoxt+Whoht-1+bo);
ht=ot·tanh(ct);
where g denotes the activation function of the hidden layer, it、ft、otRespectively representing an input gate, an output gate and a forgetting gate; c. Ct-1Representing the state of the neuron at time t-1, ctThe state of the neuron is represented by,representing hidden states of neurons, htIs the output of the LSTM neuron; the parameters W and b represent the connection weight of each layerValue and offset, in particular, WxiRepresenting the weight matrix from the hidden layer to the input layer corresponding to the input gate; wxfRepresenting the weight matrix from the hidden layer corresponding to the output gate to the input layer; wxoRepresenting a weight matrix from a hidden layer corresponding to the forgetting gate to an input layer; wxcRepresenting a weight matrix from a hidden layer to an input layer corresponding to the hidden state; whiA weight matrix representing the hidden layer to the output layer corresponding to the input gate; whfRepresenting the weight matrix from the hidden layer to the output layer corresponding to the output gate; whoRepresenting a weight matrix from a hidden layer corresponding to the forgetting gate to an output layer; whcRepresenting a weight matrix from a hidden layer to an output layer corresponding to the hidden state; bi represents the output layer offset vector corresponding to the input gate; bfRepresenting output layer bias vectors corresponding to the output gates; boRepresenting output layer bias vectors corresponding to the forgetting gates; bcRepresenting the output layer bias vector corresponding to the hidden state.
3. The method of machine translation based on generation of an antagonistic neural network of claim 1 in which the encoder uses two LSTM networks, one input forward word vector sequence and the other input backward word vector sequence, forming a bidirectional LSTM network, concatenating the vectors output by the two networks, forming a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:
wherein (theta)1,θ2,……,θk) K is the total number of categories of the classifier, i represents a certain classification category; and converting the states output by the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.
4. The machine translation method based on generation of an antagonistic neural network as claimed in claim 1, wherein the discriminant network is used for synchronously improving the capability of the generation network to generate the target language and improving the capability of the discriminant network to judge the source of the translated text through the antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;
in the machine translation method based on the generation of the antagonistic neural network, the process of judging network learning is a competition process between a generation network and a judgment network; the method specifically comprises the following steps:
randomly taking one of the real sample and the sample generated by the generation model, and judging whether the real sample is true by the judgment network;
the performance of generating the network and judging the network is continuously improved through a competitive machine learning mechanism; when the whole network reaches a Nash equilibrium state, namely two network parameters are stable, training is finished; at this time, a machine translation result generated by the network is generated, and the discriminant network can be cheated to make the translated text from the parallel corpus; in this case, the generated network model may be used as an output machine translation model.
5. The machine translation method based on generation of the countermeasure neural network of claim 1, wherein the machine translation method based on generation of the countermeasure neural network utilizes the bilingual parallel corpus resources labeled manually and also utilizes the monolingual corpus resources to perform semi-supervised learning.
6. The machine translation method based on generation of the antagonistic neural network according to claim 1, specifically comprising:
constructing a bidirectional long-time memory neural network as a discrimination network;
combining the generation network and the discrimination network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished;
and after the training of the generation confrontation network model is completed, the generation network part in the network is used as an output machine translation model for subsequent use.
7. The machine translation system based on the anti-neural network generation based on the machine translation method based on the anti-neural network generation of claim 1, wherein the machine translation system based on the anti-neural network generation comprises:
the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a multi-layer perceptron feedforward neural network model is adopted to realize a binary classification discrimination network.
8. The machine translation system based on generation of an antagonistic neural network of claim 7 further comprising:
generating a network, and combining the generated network with the judgment network to form a complete generated countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;
the monolingual corpus is integrated with the parallel corpus to form a semi-supervised corpus, and the semi-supervised corpus trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710586841.9A CN107368475B (en) | 2017-07-18 | 2017-07-18 | Machine translation method and system based on generation of antagonistic neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710586841.9A CN107368475B (en) | 2017-07-18 | 2017-07-18 | Machine translation method and system based on generation of antagonistic neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107368475A CN107368475A (en) | 2017-11-21 |
CN107368475B true CN107368475B (en) | 2021-06-04 |
Family
ID=60308088
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710586841.9A Active CN107368475B (en) | 2017-07-18 | 2017-07-18 | Machine translation method and system based on generation of antagonistic neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107368475B (en) |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109887494B (en) | 2017-12-01 | 2022-08-16 | 腾讯科技(深圳)有限公司 | Method and apparatus for reconstructing a speech signal |
CN107991876A (en) * | 2017-12-14 | 2018-05-04 | 南京航空航天大学 | Aero-engine condition monitoring data creation method based on production confrontation network |
CN108304390B (en) * | 2017-12-15 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Translation model-based training method, training device, translation method and storage medium |
CN108388549B (en) * | 2018-02-26 | 2021-02-19 | 腾讯科技(深圳)有限公司 | Information conversion method, information conversion device, storage medium and electronic device |
CN108415906B (en) * | 2018-03-28 | 2021-08-17 | 中译语通科技股份有限公司 | Automatic identification discourse machine translation method and machine translation system based on field |
CN108734276B (en) * | 2018-04-28 | 2021-12-31 | 同济大学 | Simulated learning dialogue generation method based on confrontation generation network |
CN108829685A (en) * | 2018-05-07 | 2018-11-16 | 内蒙古工业大学 | A kind of illiteracy Chinese inter-translation method based on single language training |
CN108897740A (en) * | 2018-05-07 | 2018-11-27 | 内蒙古工业大学 | A kind of illiteracy Chinese machine translation method based on confrontation neural network |
CN108874978B (en) * | 2018-06-08 | 2021-09-10 | 杭州一知智能科技有限公司 | Method for solving conference content abstract task based on layered adaptive segmented network |
CN108846130B (en) * | 2018-06-29 | 2021-02-05 | 北京百度网讯科技有限公司 | Question text generation method, device, equipment and medium |
CN110750997A (en) * | 2018-07-05 | 2020-02-04 | 普天信息技术有限公司 | Machine translation method and device based on generation countermeasure learning |
CN110852066B (en) * | 2018-07-25 | 2021-06-01 | 清华大学 | Multi-language entity relation extraction method and system based on confrontation training mechanism |
CN109241540B (en) * | 2018-08-07 | 2020-09-15 | 中国科学院计算技术研究所 | Hanblindness automatic conversion method and system based on deep neural network |
CN110874537B (en) * | 2018-08-31 | 2023-06-27 | 阿里巴巴集团控股有限公司 | Method for generating multilingual translation model, translation method and equipment |
CN110895935B (en) * | 2018-09-13 | 2023-10-27 | 阿里巴巴集团控股有限公司 | Speech recognition method, system, equipment and medium |
US11151334B2 (en) * | 2018-09-26 | 2021-10-19 | Huawei Technologies Co., Ltd. | Systems and methods for multilingual text generation field |
CN109523021B (en) * | 2018-09-28 | 2020-12-11 | 浙江工业大学 | Dynamic network structure prediction method based on long-time and short-time memory network |
CN109410179B (en) * | 2018-09-28 | 2021-07-23 | 合肥工业大学 | Image anomaly detection method based on generation countermeasure network |
CN109547320B (en) * | 2018-09-29 | 2022-08-30 | 创新先进技术有限公司 | Social contact method, device and equipment |
CN109670180B (en) * | 2018-12-21 | 2020-05-08 | 语联网(武汉)信息技术有限公司 | Method and device for translating individual characteristics of vectorized translator |
CN109887047B (en) * | 2018-12-28 | 2023-04-07 | 浙江工业大学 | Signal-image translation method based on generation type countermeasure network |
CN109902310A (en) * | 2019-01-15 | 2019-06-18 | 深圳中兴网信科技有限公司 | Vocabulary detection method, vocabulary detection system and computer readable storage medium |
CN110110337B (en) * | 2019-05-08 | 2023-04-18 | 网易有道信息技术(北京)有限公司 | Translation model training method, medium, device and computing equipment |
CN110069790B (en) * | 2019-05-10 | 2022-12-06 | 东北大学 | Machine translation system and method for contrasting original text through translated text retranslation |
CN110309512A (en) * | 2019-07-05 | 2019-10-08 | 北京邮电大学 | A kind of Chinese grammer error correction method thereof based on generation confrontation network |
CN110334361B (en) * | 2019-07-12 | 2022-11-22 | 电子科技大学 | Neural machine translation method for Chinese language |
CN110555247A (en) * | 2019-08-16 | 2019-12-10 | 华南理工大学 | structure damage early warning method based on multipoint sensor data and BilSTM |
CN110472255B (en) * | 2019-08-20 | 2021-03-02 | 腾讯科技(深圳)有限公司 | Neural network machine translation method, model, electronic terminal, and storage medium |
CN110598221B (en) * | 2019-08-29 | 2020-07-07 | 内蒙古工业大学 | Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network |
CN110866395B (en) * | 2019-10-30 | 2023-05-05 | 语联网(武汉)信息技术有限公司 | Word vector generation method and device based on translator editing behaviors |
CN110866404B (en) * | 2019-10-30 | 2023-05-05 | 语联网(武汉)信息技术有限公司 | Word vector generation method and device based on LSTM neural network |
CN111178094B (en) * | 2019-12-20 | 2023-04-07 | 沈阳雅译网络技术有限公司 | Pre-training-based scarce resource neural machine translation training method |
CN111178097B (en) * | 2019-12-24 | 2023-07-04 | 语联网(武汉)信息技术有限公司 | Method and device for generating Zhongtai bilingual corpus based on multistage translation model |
CN111310480B (en) * | 2020-01-20 | 2021-12-28 | 昆明理工大学 | Weakly supervised Hanyue bilingual dictionary construction method based on English pivot |
CN113283249A (en) * | 2020-02-19 | 2021-08-20 | 阿里巴巴集团控股有限公司 | Machine translation method, device and computer readable storage medium |
CN111523308B (en) * | 2020-03-18 | 2024-01-26 | 大箴(杭州)科技有限公司 | Chinese word segmentation method and device and computer equipment |
CN111460837A (en) * | 2020-03-31 | 2020-07-28 | 广州大学 | Character-level confrontation sample generation method and device for neural machine translation |
CN111914552A (en) * | 2020-07-31 | 2020-11-10 | 平安科技(深圳)有限公司 | Training method and device of data enhancement model |
CN112633018B (en) * | 2020-12-28 | 2022-04-15 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on data enhancement |
CN113343719B (en) * | 2021-06-21 | 2023-03-14 | 哈尔滨工业大学 | Unsupervised bilingual translation dictionary acquisition method for collaborative training by using different word embedding models |
CN113642341A (en) * | 2021-06-30 | 2021-11-12 | 深译信息科技(横琴)有限公司 | Deep confrontation generation method for solving scarcity of medical text data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101154221A (en) * | 2006-09-28 | 2008-04-02 | 株式会社东芝 | Apparatus performing translation process from inputted speech |
DE202017102381U1 (en) * | 2017-04-21 | 2017-05-11 | Robert Bosch Gmbh | Device for improving the robustness against "Adversarial Examples" |
-
2017
- 2017-07-18 CN CN201710586841.9A patent/CN107368475B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101154221A (en) * | 2006-09-28 | 2008-04-02 | 株式会社东芝 | Apparatus performing translation process from inputted speech |
DE202017102381U1 (en) * | 2017-04-21 | 2017-05-11 | Robert Bosch Gmbh | Device for improving the robustness against "Adversarial Examples" |
Non-Patent Citations (2)
Title |
---|
Lijun Wu 等.《Adversarial Neural Machine Translation》.《ResearchGate》.2017, * |
yunyoubars.《谷歌大脑科学家亲解LSTM:一个关于"遗忘"与"记忆"的故事》.《豆丁网》.2017, * |
Also Published As
Publication number | Publication date |
---|---|
CN107368475A (en) | 2017-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107368475B (en) | Machine translation method and system based on generation of antagonistic neural network | |
Koncel-Kedziorski et al. | Text generation from knowledge graphs with graph transformers | |
CN110163299B (en) | Visual question-answering method based on bottom-up attention mechanism and memory network | |
Rastgoo et al. | Sign language production: A review | |
CN108897740A (en) | A kind of illiteracy Chinese machine translation method based on confrontation neural network | |
CN107729311B (en) | Chinese text feature extraction method fusing text moods | |
Li et al. | Sentiment infomation based model for chinese text sentiment analysis | |
Li et al. | Insufficient data can also rock! learning to converse using smaller data with augmentation | |
CN110807320A (en) | Short text emotion analysis method based on CNN bidirectional GRU attention mechanism | |
Ling et al. | Context-controlled topic-aware neural response generation for open-domain dialog systems | |
CN111985205A (en) | Aspect level emotion classification model | |
CN115099409A (en) | Text-image enhanced multi-mode knowledge map embedding method | |
He et al. | MF-BERT: Multimodal fusion in pre-trained BERT for sentiment analysis | |
CN113901208B (en) | Method for analyzing emotion tendentiousness of mid-cross language comments blended with theme characteristics | |
Huo et al. | Terg: Topic-aware emotional response generation for chatbot | |
Maslennikova | ELMo Word Representations For News Protection. | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism | |
CN115730232A (en) | Topic-correlation-based heterogeneous graph neural network cross-language text classification method | |
CN115169348A (en) | Event extraction method based on hybrid neural network | |
CN114444481A (en) | Sentiment analysis and generation method of news comments | |
CN113642630A (en) | Image description method and system based on dual-path characteristic encoder | |
CN113255360A (en) | Document rating method and device based on hierarchical self-attention network | |
Jiang et al. | An affective chatbot with controlled specific emotion expression | |
Tong et al. | Text classification based on graph convolutional network with attention | |
Frias et al. | Attention-based Bilateral LSTM-CNN for the Sentiment Analysis of Code-mixed Filipino-English Social Media Texts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100040 Shijingshan Road, Shijingshan District, Beijing, No. 20, 16 layer 1601 Applicant after: Chinese translation language through Polytron Technologies Inc Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor Applicant before: Mandarin Technology (Beijing) Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |