CN107368475B

CN107368475B - Machine translation method and system based on generation of antagonistic neural network

Info

Publication number: CN107368475B
Application number: CN201710586841.9A
Authority: CN
Inventors: 李世奇; 程国艮
Original assignee: Global Tone Communication Technology Co ltd
Current assignee: Global Tone Communication Technology Co ltd
Priority date: 2017-07-18
Filing date: 2017-07-18
Publication date: 2021-06-04
Anticipated expiration: 2037-07-18
Also published as: CN107368475A

Abstract

The invention belongs to the technical field of computers, and discloses a machine translation method and a machine translation system based on a generated countermeasure neural network, wherein the method comprises the following steps: introducing a discrimination network which is in confrontation with the original machine translation generation network on the basis of the original machine translation generation network; the method is used for judging whether the translation of the target language is from a training parallel corpus or a network machine translation result generated by the original machine translation; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification; the system comprises: network discrimination, network generation, monolingual corpus and parallel corpus. The invention can fully utilize the bilingual parallel corpus resources marked by manpower and simultaneously can fully utilize the monolingual corpus resources to carry out semi-supervised learning; the monolingual corpus resources are very rich and easy to obtain, and the problem that the corpus required by a neural network machine translation model is not sufficient is solved.

Description

Machine translation method and system based on generation of antagonistic neural network

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a machine translation method and system based on generation of a confrontation neural network.

Background

Machine translation is the process of automatically translating a sentence in a source language into another sentence in a target language using computer algorithms. Machine translation is a research direction of artificial intelligence, and has very important scientific research value and practical value. Along with the continuous deepening of the globalization process and the rapid development of the internet, the machine translation technology plays an increasingly important role in political, economic, social, cultural communication and the like at home and abroad.

At present, a machine translation method based on a deep neural network is the best method in the field of machine translation. The encoding-decoding structure is mainly adopted and comprises an encoder and a decoder, wherein the encoder and the decoder both adopt a Recurrent Neural Network (RNN) and Long-Short-Term Memory (LSTM) Network structure. The translation process comprises the following steps: first, the encoder converts the input source language sentence into a word vector sequence as the input of the recurrent neural network, and the encoder outputs a dense vector of fixed length, called the context vector. The decoder then uses another recurrent neural network in conjunction with a Softmax classifier, with the context vector as input, to output a sequence of word vectors in the target language. Finally, the word vectors are mapped into target language words one by utilizing the dictionary, and the whole translation process is completed.

In summary, the problems of the prior art are as follows:

the main defect of the prior art is that the training of the deep neural network model depends heavily on a large-scale manually labeled bilingual parallel sentence-to-corpus. Because the cost of manual labeling is high, and a large-scale and high-quality manual labeling bilingual parallel corpus is lacked, the training data of the neural network machine translation model is insufficient, the performance is poor, and the problem is the bottleneck problem faced by the existing neural network machine translation model; especially in some languages, the parallel corpus resources available for training the neural network model are few and few, and it is difficult to construct a high-performance machine translation system.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a machine translation method and a machine translation system based on generation of an antagonistic neural network.

The invention is realized by the machine translation method based on the generation of the antagonistic neural network, which introduces a discrimination network antagonistic to the original machine translation generation network on the basis of the original machine translation generation network; the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification.

Further, the binary classification method comprises the following steps:

in the form of a hyperbolic tangent function:

wherein T (x) is the activation function of the hidden layer; h (x) is a hidden layer function;

the whole multi-layer perceptron feedforward neural network model function f (x) can be formally expressed as:

f(x)＝S(W₂·h(x)+b₂)＝S(W₂·T(W₁x+b₁)+b₂)，

wherein the model parameter W₂And b₂Respectively representing a weight matrix from a hidden layer to an output layer and an output layer offset vector; s (x) is the activation function of the hidden layer; the activation function takes the form of a sigmoid function:

when the multi-layer perceptron feedforward neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate the output vector Y, the category represented by the dimension with larger value in Y is selected as a classification result, and the translation is indicated to be from the training corpus or from the generation network.

Further, the generation network consists of an encoder and a decoder; the encoder adopts a bidirectional Short-Term Memory (LSTM) neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;

then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer obtained by a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.

Further, the input X of the neural network machine translation model_tAnd h_t-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output h_tRepresenting the output of the LSTM neural network element at the current time;

the method specifically comprises the following steps:

i_t＝g(W_xix_t+W_hih_t-1+b_i)；

f_t＝g(W_xfx_t+W_hfh_t-1+b_f)；

o_t＝g(W_xox_t+W_hoh_t-1+b_o)；

h_t＝o_t·tanh(c_t)；

wherein i_t、f_t、o_tRespectively representing an input gate, an output gate and forgetting; c. C_t-1Representing the state of the neuron at time t-1, c_tAnd

representing the state and hidden state of the neuron, h_tIs the output of the LSTM neuron; parameters W and b represent the connection weight and offset of each layer respectively;

further, the encoder adopts two LSTM networks, one of which inputs a forward word vector sequence and the other inputs a reverse word vector sequence to form a bidirectional LSTM network, and the vectors output by the two networks are connected to form a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:

wherein (theta)₁,θ₂,…,θ_k) K is the total number of categories of the classifier, i represents a certain classification category;and converting the states output by the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.

Further, the discrimination network is used for synchronously improving the capability of generating a network to generate a target language and improving the capability of judging a translation source by the discrimination network through antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;

in the machine translation method based on the generation of the antagonistic neural network, the process of judging network learning is a competition process between a generation network and a judgment network; the method specifically comprises the following steps:

randomly taking one of the real sample and the sample generated by the generation model, and judging whether the real sample is true by the judgment network;

the performance of generating the network and judging the network is continuously improved through a competitive machine learning mechanism; when the whole network reaches a Nash equilibrium state, namely two network parameters are stable, training is finished; at this time, a machine translation result generated by the network is generated, and the discriminant network can be cheated to make the translated text from the parallel corpus; in this case, the generated network model may be used as an output machine translation model.

Further, the machine translation method based on the generation of the antagonistic neural network utilizes the manually labeled bilingual parallel corpus resources and also utilizes the monolingual corpus resources to perform semi-supervised learning.

Further, the machine translation method based on generation of the antagonistic neural network specifically includes:

constructing a bidirectional long-time memory neural network as a discrimination network;

combining the generation network and the discrimination network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;

integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished.

And after the training of the generation confrontation network model is completed, the generation network part in the network is used as an output machine translation model for subsequent use.

Another object of the present invention is to provide a machine translation system based on generation of a countering neural network, comprising:

the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a multi-layer perceptron feedforward neural network model is adopted to realize a binary classification discrimination network.

Further, the machine translation system based on generation of the antagonistic neural network further comprises:

generating a network, and combining the generated network with the judgment network to form a complete generated countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;

the monolingual corpus is integrated with the parallel corpus to form a semi-supervised corpus, and the semi-supervised corpus trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.

The invention has the advantages and positive effects that:

the invention introduces a discrimination network which is confronted with the original machine translation generation network on the basis of the original machine translation generation network, namely, a coding-decoding structure neural network machine translation model; and the method is used for judging whether the translation of the target language is from the training corpus or is translated by the original machine to generate a network machine translation result.

The invention improves the whole framework system of the existing machine translation method based on the artificial neural network. A machine translation method based on a generation countermeasure network is provided, so that a neural network machine translation model has self-learning capability. The manually labeled bilingual parallel corpus resources are fully utilized, and meanwhile, the monolingual corpus resources can be utilized for semi-supervised learning. The monolingual corpus resources are very rich and easy to obtain, the bottleneck problem that training corpora required by neural network machine translation are insufficient is solved, and the cost of manually marking the corpora can be saved by more than 50%.

After the model is trained, the parameter scale and the operation time of the model in the invention are equivalent to those of the current neural network machine translation model in practical application, and the complexity of the machine translation model in practical use cannot be increased.

Drawings

Fig. 1 is a flowchart of a machine translation method based on generation of an antagonistic neural network according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a machine translation system based on generation of an antagonistic neural network according to an embodiment of the present invention.

In the figure: 1. judging a network; 2. generating a network; 3. monolingual corpus; 4. and (4) parallel corpora.

Fig. 3 is a schematic diagram of a neural network machine translation model based on an "encoding-decoding" structure according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of an LSTM neural network unit provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

At present, the most important defect of the prior art is that the training of the deep neural network model depends heavily on a large-scale manually labeled bilingual parallel sentence-to-corpus. Because the cost of manual labeling is high, and a large-scale and high-quality manual labeling bilingual parallel corpus is lacked, the training data of the neural network machine translation model is insufficient, the performance is poor, and the problem is the bottleneck problem faced by the existing neural network machine translation model; especially in some languages, the parallel corpus resources available for training the neural network model are few and few, and it is difficult to construct a high-performance machine translation system.

The invention adopts a multilayer perceptron feedforward neural network model to construct a discrimination network, and realizes binary classification. The multilayer perceptron neural network model includes an input layer X: { X₁,x₂,…,x_nH, an implicit layer H: { H₁,h₂,…,h_mAnd an output layer Y: { Y: }₁,y₂}。

The hidden layer function h (x) can be formally expressed as:

；h(x)＝T(W₁x+b₁)

wherein the model parameter W₁And b₁Respectively representing a weight matrix from an input layer to a hidden layer and a hidden layer bias vector; t (x) is the activation function of the hidden layer, and the hyperbolic tangent function is adopted in the invention:

the whole multilayer perceptron neural network model function f (x) can be formally expressed as:

f(x)＝S(W₂·h(x)+b₂)＝S(W₂·T(W₁x+b₁)+b₂)；

wherein the model parameter W₂And b₂Representing the weight matrix of the hidden layer to the output layer and the output layer bias vector, respectively. S (x) is an activation function of the hidden layer, and the sigmoid function is adopted in the invention:

when the multilayer perceptron neural network model carries out binary classification, the input layer vector X is substituted into f (X) to calculate a two-dimensional output vector Y, and the classification represented by the dimension with larger value in Y is selected as a classification result.

The application of the principles of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the machine translation method based on generation of the neural network according to the embodiment of the present invention,

on the basis of the traditional neural network machine translation, introducing another artificial neural network which is confronted with the neural network, and calling the artificial neural network as a discrimination network; the original machine translates the LSTM neural network to be referred to as a generative network. In the generation of the network machine translation model, the model adopted by the generation network is a traditional neural network translation model based on coding-decoding, and the function of the model is to generate a corresponding target language sentence according to an input source language sentence; the model adopted by the discrimination network is a multi-layer perceptron feedforward neural network model, a binary classification function is realized, and each node in the neural countermeasure network is a perceptron. The function of the discrimination network is to judge whether the translation of the target language is from the training corpus or based on the result of the machine translation of the recurrent neural network.

The generation countermeasure network introduces a mechanism of competitive countermeasure between the generation network and the judgment network, and synchronously improves the capability of generating the target language of the generation network and the capability of judging the source of the translation by the judgment network through countermeasure training. In the training process, judging whether the training target of the network is the real data in the corpus or the machine translation result; and the training target of the generated network is that the generated translation result can cheat the discriminant network, so that the discriminant network considers that the result of the machine translation is the result from the real corpus.

The learning process in the machine translation method based on the generation of the antagonistic neural network provided by the embodiment of the invention is changed into a competition process between the generation network and the discrimination network, namely one of the real sample and the sample generated by the generation model is randomly selected, so that the discrimination network judges whether the real sample is true or not. Through the competitive machine learning mechanism, the performance of generating the network and judging the network is continuously improved. When the whole network reaches the Nash equilibrium state, namely two network parameters basically do not change, the training is finished. At this time, the machine translation result generated by the generated network is indicated, and the discriminant network can be cheated, so that the translation is considered to be the source and parallel corpus. In this case, the generated network model may be used as an output machine translation model.

As shown in fig. 2, a machine translation system based on generation of an antagonistic neural network according to an embodiment of the present invention includes:

the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; and a discrimination network 1 of binary classification is realized by adopting a multi-layer perceptron feedforward neural network model.

The machine translation system based on generation of the antagonistic neural network further comprises:

generating a network 2, and combining the network with the judgment network to form a complete generation countermeasure network; connecting an input vector of an encoder in a generated network with an output vector of a decoder, and transmitting the input vector as an input to a judgment network; meanwhile, feeding back an output result 0 or 1 of the discrimination network to the generation network;

the monolingual corpus 3 is integrated with the parallel corpus 4 to form a semi-supervised corpus which trains the whole confrontation network; when the parameters of the generated confrontation network are kept stable, the training is finished.

The invention is further described below in connection with the positive effects.

The embodiment of the invention constructs a long-time memory neural network based on an encoding-decoding structure, and then trains a generated network by using bilingual parallel linguistic data.

The embodiment of the invention constructs another bidirectional long-time and short-time memory neural network as a discrimination network.

The application of the principles of the present invention will now be described in further detail with reference to specific embodiments.

In the embodiment of the invention, the binary classification method comprises the following steps:

in the form of a hyperbolic tangent function:

f(x)＝S(W₂·h(x)+b₂)＝S(W₂·T(W₁x+b₁)+b₂)，

As shown in fig. 3, the generation network consists of two parts, an encoder and a decoder; the encoder adopts a bidirectional Short-Term Memory (LSTM) neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;

As shown in FIG. 4, input X of the neural network machine translation model_tAnd h_t-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output h_tRepresenting the output of the LSTM neural network element at the current time;

the method specifically comprises the following steps:

i_t＝g(W_xix_t+W_hih_t-1+b_i)；

f_t＝g(W_xfx_t+W_hfh_t-1+b_f)；

o_t＝g(W_xox_t+W_hoh_t-1+b_o)；

h_t＝o_t·tanh(c_t)；

the encoder adopts two LSTM networks, one of which inputs a forward word vector sequence and the other inputs a reverse word vector sequence to form a bidirectional LSTM network, and the vectors output by the two networks are connected to form a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:

wherein (theta)₁,θ₂,…,θ_k) K is the total number of categories of the classifier, i represents a certain classification category; will solveAnd converting the output states of the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.

The discrimination network is used for synchronously improving the capability of generating a network to generate a target language and improving the capability of judging a translation source by the discrimination network through antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;

The machine translation method based on the generation of the antagonistic neural network utilizes the bilingual parallel corpus resources labeled manually and also utilizes the monolingual corpus resources to perform semi-supervised learning.

The machine translation method based on generation of the antagonistic neural network specifically comprises the following steps:

The invention combines the generation network and the discrimination network to form a complete generation countermeasure network. Specifically, an input vector of an encoder in a generating network is connected with an output vector of a decoder, and the input vector is transmitted to a judging network as input; meanwhile, the output result (0 or 1) of the discrimination network is fed back to the generation network.

The invention integrates the parallel language material and the single language material to form a large-scale semi-supervised language material, and the language material is used for training the whole to generate the confrontation network. When the parameters of the generated confrontation network are kept stable, the training is finished.

After the training of the generation of the confrontation network model is completed, the generation network part in the network can be used as an output machine translation model and can be used subsequently, and the specific use method is as follows: and performing word segmentation on the source language, inputting the result after word segmentation into an encoder for generating a network, inputting each source language word into a corresponding neural network node in sequence, and generating an output result of a network decoder, namely the output result is the corresponding target language translation.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A machine translation method based on a generation antagonistic neural network is characterized in that the machine translation method based on the generation antagonistic neural network introduces a discrimination network antagonistic to an original machine translation generation network on the basis of the original machine translation generation network; the system is used for judging whether the translation of the target language is from a training corpus or is translated by an original machine to generate a network machine translation result; the discrimination network adopts a multi-layer perceptron feedforward neural network model to realize binary classification;

the binary classification method comprises the following steps:

in the form of a hyperbolic tangent function:

f(x)＝S(W₂·h(x)+b₂)＝S(W₂·T(W₁x+b₁)+b₂)，

when the multi-layer perceptron feedforward neural network model carries out binary classification, substituting the input layer vector X into f (X) to calculate an output vector Y, selecting the category represented by the dimension with larger value in Y as a classification result, and indicating whether a translation is from a training corpus or a generation network;

the generation network consists of an encoder and a decoder; the encoder adopts a bidirectional long-time and short-time memory neural network structure; the encoder converts an input source language sentence into a word vector sequence serving as input of a long-time and short-time memory network, and the network generates a dense vector with a fixed length, called a context vector, which is output of the encoder;

then, the decoder utilizes another unidirectional long-time and short-time memory neural network and takes the context vector output by the encoder as input; superposing a Softmax classifier on an output layer of a neural network machine translation model to output a word vector sequence of a target language; and mapping the word vectors into target language words one by one through a dictionary to finish the automatic translation process.

2. The method for machine translation based on generation of an antagonistic neural network of claim 1 characterized in that said neural network machine translation model has input X_tAnd h_t-1Respectively representing the input word vector and the output of the LSTM neural network unit at the t-1 moment; output h_tRepresenting the output of the LSTM neural network element at the current time;

the method specifically comprises the following steps:

i_t＝g(W_xix_t+W_hih_t-1+b_i)；

f_t＝g(W_xfx_t+W_hfh_t-1+b_f)；

o_t＝g(W_xox_t+W_hoh_t-1+b_o)；

h_t＝o_t·tanh(c_t)；

where g denotes the activation function of the hidden layer, i_t、f_t、o_tRespectively representing an input gate, an output gate and a forgetting gate; c. C_t-1Representing the state of the neuron at time t-1, c_tThe state of the neuron is represented by,

representing hidden states of neurons, h_tIs the output of the LSTM neuron; the parameters W and b represent the connection weight of each layerValue and offset, in particular, W_xiRepresenting the weight matrix from the hidden layer to the input layer corresponding to the input gate; w_xfRepresenting the weight matrix from the hidden layer corresponding to the output gate to the input layer; w_xoRepresenting a weight matrix from a hidden layer corresponding to the forgetting gate to an input layer; w_xcRepresenting a weight matrix from a hidden layer to an input layer corresponding to the hidden state; w_hiA weight matrix representing the hidden layer to the output layer corresponding to the input gate; w_hfRepresenting the weight matrix from the hidden layer to the output layer corresponding to the output gate; w_hoRepresenting a weight matrix from a hidden layer corresponding to the forgetting gate to an output layer; w_hcRepresenting a weight matrix from a hidden layer to an output layer corresponding to the hidden state; bi represents the output layer offset vector corresponding to the input gate; b_fRepresenting output layer bias vectors corresponding to the output gates; b_oRepresenting output layer bias vectors corresponding to the forgetting gates; b_cRepresenting the output layer bias vector corresponding to the hidden state.

3. The method of machine translation based on generation of an antagonistic neural network of claim 1 in which the encoder uses two LSTM networks, one input forward word vector sequence and the other input backward word vector sequence, forming a bidirectional LSTM network, concatenating the vectors output by the two networks, forming a context vector; the decoder adopts an LSTM network, inputs the context vector and outputs a state sequence; and then passing through a Softmax classifier, wherein the function form is as follows:

wherein (theta)₁，θ₂，……，θ_k) K is the total number of categories of the classifier, i represents a certain classification category; and converting the states output by the decoder into word vectors of the target language one by one, and then integrating the sequences to form a translation result.

4. The machine translation method based on generation of an antagonistic neural network as claimed in claim 1, wherein the discriminant network is used for synchronously improving the capability of the generation network to generate the target language and improving the capability of the discriminant network to judge the source of the translated text through the antagonistic training; in the countermeasure training process, the judgment network is used for judging whether the translation result is real data in the corpus or a result of network machine translation generated by the original machine translation;

5. The machine translation method based on generation of the countermeasure neural network of claim 1, wherein the machine translation method based on generation of the countermeasure neural network utilizes the bilingual parallel corpus resources labeled manually and also utilizes the monolingual corpus resources to perform semi-supervised learning.

6. The machine translation method based on generation of the antagonistic neural network according to claim 1, specifically comprising:

integrating the parallel linguistic data and the monolingual linguistic data to form a semi-supervised linguistic data, and training the whole confrontation network by using the semi-supervised linguistic data; when the parameters of the generated confrontation network are kept stable, the training is finished;

7. The machine translation system based on the anti-neural network generation based on the machine translation method based on the anti-neural network generation of claim 1, wherein the machine translation system based on the anti-neural network generation comprises:

8. The machine translation system based on generation of an antagonistic neural network of claim 7 further comprising: