CN108897740A

CN108897740A - A kind of illiteracy Chinese machine translation method based on confrontation neural network

Info

Publication number: CN108897740A
Application number: CN201810428132.2A
Authority: CN
Inventors: 苏依拉; 王宇飞; 孙晓骞; 高芬; 张振; 牛向华; 赵亚平
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2018-11-27

Abstract

A kind of illiteracy Chinese interpretation method based on confrontation neural network introduces the differentiation network D of one with generation network G confrontation on the basis of the generation network G of former machine translation；It is described differentiate network D mainly to generate network G output realize two-value classification, judge the translation of object language, if it is from training Parallel Corpus, then return value be 1, if from generate network G machine translation as a result, if return value be 0.When the probability distribution of truthful data (can not be difficult to) calculate (for example original language parallel corpora data are less), the training mechanism fought by generator and arbiter can make generator go to approach the probability distribution for being difficult to calculate.

Description

A kind of illiteracy Chinese machine translation method based on confrontation neural network

Technical field

The invention belongs to field of computer technology, in particular to a kind of illiteracy Chinese machine translation side based on confrontation neural network Method.

Background technique

Machine translation (Machine Translation, abbreviation MT) is that one in natural language processing field is earliest Branch is studied, it is a kind of natural language to be transformed into the another kind nature with identical meaning using computer (machine) The process of language.Machine translation is a research direction of artificial intelligence, has highly important scientific research value and practical value. Along with the rapid development of globalization process deepened constantly with internet, machine translation mothod at home and abroad politics, economy, society Meeting, cultural exchanges etc. play increasingly important role.

Currently, machine translation method neural network based is that machine translation field obtains the optimal method of translation effect. It mainly uses " coding-decoding " structure, is made of two parts of encoder and decoder, the two is all made of Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) and long short-term memory (Long Short-Term Memory, LSTM) network knot Structure.The process of translation includes：Firstly, the source language sentence of input is converted into a term vector sequence as circulation by encoder The input of neural network, encoder can export the intensive vector an of regular length, referred to as context vector.Then, decoder Using context vector as input, a Softmax classifier is combined using another Recognition with Recurrent Neural Network, exports target language The term vector sequence of speech.Finally, term vector is mapped one by one using dictionary becomes object language word, entire translation process is completed.

However, the bilingual parallel corporas for needing largely manually to mark using existing RNN and LSTM neural network model Library.But existing realistic problem is, it is the artificial heavy workload for marking parallel corpora, at high cost, and lack the people of high quality Work marks Parallel Corpus.And these deficiencies will largely influence last translation quality.This is existing nerve net The bottleneck problem that network Machine Translation Model faces.And especially for some rare foreign languages (such as Mongol) for, these problems are more It is prominent, can be used for training the Parallel Corpus data of neural network model less, be thus difficult to train and construct high quality, High performance machine translation system.

Summary of the invention

In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of based on confrontation neural network Cover Chinese machine translation method.

To achieve the goals above, the technical solution adopted by the present invention is that：

It is a kind of to be drawn on the basis of the generation network G of former machine translation based on the illiteracy Chinese interpretation method for fighting neural network Enter the differentiation network D of one with generation network G confrontation；It is described to differentiate that network D mainly realizes two-value point to the output for generating network G Class judges the translation of object language, and if it is the Parallel Corpus from training, then return value is 1, if from net is generated Network G machine translation as a result, then return value be 0.

The differentiation network D uses multilayer perceptron BP network model, and the method for the two-value classification is：

Using the form of amendment linear unit (RELU)：

Wherein, x makes a living into network G to the input signal for differentiating network D；α is adjustable constant, and α answers very little, can such as set For α=0.001；

Multilayer perceptron feedforward neural network carries out information propagation by following formula：

a^(l)=f_l(W^(l)·a^(l-1)+b^(l))

Wherein, a^(l)Indicate the output of l layers of neuron, f_lIndicate the activation primitive of l layers of neuron, W^(l)Indicate that l-1 layers are arrived l The weight matrix of layer, b^(l)Indicate l-1 layers to l layers of biasing, wherein activation primitive uses sigmoid activation primitive：

When multilayer perceptron BP network model carries out two-value classification, using input layer vector X as the defeated of first layer Enter a⁽⁰⁾It substitutes into f (Wa+b), calculates output a^(l)As entire function output vector Y, the larger dimension institute of numerical value in Y is selected The classification of representative indicates that translation is derived from Parallel Corpus, is also derived from generation network G as classification results.

The generation network G uses convolutional neural networks (CNN), is made of encoder and decoder two parts；The volume Code device and decoder are all the depth CNN of multilayer, obtain short distance Dependency Specification using CNN convolution kernel, and by increasing CNN Depth obtains remote Dependency Specification, and every layer decoder is equipped with an attention mechanism.

Described to differentiate network D by training with the confrontation type for generating network G, synchronous raising generates network G and generates target language The ability of speech and raising differentiate the ability that network D judges translation source；In confrontation type training process, differentiate network D for sentencing Disconnected translation out is derived from Parallel Corpus data, or generates the result of network G machine translation；

Differentiate that the process of network D study makes a living into network G and differentiates the competition process between network D, specifically includes：

One is taken from authentic specimen and the sample generated by generation network G at random, allows and differentiates that network D goes to judge whether It is true；

By the mechanism of Machine Learning of competitive mode, makes to generate network G and differentiate that the performance of network D is constantly promoted；When entire Network reaches Nash Equilibrium state, i.e., when two network parameters are stablized, training is completed；It is turned at this point, generating the machine that network G generates It translates as a result, having been able to out-trick and differentiates network D, it is made to think translation from parallel corpora；At this point, generation network G model is It can be used as the Machine Translation Model of output.

The process of confrontation type training can be regarded as following optimization task：

V (D, G)=E_{X~Pdata (x)}[log D(x)]+E_{Z~Pz (z)}[log(1-D(G(z)))]

The optimization task is about the cost function (Value Function) for differentiating network D and generation network G, wherein x Indicate the data in Parallel Corpus；Z indicates that input generates the source language data of network；E indicates the expectation of the event；G (z) table Show and generates the translation data that network G generates；D (x) is to differentiate that network D judges whether x derives from the probability of Parallel Corpus；D(G (z)) it indicates to differentiate that network D judges to generate the probability whether translation that network G generates derives from Parallel Corpus.In training process Fixed party updates the parameter of another network, alternating iteration, so that the mistake of other side maximizes, finally, generates network G energy The distribution of sample data is estimated, model is generated and implicitly defines a probability distribution P (g), data are really distributed as P (data), there are optimal solutions when P (g)=P (data), that is, reach Nash Equilibrium, generate network G at this time and have restored instruction Practice the distribution of data, the accuracy rate of discrimination model is equal to 50%.

By that will generate network G and differentiate that network D carries out series connection training and completely generates confrontation network to be formed；It will generate The input vector of encoder and the output vector of decoder are attached in network G, are passed to as input and are differentiated network D；Together When, the output result 0 or 1 for differentiating network D is fed back into generation network G.

The Parallel Corpus of the training be by source text and its it is parallel it is corresponding translate Chinese language originally constitute it is bilingual/multi-lingual Corpus participates in the training for generating network D.

Compared with prior art, the beneficial effects of the invention are as follows：

When the probability distribution of truthful data (can not be difficult to) calculate (for example original language parallel corpora data are less), Traditional neural network model not can be used directly.But confrontation neural network still can be used, and pass through generator and differentiation The training mechanism of device confrontation, can make generator go to approach the probability distribution for being difficult to calculate.

Detailed description of the invention

Fig. 1 is the flow chart of the illiteracy Chinese machine translation method the present invention is based on confrontation neural network.

Fig. 2 is convolutional neural networks (CNN) neural network and differentiation network structure.

Fig. 3 is encoder using convolutional neural networks schematic diagram.

Fig. 4 is decoder using convolutional neural networks schematic diagram.

Specific embodiment

The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.

As shown in Figure 1, the present invention is a kind of illiteracy Chinese machine translation method based on confrontation neural network, in former machine translation Generation network G on the basis of, introduce one with generate network G fight differentiation network D；By the side for realizing two-value classification Method judges the translation of object language, and if it is the Parallel Corpus from training, then return value is 1, if from net is generated Network G machine translation as a result, then return value be 0.

Differentiate that network D uses multilayer perceptron BP network model, two-value classification method is：

Using the form of amendment linear unit (RELU)：

RELU is selected, comparing tanh in gradient decline has faster convergence rate, and computing cost is smaller.

Entire multilayer perceptron feedforward neural network carries out information transmitting by following formula：

z^(l)=W^(l)·a^(l-1)+b^(l) (1)

a^(l)=f_l(z^(l)) (2)

(1) and (2) is merged：

a^(l)=f_l(W^(l)·a^(l-1)+b^(l)) (3)

Wherein, z^(l)It indicates the input of l layers of neuron, indicates a^(l)Indicate the output of l layers of neuron, f_lIndicate l layers of nerve The activation primitive of member, W^(l)Indicate l-1 layers to l layers of weight matrix, b^(l)Indicate l-1 layers to l layers of biasing.Wherein activation primitive f_lIt is all made of sigmoid activation primitive：

When multilayer perceptron BP network model carries out two-value classification, using input layer vector X as the defeated of first layer Enter a⁽⁰⁾It substitutes into f (w, a, b), calculates output a^(l)As entire function output vector, then using the output vector as The input vector x of softmax classifier is inputted.

Further, the softmax classifier functions form is as follows：

Wherein, i indicates some class categories, (θ₁,θ₂,…,θ_k) it is corresponding classifier parameters of all categories, k is classification Sum.P (y=i | x；θ) indicate that the x of input belongs to the probability of classification i.

Further, for applying in softmax classifier of the invention, i ∈ (1,2), k=2.If i=1 indicates to come Derived from the classification of Parallel Corpus, i=2 is indicated from the classification for generating network.Its functional form is as follows：

Final output vector Y shape formula isWherein, P (G) indicates translation from the probability for generating network G, P (C) It indicates that translation derives from the probability of Parallel Corpus, selects in Y that classification representated by the larger dimension of numerical value is as classification results, i.e., If P (G) is larger, P (G) is selected, then differentiates that network-feedback value is 0；Otherwise selection P (C), then value of feedback is 1.

It generates network G and uses convolutional neural networks (CNN), be made of encoder and decoder two parts；Encoder, decoding Device two parts utilize CNN convolution kernel to obtain short distance Dependency Specification, and obtain remote rely on by increasing CNN depth and believe Breath, therefore encoder and decoder are all the depth CNN of multilayer, every layer decoder is equipped with an attention mechanism.

Network D is differentiated by training with the confrontation type for generating network G, and synchronous raising generates network G and generates object language Ability and differentiation network D judge the ability in translation source；In confrontation type training process, differentiate network D for judging translation Parallel Corpus data are derived from, or former machine translation generates the result of network G machine translation；

Differentiate that the process of network D study makes a living into network G and differentiates the competition process between network D；It specifically includes：

One is taken from authentic specimen and the sample generated by generation network at random, allows and differentiates that network D goes to judge whether It is true；

By the mechanism of Machine Learning of competitive mode, makes to generate network G and differentiate that the performance of network D is constantly promoted；When entire Network reaches Nash Equilibrium state, i.e., when two network parameters are stablized, training is completed；It is turned at this point, generating the machine that network G generates It translates as a result, having been able to out-trick and differentiates network D, it is made to think translation from parallel corpora；At this point, generating the model of network G It can be used as the Machine Translation Model of output.

It generates network G and differentiation network D carries out series connection training and fights network to form complete generate；Network G will be generated The input vector of middle encoder and the output vector of decoder are attached, and are passed to as input and are differentiated network D；Meanwhile it will Differentiate that the output result 0 or 1 of network D feeds back to generation network G.

Confrontation type training section training process of the present invention regards following optimization task as：

V (D, G)=E_{X~Pdata (x)}[log(D(x))]+E_{Z~Pz (z)}[log(1-D(G(z)))]

The optimization task can regard the process optimized to two tasks as：

1. the optimization of couple D：

2. the optimization of couple G：

Wherein, x indicates the data in Parallel Corpus；Z indicates that input generates the source language data of network；G (z) indicates life The translation data generated at network G；D (x) is to differentiate that network D judges whether x derives from the probability of Parallel Corpus；D(G(z)) It indicates to differentiate that network D judges to generate the probability whether translation that network G generates derives from Parallel Corpus.

It can be seen that the optimization for D, if differentiating, the ability of network D is stronger, D (x) Ying Yue great,

D (G (x)) is smaller, and V (D, G) will increase at this time, therefore is optimized for tending to its maximum value maxD for D.

For generating network G, the purpose of the network is to generate the translation for being enough to confuse and differentiating network, that is, is generated more accurate Translation, therefore D (G (z)) value should increase as far as possible, V (D, G) value will reduce at this time, therefore be optimized for tending to its minimum value for G minG。

Below with reference to embodiment, invention is further explained.

As shown in Fig. 2, being made of in generating network decoder and encoder two parts：Affiliated encoder uses convolution Neural network (Convolutional Neural Networks), the encoder is first converted to the source language sentence of input One term vector sequence carries out characteristic processing by convolutional layer, activation primitive, pond layer and full articulamentum.Then the decoding Device utilizes another convolutional neural networks, using term vector after the processing of encoder output as input, through convolutional neural networks Decode term vector sequence, final output translation.Then differentiation network is passed to using the output vector of decoder as input, together When, the output result for differentiating network (0 or 1) is fed back into living network.

Wherein, convolutional layer carries out feature extraction to term vector；Pond layer carries out Feature Compression and extracts main feature and reduce Computation complexity；Full articulamentum connects all features.

For example, being translated into object language Mongolian with " tomorrow will rain " for source language sentence.It is first that sentence inputting is raw At network, encoder is initially entered, encoder uses convolutional neural networks.Encoder encodes sentence, by source language sentence A term vector sequence is converted to, feature extraction then is carried out to the term vector sequence by convolutional layer, pond layer carries out feature After feature is attached by compression, full articulamentum, then send decoder to, as shown in Figure 3；Decoder equally uses convolutional Neural Network exports the term vector sequence of object language by decoder processes, and term vector is then mapped as mesh one by one by dictionary Poster words are finally completed translation process, outputAs shown in Figure 4；And it will solution The output vector of code device passes to differentiation network as input, differentiates that 0 or 1 are fed back to generation net by two-value classification by network Network illustrates to differentiate that network thinks that the translation from Parallel Corpus, has generated the training objective of network if value of feedback is 1 Through reaching；If value of feedback is 0, illustrate to differentiate that network is thought to generate the translation of network training from network is generated to translation Model is not yet completed, and need to continue to train.

Claims

1. a kind of illiteracy Chinese interpretation method based on confrontation neural network, which is characterized in that in the generation network G of former machine translation On the basis of, introduce the differentiation network D of one with generation network G confrontation；The output for differentiating network D mainly to network G is generated It realizes two-value classification, judges the translation of object language, if it is the Parallel Corpus from training, then return value is 1, if coming Derived from generation network G machine translation as a result, then return value is 0.

2. the illiteracy Chinese machine translation method according to claim 1 based on confrontation neural network, which is characterized in that the differentiation Network D uses multilayer perceptron BP network model, and the method for the two-value classification is：

Using the form of amendment linear unit (RELU)：

Wherein, x makes a living into network G to the input signal for differentiating network D；α is adjustable constant；

a^(l)=f_l(W^(l)·a^(l-1)+b^(l))

Wherein, a^(l)Indicate the output of l layers of neuron, f_lIndicate the activation primitive of l layers of neuron, W^(l)Indicate l-1 layers to l layers Weight matrix, b^(l)Indicate l-1 layers to l layers of biasing, wherein activation primitive uses sigmoid activation primitive：

When multilayer perceptron BP network model carries out two-value classification, using input layer vector X as the input a of first layer⁽⁰⁾ It substitutes into f (Wa+b), calculates output a^(l)As entire function output vector Y, select in Y representated by the larger dimension of numerical value Classification indicate that translation is derived from Parallel Corpus, be also derived from generation network G as classification results.

3. the illiteracy Chinese machine translation method according to claim 1 based on confrontation neural network, which is characterized in that the generation Network G uses convolutional neural networks (CNN), is made of encoder and decoder two parts；The encoder and decoder are all The depth CNN of multilayer obtains short distance Dependency Specification using CNN convolution kernel, and is obtained at a distance by increasing CNN depth Dependency Specification, every layer decoder are equipped with an attention mechanism.

4. the illiteracy Chinese machine translation method according to claim 1 based on confrontation neural network, which is characterized in that the differentiation Network D with the confrontation type for generating network G by training, and synchronous raising generation network G generates the ability of object language and raising is sentenced Other network D judges the ability in translation source；In confrontation type training process, differentiate network D for judging that translation is derived from Parallel Corpus data, or generate the result of network G machine translation；

One is taken from authentic specimen and the sample generated by generation network G at random, allows and differentiates that network D goes to judge whether it is Very；

By the mechanism of Machine Learning of competitive mode, makes to generate network G and differentiate that the performance of network D is constantly promoted；Work as whole network Reach Nash Equilibrium state, i.e., when two network parameters are stablized, training is completed；At this point, generating the machine translation knot that network G generates Fruit, having been able to out-trick differentiates network D, it is made to think translation from parallel corpora；At this point, generating network G model can make For the Machine Translation Model of output.

5. the illiteracy Chinese machine translation method according to claim 4 based on confrontation neural network, which is characterized in that by confrontation type Trained process regards following optimization task as：

V (D, G)=E_{X~Pdata (x)}[log D(x)]+E_{Z~Pz (z)}[log(1-D(G(z)))]

The optimization task is about the cost function (Value Function) for differentiating network D and generation network G, wherein x expression Data in Parallel Corpus；Z indicates that input generates the source language data of network；E indicates the expectation of the event；G (z) indicates life The translation data generated at network G；D (x) is to differentiate that network D judges whether x derives from the probability of Parallel Corpus；D(G(z)) It indicates to differentiate that network D judges to generate the probability whether translation that network G generates derives from Parallel Corpus.It is fixed in training process One side updates the parameter of another network, alternating iteration, so that the mistake of other side maximizes, finally, generating network G can estimate The distribution of sample data out generates model and implicitly defines a probability distribution P (g), and data are really distributed as P (data), There are optimal solutions when P (g)=P (data), that is, reach Nash Equilibrium, generate network G at this time and have restored training data Distribution, the accuracy rate of discrimination model is equal to 50%.

6. the illiteracy Chinese machine translation method according to claim 4 based on confrontation neural network, which is characterized in that generate network G and differentiation network D carries out series connection training and fights network to form complete generate；By generate network G in encoder input to Amount and the output vector of decoder are attached, and are passed to as input and are differentiated network D；Meanwhile the output knot that will differentiate network D Fruit 0 or 1 feeds back to generation network G.

7. the illiteracy Chinese machine translation method according to claim 1 based on confrontation neural network, which is characterized in that the training Parallel Corpus be to participate in generating by source text and its parallel corresponding bilingual/multi-lingual corpus translating Chinese language and originally constituting The training of network D.