CN110738062A

CN110738062A - GRU neural network Mongolian Chinese machine translation method

Info

Publication number: CN110738062A
Application number: CN201910940595.1A
Authority: CN
Inventors: 苏依拉; 卞乐乐; 赵旭; 薛媛; 范婷婷; 张振
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-01-31

Abstract

GRU neural network Mongolian Chinese machine translation method, firstly preprocessing translation language, then making construction and training of Encoder-Decoder model for scale Mongolian Chinese bilingual, making coding system processing for Mongolian Chinese bilingual corpus, finally obtaining translation result based on the Encoder-Decoder model, wherein the Encoder-Decoder model is constructed by neural networks, wherein neural networks are LSTM and are responsible for Encoder coding, and adopting bidirectional coding setting, namely, forward coding and reverse coding are carried out on source language, source sentences are converted into vectors with two different direction coding and fixed length, neural networks are GRU and are responsible for Decode decoding, and decoding is carried out from the forward direction and the reverse direction, namely, when decoding and outputting a target language, context information is automatically integrated, thereby converting vectors generated by coding into a target sentence, the invention combines the characteristics of the Mongolian Chinese machine translation system, makes the expression capability of the translation system smoother, more approximate to human expression, and reduces semantic loss and confusion degree of translation in translation process.

Description

GRU neural network Mongolian Chinese machine translation method

Technical Field

The invention belongs to the technical field of machine translation, relates to Mongolian Chinese machine translation, and particularly relates to GRU neural network Mongolian Chinese machine translation methods.

Background

At present, with the rapid development of the internet industry, series IT industries including information technology and the like are continuously growing, machine translation aiming at natural language processing plays a role in promoting the development of the whole internet industry by , large-scale search service industries such as Google, Baidu and the like carry out large-scale scientific researches aiming at the field of machine translation aiming at the development of the industry, and the continuous research is carried out for continuously obtaining high-quality translations.

For example, machine translation is relatively hard, wherein programs are designed well, the probability of errors in translation is very high, and sometimes even various grammatical errors occur, translation of long paragraphs is difficult to understand and does not conform to normal logic, readability of translated things is poor, grammatical features of sentences cannot be reflected, translated documents are relatively coarse and difficult to understand, simple words are all worded out, hard words are difficult to understand, translation of simple short sentences is only performed, and low quality of translated text due to differences of processing and grammatical structures of ambiguous words is particularly prominent in machine translation.

Although many problems in the statistical translation method, such as unclear decoding, wrong translation, word processing without login, and the like, can be alleviated to some extent in by knowing the meaning of sentences, the problems still have disadvantages in accuracy compared with manual translation.

In the translation system, the calculation complexity of an encoder and a decoder is higher, due to the limitation of calculation amount and GPU memory, a neural machine translation model needs to determine commonly used word lists with limited scale in advance, the neural machine translation system often limits the word lists into high-frequency words and regards all other low-frequency words as unregistered words, Mongolian belongs to sticky languages, characteristics of the sticky languages are that other word components are connected to the front, middle and postfixes of roots of the words to be used as new words for derivation, therefore, the Mongolian word components and the morphological transformation thereof are very abundant, and the phenomenon of foreign words and unregistered words is frequent.

Disclosure of Invention

In order to solve the problems of translation missing, wrong translation, word processing unknown and the like in the translation process mainly existing in the prior art, the invention aims to provide GRU neural network Mongolian Chinese machine translation methods, wherein the method is characterized in that a CPU and a GPU are used for processing a corpus in a parallel working mode, so that the speed is improved by nearly times, the corpus is learned by a set learning rate, the local optimal problem existing in the semantic expression process of the learned corpus and the problem of low coding quality caused by rapid convergence can be effectively relieved, the quality of the whole system is improved by setting a special structure and an algorithm, and aiming at the current situations of rare data and small dictionary in small corpus, the system complexity is reduced, the translation service quality of a user is ensured under the condition of a visual system structure of the user, so that the Mongolian Chinese machine translation system is perfected, and the target of better translation is achieved.

In order to achieve the purpose, the invention adopts the technical scheme that:

GRU neural network Mongolian Chinese machine translation method comprises the steps of preprocessing translation languages, building and training an Encoder-Decoder model for certain-scale Mongolian bilingual, processing a coding system for the Mongolian bilingual corpus, and obtaining translation results based on the Encoder-Decoder model.

The preprocessing of the translation language is to perform word segmentation on the translation language by using an NLPIR word segmentation technology.

The Encoder-Decoder model is a neural machine translation model constructed by neural networks, wherein neural networks are LSTMs and are responsible for Encoder coding, specifically, bidirectional coding setting is adopted, namely forward coding and reverse coding are carried out on a source language, a source sentence is converted into vectors which are coded in two different directions and have fixed lengths, neural networks are GRUs and are responsible for Decoder decoding, because an Encoder outputs two coded vectors, a Decoder also needs to decode from the forward direction and the reverse direction, because the two vectors to be decoded contain all context information, namely, when the Decoder outputs a target language, the context information can be automatically integrated, so that the fixed length vectors generated by coding are converted into a target sentence.

The calculation formula of the Encoder code is as follows:

h_t＝f(x_t,h_t-1)

i.e. input x according to the current time_tAnd hidden layer output h at time _t-1Computing hidden layer output h at the current time_tThe output of each time is obtained through Encoder coding, and then the characteristic representation of the final source sentence context is obtained through calculation, namely, the final time is usedThe hidden layer output of the moment represents the context of the source sentence;

the calculation formula of the Decoder decoding is as follows:

wherein x₁，…，x_TIs an input sequence, y₁，…，y_T′Is the output sequence, V is the initial value of the decoder, i.e. x₁，…，x_TT is the length of the input sentence, T 'is the output sentence length, and T' are typically not equal in length;

the target function of the model is the probability that the source sentence is correctly translated into the target sentence;

the model training process is the process of maximizing the probability of correctly translating the source sentence into the target sentence in the training sample, and for each time i, the probability that the current output is a correct result is calculated as follows

p(y_i|{y₁，…，y_i-1})＝g(y_i-1,s_i,c)

Where g denotes the transformation function of the intermediate semantic representation of the entire sentence, s_iIs the feature vector that has been obtained and c is the source sentence context.

The Encoder coding part of the Encoder-Decoder model is unchanged, the Decoder decoding part introduces an attention mechanism, context information depended on during decoding calculation is calculated according to the Encoder network hidden layer at moment and all moments on the Decoder network, the context information corresponding to different moments is different, wherein for each moment i, the probability that the current output is a correct result is calculated as follows

p(y_i|{y₁，…，y_i-1},C)＝g(y_i-1,s_i,c_i)

Wherein C represents intermediate semantic coding, at which time the source sentence context C is distinguished for the Decoder at different times, denoted as C_i，c_iThe calculation formula of (a) is as follows:

e_ij＝a(s_i-1,h)

c_ithe formula is calculated by weighted summation of hidden layer output at all time points of Encoder coding part, T represents the length of input sentence, a_ijAn attention distribution coefficient representing the jth word of the input sentence when the target outputs the ith word, s represents an intermediate coding vector, h represents a semantic code of the jth word in the input sentence, a(s)_i-1H) denotes the complex coding function, e_ijRepresenting the total vector. The Decoder decoding has different corresponding weights at different time, c_iThe same applies to the calculation of Decoder decoding hidden layer output to assist the hidden layer to be better expressed.

The invention uses the BLEU algorithm score to judge the translation effect.

In the invention, a reinforcement learning mechanism can be added in the translation process, wherein in the reinforcement learning mechanism, an Encoder-Decoder model translation frame is used as a perceptron, BLUE algorithm score is used as Environment, and when a source sentence X (X)₁,x₂……x_n-1,x_n) Input into an encoder and mapped into a coding vector Z (Z)₁,z₂……z_n-1,z_n) The translation framework translates the source sentence into Y (Y) through bidirectional decoding₁,y₂……y_n-1,y_n) In the process, the instant evaluation principle is used for reinforcement learning, sentences are translated each time, namely, interaction is carried out with the BLUE algorithm, and a translated sentence y is obtained_tAccording to the reward mechanism algorithm, the reward value R (y) of the translated sentence is obtained_t,s_t)，R(y_t,s_t) Namely, the quality evaluation of the translated sentence, namely the current BLUE score, and the data R (y) is obtained by the continuous interaction of the Agent and the Environment_t,s_t)，R(y_t,s_t) The maximum value indicates that the current translation effect is closest to the real sentence.

Compared with the prior art, the invention has the beneficial effects that:

the system architecture formed by an encoder formed by LSTM and a decoder formed by self-attention mechanism and GRU combines the characteristics of Mongolian language and Chinese language, the expression capability of a Mongolian Chinese machine translation system is smoother by steps, the expression capability of the Mongolian Chinese machine translation system is closer to the expression of human, and the semantic loss and the translation disorder degree in the translation process are reduced.

Drawings

Fig. 1 is a schematic diagram of the operation mechanism of LSTM.

Fig. 2 is a schematic diagram of the operation mechanism of forget in LSTM.

Fig. 3 is a diagram illustrating the calculation of the cell state at the current time in the LSTM.

Fig. 4 is a schematic diagram of the operating mechanism of output in the LSTM.

Fig. 5 is a schematic diagram of an operation mechanism of the GRU neural network.

Fig. 6 is a schematic diagram of an operation mechanism of the update in the GRU neural network.

Fig. 7 is a schematic diagram of the operation mechanism of reset in the GRU neural network.

Fig. 8 is a schematic diagram of calculation of current memory content in the GRU neural network.

FIG. 9 is a diagram of the calculation of the final memory of the current time step in the GRU neural network.

FIG. 10 is an inventive technique flow diagram.

FIG. 11 is a schematic diagram of a reinforcement learning mechanism.

Fig. 12 is a simulation of reinforcement learning reward mechanisms in the translation system.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

The invention is based on a GRU neural network which is variants of an LSTM neural network, the LSTM neural network is proposed for overcoming the defect that RNN cannot be used for long distance dependence, the LSTM network is different from the traditional neural network, an original neural network unit is modified into a CEC memory unit, the addition mechanism of the CEC memory unit enables gradient to be reserved, errors are transmitted, and the problem of gradient dispersion is solved.

The structure of the LSTM repeating network module is much more complex, it implements three calculations, namely forget , input and output . forget is responsible for deciding how many cell states at last are retained to the cell state at the current time, input is responsible for deciding how many cell states at the current time are retained to the cell state at the current time, output is responsible for deciding how many outputs are made to the cell state at the current time.

Forgetting As shown in FIG. 2, it is calculated which information needs to be forgotten, and the values are 0 to 1 after sigmoid processing, 1 represents all, 0 represents all forgetting, and there are

f_t＝σ(W_f·[h_t-1,x_t]+b_f)

Wherein the middle brackets indicate that two vectors are joined and merged, W_fThe weight matrix of forgetting is shown, sigma is sigmoid function, bf is bias term of forgetting , the dimension of an input layer is dx, the dimension of a hidden layer is dh, and the dimension of an upper state is dc, then W is_fHas a dimension of dc x (dh + dx), [ h × (dh + dx)_t-1,x_t]Representing the concatenation of two vectors into larger vectors.

The input is used to calculate which information is stored in the state unit, divided into two parts.

Section is_t＝σ(W_i·[h_t-1,x_t]+b_i)，b_iRepresenting the bias term of the input , the portion can be seen as how much of the current input needs to be saved to the cell state.

The second part is as follows:

this portion can be considered new information generated by the current input to be added to the cell state.

The cell state at the current time is obtained by adding the product of the forgotten input and the state at the previous time to the product of the two parts of the input , c_t-1Representing the state of the cell times, as in FIG. 3, i.e.

Calculating which information is required to be output through a sigmoid function, and multiplying the information by the value of the current unit state through a tanh function to obtain output, wherein W_oIs a weight matrix, b_oIs the bias term, as in fig. 4.

o_t＝σ(W_o·[h_t-1,x_t]+b_o)

h_t＝o_t*tanh(c_t)

LSTM has three structures and cell states, while GRU has only two structures, update and reset , which are structurally simpler, GRU may yield better than LSTM when GRU and LSTM have excellent results at the same time, GRU has fewer parameters in the training process, is relatively easy to train and can prevent overfitting, the mechanism of operation of GRU neural network is as shown in FIG. 5.

1 update :

z_t＝σ(W^(z)x_t+U^(z)h_t-1)

wherein x_tThe input vector at the t-th time step, i.e. the t-th component of the input sequence X, is subjected to linear transformations (with the weight matrix W)^(z)Multiplication). h is_(t-1)The information of the first time steps t-1 is saved, and it goes through linear transformations (and the weight matrix U)^(z)Multiply) update adds and drops the two pieces of information into the Sigmoid activation function, thus compressing the activation result between 0 and 1, as shown in fig. 6.

2 reset :

reset primarily determines how much past information needs to be forgotten, as shown in fig. 7, reset and update operate similarly and can be calculated using the following expression:

r_t＝σ(w^(r)x_t+U^(r)h_t-1)

w^(r)and U^(r)Respectively, representing different weight matrices.

3, current memory content:

in use of reset , the new memory will store past related information using reset , which is calculated as:

input x_tAnd upper time step information h_t-1First, linear transformations are performed, i.e., right-multiplying the matrices W and U, respectively.

Recalculate reset r_tAnd Uh_t-1Of Hadamard products, i.e. r_tAnd Uh_t-1Since the previously calculated reset is vectors of 0 to 1, it measures the size of gating on, for example, if an element corresponds to gating value 0, it represents that the element has been completely forgotten.

4 final memory of current time step

At the last step, the network needs to calculate h_tThe vector will retain the information of the current cell and pass to the next cells in this process an update is needed that determines the current memory content h'_tAnd h in the preceding time step_t-1What is the information that needs to be collected this process is expressed as:

h_t＝z_t*h_t-1+(1-z_t)*h′_t

z_tto update the activation result of , it also controls the flow of information in -controlled form_tAnd h_t-1The Hadamard product of (a) represents the information that was previously held to the final memory at time step , and the information that was held to the final memory by the current memory is equal to the content output by the final control loop unit, as shown in fig. 9.

Based on the basic principle, the Mongolian Chinese machine translation system of GRU neural networks is constructed, referring to FIG. 10, firstly, translation languages are preprocessed, the NLPIR word segmentation technology is used for carrying out word segmentation on Mongolian, then model building and training are carried out on certain-scale Mongolian bilingual, meanwhile, a self-attention mechanism is added to increase the translation effect, coding system processing is carried out on Mongolian bilingual corpus, and finally solving, optimizing and judging are carried out through BLU algorithm scoring.

1NLPIR word segmentation technology

The NLPIR word segmentation technology has excellent effect, is applied to , and has the realization principle of a word segmentation method based on word frequency statistics,

realizing Chinese word segmentation by layering of a laminated Markov model, comprising five steps of sentence segmentation, atom segmentation, preliminary segmentation, N shortest path segmentation and optimal segmentation result generation

(1) Punctuation sentence

Sentence break means that the source sentence is divided into a plurality of short sentences according to standard sentence division marks such as punctuation marks, division marks and the like. The short sentences obtained after sentence breaking are convenient for word segmentation processing, and finally word segmentation results of all the short sentences are connected to form word segmentation results of the whole sentence.

(2) Atom splitting

The atomic segmentation divides the short sentence into independent minimum morpheme units to prepare for the subsequent preliminary segmentation.

(3) Preliminary segmentation

The preliminary segmentation comprises two layers of circulation, wherein th layer of circulation traverses all atoms of a short sentence, the second layer of circulation continuously combines the current atom with the adjacent atoms behind the current atom and accesses a dictionary library to check whether the current combination is meaningful phrases, if the current phrase is hit in the dictionary library, the current phrase is recorded, otherwise, the inner layer of circulation is skipped out, the outer layer of circulation is continued, and all possible atom combinations are obtained through the preliminary segmentation.

(4) N shortest path slicing

And (4) N shortest path segmentation, wherein the basic idea is to reserve N results with the maximum segmentation probability as a candidate set of word segmentation results for generating the optimal segmentation result. And constructing a directed acyclic graph for the current sentence according to the preliminary segmentation result by the N shortest path segmentation, wherein the nodes of the graph represent characters or words, the edges of the graph represent the connection between adjacent characters or words, the edge weight represents the probability of the occurrence of the corresponding characters or words under the condition of the current characters or words, and the N shortest path segmentation is to keep the N segmentations with the maximum probability product as a candidate set. And obtaining N candidate segmentation results through N shortest path segmentation.

(5) Optimal segmentation result

After identifying unregistered words such as names, place names and the like (unregistered words refer to words which are not included in a word segmentation word list but need to be segmented independently under the current context, generally comprises names of people, place names, proper nouns and the like), scoring to obtain an optimal path, namely a final segmentation result.

2Encoder-Decoder model

The Encoder-Decoder model is composed of two parts of Encoder coding and Decoder, neural machine translation models are composed of two neural networks, wherein neural networks responsible for Encoder coding are LSTM, bidirectional coding technology is added, namely the source language to be translated is subjected to forward coding and reverse coding, source sentences are converted into coded and fixed-length coded vectors in two different directions, and GRUs responsible for Decoder decoding are coded from the forward direction and the reverse direction based on the fact that the coding is carried out from the two directions, therefore, decoding needs to be carried out from the two directions of the forward direction and the reverse direction when decoding the coded vectors.

(1) The calculation formula of the Encoder coding part is shown as h_t＝f(x_t,h_t-1)

And calculating the hidden layer output at the current moment according to the input at the current moment and the hidden layer output at the upper moment, obtaining the output at each moment through Encoder coding, and further calculating to obtain the characteristic representation c of the final source sentence context.

c＝h_t

Where the hidden layer output at the end time represents the context of the source sentence.

(2) The calculation formula of the Decoder decoding part is as follows:

the target function of the model is the probability that the source sentence is correctly translated into the target sentence, the model training process is the process of maximizing the probability that the source sentence in the training sample is correctly translated into the target sentence, and the probability that the current output is the correct result at each moment i is calculated as follows.

p(y_i|{y₁，…，y_i-1},C)＝g(y_i-1,s_i,c)

3 Encoder-Decoder model added with self-attention mechanism

A model based on an Attention mechanism (Attention) is based on an Encode-Decoder model, an Encode coding part is unchanged, a Decoder decoding part introduces an Attention mechanism, the Attention mechanism realizes soft alignment between words, and translation effect is improved.

The target function of the Decoder part model is the probability of correctly translating the source sentence into the target sentence, the model training process is the probability maximization process, but the probability calculation that the current output is the correct result at each time i is greatly different from the original Decoder, and the following calculation is carried out:

p(y_i|{y₁，…，y_i-1},C)＝g(y_i-1,s_i,c_i)

at this time, the source sentence context c is not simply represented by the output of the last-moment hidden layer, but is distinguished for the source sentence contexts of the Decoder at different moments, and is represented as c_i，c_iThe calculation formula of (a) is as follows:

this can be interpreted as a weighted sum of the hidden layer outputs for all times of the Encoder part, with different weights for different times of the Decoder, i.e. alignment in the sense of meaning_iThe method is also applied to the calculation of the output of the Decoder hidden layer, and the auxiliary hidden layer is better expressed. The formula is as follows:

e_ij＝a(s_i-1,h)

context information dependent on calculation of a hidden layer in a Decoder partial model based on an attention mechanism model is calculated according to an time hidden layer on a Decoder network and an Encoder network hidden layer at all times, and the context information corresponding to different times is different.

4BLEU scoring algorithm

In recent years, machine translation technology is rapidly developed, multiple automatic evaluation standards of translation technology are provided, and the evaluation standard widely applied and approved by at present is subjected to grading judgment by adopting a BLEU algorithm, wherein the BLEU algorithm is references for evaluating the machine translation technology at the present stage, and the basic idea of the algorithm is that the more N-grams ( types of statistical language models, including -element models, binary models, ternary models, quaternary models and the like) are, the more the translation to be evaluated and the provided reference translation are, the higher the quality of the translation result of machine translation is, and the calculation of the BLEU algorithm is shown as follows, wherein BP is a piecewise function

Wherein c represents the length of the translation to be evaluated, r represents the length of the reference translation, and the piecewise function BP

Namely length penalty factors which are related to the size relationship between c and r, N represents the number of N-gram models ( models correspond to N-grams), N and w represent the weight of the corresponding N-gram models, 1/N is usually taken, N is usually designated as 4 in most cases, np in the formula represents the matching accuracy of the corresponding models (i.e. co-occurrence N-gram occupation ratio), if any N meta-model is not matched, the BLEU value is 0 in this case, and the BLEU algorithm is meaningless, so the BLEU algorithm is not suitable for measuring the translation of a single statement, but is suitable for performing translation evaluation on more statements.

Detailed description of the invention

The whole operation specific algorithm is as follows:

1：loop

2: selecting Mongolian and Chinese bilingual corpus, and segmenting the Mongolian by using an NLPIR (non-linear regression with fuzzy inference engine) segmentation technology;

3: vectorizing Mongolian Chinese corpus;

4: performing modeling operation on the Mongolia according to an Encoder-Decoder model;

5: adopting an output function to carry out operation of output characteristics;

p(y_i|{y₁，…，y_i-1},C)＝g(y_i-1,s_i,c_i)

6：end loop。

, the invention can add reinforcement learning mechanism in the translation process, so called reinforcement learning takes action based on the environment to obtain the maximum expected benefit, the current signal and stimulus affect the following signal and stimulus every steps in reinforcement learning, and the Reward mechanism of reinforcement learning feeds back rewards according to the obtained translation content to make the translation effect advance to good directions, and the quality of the translation is more accurate and reliable by integrating the idea of reinforcement learning into the machine translation framework.

The reinforcement learning interaction block diagram is shown in FIG. 11, which shows an interaction serialization process of reinforcement learning, Agent represents an abstracted unit capable of sensing an external Environment, namely a perceptron, State represents a current State, Action is an Action taken in the current State, Reward represents feedback Reward mechanisms for currently taking the Action, and Environment represents the Environment of the current perceptron, Agent sends out an Action At based on the current State, and then Environment responds to generate a new State and corresponding Reward, so that the perceptron can intelligently execute the Action in each State through interactive Reward mechanisms.

In the Mongolian translation framework of an encoder-decoder, the translation framework is an Agent of an Agent. When source statement X (X)₁,x₂……x_n-1,x_n) Input into an encoder and mapped into a coding vector Z (Z)₁,z₂……z_n-1,z_n) The translation framework translates the source sentence into Y (Y) through bidirectional decoding₁,y₂……y_n-1,y_n) The procedure reinforcement learning uses the principle of instantaneous evaluation, and each time sentences are translated, the system will interact with BLUE algorithm (BLUE as Environment in reinforcement learning) to obtain the translated sentence y_tThe reinforcement learning can automatically obtain the reward value R (y) of the translated sentence according to the reward mechanism algorithm_t,s_t) In other words R (y)_t,s_t) Is the quality rating of the translated sentence, i.e. the current BLUE score. Agent (translation framework)) Continuous interaction with Environment (BLUE scoring criteria) to obtain data R (y)_t,s_t)，R(y_t,s_t) The maximum value indicates that the current translation effect is closest to the real sentence, and the system selects R (y)_t,s_t) As shown in FIG. 12, this is a simulation of simplified reinforcement learning reward mechanisms in the translation system, translating Chinese "I love Chinese" into Mongolian, the encoder-decoder framework is subject to three iterations to converge under the action of reinforcement learning, and iterations translate "I love Chinese" into Mongolian

The resulting prize value is R (y)_t,s_t) -5 and the second iteration translates into

The prize value at this time is R (y)_t,s_t) 1, the reinforcement learning reaches convergence by the third iteration and translates 'I love China' into

By the time all iterations have been completed for the third time, the reward value R (y)_t,s_t) 10. Comparing the reward value R (y) of each iteration through interaction_t,s_t) When the reward value is 10 at most, the system takes the translated Mongolian sentence with the maximum reward value as the final translation, namely, the translation of 'I love China' into Mongolian is determined as

So far, I love Chinese sentences and finish the translation, and thus the best translation is found to be the one with the largest reward value.

Therefore, by adding reinforcement learning and carrying out iteration for a plurality of times, optimized translation can be carried out according to data obtained by self learning to obtain optimally translated sentences. The translation system combines bidirectional encoding and bidirectional decoding, simultaneously adds a strengthened thought with human thoughts, and utilizes data generated by the translation system to learn so as to increase the translation effect of low-resource languages.

Claims

1, GRU neural network Mongolian Chinese machine translation method, firstly preprocessing translation language, then constructing and training an Encoder-Decoder model for definite-scale Mongolian Chinese bilingual, processing a coding system for Mongolian Chinese bilingual corpus, and finally obtaining a translation result based on the Encoder-Decoder model, which is characterized in that the Encoder-Decoder model is a neural machine translation model constructed by a neural network, wherein neural networks are LSTMs and are responsible for Encoder coding, a bidirectional coding setting is adopted, namely a source language is subjected to forward coding and reverse coding, a source sentence is converted into vectors with two different directions and fixed lengths, two vectors to be decoded contain all context information, the other neural networks are GRUs and are responsible for Decoder decoding, the decoding is carried out from the forward direction and the reverse direction, namely the context information is automatically integrated when the target language is decoded and output, and the fixed length vectors generated by coding are converted into a target sentence.

2. The GRU neural network Mongolian Chinese machine translation method according to claim 1, wherein the preprocessing of the translation language is to perform word segmentation on the translation language by using NLPIR word segmentation technology.

3. The GRU neural network Mongolian machine translation method according to claim 3, wherein the calculation formula of the Encoder code is as follows:

h_t＝f(x_t,h_t-1)

i.e. input x according to the current time_tAnd hidden layer output h at time _t-1Computing hidden layer output h at the current time_tThe output of each time is obtained through Encoder coding, and then the characteristic representation of the final source sentence context is obtained through calculation, namely, the characteristic representation of the final time is obtainedThe hidden layer outputs a context representing a source sentence;

the calculation formula of the Decoder decoding is as follows:

p(y_i|{y₁，…，y_i-1})＝g(y_i-1,s_i,c)

4. The GRU neural network Mongolian Chinese machine translation method according to claim 3, characterized in that an Encoder coding part of the Encoder-Decode model is unchanged, a Decode decoding part introduces a self-attention mechanism, context information dependent on decoding calculation is calculated according to Encoder network hidden layers at moments and all moments on the Decode network, the context information is different at different moments, wherein, for each moment i, the probability that the current output is a correct result is calculated as follows

p(y_i|{y₁，…，y_i-1},C)＝g(y_i-1,s_i,c_i)

e_ij＝a(s_i-1,h)

5. The GRU neural network Mongolian Chinese machine translation method as claimed in claim 3, wherein BLEU algorithm scoring is used for translation effect evaluation.

6. The GRU neural network Mongolian Chinese machine translation method according to claim 3, wherein a reinforcement learning mechanism is added in the translation process.

7. The GRU neural network Mongolian machine translation method according to claim 6, wherein in the reinforcement learning mechanism, an Encoder-Decoder model translation framework is used as a perceptron, BLUE algorithm scores are used as Environment, and when a source sentence X (X) is a source sentence X (X)₁,x₂……x_n-1,x_n) Input into an encoder and mapped into a coding vector Z (Z)₁,z₂……z_n-1,z_n) The translation framework translates the source sentence into Y (Y) through bidirectional decoding₁,y₂……y_n-1,y_n) In the process, the instant evaluation principle is used for reinforcement learning, sentences are translated each time, namely, interaction is carried out with the BLUE algorithm, and a translated sentence y is obtained_tAccording to the reward mechanism algorithm, the reward value R (y) of the translated sentence is obtained_t,s_t)，R(y_t,s_t) Namely, the quality evaluation of the translated sentence, namely the current BLUE score, and the data R (y) is obtained by the continuous interaction of the Agent and the Environment_t,s_t)，R(y_t,s_t) The maximum value indicates that the current translation effect is closest to the real sentence.