CN110738062A - GRU neural network Mongolian Chinese machine translation method - Google Patents

GRU neural network Mongolian Chinese machine translation method Download PDF

Info

Publication number
CN110738062A
CN110738062A CN201910940595.1A CN201910940595A CN110738062A CN 110738062 A CN110738062 A CN 110738062A CN 201910940595 A CN201910940595 A CN 201910940595A CN 110738062 A CN110738062 A CN 110738062A
Authority
CN
China
Prior art keywords
coding
sentence
translation
encoder
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910940595.1A
Other languages
Chinese (zh)
Inventor
苏依拉
卞乐乐
赵旭
薛媛
范婷婷
张振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN201910940595.1A priority Critical patent/CN110738062A/en
Publication of CN110738062A publication Critical patent/CN110738062A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

GRU neural network Mongolian Chinese machine translation method, firstly preprocessing translation language, then making construction and training of Encoder-Decoder model for scale Mongolian Chinese bilingual, making coding system processing for Mongolian Chinese bilingual corpus, finally obtaining translation result based on the Encoder-Decoder model, wherein the Encoder-Decoder model is constructed by neural networks, wherein neural networks are LSTM and are responsible for Encoder coding, and adopting bidirectional coding setting, namely, forward coding and reverse coding are carried out on source language, source sentences are converted into vectors with two different direction coding and fixed length, neural networks are GRU and are responsible for Decode decoding, and decoding is carried out from the forward direction and the reverse direction, namely, when decoding and outputting a target language, context information is automatically integrated, thereby converting vectors generated by coding into a target sentence, the invention combines the characteristics of the Mongolian Chinese machine translation system, makes the expression capability of the translation system smoother, more approximate to human expression, and reduces semantic loss and confusion degree of translation in translation process.

Description

GRU neural network Mongolian Chinese machine translation method
Technical Field
The invention belongs to the technical field of machine translation, relates to Mongolian Chinese machine translation, and particularly relates to GRU neural network Mongolian Chinese machine translation methods.
Background
At present, with the rapid development of the internet industry, series IT industries including information technology and the like are continuously growing, machine translation aiming at natural language processing plays a role in promoting the development of the whole internet industry by , large-scale search service industries such as Google, Baidu and the like carry out large-scale scientific researches aiming at the field of machine translation aiming at the development of the industry, and the continuous research is carried out for continuously obtaining high-quality translations.
For example, machine translation is relatively hard, wherein programs are designed well, the probability of errors in translation is very high, and sometimes even various grammatical errors occur, translation of long paragraphs is difficult to understand and does not conform to normal logic, readability of translated things is poor, grammatical features of sentences cannot be reflected, translated documents are relatively coarse and difficult to understand, simple words are all worded out, hard words are difficult to understand, translation of simple short sentences is only performed, and low quality of translated text due to differences of processing and grammatical structures of ambiguous words is particularly prominent in machine translation.
Although many problems in the statistical translation method, such as unclear decoding, wrong translation, word processing without login, and the like, can be alleviated to some extent in by knowing the meaning of sentences, the problems still have disadvantages in accuracy compared with manual translation.
In the translation system, the calculation complexity of an encoder and a decoder is higher, due to the limitation of calculation amount and GPU memory, a neural machine translation model needs to determine commonly used word lists with limited scale in advance, the neural machine translation system often limits the word lists into high-frequency words and regards all other low-frequency words as unregistered words, Mongolian belongs to sticky languages, characteristics of the sticky languages are that other word components are connected to the front, middle and postfixes of roots of the words to be used as new words for derivation, therefore, the Mongolian word components and the morphological transformation thereof are very abundant, and the phenomenon of foreign words and unregistered words is frequent.
Disclosure of Invention
In order to solve the problems of translation missing, wrong translation, word processing unknown and the like in the translation process mainly existing in the prior art, the invention aims to provide GRU neural network Mongolian Chinese machine translation methods, wherein the method is characterized in that a CPU and a GPU are used for processing a corpus in a parallel working mode, so that the speed is improved by nearly times, the corpus is learned by a set learning rate, the local optimal problem existing in the semantic expression process of the learned corpus and the problem of low coding quality caused by rapid convergence can be effectively relieved, the quality of the whole system is improved by setting a special structure and an algorithm, and aiming at the current situations of rare data and small dictionary in small corpus, the system complexity is reduced, the translation service quality of a user is ensured under the condition of a visual system structure of the user, so that the Mongolian Chinese machine translation system is perfected, and the target of better translation is achieved.
In order to achieve the purpose, the invention adopts the technical scheme that:
GRU neural network Mongolian Chinese machine translation method comprises the steps of preprocessing translation languages, building and training an Encoder-Decoder model for certain-scale Mongolian bilingual, processing a coding system for the Mongolian bilingual corpus, and obtaining translation results based on the Encoder-Decoder model.
The preprocessing of the translation language is to perform word segmentation on the translation language by using an NLPIR word segmentation technology.
The Encoder-Decoder model is a neural machine translation model constructed by neural networks, wherein neural networks are LSTMs and are responsible for Encoder coding, specifically, bidirectional coding setting is adopted, namely forward coding and reverse coding are carried out on a source language, a source sentence is converted into vectors which are coded in two different directions and have fixed lengths, neural networks are GRUs and are responsible for Decoder decoding, because an Encoder outputs two coded vectors, a Decoder also needs to decode from the forward direction and the reverse direction, because the two vectors to be decoded contain all context information, namely, when the Decoder outputs a target language, the context information can be automatically integrated, so that the fixed length vectors generated by coding are converted into a target sentence.
The calculation formula of the Encoder code is as follows:
ht=f(xt,ht-1)
i.e. input x according to the current timetAnd hidden layer output h at time t-1Computing hidden layer output h at the current timetThe output of each time is obtained through Encoder coding, and then the characteristic representation of the final source sentence context is obtained through calculation, namely, the final time is usedThe hidden layer output of the moment represents the context of the source sentence;
the calculation formula of the Decoder decoding is as follows:
Figure BDA0002222785840000031
wherein x1,…,xTIs an input sequence, y1,…,yT′Is the output sequence, V is the initial value of the decoder, i.e. x1,…,xTT is the length of the input sentence, T 'is the output sentence length, and T' are typically not equal in length;
the target function of the model is the probability that the source sentence is correctly translated into the target sentence;
the model training process is the process of maximizing the probability of correctly translating the source sentence into the target sentence in the training sample, and for each time i, the probability that the current output is a correct result is calculated as follows
p(yi|{y1,…,yi-1})=g(yi-1,si,c)
Where g denotes the transformation function of the intermediate semantic representation of the entire sentence, siIs the feature vector that has been obtained and c is the source sentence context.
The Encoder coding part of the Encoder-Decoder model is unchanged, the Decoder decoding part introduces an attention mechanism, context information depended on during decoding calculation is calculated according to the Encoder network hidden layer at moment and all moments on the Decoder network, the context information corresponding to different moments is different, wherein for each moment i, the probability that the current output is a correct result is calculated as follows
p(yi|{y1,…,yi-1},C)=g(yi-1,si,ci)
Wherein C represents intermediate semantic coding, at which time the source sentence context C is distinguished for the Decoder at different times, denoted as Ci,ciThe calculation formula of (a) is as follows:
Figure BDA0002222785840000042
eij=a(si-1,h)
cithe formula is calculated by weighted summation of hidden layer output at all time points of Encoder coding part, T represents the length of input sentence, aijAn attention distribution coefficient representing the jth word of the input sentence when the target outputs the ith word, s represents an intermediate coding vector, h represents a semantic code of the jth word in the input sentence, a(s)i-1H) denotes the complex coding function, eijRepresenting the total vector. The Decoder decoding has different corresponding weights at different time, ciThe same applies to the calculation of Decoder decoding hidden layer output to assist the hidden layer to be better expressed.
The invention uses the BLEU algorithm score to judge the translation effect.
In the invention, a reinforcement learning mechanism can be added in the translation process, wherein in the reinforcement learning mechanism, an Encoder-Decoder model translation frame is used as a perceptron, BLUE algorithm score is used as Environment, and when a source sentence X (X)1,x2……xn-1,xn) Input into an encoder and mapped into a coding vector Z (Z)1,z2……zn-1,zn) The translation framework translates the source sentence into Y (Y) through bidirectional decoding1,y2……yn-1,yn) In the process, the instant evaluation principle is used for reinforcement learning, sentences are translated each time, namely, interaction is carried out with the BLUE algorithm, and a translated sentence y is obtainedtAccording to the reward mechanism algorithm, the reward value R (y) of the translated sentence is obtainedt,st),R(yt,st) Namely, the quality evaluation of the translated sentence, namely the current BLUE score, and the data R (y) is obtained by the continuous interaction of the Agent and the Environmentt,st),R(yt,st) The maximum value indicates that the current translation effect is closest to the real sentence.
Compared with the prior art, the invention has the beneficial effects that:
the system architecture formed by an encoder formed by LSTM and a decoder formed by self-attention mechanism and GRU combines the characteristics of Mongolian language and Chinese language, the expression capability of a Mongolian Chinese machine translation system is smoother by steps, the expression capability of the Mongolian Chinese machine translation system is closer to the expression of human, and the semantic loss and the translation disorder degree in the translation process are reduced.
Drawings
Fig. 1 is a schematic diagram of the operation mechanism of LSTM.
Fig. 2 is a schematic diagram of the operation mechanism of forget in LSTM.
Fig. 3 is a diagram illustrating the calculation of the cell state at the current time in the LSTM.
Fig. 4 is a schematic diagram of the operating mechanism of output in the LSTM.
Fig. 5 is a schematic diagram of an operation mechanism of the GRU neural network.
Fig. 6 is a schematic diagram of an operation mechanism of the update in the GRU neural network.
Fig. 7 is a schematic diagram of the operation mechanism of reset in the GRU neural network.
Fig. 8 is a schematic diagram of calculation of current memory content in the GRU neural network.
FIG. 9 is a diagram of the calculation of the final memory of the current time step in the GRU neural network.
FIG. 10 is an inventive technique flow diagram.
FIG. 11 is a schematic diagram of a reinforcement learning mechanism.
Fig. 12 is a simulation of reinforcement learning reward mechanisms in the translation system.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
The invention is based on a GRU neural network which is variants of an LSTM neural network, the LSTM neural network is proposed for overcoming the defect that RNN cannot be used for long distance dependence, the LSTM network is different from the traditional neural network, an original neural network unit is modified into a CEC memory unit, the addition mechanism of the CEC memory unit enables gradient to be reserved, errors are transmitted, and the problem of gradient dispersion is solved.
The structure of the LSTM repeating network module is much more complex, it implements three calculations, namely forget , input and output . forget is responsible for deciding how many cell states at last are retained to the cell state at the current time, input is responsible for deciding how many cell states at the current time are retained to the cell state at the current time, output is responsible for deciding how many outputs are made to the cell state at the current time.
Forgetting As shown in FIG. 2, it is calculated which information needs to be forgotten, and the values are 0 to 1 after sigmoid processing, 1 represents all, 0 represents all forgetting, and there are
ft=σ(Wf·[ht-1,xt]+bf)
Wherein the middle brackets indicate that two vectors are joined and merged, WfThe weight matrix of forgetting is shown, sigma is sigmoid function, bf is bias term of forgetting , the dimension of an input layer is dx, the dimension of a hidden layer is dh, and the dimension of an upper state is dc, then W isfHas a dimension of dc x (dh + dx), [ h × (dh + dx)t-1,xt]Representing the concatenation of two vectors into larger vectors.
The input is used to calculate which information is stored in the state unit, divided into two parts.
Section ist=σ(Wi·[ht-1,xt]+bi),biRepresenting the bias term of the input , the portion can be seen as how much of the current input needs to be saved to the cell state.
The second part is as follows:
Figure BDA0002222785840000061
this portion can be considered new information generated by the current input to be added to the cell state.
The cell state at the current time is obtained by adding the product of the forgotten input and the state at the previous time to the product of the two parts of the input , ct-1Representing the state of the cell times, as in FIG. 3, i.e.
Figure BDA0002222785840000062
Calculating which information is required to be output through a sigmoid function, and multiplying the information by the value of the current unit state through a tanh function to obtain output, wherein WoIs a weight matrix, boIs the bias term, as in fig. 4.
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(ct)
LSTM has three structures and cell states, while GRU has only two structures, update and reset , which are structurally simpler, GRU may yield better than LSTM when GRU and LSTM have excellent results at the same time, GRU has fewer parameters in the training process, is relatively easy to train and can prevent overfitting, the mechanism of operation of GRU neural network is as shown in FIG. 5.
1 update :
zt=σ(W(z)xt+U(z)ht-1)
wherein xtThe input vector at the t-th time step, i.e. the t-th component of the input sequence X, is subjected to linear transformations (with the weight matrix W)(z)Multiplication). h is(t-1)The information of the first time steps t-1 is saved, and it goes through linear transformations (and the weight matrix U)(z)Multiply) update adds and drops the two pieces of information into the Sigmoid activation function, thus compressing the activation result between 0 and 1, as shown in fig. 6.
2 reset :
reset primarily determines how much past information needs to be forgotten, as shown in fig. 7, reset and update operate similarly and can be calculated using the following expression:
rt=σ(w(r)xt+U(r)ht-1)
w(r)and U(r)Respectively, representing different weight matrices.
3, current memory content:
in use of reset , the new memory will store past related information using reset , which is calculated as:
Figure BDA0002222785840000071
input xtAnd upper time step information ht-1First, linear transformations are performed, i.e., right-multiplying the matrices W and U, respectively.
Recalculate reset rtAnd Uht-1Of Hadamard products, i.e. rtAnd Uht-1Since the previously calculated reset is vectors of 0 to 1, it measures the size of gating on, for example, if an element corresponds to gating value 0, it represents that the element has been completely forgotten.
4 final memory of current time step
At the last step, the network needs to calculate htThe vector will retain the information of the current cell and pass to the next cells in this process an update is needed that determines the current memory content h'tAnd h in the preceding time stept-1What is the information that needs to be collected this process is expressed as:
ht=zt*ht-1+(1-zt)*h′t
ztto update the activation result of , it also controls the flow of information in -controlled formtAnd ht-1The Hadamard product of (a) represents the information that was previously held to the final memory at time step , and the information that was held to the final memory by the current memory is equal to the content output by the final control loop unit, as shown in fig. 9.
Based on the basic principle, the Mongolian Chinese machine translation system of GRU neural networks is constructed, referring to FIG. 10, firstly, translation languages are preprocessed, the NLPIR word segmentation technology is used for carrying out word segmentation on Mongolian, then model building and training are carried out on certain-scale Mongolian bilingual, meanwhile, a self-attention mechanism is added to increase the translation effect, coding system processing is carried out on Mongolian bilingual corpus, and finally solving, optimizing and judging are carried out through BLU algorithm scoring.
1NLPIR word segmentation technology
The NLPIR word segmentation technology has excellent effect, is applied to , and has the realization principle of a word segmentation method based on word frequency statistics,
realizing Chinese word segmentation by layering of a laminated Markov model, comprising five steps of sentence segmentation, atom segmentation, preliminary segmentation, N shortest path segmentation and optimal segmentation result generation
(1) Punctuation sentence
Sentence break means that the source sentence is divided into a plurality of short sentences according to standard sentence division marks such as punctuation marks, division marks and the like. The short sentences obtained after sentence breaking are convenient for word segmentation processing, and finally word segmentation results of all the short sentences are connected to form word segmentation results of the whole sentence.
(2) Atom splitting
The atomic segmentation divides the short sentence into independent minimum morpheme units to prepare for the subsequent preliminary segmentation.
(3) Preliminary segmentation
The preliminary segmentation comprises two layers of circulation, wherein th layer of circulation traverses all atoms of a short sentence, the second layer of circulation continuously combines the current atom with the adjacent atoms behind the current atom and accesses a dictionary library to check whether the current combination is meaningful phrases, if the current phrase is hit in the dictionary library, the current phrase is recorded, otherwise, the inner layer of circulation is skipped out, the outer layer of circulation is continued, and all possible atom combinations are obtained through the preliminary segmentation.
(4) N shortest path slicing
And (4) N shortest path segmentation, wherein the basic idea is to reserve N results with the maximum segmentation probability as a candidate set of word segmentation results for generating the optimal segmentation result. And constructing a directed acyclic graph for the current sentence according to the preliminary segmentation result by the N shortest path segmentation, wherein the nodes of the graph represent characters or words, the edges of the graph represent the connection between adjacent characters or words, the edge weight represents the probability of the occurrence of the corresponding characters or words under the condition of the current characters or words, and the N shortest path segmentation is to keep the N segmentations with the maximum probability product as a candidate set. And obtaining N candidate segmentation results through N shortest path segmentation.
(5) Optimal segmentation result
After identifying unregistered words such as names, place names and the like (unregistered words refer to words which are not included in a word segmentation word list but need to be segmented independently under the current context, generally comprises names of people, place names, proper nouns and the like), scoring to obtain an optimal path, namely a final segmentation result.
2Encoder-Decoder model
The Encoder-Decoder model is composed of two parts of Encoder coding and Decoder, neural machine translation models are composed of two neural networks, wherein neural networks responsible for Encoder coding are LSTM, bidirectional coding technology is added, namely the source language to be translated is subjected to forward coding and reverse coding, source sentences are converted into coded and fixed-length coded vectors in two different directions, and GRUs responsible for Decoder decoding are coded from the forward direction and the reverse direction based on the fact that the coding is carried out from the two directions, therefore, decoding needs to be carried out from the two directions of the forward direction and the reverse direction when decoding the coded vectors.
(1) The calculation formula of the Encoder coding part is shown as ht=f(xt,ht-1)
And calculating the hidden layer output at the current moment according to the input at the current moment and the hidden layer output at the upper moment, obtaining the output at each moment through Encoder coding, and further calculating to obtain the characteristic representation c of the final source sentence context.
c=ht
Where the hidden layer output at the end time represents the context of the source sentence.
(2) The calculation formula of the Decoder decoding part is as follows:
Figure BDA0002222785840000101
the target function of the model is the probability that the source sentence is correctly translated into the target sentence, the model training process is the process of maximizing the probability that the source sentence in the training sample is correctly translated into the target sentence, and the probability that the current output is the correct result at each moment i is calculated as follows.
p(yi|{y1,…,yi-1},C)=g(yi-1,si,c)
3 Encoder-Decoder model added with self-attention mechanism
A model based on an Attention mechanism (Attention) is based on an Encode-Decoder model, an Encode coding part is unchanged, a Decoder decoding part introduces an Attention mechanism, the Attention mechanism realizes soft alignment between words, and translation effect is improved.
The target function of the Decoder part model is the probability of correctly translating the source sentence into the target sentence, the model training process is the probability maximization process, but the probability calculation that the current output is the correct result at each time i is greatly different from the original Decoder, and the following calculation is carried out:
p(yi|{y1,…,yi-1},C)=g(yi-1,si,ci)
at this time, the source sentence context c is not simply represented by the output of the last-moment hidden layer, but is distinguished for the source sentence contexts of the Decoder at different moments, and is represented as ci,ciThe calculation formula of (a) is as follows:
Figure BDA0002222785840000111
this can be interpreted as a weighted sum of the hidden layer outputs for all times of the Encoder part, with different weights for different times of the Decoder, i.e. alignment in the sense of meaningiThe method is also applied to the calculation of the output of the Decoder hidden layer, and the auxiliary hidden layer is better expressed. The formula is as follows:
Figure BDA0002222785840000112
eij=a(si-1,h)
context information dependent on calculation of a hidden layer in a Decoder partial model based on an attention mechanism model is calculated according to an time hidden layer on a Decoder network and an Encoder network hidden layer at all times, and the context information corresponding to different times is different.
4BLEU scoring algorithm
In recent years, machine translation technology is rapidly developed, multiple automatic evaluation standards of translation technology are provided, and the evaluation standard widely applied and approved by at present is subjected to grading judgment by adopting a BLEU algorithm, wherein the BLEU algorithm is references for evaluating the machine translation technology at the present stage, and the basic idea of the algorithm is that the more N-grams ( types of statistical language models, including -element models, binary models, ternary models, quaternary models and the like) are, the more the translation to be evaluated and the provided reference translation are, the higher the quality of the translation result of machine translation is, and the calculation of the BLEU algorithm is shown as follows, wherein BP is a piecewise function
Figure BDA0002222785840000113
Wherein c represents the length of the translation to be evaluated, r represents the length of the reference translation, and the piecewise function BP
Namely length penalty factors which are related to the size relationship between c and r, N represents the number of N-gram models ( models correspond to N-grams), N and w represent the weight of the corresponding N-gram models, 1/N is usually taken, N is usually designated as 4 in most cases, np in the formula represents the matching accuracy of the corresponding models (i.e. co-occurrence N-gram occupation ratio), if any N meta-model is not matched, the BLEU value is 0 in this case, and the BLEU algorithm is meaningless, so the BLEU algorithm is not suitable for measuring the translation of a single statement, but is suitable for performing translation evaluation on more statements.
Detailed description of the invention
The whole operation specific algorithm is as follows:
1:loop
2: selecting Mongolian and Chinese bilingual corpus, and segmenting the Mongolian by using an NLPIR (non-linear regression with fuzzy inference engine) segmentation technology;
3: vectorizing Mongolian Chinese corpus;
4: performing modeling operation on the Mongolia according to an Encoder-Decoder model;
5: adopting an output function to carry out operation of output characteristics;
p(yi|{y1,…,yi-1},C)=g(yi-1,si,ci)
6:end loop。
, the invention can add reinforcement learning mechanism in the translation process, so called reinforcement learning takes action based on the environment to obtain the maximum expected benefit, the current signal and stimulus affect the following signal and stimulus every steps in reinforcement learning, and the Reward mechanism of reinforcement learning feeds back rewards according to the obtained translation content to make the translation effect advance to good directions, and the quality of the translation is more accurate and reliable by integrating the idea of reinforcement learning into the machine translation framework.
The reinforcement learning interaction block diagram is shown in FIG. 11, which shows an interaction serialization process of reinforcement learning, Agent represents an abstracted unit capable of sensing an external Environment, namely a perceptron, State represents a current State, Action is an Action taken in the current State, Reward represents feedback Reward mechanisms for currently taking the Action, and Environment represents the Environment of the current perceptron, Agent sends out an Action At based on the current State, and then Environment responds to generate a new State and corresponding Reward, so that the perceptron can intelligently execute the Action in each State through interactive Reward mechanisms.
In the Mongolian translation framework of an encoder-decoder, the translation framework is an Agent of an Agent. When source statement X (X)1,x2……xn-1,xn) Input into an encoder and mapped into a coding vector Z (Z)1,z2……zn-1,zn) The translation framework translates the source sentence into Y (Y) through bidirectional decoding1,y2……yn-1,yn) The procedure reinforcement learning uses the principle of instantaneous evaluation, and each time sentences are translated, the system will interact with BLUE algorithm (BLUE as Environment in reinforcement learning) to obtain the translated sentence ytThe reinforcement learning can automatically obtain the reward value R (y) of the translated sentence according to the reward mechanism algorithmt,st) In other words R (y)t,st) Is the quality rating of the translated sentence, i.e. the current BLUE score. Agent (translation framework)) Continuous interaction with Environment (BLUE scoring criteria) to obtain data R (y)t,st),R(yt,st) The maximum value indicates that the current translation effect is closest to the real sentence, and the system selects R (y)t,st) As shown in FIG. 12, this is a simulation of simplified reinforcement learning reward mechanisms in the translation system, translating Chinese "I love Chinese" into Mongolian, the encoder-decoder framework is subject to three iterations to converge under the action of reinforcement learning, and iterations translate "I love Chinese" into Mongolian
Figure BDA0002222785840000131
The resulting prize value is R (y)t,st) -5 and the second iteration translates into
Figure BDA0002222785840000132
The prize value at this time is R (y)t,st) 1, the reinforcement learning reaches convergence by the third iteration and translates 'I love China' into
Figure BDA0002222785840000134
By the time all iterations have been completed for the third time, the reward value R (y)t,st) 10. Comparing the reward value R (y) of each iteration through interactiont,st) When the reward value is 10 at most, the system takes the translated Mongolian sentence with the maximum reward value as the final translation, namely, the translation of 'I love China' into Mongolian is determined as
Figure BDA0002222785840000135
So far, I love Chinese sentences and finish the translation, and thus the best translation is found to be the one with the largest reward value.
Therefore, by adding reinforcement learning and carrying out iteration for a plurality of times, optimized translation can be carried out according to data obtained by self learning to obtain optimally translated sentences. The translation system combines bidirectional encoding and bidirectional decoding, simultaneously adds a strengthened thought with human thoughts, and utilizes data generated by the translation system to learn so as to increase the translation effect of low-resource languages.

Claims (7)

1, GRU neural network Mongolian Chinese machine translation method, firstly preprocessing translation language, then constructing and training an Encoder-Decoder model for definite-scale Mongolian Chinese bilingual, processing a coding system for Mongolian Chinese bilingual corpus, and finally obtaining a translation result based on the Encoder-Decoder model, which is characterized in that the Encoder-Decoder model is a neural machine translation model constructed by a neural network, wherein neural networks are LSTMs and are responsible for Encoder coding, a bidirectional coding setting is adopted, namely a source language is subjected to forward coding and reverse coding, a source sentence is converted into vectors with two different directions and fixed lengths, two vectors to be decoded contain all context information, the other neural networks are GRUs and are responsible for Decoder decoding, the decoding is carried out from the forward direction and the reverse direction, namely the context information is automatically integrated when the target language is decoded and output, and the fixed length vectors generated by coding are converted into a target sentence.
2. The GRU neural network Mongolian Chinese machine translation method according to claim 1, wherein the preprocessing of the translation language is to perform word segmentation on the translation language by using NLPIR word segmentation technology.
3. The GRU neural network Mongolian machine translation method according to claim 3, wherein the calculation formula of the Encoder code is as follows:
ht=f(xt,ht-1)
i.e. input x according to the current timetAnd hidden layer output h at time t-1Computing hidden layer output h at the current timetThe output of each time is obtained through Encoder coding, and then the characteristic representation of the final source sentence context is obtained through calculation, namely, the characteristic representation of the final time is obtainedThe hidden layer outputs a context representing a source sentence;
the calculation formula of the Decoder decoding is as follows:
Figure FDA0002222785830000011
wherein x1,…,xTIs an input sequence, y1,…,yT′Is the output sequence, V is the initial value of the decoder, i.e. x1,…,xTT is the length of the input sentence, T 'is the output sentence length, and T' are typically not equal in length;
the target function of the model is the probability that the source sentence is correctly translated into the target sentence;
the model training process is the process of maximizing the probability of correctly translating the source sentence into the target sentence in the training sample, and for each time i, the probability that the current output is a correct result is calculated as follows
p(yi|{y1,…,yi-1})=g(yi-1,si,c)
Where g denotes the transformation function of the intermediate semantic representation of the entire sentence, siIs the feature vector that has been obtained and c is the source sentence context.
4. The GRU neural network Mongolian Chinese machine translation method according to claim 3, characterized in that an Encoder coding part of the Encoder-Decode model is unchanged, a Decode decoding part introduces a self-attention mechanism, context information dependent on decoding calculation is calculated according to Encoder network hidden layers at moments and all moments on the Decode network, the context information is different at different moments, wherein, for each moment i, the probability that the current output is a correct result is calculated as follows
p(yi|{y1,…,yi-1},C)=g(yi-1,si,ci)
Wherein C represents intermediate semantic coding, at which time the source sentence context C is distinguished for the Decoder at different times, denoted as Ci,ciThe calculation formula of (a) is as follows:
Figure FDA0002222785830000021
Figure FDA0002222785830000022
eij=a(si-1,h)
cithe formula is calculated by weighted summation of hidden layer output at all time points of Encoder coding part, T represents the length of input sentence, aijAn attention distribution coefficient representing the jth word of the input sentence when the target outputs the ith word, s represents an intermediate coding vector, h represents a semantic code of the jth word in the input sentence, a(s)i-1H) denotes the complex coding function, eijRepresenting the total vector. The Decoder decoding has different corresponding weights at different time, ciThe same applies to the calculation of Decoder decoding hidden layer output to assist the hidden layer to be better expressed.
5. The GRU neural network Mongolian Chinese machine translation method as claimed in claim 3, wherein BLEU algorithm scoring is used for translation effect evaluation.
6. The GRU neural network Mongolian Chinese machine translation method according to claim 3, wherein a reinforcement learning mechanism is added in the translation process.
7. The GRU neural network Mongolian machine translation method according to claim 6, wherein in the reinforcement learning mechanism, an Encoder-Decoder model translation framework is used as a perceptron, BLUE algorithm scores are used as Environment, and when a source sentence X (X) is a source sentence X (X)1,x2……xn-1,xn) Input into an encoder and mapped into a coding vector Z (Z)1,z2……zn-1,zn) The translation framework translates the source sentence into Y (Y) through bidirectional decoding1,y2……yn-1,yn) In the process, the instant evaluation principle is used for reinforcement learning, sentences are translated each time, namely, interaction is carried out with the BLUE algorithm, and a translated sentence y is obtainedtAccording to the reward mechanism algorithm, the reward value R (y) of the translated sentence is obtainedt,st),R(yt,st) Namely, the quality evaluation of the translated sentence, namely the current BLUE score, and the data R (y) is obtained by the continuous interaction of the Agent and the Environmentt,st),R(yt,st) The maximum value indicates that the current translation effect is closest to the real sentence.
CN201910940595.1A 2019-09-30 2019-09-30 GRU neural network Mongolian Chinese machine translation method Pending CN110738062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910940595.1A CN110738062A (en) 2019-09-30 2019-09-30 GRU neural network Mongolian Chinese machine translation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910940595.1A CN110738062A (en) 2019-09-30 2019-09-30 GRU neural network Mongolian Chinese machine translation method

Publications (1)

Publication Number Publication Date
CN110738062A true CN110738062A (en) 2020-01-31

Family

ID=69268384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910940595.1A Pending CN110738062A (en) 2019-09-30 2019-09-30 GRU neural network Mongolian Chinese machine translation method

Country Status (1)

Country Link
CN (1) CN110738062A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291534A (en) * 2020-02-03 2020-06-16 苏州科技大学 Global coding method for automatic summarization of Chinese long text
CN112070208A (en) * 2020-08-05 2020-12-11 同济大学 Tool wear prediction method based on encoder-decoder stage attention mechanism
CN112215017A (en) * 2020-10-22 2021-01-12 内蒙古工业大学 Mongolian Chinese machine translation method based on pseudo parallel corpus construction
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112580372A (en) * 2020-12-26 2021-03-30 内蒙古工业大学 Mongolian Chinese neural machine translation method based on Actor-Critic
CN112597780A (en) * 2020-12-28 2021-04-02 焦点科技股份有限公司 Multi-language mixed heterogeneous neural network machine learning translation method
CN113191165A (en) * 2021-07-01 2021-07-30 南京新一代人工智能研究院有限公司 Method for avoiding duplication of machine translation fragments
CN113205792A (en) * 2021-04-08 2021-08-03 内蒙古工业大学 Mongolian speech synthesis method based on Transformer and WaveNet
CN113408781A (en) * 2021-04-30 2021-09-17 南通大学 Encoder-Decoder-based long-term traffic flow prediction method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920468A (en) * 2018-05-07 2018-11-30 内蒙古工业大学 A kind of bilingual kind of inter-translation method of illiteracy Chinese based on intensified learning
CN109472024A (en) * 2018-10-25 2019-03-15 安徽工业大学 A kind of file classification method based on bidirectional circulating attention neural network
CN109508462A (en) * 2018-10-25 2019-03-22 内蒙古工业大学 A kind of neural network illiteracy Chinese machine translation method based on coder-decoder
CN109598002A (en) * 2018-11-15 2019-04-09 重庆邮电大学 Neural machine translation method and system based on bidirectional circulating neural network
CN109791631A (en) * 2016-08-25 2019-05-21 谷歌有限责任公司 Reward enhancing model training

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109791631A (en) * 2016-08-25 2019-05-21 谷歌有限责任公司 Reward enhancing model training
CN108920468A (en) * 2018-05-07 2018-11-30 内蒙古工业大学 A kind of bilingual kind of inter-translation method of illiteracy Chinese based on intensified learning
CN109472024A (en) * 2018-10-25 2019-03-15 安徽工业大学 A kind of file classification method based on bidirectional circulating attention neural network
CN109508462A (en) * 2018-10-25 2019-03-22 内蒙古工业大学 A kind of neural network illiteracy Chinese machine translation method based on coder-decoder
CN109598002A (en) * 2018-11-15 2019-04-09 重庆邮电大学 Neural machine translation method and system based on bidirectional circulating neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
乌云塔那: "基于神经网络的蒙汉机器翻译研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291534A (en) * 2020-02-03 2020-06-16 苏州科技大学 Global coding method for automatic summarization of Chinese long text
CN112070208A (en) * 2020-08-05 2020-12-11 同济大学 Tool wear prediction method based on encoder-decoder stage attention mechanism
CN112070208B (en) * 2020-08-05 2022-08-30 同济大学 Tool wear prediction method based on encoder-decoder stage attention mechanism
CN112215017B (en) * 2020-10-22 2022-04-29 内蒙古工业大学 Mongolian Chinese machine translation method based on pseudo parallel corpus construction
CN112215017A (en) * 2020-10-22 2021-01-12 内蒙古工业大学 Mongolian Chinese machine translation method based on pseudo parallel corpus construction
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112329760B (en) * 2020-11-17 2021-12-21 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112580372A (en) * 2020-12-26 2021-03-30 内蒙古工业大学 Mongolian Chinese neural machine translation method based on Actor-Critic
CN112597780A (en) * 2020-12-28 2021-04-02 焦点科技股份有限公司 Multi-language mixed heterogeneous neural network machine learning translation method
CN113205792A (en) * 2021-04-08 2021-08-03 内蒙古工业大学 Mongolian speech synthesis method based on Transformer and WaveNet
CN113408781A (en) * 2021-04-30 2021-09-17 南通大学 Encoder-Decoder-based long-term traffic flow prediction method
CN113191165A (en) * 2021-07-01 2021-07-30 南京新一代人工智能研究院有限公司 Method for avoiding duplication of machine translation fragments
CN113191165B (en) * 2021-07-01 2021-09-24 南京新一代人工智能研究院有限公司 Method for avoiding duplication of machine translation fragments

Similar Documents

Publication Publication Date Title
CN110738062A (en) GRU neural network Mongolian Chinese machine translation method
CN109359294B (en) Ancient Chinese translation method based on neural machine translation
CN111368565A (en) Text translation method, text translation device, storage medium and computer equipment
CN111078866B (en) Chinese text abstract generation method based on sequence-to-sequence model
CN110674646A (en) Mongolian Chinese machine translation system based on byte pair encoding technology
CN110688862A (en) Mongolian-Chinese inter-translation method based on transfer learning
CN109271629B (en) Method for generating text abstract based on reinforcement learning
CN108415906B (en) Automatic identification discourse machine translation method and machine translation system based on field
CN112765345A (en) Text abstract automatic generation method and system fusing pre-training model
CN111651589B (en) Two-stage text abstract generation method for long document
CN110032638B (en) Encoder-decoder-based generative abstract extraction method
CN110569505B (en) Text input method and device
CN111125333B (en) Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN111666756B (en) Sequence model text abstract generation method based on theme fusion
CN110717345B (en) Translation realignment recurrent neural network cross-language machine translation method
CN110532555B (en) Language evaluation generation method based on reinforcement learning
CN111708877B (en) Text abstract generation method based on key information selection and variational potential variable modeling
CN111144410B (en) Cross-modal image semantic extraction method, system, equipment and medium
CN111309896B (en) Deep learning text abstract generation method based on secondary attention
CN114691858B (en) Improved UNILM digest generation method
CN116663578A (en) Neural machine translation method based on strategy gradient method improvement
CN113076718B (en) Commodity attribute extraction method and system
CN109918484B (en) Dialog generation method and device
CN116432637A (en) Multi-granularity extraction-generation hybrid abstract method based on reinforcement learning
CN115719072A (en) Chapter-level neural machine translation method and system based on mask mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200131