CN112364668A

CN112364668A - Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine

Info

Publication number: CN112364668A
Application number: CN202011250507.4A
Authority: CN
Inventors: 苏依拉; 赵旭; 薛媛; 卞乐乐; 范婷婷; 仁庆道尔吉
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2021-02-12

Abstract

A Mongolian Chinese machine translation method based on a model irrelevant element learning strategy and a differentiable neural machine divides Chinese words, then constructs a Mongolian Chinese bilingual dictionary and obtains a Mongolian Chinese bilingual word vector matrix; initializing a local optimal task parameter by using an MAML method, namely a model initialization parameter in Mongolian translation; and building a Mongolian Chinese translation model by adopting a differentiable neural machine based on the model initialization parameters. The invention uses a differentiable neural machine and a model independent element learning strategy, wherein the model independent element learning strategy is used for initializing parameters, is reconstructed based on RNN and LSTM, and processes long-term semantics by means of a memory management mechanism and selective reading, thereby further promoting the solution of sequence problems and improving the translation performance. Particularly, aiming at the problems of data sparseness and undersize generated dictionaries in the linguistic data of the small languages, the problem of semantic information deficiency in the translation process and the like, the Mongolian Chinese machine translation system can be further improved, and better translation performance is realized.

Description

Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine

Technical Field

The invention belongs to the technical field of deep Learning machine translation, and particularly relates to a Mongolian machine translation method based on a Model-independent Meta-Learning (MAML) strategy and a differentiable neural machine (DNC).

Background

The current deep learning field is continuously developed, the natural language processing field is greatly developed, and especially under the artificial intelligence era, machine translation plays an increasingly important role in the development of the internet field.

However, the problems revealed in the field of natural language processing are still existed, and although many scientists continuously strive to improve the quality of translation, the problems are inevitable, such as ambiguous word processing, judgment and representation of unknown words, lack of semantic information, errors of bilingual corresponding word vectors, lack of corpus and the like, which greatly affect the quality of machine translation. At present, many famous universities and research institutes abroad try to solve the problems, and various schemes are proposed and implemented, but at present, most of the solutions solve the problems in one aspect rather than the problems on the whole. Under the continuous promotion of deep learning, the machine translation level has long-term progress compared with the prior machine translation based on statistics, but the existing problems of ambiguity, unknown words, semantic lack and the like do not have good solutions.

At present, machine translation based on attention mechanism has become a mainstream model and achieves the best performance at present, but fundamentally, the machine translation does not solve the problem deeply and can only relieve the problem to a certain extent.

Recent research shows that the pre-training model occupies a major position in the current machine translation field, and various pre-training models proposed by various companies and research institutes certainly solve the problems of statistical machine translation in a certain aspect, but no system completely solving the translation problems appears, so that on the premise of rough translation, how to make fine translation is the current work focus. Particularly, careful processing is needed for the lack of semantic information and word vectors in low-resource languages such as Mongolian.

Disclosure of Invention

In order to overcome the drawbacks of the prior art and further improve the performance of machine translation, the present invention provides a method for montmorillohman machine translation based on a model independent meta-learning strategy and a differentiable neural machine, which uses a differentiable neural machine (DNC) and a model independent meta-learning strategy (MAML), wherein the model independent meta-learning strategy is used to initialize parameters, and is reconstructed based on RNN and LSTM, and long-term semantic processing is performed by means of a memory management mechanism and selective reading, thereby further promoting the solution of sequence problem and improving the translation performance. Particularly, aiming at the problems of data sparseness and undersize generated dictionaries in the linguistic data of the small languages, the problem of semantic information deficiency in the translation process and the like, the Mongolian Chinese machine translation system can be further improved, and better translation performance is realized.

In order to achieve the purpose, the invention adopts the technical scheme that:

a Mongolian Chinese machine translation method based on a model independent element learning strategy and a differentiable neural machine comprises the following steps:

step 1, segmenting Chinese, then constructing a Mongolian Chinese bilingual dictionary, and acquiring a vector matrix of the Mongolian Chinese bilingual words;

step 2, initializing a local optimal task parameter by using an MAML method, namely a model initialization parameter in Mongolian translation;

and 3, building a Mongolian Chinese translation model by adopting a differentiable neural machine based on the model initialization parameters.

In the step 1, a Chinese word segmentation method is used, an accurate mode is adopted to perform word segmentation of Chinese linguistic data, a sentence to be processed is subjected to keyword extraction, part-of-speech tagging processing is performed, stop words are loaded, and word segmentation processing is achieved.

In the step 1, after word segmentation is completed, a fast _ align is adopted to process a Mongolian Chinese material, and firstly, a Mongolian Chinese bilingual dictionary is constructed, wherein the process is as follows:

1) merging the Mongolian materials, merging source language sentences and target language sentences in each line, and separating the source language sentences and the target language sentences by symbols with leading spaces and trailing spaces;

2) carrying out Mongolian Chinese bilingual alignment operation by using a fast _ align tool;

3) and constructing a Mongolian Chinese bilingual dictionary by using the aligned Mongolian Chinese bilingual material.

After the Mongolian bilingual dictionary is built, generating Mongolian bilingual word vectors, namely word embedding vectors, by adopting a skip-gram model in a fast-text tool, wherein the process comprises the following steps:

1) processing a Mongolian bilingual dictionary at an input layer by using a skip-gram model, predicting front and rear c words by using words at the current position to obtain long-term information, wherein the prediction result is the probability of reasoning words in c windows of a context according to the current words and is expressed as

The word representing the current position t,

c successive words before and after the t position;

2) adding the word vectors for each position by aggregation, i.e.

Wherein T represents the position of the central word, C represents C words before and after the current word, namely the sizes of front and back windows;

3) a dictionary of the Huffman Tree coding output layer is adopted, and the Huffman Tree coding output layer is distributed from the root of the Tree to the leaf nodes according to the frequency;

4) logarithm of the result of adding word vectors

w refers to the current location word;

5) calculating the partial derivative of l, gradually updating the weight by using a gradient descent algorithm, and training the obtained product

Namely, the word vector matrix, correspondingly integrating the vectors of all the words, namely, sorting and collecting the context vector of each word, and finally obtaining the Mongolian bilingual word vector matrix.

The specific steps of the step 2 are as follows:

1) performing initial training by means of partial high-resource language tasks, and recording task distribution as p (tau);

2) initializing a learning rate alpha, beta of gradient descent;

3) randomly initializing a parameter theta according to previous experimental records or experiences, and sampling tau_iDenotes the task numbered i, τ_iBelongs to p (tau), and p (tau) is the total task distribution;

4) for each task_iCalculating the gradient thereof

Wherein

To loss ofFunction(s)

Calculating a sign for the gradient;

5) performing a gradient update to obtain

θ'_iIs the new parameter after the gradient update;

6) each task τ in p (τ)_iAfter the execution is finished, performing a second gradient update, that is, a final gradient update, wherein the formula is as follows:

theta at this time^fNamely the locally optimal task parameters obtained by finally carrying out gradient descent.

In the step 3, a final translation model is obtained through repeated simulation training and fine tuning.

The differentiable neural machine uses vectors to store memory, each row of the memory matrix corresponding to a different memory, and its processor uses an interface vector in_tControlling a write head control and a plurality of read head controls to interact with a memory, wherein 1 row vector of a memory matrix represents 1 group of memories, N rows represent that the memory matrix can hold N groups of memories at most, a differentiable neural machine at each time step receives a read head information stream at the last moment and an external input information stream at the current moment to form an external input information stream of a generalized differentiable neural machine, the generalized differentiable neural machine is processed to be in a hidden state to generate an output vector and an interface vector, the interface vector controls the read head to interact with an external storage matrix through a read-write mechanism to generate write information at the current moment, the matrix is updated to obtain read information at the current moment, and the read information and the output vector are linearly combined to generate a final output vector ou at the current moment_tWherein the memory is composed of memories, and the memory form of the memory is a memory matrix.

The processor consists of a plurality of neural networks and is responsible for interacting with input and output, wherein the input in_tIs formed by a read vector r and an input vector x_tThe resulting single controller is connected to the controller,i.e. the input vector of the processor,

wherein

Representing a read vector set in the memory matrix at the last moment, and d represents the group number of the set;

and using the obtained vector to write and read, and performing read-write operation on the memory so as to update the content of the memory, wherein the write operation is as follows:

M^f[i,j]＝M[i,j](1-w^w[i]era[i])+w^w[i]val[i]

that is, an erase rewrite operation is performed on the matrix in the memory, where M [ i, j [ ]]Representing the distribution of j dimensions over the current row i in memory, w^wRepresents the write weight, era [ i ]]Representing an erase vector, i.e. performing an erase operation on the j dimension on the current row, val [ i ]]Representing the write vector, i.e. the addition to the erase location;

the read vector r is defined as

Wherein, M [ i ], []Representing the memory matrix in the memory, i is the location information, here representing the ith row of the memory,. representing all vector dimensions at the current location, N represents the distribution row in the memory, w^r[i]Representing the read weight on row i of the memory;

based on the addressing of the content and the dynamic memory allocation, the location of the write in the memory is determined, and based on the addressing of the content and the time-link matrix, where to read is determined, the formula is as follows:

c (M, k, beta) i defines the normalized probability distribution on the memory position in the memory, namely the judgment of position accuracy, wherein D is a cosine similarity function, key is a search key value, beta is a focusing parameter and represents the strength of the key, namely the suitable degree of the current position, and j is the distribution of all dimensions in the ith row of the memory and represents the dimension from 1 to W;

after the read-write operation of the memory is completed, the processor obtains two vectors epsilon_tAnd v_t，(v_t,ε_t)＝NN([in₁；...；in_t]；θ^f) And NN denotes a processor.

The differentiable neural machine adopts an auxiliary matrix to record a previous word sequence or a semantic sequence to ensure that a reading and writing sequence is correct, the auxiliary matrix comprises two parts, one part is a using vector and the other part is a time link matrix, the using vector records a used position until now, and the time link matrix records a sequence of writing positions, so that correct storage and reading of related semantic information are ensured, a used matrix position is identified, covering is prevented, semantic accuracy is ensured, and a prediction result finally output by the differentiable neural machine

Wherein w_rIs the read weight of the output and,

is a read vector, i.e. a C-dimensional vector on the current line t.

The differentiable neural machine determines the final output prediction result ou by the trainable processor at each time step t based on the information flow at the time t-1 and the prediction information of the two parts after the information flow is exchanged with the memory_tThe final output prediction result ou is obtained_tThe Mongolian Chinese machine translation model (corresponding to a decoder part) is introduced, and translation is carried out by the following formula:

where S is the final sentence generated by the translation, ou_tIs the output characteristic of the statement at the time t, namely the prediction result of the final output obtained by the differentiable neural machine,

is a network-related parameter, s_tThe term is generated at time t, and the term expression satisfying the maximum probability of the semantic feature, that is, the optimal interpretation of the translated term is obtained.

Compared with the prior art, the invention has the beneficial effects that:

1. the Linux system is used for processing the Mongolian Chinese bilingual corpus in a GPU working mode, so that the speed is improved by about one time, the problem is solved through the advantage of the architecture, meanwhile, the quality of the whole system is improved by setting a special translation network structure and evaluating a translation algorithm, and the machine translation quality is further improved.

2. The order of the semantic information can be stored by means of a differentiable neural machine, and the assignment is updated in real time, so that the accuracy in the processing process is ensured. The semantic information is enriched through the processor and the memory, and the translation effect is improved.

3. The initial parameter is optimized by means of the MAML method, a local optimal parameter value can be obtained, a good starting point is provided for a downstream task, namely a translation process, and the translation effect is further improved.

Drawings

FIG. 1 is a schematic diagram of the word segmentation process of the present invention.

Fig. 2 is a schematic diagram of the overall flow principle of MAML.

FIG. 3 is a schematic diagram of an implementation of MAML, namely a Model-analytical Meta-learning (MAML) method.

Fig. 4 is a schematic diagram of the detailed structure of DNC.

FIG. 5 is a schematic diagram of a processor obtaining a final output vector.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

The invention discloses a Mongolian Chinese machine translation method based on a model irrelevant element learning strategy and a differentiable neural machine. And then, by means of a memory management mechanism of a differentiable neural machine, accurate fusion of long-term semantic information is realized through dynamic memory allocation, and the relation of related semantics is accurately found out so as to enhance the translation effect. And finally, verifying and solving the model, and evaluating the translation effect by means of the cross entropy and the BLEU value.

The steps of the present invention are explained in detail below, including:

step 1, pretreatment process: the method comprises the steps of segmenting Chinese, then constructing a Mongolian Chinese bilingual dictionary, and obtaining a vector matrix of the Mongolian Chinese bilingual words.

Specifically, a Chinese word segmentation method of Chinese ending is utilized, a precise mode is adopted for segmenting words of Chinese corpus, referring to fig. 1, a sentence to be processed is extracted with keywords, part of speech tagging is carried out, in order to reduce the influence of punctuation and functional words, such as words like 'true' and 'false', filtering is carried out by loading a stop-word table, the influence of punctuation and functional words is filtered, and relevant semantic information is enhanced.

After word segmentation is finished, processing the Mongolian Chinese material by adopting fast _ align, firstly constructing a Mongolian Chinese bilingual dictionary, and performing the following process:

1) carrying out merging processing on Mongolian Chinese materials, wherein each line is merged by a source language sentence and a target language translation thereof and is separated by a symbol with a leading space and a trailing space (| |);

After the Mongolian Chinese bilingual dictionary is constructed, generating Mongolian Chinese bilingual word vectors, namely word embedding vectors, by means of a fast-text tool. The word vector generation is performed by using the skip-gram model provided by the word vector generation method. The specific process is as follows:

1) processing the Mongolian Chinese bilingual dictionary at an input layer by using a skip-gram model, predicting front and rear c words by using the words at the current position to obtain long-term information, and predictingThe result, i.e., the probability of a word within c windows of the context being inferred from the current word, is expressed as

The word representing the current position t,

c successive words before and after the t position;

2) adding the word vectors for each position by aggregation, i.e.

4) logarithm of the result of adding word vectors

w refers to the current location word;

Namely, the corresponding word vector matrix, correspondingly integrating the vectors of all the words, namely, sorting and collecting the context vector of each word, and finally obtaining the Mongolian bilingual word vector matrix.

Next, the next operation is performed by means of a model independent learning strategy (MAML) and a differentiable neural machine (DNC), and specific operation introduction is respectively performed on the two.

And 2, initializing a local optimal task parameter by using the MAML method, namely a model initialization parameter in Mongolian translation.

In the face of low-resource languages such as Mongolian, a general translation strategy method independent of an original task can be obtained by means of gradient descent training of part of high-resource languages through MAML, and then a low-resource language system is subjected to repeated simulation training and fine tuning according to the learning method, namely initial local optimal task parameters, so that a proper translation model is finally obtained. Referring to fig. 2 and 3, the MAML method is embodied as follows:

2) initializing a learning rate alpha, beta of gradient descent;

4) for each task_iCalculating the gradient thereof

Wherein

In order to be a function of the loss,

calculating a sign for the gradient;

5) performing a gradient update to obtain

θ'_iIs the new parameter after the gradient update;

Thus, by means of the MAML method, one is obtainedInitial optimum parameter θ^fThe subsequent translation of the low-resource language pair, namely the Mongolian Chinese translation task, can be performed with a better initial parameter theta^fAnd starting training uniformly.

Initial parameter θ obtained by gradient descent through MAML^fThe method can be well and quickly adapted to a new task, namely a Mongolian translation task.

And 3, building a translation model mentioned in the step 2 by adopting a differentiable neural machine based on the model initialization parameters. And finally, obtaining a final translation model through repeated simulation training and Fine tuning, and obtaining the final translation model through Fine-Tune (Fine tuning), so that the quality of Mongolian translation is further improved, and the problems of lack of low-resource corpus, insufficient semantic information, poor long sentence translation and the like are solved.

As the name suggests, the differentiable Neural machine (DNC) is a differentiable Neural computers (Neural machine). Differentiability is important in machine learning, where the computation in a computer is absolute, either 0 or 1, and the computer operates in either logic or integers. However, most neural networks and machine learning use real numbers more and use smoother curves, so that the training is easier, the real situation is closer, and the accuracy of data can be ensured. The method can meet the requirement slightly, and can slowly approach to an optimal value and more approach to a real situation to realize a better result.

A differentiable neural machine (DNC) is a neural machine that is made up of a neural network processor in combination with dynamic memory. The hybrid neural machine has the advantages that the neural network can learn from data, and can store learned knowledge, namely complex structured data. The dynamic memory can be selectively written in and read out, the memory content is allowed to be modified iteratively, the memory range is enlarged, and the defect that the neural network cannot store data for a long time is overcome.

The specific structure of the differentiable neural machine (DNC) is shown in fig. 4. Which uses vectors to store Memory, each row of a Memory matrix (Memory matrix) corresponding to a different Memory, and a processor using an interface vector in_tControlling a writeHead control and multiple reading heads (each reading head is formed by linear combination of two addressing mechanisms, the number of reading heads is not constrained in structural design) control and memory interaction, 1 row vector of the memory matrix represents 1 group of memory, N rows represent that the memory matrix can hold N groups of memory at most, the differentiable neural machine receives the information flow of the reading head at the last moment and the external input information flow at the current moment to form the external input information flow of the generalized differentiable neural machine (namely the traditional LSTM corresponds to the external input inputs at each step), the generalized differentiable neural machine is processed to be in a hidden state, the hidden state generates an output vector and an interface vector, the interface vector controls the reading head, and interacting with an external storage matrix through a read-write mechanism to generate write information at the current moment, updating the matrix to obtain read information at the current moment, and linearly combining the read information and the output vector to generate a final output vector ou at the current moment._tWherein the memory is composed of memories, and the memory form of the memory is a memory matrix.

Specifically, referring to FIG. 5, the processor consists of several neural networks responsible for interacting with inputs and outputs, where the input in_tIs formed by a read vector r and an input vector x_tThe resulting single controller, i.e. the processor input vector,

wherein

based on the obtained vector, the memory can be updated by performing a read/write operation, here a write operation followed by a read operation, on the memory, wherein the write operation is as follows:

M^f[i,j]＝M[i,j](1-w^w[i]era[i])+w^w[i]val[i]

that is, an erase rewrite operation is performed on the matrix in the memory, where M [ i, j [ ]]Representing the distribution of j dimensions over the current row i in memory, w^wRepresents the write weight, era [ i ]]Representing an erase vector, i.e. performing an erase operation on the j dimension on the current row, val [ i ]]Representing write vectors, i.e.Adding at the erasing position;

the read vector r is defined as

the dynamic memory (the memory in the differentiable neural machine, generally called as dynamic memory, performs the writing and erasing of the memory by means of the reading and writing operation) can read the relevant information more accurately through the memory addressing, and obtains higher accuracy. The formula is as follows:

c (M, k, beta) i defines the normalized probability distribution on the memory position in the memory, namely the judgment of position accuracy, wherein D is a cosine similarity function, key is a search key value, beta is a focusing parameter and represents the strength of the key, namely the suitable degree of the current position, and j is the distribution set of all dimensions in the ith row of the memory and represents the dimensions from 1 to W;

the differentiable neural machine adopts an auxiliary matrix to record the previous word sequence or semantic sequence to ensure that the reading and writing sequence is correct, the auxiliary matrix belongs to a part of the differentiable neural machine, and is mainly used for recording the previous word sequence or semantic sequence, and is similar to a linked list to ensure that the reading and writing sequence is correct. The method comprises two parts, wherein one part is a using vector, the other part is a time link matrix, the using vector records the used position up to now, the time link matrix records the sequence of writing the position, thereby ensuring the correct storage and reading of the related semantic information, simultaneously identifying the used matrix position, preventing the occurrence of coverage and ensuring the semantic accuracy, when the reading operation is carried out, the time link matrix is used for interconnecting the correct sequence, the DNC can recall the events of the last step under a certain instant memory and the last step of the last step, and so on, namely, the events required to be done by the time link matrix can be traversed, a linked list is formed according to the front and back sequence, and the output of the word sequence is correct.

After the read-write operation of the memory is completed, the processor obtains two vectors epsilon_tAnd v_t，(v_t,ε_t)＝NN([in₁；...；in_t]；θ^f) And NN denotes a processor. Epsilon_tGenerating output reading head Memory r by interaction of Memory Addressing and Memory matrix_t，v_tLinearly combined with the read head to a final output vector ou_t，

I.e. the prediction of the final output of the neurone, where w_rIs the read weight of the output and,

is a read vector, i.e. a C-dimensional vector on the current line t.

is a network-related parameter, s_tIs a sentence generated at time t, the expression of the sentence satisfying the maximum probability of semantic features is obtained,i.e. the optimal interpretation of the translated sentence.

And evaluating the obtained sentence characteristic result by using cross entropy. Given an input sequence x, a network output sequence y and a target sequence z, converting the input sequence, the network output sequence y and the target sequence z into corresponding two-dimensional vector distribution, wherein the target statement length is T, and then generating the following cross entropy loss function:

the input vector has a size dimension of 92 and the target vector is 90. This network has 90 output units, corresponding to 9 individual softmax distributions over 10 dimensions. Thus, the log probability of correctly predicting the entire target triplet translates into the sum of every 9 individual log probabilities for the correct classification.

Wherein, G (t) is a standard function, when the generating stage is at the time t, the function value is 1, otherwise, the function value is 0, dit is a range of dimension, and Pr is a conditional probability.

And analyzing and verifying the cross entropy to obtain the cost of the translation result, and when the obtained cost is small enough, the final evaluation of the translation effect can be further performed by using a common BLEU algorithm.

BLEU scoring algorithm

The BLEU algorithm is a reference for evaluating a machine translation technology at the present stage, and the basic idea of the algorithm is to compare a translation to be evaluated with a provided reference translation and judge the accuracy of the translation. The calculation of the BLEU algorithm is shown below, where BP is a piecewise function

Wherein c represents the length of the translation to be evaluated, r represents the length of the reference translation, and the piecewise function BP is a length penalty factor which is related to the size relationship between c and r.

The steps of the invention can be described as follows:

1：loop

2: selecting a Mongolian Chinese bilingual corpus, and processing the corpus by utilizing a jieba word segmentation method and a Fast _ Alighn method to obtain a Mongolian Chinese bilingual dictionary.

3: and further generating a word vector by using Fast-Text and skip-gram models.

4: by using an MAML method and a differentiable neural machine DNC, obtaining a locally optimal task parameter by using the MAML, and then performing output processing by using the DNC;

5: the following output functions are used for the operation of the output characteristics:

6: evaluating by combining cross entropy and a BLEU evaluation translation quality algorithm:

7：end loop。

Claims

1. a Mongolian Chinese machine translation method based on a model independent element learning strategy and a differentiable neural machine is characterized by comprising the following steps:

step 2, initializing a local optimal task parameter by using a model independent element learning strategy method, namely a model initialization parameter in Mongolian translation;

2. The Mongolian Chinese machine translation method based on the model independent element learning strategy and the differentiable neural machine as claimed in claim 1, wherein in the step 1, a Chinese segmentation method is used, a precise mode is adopted for segmentation of Chinese corpus, keywords are extracted from a sentence to be processed, part-of-speech tagging processing is performed, stop words are loaded, and segmentation processing is achieved.

3. The Mongolian Chinese machine translation method based on the model independent meta learning strategy and the differentiable neural machine according to claim 1, wherein in the step 1, after word segmentation is finished, a fast _ align is adopted to process Mongolian Chinese material, a Mongolian Chinese bilingual dictionary is firstly constructed, and the process is as follows:

4. The Mongolian Chinese machine translation method based on the model independent meta learning strategy and the differentiable neural machine as claimed in claim 1 or 3, wherein after the Mongolian bilingual dictionary is constructed, a skip-gram model in a fast-text tool is adopted to generate Mongolian bilingual word vectors, namely word embedding vectors, and the process is as follows:

w_tWord representing current position t, w_t±cC successive words before and after the t position;

2) adding the word vectors for each position by aggregation, i.e.

4) logarithm of the result of adding word vectors

w refers to the current location word;

5) calculating partial derivatives of l, gradually updating the weight by using a gradient descent algorithm, and training the w after the training is finished_t±cNamely, the word vector matrix, correspondingly integrating the vectors of all the words, namely, sorting and collecting the context vector of each word, and finally obtaining the Mongolian bilingual word vector matrix.

5. The Mongolian Chinese machine translation method based on the model independent meta-learning strategy and the differentiable neural machine according to claim 1, wherein the specific steps of the step 2 are as follows:

2) initializing a learning rate alpha, beta of gradient descent;

4) for each task_iCalculating the gradient thereof

Wherein

In order to be a function of the loss,

calculating a sign for the gradient;

5) performing a gradient update to obtain

θ′_iIs the new parameter after the gradient update;

6. The method for Mongolian Chinese machine translation based on the model-independent meta-learning strategy and the differentiable neural machine according to claim 1, wherein in the step 3, a final translation model is obtained by repeating simulation training and fine tuning.

7. The method of claim 1 or 6, wherein the differentiable neural machine uses vectors to store memory, each row of the memory matrix corresponds to a different memory, and the processor uses an interface vector in_tControlling a write head control and a plurality of read head controls to interact with a memory, wherein 1 row vector of a memory matrix represents 1 group of memories, N rows represent that the memory matrix can hold N groups of memories at most, a differentiable neural machine at each time step receives a read head information stream at the last moment and an external input information stream at the current moment to form an external input information stream of a generalized differentiable neural machine, the generalized differentiable neural machine is processed to a hidden state to generate an output vector and an interface vector, the interface vector controls the read head, interacts with an external storage matrix through a read-write mechanism to generate write information at the current moment, and updates the matrixObtaining the read information of the current time, and linearly combining the read information and the output vector to generate the final output vector ou of the current time_tWherein the memory is composed of memories, and the memory form of the memory is a memory matrix.

8. The Mongolian Chinese machine translation method based on the model-independent meta-learning strategy and the differentiable neural machine as claimed in claim 7, wherein the processor is composed of a plurality of neural networks and is responsible for interacting with input and output, wherein the input in is_tIs formed by a read vector r and an input vector x_tThe resulting single controller, i.e. the processor input vector,

wherein

M^f[i,j]＝M[i,j](1-w^w[i]era[i])+w^w[i]val[i]

the read vector r is defined as:

9. The Mongolian Chinese machine translation method based on the model independent meta learning strategy and the differentiable neural machine according to claim 8, wherein the differentiable neural machine adopts an auxiliary matrix to record a previous word sequence or semantic sequence, so as to ensure correct reading and writing sequence, the auxiliary matrix comprises two parts, one part is a using vector, the other part is a time link matrix, the using vector records a used position so far, and the time link matrix records a sequence of writing positions, so as to ensure correct storage and reading of related semantic information, simultaneously identify the used matrix position, prevent coverage, ensure semantic accuracy, and predict a result finally output by the differentiable neural machine

Wherein w_rIs the read weight of the output and,

is a read vectorI.e. a C-dimensional vector on the current line t.

10. The method of claim 9, wherein the differentiable neural machine determines at each time step t a final output prediction result ou based on linear combination of the two prediction information after the information flow at time t-1 is exchanged with the memory based on the information flow at time t-1_tThe final output prediction result ou is obtained_tIntroducing a Mongolian Chinese machine translation model, and translating by the following formula: