CN112364668A - Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine - Google Patents

Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine Download PDF

Info

Publication number
CN112364668A
CN112364668A CN202011250507.4A CN202011250507A CN112364668A CN 112364668 A CN112364668 A CN 112364668A CN 202011250507 A CN202011250507 A CN 202011250507A CN 112364668 A CN112364668 A CN 112364668A
Authority
CN
China
Prior art keywords
memory
vector
mongolian
matrix
translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011250507.4A
Other languages
Chinese (zh)
Inventor
苏依拉
赵旭
薛媛
卞乐乐
范婷婷
仁庆道尔吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN202011250507.4A priority Critical patent/CN112364668A/en
Publication of CN112364668A publication Critical patent/CN112364668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

A Mongolian Chinese machine translation method based on a model irrelevant element learning strategy and a differentiable neural machine divides Chinese words, then constructs a Mongolian Chinese bilingual dictionary and obtains a Mongolian Chinese bilingual word vector matrix; initializing a local optimal task parameter by using an MAML method, namely a model initialization parameter in Mongolian translation; and building a Mongolian Chinese translation model by adopting a differentiable neural machine based on the model initialization parameters. The invention uses a differentiable neural machine and a model independent element learning strategy, wherein the model independent element learning strategy is used for initializing parameters, is reconstructed based on RNN and LSTM, and processes long-term semantics by means of a memory management mechanism and selective reading, thereby further promoting the solution of sequence problems and improving the translation performance. Particularly, aiming at the problems of data sparseness and undersize generated dictionaries in the linguistic data of the small languages, the problem of semantic information deficiency in the translation process and the like, the Mongolian Chinese machine translation system can be further improved, and better translation performance is realized.

Description

Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine
Technical Field
The invention belongs to the technical field of deep Learning machine translation, and particularly relates to a Mongolian machine translation method based on a Model-independent Meta-Learning (MAML) strategy and a differentiable neural machine (DNC).
Background
The current deep learning field is continuously developed, the natural language processing field is greatly developed, and especially under the artificial intelligence era, machine translation plays an increasingly important role in the development of the internet field.
However, the problems revealed in the field of natural language processing are still existed, and although many scientists continuously strive to improve the quality of translation, the problems are inevitable, such as ambiguous word processing, judgment and representation of unknown words, lack of semantic information, errors of bilingual corresponding word vectors, lack of corpus and the like, which greatly affect the quality of machine translation. At present, many famous universities and research institutes abroad try to solve the problems, and various schemes are proposed and implemented, but at present, most of the solutions solve the problems in one aspect rather than the problems on the whole. Under the continuous promotion of deep learning, the machine translation level has long-term progress compared with the prior machine translation based on statistics, but the existing problems of ambiguity, unknown words, semantic lack and the like do not have good solutions.
At present, machine translation based on attention mechanism has become a mainstream model and achieves the best performance at present, but fundamentally, the machine translation does not solve the problem deeply and can only relieve the problem to a certain extent.
Recent research shows that the pre-training model occupies a major position in the current machine translation field, and various pre-training models proposed by various companies and research institutes certainly solve the problems of statistical machine translation in a certain aspect, but no system completely solving the translation problems appears, so that on the premise of rough translation, how to make fine translation is the current work focus. Particularly, careful processing is needed for the lack of semantic information and word vectors in low-resource languages such as Mongolian.
Disclosure of Invention
In order to overcome the drawbacks of the prior art and further improve the performance of machine translation, the present invention provides a method for montmorillohman machine translation based on a model independent meta-learning strategy and a differentiable neural machine, which uses a differentiable neural machine (DNC) and a model independent meta-learning strategy (MAML), wherein the model independent meta-learning strategy is used to initialize parameters, and is reconstructed based on RNN and LSTM, and long-term semantic processing is performed by means of a memory management mechanism and selective reading, thereby further promoting the solution of sequence problem and improving the translation performance. Particularly, aiming at the problems of data sparseness and undersize generated dictionaries in the linguistic data of the small languages, the problem of semantic information deficiency in the translation process and the like, the Mongolian Chinese machine translation system can be further improved, and better translation performance is realized.
In order to achieve the purpose, the invention adopts the technical scheme that:
a Mongolian Chinese machine translation method based on a model independent element learning strategy and a differentiable neural machine comprises the following steps:
step 1, segmenting Chinese, then constructing a Mongolian Chinese bilingual dictionary, and acquiring a vector matrix of the Mongolian Chinese bilingual words;
step 2, initializing a local optimal task parameter by using an MAML method, namely a model initialization parameter in Mongolian translation;
and 3, building a Mongolian Chinese translation model by adopting a differentiable neural machine based on the model initialization parameters.
In the step 1, a Chinese word segmentation method is used, an accurate mode is adopted to perform word segmentation of Chinese linguistic data, a sentence to be processed is subjected to keyword extraction, part-of-speech tagging processing is performed, stop words are loaded, and word segmentation processing is achieved.
In the step 1, after word segmentation is completed, a fast _ align is adopted to process a Mongolian Chinese material, and firstly, a Mongolian Chinese bilingual dictionary is constructed, wherein the process is as follows:
1) merging the Mongolian materials, merging source language sentences and target language sentences in each line, and separating the source language sentences and the target language sentences by symbols with leading spaces and trailing spaces;
2) carrying out Mongolian Chinese bilingual alignment operation by using a fast _ align tool;
3) and constructing a Mongolian Chinese bilingual dictionary by using the aligned Mongolian Chinese bilingual material.
After the Mongolian bilingual dictionary is built, generating Mongolian bilingual word vectors, namely word embedding vectors, by adopting a skip-gram model in a fast-text tool, wherein the process comprises the following steps:
1) processing a Mongolian bilingual dictionary at an input layer by using a skip-gram model, predicting front and rear c words by using words at the current position to obtain long-term information, wherein the prediction result is the probability of reasoning words in c windows of a context according to the current words and is expressed as
Figure BDA0002771437620000033
Figure BDA0002771437620000034
The word representing the current position t,
Figure BDA0002771437620000035
c successive words before and after the t position;
2) adding the word vectors for each position by aggregation, i.e.
Figure BDA0002771437620000031
Wherein T represents the position of the central word, C represents C words before and after the current word, namely the sizes of front and back windows;
3) a dictionary of the Huffman Tree coding output layer is adopted, and the Huffman Tree coding output layer is distributed from the root of the Tree to the leaf nodes according to the frequency;
4) logarithm of the result of adding word vectors
Figure BDA0002771437620000032
w refers to the current location word;
5) calculating the partial derivative of l, gradually updating the weight by using a gradient descent algorithm, and training the obtained product
Figure BDA0002771437620000036
Namely, the word vector matrix, correspondingly integrating the vectors of all the words, namely, sorting and collecting the context vector of each word, and finally obtaining the Mongolian bilingual word vector matrix.
The specific steps of the step 2 are as follows:
1) performing initial training by means of partial high-resource language tasks, and recording task distribution as p (tau);
2) initializing a learning rate alpha, beta of gradient descent;
3) randomly initializing a parameter theta according to previous experimental records or experiences, and sampling tauiDenotes the task numbered i, τiBelongs to p (tau), and p (tau) is the total task distribution;
4) for each taskiCalculating the gradient thereof
Figure BDA0002771437620000041
Wherein
Figure BDA0002771437620000042
To loss ofFunction(s)
Figure BDA0002771437620000043
Calculating a sign for the gradient;
5) performing a gradient update to obtain
Figure BDA0002771437620000044
θ'iIs the new parameter after the gradient update;
6) each task τ in p (τ)iAfter the execution is finished, performing a second gradient update, that is, a final gradient update, wherein the formula is as follows:
Figure BDA0002771437620000045
theta at this timefNamely the locally optimal task parameters obtained by finally carrying out gradient descent.
In the step 3, a final translation model is obtained through repeated simulation training and fine tuning.
The differentiable neural machine uses vectors to store memory, each row of the memory matrix corresponding to a different memory, and its processor uses an interface vector intControlling a write head control and a plurality of read head controls to interact with a memory, wherein 1 row vector of a memory matrix represents 1 group of memories, N rows represent that the memory matrix can hold N groups of memories at most, a differentiable neural machine at each time step receives a read head information stream at the last moment and an external input information stream at the current moment to form an external input information stream of a generalized differentiable neural machine, the generalized differentiable neural machine is processed to be in a hidden state to generate an output vector and an interface vector, the interface vector controls the read head to interact with an external storage matrix through a read-write mechanism to generate write information at the current moment, the matrix is updated to obtain read information at the current moment, and the read information and the output vector are linearly combined to generate a final output vector ou at the current momenttWherein the memory is composed of memories, and the memory form of the memory is a memory matrix.
The processor consists of a plurality of neural networks and is responsible for interacting with input and output, wherein the input intIs formed by a read vector r and an input vector xtThe resulting single controller is connected to the controller,i.e. the input vector of the processor,
Figure BDA0002771437620000046
wherein
Figure BDA0002771437620000047
Representing a read vector set in the memory matrix at the last moment, and d represents the group number of the set;
and using the obtained vector to write and read, and performing read-write operation on the memory so as to update the content of the memory, wherein the write operation is as follows:
Mf[i,j]=M[i,j](1-ww[i]era[i])+ww[i]val[i]
that is, an erase rewrite operation is performed on the matrix in the memory, where M [ i, j [ ]]Representing the distribution of j dimensions over the current row i in memory, wwRepresents the write weight, era [ i ]]Representing an erase vector, i.e. performing an erase operation on the j dimension on the current row, val [ i ]]Representing the write vector, i.e. the addition to the erase location;
the read vector r is defined as
Figure BDA0002771437620000051
Wherein, M [ i ], []Representing the memory matrix in the memory, i is the location information, here representing the ith row of the memory,. representing all vector dimensions at the current location, N represents the distribution row in the memory, wr[i]Representing the read weight on row i of the memory;
based on the addressing of the content and the dynamic memory allocation, the location of the write in the memory is determined, and based on the addressing of the content and the time-link matrix, where to read is determined, the formula is as follows:
Figure BDA0002771437620000052
c (M, k, beta) i defines the normalized probability distribution on the memory position in the memory, namely the judgment of position accuracy, wherein D is a cosine similarity function, key is a search key value, beta is a focusing parameter and represents the strength of the key, namely the suitable degree of the current position, and j is the distribution of all dimensions in the ith row of the memory and represents the dimension from 1 to W;
after the read-write operation of the memory is completed, the processor obtains two vectors epsilontAnd vt,(vtt)=NN([in1;...;int];θf) And NN denotes a processor.
The differentiable neural machine adopts an auxiliary matrix to record a previous word sequence or a semantic sequence to ensure that a reading and writing sequence is correct, the auxiliary matrix comprises two parts, one part is a using vector and the other part is a time link matrix, the using vector records a used position until now, and the time link matrix records a sequence of writing positions, so that correct storage and reading of related semantic information are ensured, a used matrix position is identified, covering is prevented, semantic accuracy is ensured, and a prediction result finally output by the differentiable neural machine
Figure BDA0002771437620000053
Wherein wrIs the read weight of the output and,
Figure BDA0002771437620000054
is a read vector, i.e. a C-dimensional vector on the current line t.
The differentiable neural machine determines the final output prediction result ou by the trainable processor at each time step t based on the information flow at the time t-1 and the prediction information of the two parts after the information flow is exchanged with the memorytThe final output prediction result ou is obtainedtThe Mongolian Chinese machine translation model (corresponding to a decoder part) is introduced, and translation is carried out by the following formula:
Figure BDA0002771437620000061
where S is the final sentence generated by the translation, outIs the output characteristic of the statement at the time t, namely the prediction result of the final output obtained by the differentiable neural machine,
Figure BDA0002771437620000062
is a network-related parameter, stThe term is generated at time t, and the term expression satisfying the maximum probability of the semantic feature, that is, the optimal interpretation of the translated term is obtained.
Compared with the prior art, the invention has the beneficial effects that:
1. the Linux system is used for processing the Mongolian Chinese bilingual corpus in a GPU working mode, so that the speed is improved by about one time, the problem is solved through the advantage of the architecture, meanwhile, the quality of the whole system is improved by setting a special translation network structure and evaluating a translation algorithm, and the machine translation quality is further improved.
2. The order of the semantic information can be stored by means of a differentiable neural machine, and the assignment is updated in real time, so that the accuracy in the processing process is ensured. The semantic information is enriched through the processor and the memory, and the translation effect is improved.
3. The initial parameter is optimized by means of the MAML method, a local optimal parameter value can be obtained, a good starting point is provided for a downstream task, namely a translation process, and the translation effect is further improved.
Drawings
FIG. 1 is a schematic diagram of the word segmentation process of the present invention.
Fig. 2 is a schematic diagram of the overall flow principle of MAML.
FIG. 3 is a schematic diagram of an implementation of MAML, namely a Model-analytical Meta-learning (MAML) method.
Fig. 4 is a schematic diagram of the detailed structure of DNC.
FIG. 5 is a schematic diagram of a processor obtaining a final output vector.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
The invention discloses a Mongolian Chinese machine translation method based on a model irrelevant element learning strategy and a differentiable neural machine. And then, by means of a memory management mechanism of a differentiable neural machine, accurate fusion of long-term semantic information is realized through dynamic memory allocation, and the relation of related semantics is accurately found out so as to enhance the translation effect. And finally, verifying and solving the model, and evaluating the translation effect by means of the cross entropy and the BLEU value.
The steps of the present invention are explained in detail below, including:
step 1, pretreatment process: the method comprises the steps of segmenting Chinese, then constructing a Mongolian Chinese bilingual dictionary, and obtaining a vector matrix of the Mongolian Chinese bilingual words.
Specifically, a Chinese word segmentation method of Chinese ending is utilized, a precise mode is adopted for segmenting words of Chinese corpus, referring to fig. 1, a sentence to be processed is extracted with keywords, part of speech tagging is carried out, in order to reduce the influence of punctuation and functional words, such as words like 'true' and 'false', filtering is carried out by loading a stop-word table, the influence of punctuation and functional words is filtered, and relevant semantic information is enhanced.
After word segmentation is finished, processing the Mongolian Chinese material by adopting fast _ align, firstly constructing a Mongolian Chinese bilingual dictionary, and performing the following process:
1) carrying out merging processing on Mongolian Chinese materials, wherein each line is merged by a source language sentence and a target language translation thereof and is separated by a symbol with a leading space and a trailing space (| |);
2) carrying out Mongolian Chinese bilingual alignment operation by using a fast _ align tool;
3) and constructing a Mongolian Chinese bilingual dictionary by using the aligned Mongolian Chinese bilingual material.
After the Mongolian Chinese bilingual dictionary is constructed, generating Mongolian Chinese bilingual word vectors, namely word embedding vectors, by means of a fast-text tool. The word vector generation is performed by using the skip-gram model provided by the word vector generation method. The specific process is as follows:
1) processing the Mongolian Chinese bilingual dictionary at an input layer by using a skip-gram model, predicting front and rear c words by using the words at the current position to obtain long-term information, and predictingThe result, i.e., the probability of a word within c windows of the context being inferred from the current word, is expressed as
Figure BDA0002771437620000083
Figure BDA0002771437620000084
The word representing the current position t,
Figure BDA0002771437620000085
c successive words before and after the t position;
2) adding the word vectors for each position by aggregation, i.e.
Figure BDA0002771437620000081
Wherein T represents the position of the central word, C represents C words before and after the current word, namely the sizes of front and back windows;
3) a dictionary of the Huffman Tree coding output layer is adopted, and the Huffman Tree coding output layer is distributed from the root of the Tree to the leaf nodes according to the frequency;
4) logarithm of the result of adding word vectors
Figure BDA0002771437620000082
w refers to the current location word;
5) calculating the partial derivative of l, gradually updating the weight by using a gradient descent algorithm, and training the obtained product
Figure BDA0002771437620000086
Namely, the corresponding word vector matrix, correspondingly integrating the vectors of all the words, namely, sorting and collecting the context vector of each word, and finally obtaining the Mongolian bilingual word vector matrix.
Next, the next operation is performed by means of a model independent learning strategy (MAML) and a differentiable neural machine (DNC), and specific operation introduction is respectively performed on the two.
And 2, initializing a local optimal task parameter by using the MAML method, namely a model initialization parameter in Mongolian translation.
In the face of low-resource languages such as Mongolian, a general translation strategy method independent of an original task can be obtained by means of gradient descent training of part of high-resource languages through MAML, and then a low-resource language system is subjected to repeated simulation training and fine tuning according to the learning method, namely initial local optimal task parameters, so that a proper translation model is finally obtained. Referring to fig. 2 and 3, the MAML method is embodied as follows:
1) performing initial training by means of partial high-resource language tasks, and recording task distribution as p (tau);
2) initializing a learning rate alpha, beta of gradient descent;
3) randomly initializing a parameter theta according to previous experimental records or experiences, and sampling tauiDenotes the task numbered i, τiBelongs to p (tau), and p (tau) is the total task distribution;
4) for each taskiCalculating the gradient thereof
Figure BDA0002771437620000091
Wherein
Figure BDA0002771437620000092
In order to be a function of the loss,
Figure BDA0002771437620000093
calculating a sign for the gradient;
5) performing a gradient update to obtain
Figure BDA0002771437620000094
θ'iIs the new parameter after the gradient update;
6) each task τ in p (τ)iAfter the execution is finished, performing a second gradient update, that is, a final gradient update, wherein the formula is as follows:
Figure BDA0002771437620000095
theta at this timefNamely the locally optimal task parameters obtained by finally carrying out gradient descent.
Thus, by means of the MAML method, one is obtainedInitial optimum parameter θfThe subsequent translation of the low-resource language pair, namely the Mongolian Chinese translation task, can be performed with a better initial parameter thetafAnd starting training uniformly.
Initial parameter θ obtained by gradient descent through MAMLfThe method can be well and quickly adapted to a new task, namely a Mongolian translation task.
And 3, building a translation model mentioned in the step 2 by adopting a differentiable neural machine based on the model initialization parameters. And finally, obtaining a final translation model through repeated simulation training and Fine tuning, and obtaining the final translation model through Fine-Tune (Fine tuning), so that the quality of Mongolian translation is further improved, and the problems of lack of low-resource corpus, insufficient semantic information, poor long sentence translation and the like are solved.
As the name suggests, the differentiable Neural machine (DNC) is a differentiable Neural computers (Neural machine). Differentiability is important in machine learning, where the computation in a computer is absolute, either 0 or 1, and the computer operates in either logic or integers. However, most neural networks and machine learning use real numbers more and use smoother curves, so that the training is easier, the real situation is closer, and the accuracy of data can be ensured. The method can meet the requirement slightly, and can slowly approach to an optimal value and more approach to a real situation to realize a better result.
A differentiable neural machine (DNC) is a neural machine that is made up of a neural network processor in combination with dynamic memory. The hybrid neural machine has the advantages that the neural network can learn from data, and can store learned knowledge, namely complex structured data. The dynamic memory can be selectively written in and read out, the memory content is allowed to be modified iteratively, the memory range is enlarged, and the defect that the neural network cannot store data for a long time is overcome.
The specific structure of the differentiable neural machine (DNC) is shown in fig. 4. Which uses vectors to store Memory, each row of a Memory matrix (Memory matrix) corresponding to a different Memory, and a processor using an interface vector intControlling a writeHead control and multiple reading heads (each reading head is formed by linear combination of two addressing mechanisms, the number of reading heads is not constrained in structural design) control and memory interaction, 1 row vector of the memory matrix represents 1 group of memory, N rows represent that the memory matrix can hold N groups of memory at most, the differentiable neural machine receives the information flow of the reading head at the last moment and the external input information flow at the current moment to form the external input information flow of the generalized differentiable neural machine (namely the traditional LSTM corresponds to the external input inputs at each step), the generalized differentiable neural machine is processed to be in a hidden state, the hidden state generates an output vector and an interface vector, the interface vector controls the reading head, and interacting with an external storage matrix through a read-write mechanism to generate write information at the current moment, updating the matrix to obtain read information at the current moment, and linearly combining the read information and the output vector to generate a final output vector ou at the current moment.tWherein the memory is composed of memories, and the memory form of the memory is a memory matrix.
Specifically, referring to FIG. 5, the processor consists of several neural networks responsible for interacting with inputs and outputs, where the input intIs formed by a read vector r and an input vector xtThe resulting single controller, i.e. the processor input vector,
Figure BDA0002771437620000101
wherein
Figure BDA0002771437620000102
Representing a read vector set in the memory matrix at the last moment, and d represents the group number of the set;
based on the obtained vector, the memory can be updated by performing a read/write operation, here a write operation followed by a read operation, on the memory, wherein the write operation is as follows:
Mf[i,j]=M[i,j](1-ww[i]era[i])+ww[i]val[i]
that is, an erase rewrite operation is performed on the matrix in the memory, where M [ i, j [ ]]Representing the distribution of j dimensions over the current row i in memory, wwRepresents the write weight, era [ i ]]Representing an erase vector, i.e. performing an erase operation on the j dimension on the current row, val [ i ]]Representing write vectors, i.e.Adding at the erasing position;
the read vector r is defined as
Figure BDA0002771437620000103
Wherein, M [ i ], []Representing the memory matrix in the memory, i is the location information, here representing the ith row of the memory,. representing all vector dimensions at the current location, N represents the distribution row in the memory, wr[i]Representing the read weight on row i of the memory;
the dynamic memory (the memory in the differentiable neural machine, generally called as dynamic memory, performs the writing and erasing of the memory by means of the reading and writing operation) can read the relevant information more accurately through the memory addressing, and obtains higher accuracy. The formula is as follows:
Figure BDA0002771437620000111
c (M, k, beta) i defines the normalized probability distribution on the memory position in the memory, namely the judgment of position accuracy, wherein D is a cosine similarity function, key is a search key value, beta is a focusing parameter and represents the strength of the key, namely the suitable degree of the current position, and j is the distribution set of all dimensions in the ith row of the memory and represents the dimensions from 1 to W;
the differentiable neural machine adopts an auxiliary matrix to record the previous word sequence or semantic sequence to ensure that the reading and writing sequence is correct, the auxiliary matrix belongs to a part of the differentiable neural machine, and is mainly used for recording the previous word sequence or semantic sequence, and is similar to a linked list to ensure that the reading and writing sequence is correct. The method comprises two parts, wherein one part is a using vector, the other part is a time link matrix, the using vector records the used position up to now, the time link matrix records the sequence of writing the position, thereby ensuring the correct storage and reading of the related semantic information, simultaneously identifying the used matrix position, preventing the occurrence of coverage and ensuring the semantic accuracy, when the reading operation is carried out, the time link matrix is used for interconnecting the correct sequence, the DNC can recall the events of the last step under a certain instant memory and the last step of the last step, and so on, namely, the events required to be done by the time link matrix can be traversed, a linked list is formed according to the front and back sequence, and the output of the word sequence is correct.
After the read-write operation of the memory is completed, the processor obtains two vectors epsilontAnd vt,(vtt)=NN([in1;...;int];θf) And NN denotes a processor. EpsilontGenerating output reading head Memory r by interaction of Memory Addressing and Memory matrixt,vtLinearly combined with the read head to a final output vector out
Figure BDA0002771437620000112
I.e. the prediction of the final output of the neurone, where wrIs the read weight of the output and,
Figure BDA0002771437620000113
is a read vector, i.e. a C-dimensional vector on the current line t.
The differentiable neural machine determines the final output prediction result ou by the trainable processor at each time step t based on the information flow at the time t-1 and the prediction information of the two parts after the information flow is exchanged with the memorytThe final output prediction result ou is obtainedtThe Mongolian Chinese machine translation model (corresponding to a decoder part) is introduced, and translation is carried out by the following formula:
Figure BDA0002771437620000121
where S is the final sentence generated by the translation, outIs the output characteristic of the statement at the time t, namely the prediction result of the final output obtained by the differentiable neural machine,
Figure BDA0002771437620000124
is a network-related parameter, stIs a sentence generated at time t, the expression of the sentence satisfying the maximum probability of semantic features is obtained,i.e. the optimal interpretation of the translated sentence.
And evaluating the obtained sentence characteristic result by using cross entropy. Given an input sequence x, a network output sequence y and a target sequence z, converting the input sequence, the network output sequence y and the target sequence z into corresponding two-dimensional vector distribution, wherein the target statement length is T, and then generating the following cross entropy loss function:
Figure BDA0002771437620000122
the input vector has a size dimension of 92 and the target vector is 90. This network has 90 output units, corresponding to 9 individual softmax distributions over 10 dimensions. Thus, the log probability of correctly predicting the entire target triplet translates into the sum of every 9 individual log probabilities for the correct classification.
Wherein, G (t) is a standard function, when the generating stage is at the time t, the function value is 1, otherwise, the function value is 0, dit is a range of dimension, and Pr is a conditional probability.
And analyzing and verifying the cross entropy to obtain the cost of the translation result, and when the obtained cost is small enough, the final evaluation of the translation effect can be further performed by using a common BLEU algorithm.
BLEU scoring algorithm
The BLEU algorithm is a reference for evaluating a machine translation technology at the present stage, and the basic idea of the algorithm is to compare a translation to be evaluated with a provided reference translation and judge the accuracy of the translation. The calculation of the BLEU algorithm is shown below, where BP is a piecewise function
Figure BDA0002771437620000123
Figure BDA0002771437620000131
Wherein c represents the length of the translation to be evaluated, r represents the length of the reference translation, and the piecewise function BP is a length penalty factor which is related to the size relationship between c and r.
The steps of the invention can be described as follows:
1:loop
2: selecting a Mongolian Chinese bilingual corpus, and processing the corpus by utilizing a jieba word segmentation method and a Fast _ Alighn method to obtain a Mongolian Chinese bilingual dictionary.
3: and further generating a word vector by using Fast-Text and skip-gram models.
4: by using an MAML method and a differentiable neural machine DNC, obtaining a locally optimal task parameter by using the MAML, and then performing output processing by using the DNC;
5: the following output functions are used for the operation of the output characteristics:
Figure BDA0002771437620000132
6: evaluating by combining cross entropy and a BLEU evaluation translation quality algorithm:
Figure BDA0002771437620000133
Figure BDA0002771437620000134
7:end loop。

Claims (10)

1. a Mongolian Chinese machine translation method based on a model independent element learning strategy and a differentiable neural machine is characterized by comprising the following steps:
step 1, segmenting Chinese, then constructing a Mongolian Chinese bilingual dictionary, and acquiring a vector matrix of the Mongolian Chinese bilingual words;
step 2, initializing a local optimal task parameter by using a model independent element learning strategy method, namely a model initialization parameter in Mongolian translation;
and 3, building a Mongolian Chinese translation model by adopting a differentiable neural machine based on the model initialization parameters.
2. The Mongolian Chinese machine translation method based on the model independent element learning strategy and the differentiable neural machine as claimed in claim 1, wherein in the step 1, a Chinese segmentation method is used, a precise mode is adopted for segmentation of Chinese corpus, keywords are extracted from a sentence to be processed, part-of-speech tagging processing is performed, stop words are loaded, and segmentation processing is achieved.
3. The Mongolian Chinese machine translation method based on the model independent meta learning strategy and the differentiable neural machine according to claim 1, wherein in the step 1, after word segmentation is finished, a fast _ align is adopted to process Mongolian Chinese material, a Mongolian Chinese bilingual dictionary is firstly constructed, and the process is as follows:
1) merging the Mongolian materials, merging source language sentences and target language sentences in each line, and separating the source language sentences and the target language sentences by symbols with leading spaces and trailing spaces;
2) carrying out Mongolian Chinese bilingual alignment operation by using a fast _ align tool;
3) and constructing a Mongolian Chinese bilingual dictionary by using the aligned Mongolian Chinese bilingual material.
4. The Mongolian Chinese machine translation method based on the model independent meta learning strategy and the differentiable neural machine as claimed in claim 1 or 3, wherein after the Mongolian bilingual dictionary is constructed, a skip-gram model in a fast-text tool is adopted to generate Mongolian bilingual word vectors, namely word embedding vectors, and the process is as follows:
1) processing a Mongolian bilingual dictionary at an input layer by using a skip-gram model, predicting front and rear c words by using words at the current position to obtain long-term information, wherein the prediction result is the probability of reasoning words in c windows of a context according to the current words and is expressed as
Figure FDA0002771437610000028
wtWord representing current position t, wt±cC successive words before and after the t position;
2) adding the word vectors for each position by aggregation, i.e.
Figure FDA0002771437610000021
Wherein T represents the position of the central word, C represents C words before and after the current word, namely the sizes of front and back windows;
3) a dictionary of the Huffman Tree coding output layer is adopted, and the Huffman Tree coding output layer is distributed from the root of the Tree to the leaf nodes according to the frequency;
4) logarithm of the result of adding word vectors
Figure FDA0002771437610000022
w refers to the current location word;
5) calculating partial derivatives of l, gradually updating the weight by using a gradient descent algorithm, and training the w after the training is finishedt±cNamely, the word vector matrix, correspondingly integrating the vectors of all the words, namely, sorting and collecting the context vector of each word, and finally obtaining the Mongolian bilingual word vector matrix.
5. The Mongolian Chinese machine translation method based on the model independent meta-learning strategy and the differentiable neural machine according to claim 1, wherein the specific steps of the step 2 are as follows:
1) performing initial training by means of partial high-resource language tasks, and recording task distribution as p (tau);
2) initializing a learning rate alpha, beta of gradient descent;
3) randomly initializing a parameter theta according to previous experimental records or experiences, and sampling tauiDenotes the task numbered i, τiBelongs to p (tau), and p (tau) is the total task distribution;
4) for each taskiCalculating the gradient thereof
Figure FDA0002771437610000023
Wherein
Figure FDA0002771437610000024
In order to be a function of the loss,
Figure FDA0002771437610000025
calculating a sign for the gradient;
5) performing a gradient update to obtain
Figure FDA0002771437610000026
θ′iIs the new parameter after the gradient update;
6) each task τ in p (τ)iAfter the execution is finished, performing a second gradient update, that is, a final gradient update, wherein the formula is as follows:
Figure FDA0002771437610000027
theta at this timefNamely the locally optimal task parameters obtained by finally carrying out gradient descent.
6. The method for Mongolian Chinese machine translation based on the model-independent meta-learning strategy and the differentiable neural machine according to claim 1, wherein in the step 3, a final translation model is obtained by repeating simulation training and fine tuning.
7. The method of claim 1 or 6, wherein the differentiable neural machine uses vectors to store memory, each row of the memory matrix corresponds to a different memory, and the processor uses an interface vector intControlling a write head control and a plurality of read head controls to interact with a memory, wherein 1 row vector of a memory matrix represents 1 group of memories, N rows represent that the memory matrix can hold N groups of memories at most, a differentiable neural machine at each time step receives a read head information stream at the last moment and an external input information stream at the current moment to form an external input information stream of a generalized differentiable neural machine, the generalized differentiable neural machine is processed to a hidden state to generate an output vector and an interface vector, the interface vector controls the read head, interacts with an external storage matrix through a read-write mechanism to generate write information at the current moment, and updates the matrixObtaining the read information of the current time, and linearly combining the read information and the output vector to generate the final output vector ou of the current timetWherein the memory is composed of memories, and the memory form of the memory is a memory matrix.
8. The Mongolian Chinese machine translation method based on the model-independent meta-learning strategy and the differentiable neural machine as claimed in claim 7, wherein the processor is composed of a plurality of neural networks and is responsible for interacting with input and output, wherein the input in istIs formed by a read vector r and an input vector xtThe resulting single controller, i.e. the processor input vector,
Figure FDA0002771437610000031
wherein
Figure FDA0002771437610000032
Representing a read vector set in the memory matrix at the last moment, and d represents the group number of the set;
and using the obtained vector to write and read, and performing read-write operation on the memory so as to update the content of the memory, wherein the write operation is as follows:
Mf[i,j]=M[i,j](1-ww[i]era[i])+ww[i]val[i]
that is, an erase rewrite operation is performed on the matrix in the memory, where M [ i, j [ ]]Representing the distribution of j dimensions over the current row i in memory, wwRepresents the write weight, era [ i ]]Representing an erase vector, i.e. performing an erase operation on the j dimension on the current row, val [ i ]]Representing the write vector, i.e. the addition to the erase location;
the read vector r is defined as:
Figure FDA0002771437610000041
wherein, M [ i ], []Representing the memory matrix in the memory, i is the location information, here representing the ith row of the memory,. representing all vector dimensions at the current location, N represents the distribution row in the memory, wr[i]Representing the read weight on row i of the memory;
based on the addressing of the content and the dynamic memory allocation, the location of the write in the memory is determined, and based on the addressing of the content and the time-link matrix, where to read is determined, the formula is as follows:
Figure FDA0002771437610000042
c (M, k, beta) i defines the normalized probability distribution on the memory position in the memory, namely the judgment of position accuracy, wherein D is a cosine similarity function, key is a search key value, beta is a focusing parameter and represents the strength of the key, namely the suitable degree of the current position, and j is the distribution of all dimensions in the ith row of the memory and represents the dimension from 1 to W;
after the read-write operation of the memory is completed, the processor obtains two vectors epsilontAnd vt,(vtt)=NN([in1;...;int];θf) And NN denotes a processor.
9. The Mongolian Chinese machine translation method based on the model independent meta learning strategy and the differentiable neural machine according to claim 8, wherein the differentiable neural machine adopts an auxiliary matrix to record a previous word sequence or semantic sequence, so as to ensure correct reading and writing sequence, the auxiliary matrix comprises two parts, one part is a using vector, the other part is a time link matrix, the using vector records a used position so far, and the time link matrix records a sequence of writing positions, so as to ensure correct storage and reading of related semantic information, simultaneously identify the used matrix position, prevent coverage, ensure semantic accuracy, and predict a result finally output by the differentiable neural machine
Figure FDA0002771437610000043
Wherein wrIs the read weight of the output and,
Figure FDA0002771437610000044
is a read vectorI.e. a C-dimensional vector on the current line t.
10. The method of claim 9, wherein the differentiable neural machine determines at each time step t a final output prediction result ou based on linear combination of the two prediction information after the information flow at time t-1 is exchanged with the memory based on the information flow at time t-1tThe final output prediction result ou is obtainedtIntroducing a Mongolian Chinese machine translation model, and translating by the following formula:
Figure FDA0002771437610000051
where S is the final sentence generated by the translation, outIs the output characteristic of the statement at the time t, namely the prediction result of the final output obtained by the differentiable neural machine,
Figure FDA0002771437610000052
is a network-related parameter, stThe term is generated at time t, and the term expression satisfying the maximum probability of the semantic feature, that is, the optimal interpretation of the translated term is obtained.
CN202011250507.4A 2020-11-10 2020-11-10 Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine Pending CN112364668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011250507.4A CN112364668A (en) 2020-11-10 2020-11-10 Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011250507.4A CN112364668A (en) 2020-11-10 2020-11-10 Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine

Publications (1)

Publication Number Publication Date
CN112364668A true CN112364668A (en) 2021-02-12

Family

ID=74510084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011250507.4A Pending CN112364668A (en) 2020-11-10 2020-11-10 Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine

Country Status (1)

Country Link
CN (1) CN112364668A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065432A (en) * 2021-03-23 2021-07-02 内蒙古工业大学 Handwritten Mongolian recognition method based on data enhancement and ECA-Net

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619127A (en) * 2019-08-29 2019-12-27 内蒙古工业大学 Mongolian Chinese machine translation method based on neural network turing machine
CN111597827A (en) * 2020-04-02 2020-08-28 云知声智能科技股份有限公司 Method and device for improving machine translation accuracy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619127A (en) * 2019-08-29 2019-12-27 内蒙古工业大学 Mongolian Chinese machine translation method based on neural network turing machine
CN111597827A (en) * 2020-04-02 2020-08-28 云知声智能科技股份有限公司 Method and device for improving machine translation accuracy

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEX GRAVES 等: "Hybrid computing using a neural network with dynamic external memory", 《NATURE》 *
HULA HOOP: "浅析至强RNN可微分神经计算机(DNC)", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/27773709》 *
王翠: "基于MAML方法的佤语孤立词分类", 《云南民族大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065432A (en) * 2021-03-23 2021-07-02 内蒙古工业大学 Handwritten Mongolian recognition method based on data enhancement and ECA-Net

Similar Documents

Publication Publication Date Title
CN109766277B (en) Software fault diagnosis method based on transfer learning and DNN
CN109858041B (en) Named entity recognition method combining semi-supervised learning with user-defined dictionary
CN110069778B (en) Commodity emotion analysis method for Chinese merged embedded word position perception
CN111046179B (en) Text classification method for open network question in specific field
CN112733541A (en) Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN108052499B (en) Text error correction method and device based on artificial intelligence and computer readable medium
CN111191002B (en) Neural code searching method and device based on hierarchical embedding
CN110619127B (en) Mongolian Chinese machine translation method based on neural network turing machine
CN110688862A (en) Mongolian-Chinese inter-translation method based on transfer learning
CN113190656B (en) Chinese named entity extraction method based on multi-annotation frame and fusion features
CN109086865B (en) Sequence model establishing method based on segmented recurrent neural network
CN112818118B (en) Reverse translation-based Chinese humor classification model construction method
CN112541356A (en) Method and system for recognizing biomedical named entities
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN113190219A (en) Code annotation generation method based on recurrent neural network model
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
CN113221542A (en) Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening
CN113191150B (en) Multi-feature fusion Chinese medical text named entity identification method
CN111428518B (en) Low-frequency word translation method and device
CN112765996B (en) Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation
CN112364668A (en) Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine
CN114880022B (en) Bash code annotation generation method based on CodeBERT fine tuning and retrieval enhancement
CN115659172A (en) Generation type text summarization method based on key information mask and copy
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210212