CN111753557A

CN111753557A - Chinese-more unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

Info

Publication number: CN111753557A
Application number: CN202010096013.9A
Authority: CN
Inventors: 余正涛; 薛明亚; 高盛祥; 赖华; 翟家欣; 朱恩昌; 陈玮
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2020-10-09
Anticipated expiration: 2040-02-17
Also published as: CN111753557B

Abstract

The invention relates to a Chinese-more unsupervised neural machine translation method fusing an EMD minimized bilingual dictionary, belonging to the technical field of machine translation. The invention comprises the following steps: collecting the corpus; crawling Chinese and Vietnamese monolingual sentences by using a web crawler; firstly, respectively training the monolingual word embedding of Chinese and Vietnamese, and obtaining a Chinese-Vietnamese bilingual dictionary through EMD training of minimum word embedding distribution; training the dictionary as a seed dictionary to obtain Chinese-Yue bilingual word embedding; finally, embedding and applying the bilingual words into an unsupervised machine translation model of a shared encoder to construct a Chinese-more unsupervised neural machine translation method fusing an EMD minimized bilingual dictionary. The method can effectively improve the performance of hanyue unsupervised neural machine translation.

Description

Chinese-more unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

Technical Field

The invention relates to a Chinese-more unsupervised neural machine translation method fused with an EMD (Earth Mover's Distance) minimized bilingual dictionary, belonging to the technical field of machine translation.

Background

Neural machine translation is a machine translation method proposed in recent years, and the quality of neural machine translation has become a mainstream translation method beyond statistical machine translation over a plurality of language pairs. However, the neural machine translation requires a large-scale parallel corpus to have a good effect, and when the training data is insufficient, the translation quality is poor. Parallel corpora between chinese and vietnamese are rare and not readily available, so chinese-over-machine translation is typically a low resource language machine translation. However, Chinese and Vietnamese have a large amount of monolingual linguistic data, and Chinese-Vietnamese unsupervised neural machine translation which only utilizes the monolingual linguistic data is explored in the text, so that the method has very important function for promoting communication and cooperation of two countries, and has very important theoretical and application value for the research of machine translation of low-resource languages.

At present, the research methods of unsupervised machine translation mainly include unsupervised machine translation based on counterlearning and unsupervised machine translation (shared space) based on shared encoder. Lample et al propose the idea of mapping sentences of two different monolingual corpora into the same space, reconstruct a shared feature space from the two languages by learning, and realize unsupervised neural machine translation only with monolingual corpora. Artemix et al modify the model by pre-training unsupervised bilingual word embedding, using a shared encoder and separate decoding to provide unsupervised neural machine translation using only monolingual corpora. The weight-sharing unsupervised machine translation model proposed by Yang et al improves the characteristics and internal features of each language compared with the shared encoder model so as to improve the translation quality, and Lample et al can obtain the effect of further improving unsupervised neural machine translation by combining the neural machine translation and the phrase-based statistical machine translation effect. Lample et al propose cross-language model pre-training for initializing lookup tables to improve the quality of pre-trained cross-language word embedding, and significantly improve the performance of unsupervised machine translation models. They use homologous words as initial cross-language information or a digital alignment method from monolingual corpus of similar languages, and then expand learning to realize unsupervised neural machine translation. The Chinese-Vietnam language has larger difference, and no usable homologous words exist among the Chinese-Vietnam, so the method using the language homologous words is not feasible on the Chinese-Vietnam language pair, and the unsupervised neural machine translation of the shared encoder of the artemie et al is realized on the basis of unsupervised bilingual word vectors, thereby conforming to the characteristic of larger difference of the language pair. Therefore, the invention chooses to extend the work of artemie et al, but the quality of learning bilingual word embedding using arabic numerals between languages is limited, so the idea of the invention is to improve the unsupervised bilingual word embedding quality to improve the unsupervised neural machine translation quality of chinese characters.

In the unsupervised machine translation only using Chinese and Vietnamese monolingual corpora, the machine translation is difficult to directly realize, but the acquisition of a bilingual dictionary is relatively easy, so the invention considers that the Chinese-Vietnamese bilingual dictionary is trained in the Chinese-Vietnamese monolingual corpora first, and then the Chinese-Vietnamese dictionary is used as a seed word to guide and train the embedding of bilingual words with higher quality, so the unsupervised neural machine translation quality of the Chinese-Vietnamese is improved. Zhang et al propose that the similarity of the word vector space distribution of the language is utilized, the method of EMD minimization is used for training a bilingual dictionary, the whole process only uses the unsupervised training mode of the monolingual corpus, the quality can be compared favorably with the supervised mode, and the characteristic of great difference of the Chinese-Yuan language is met. The more unsupervised neural machine translation of chinese that fuses EMD to minimize bilingual dictionaries is proposed herein.

The method comprises the steps of firstly regarding word embedding of Chinese and Vietnamese monolingus as two probability distributions, training by minimizing the EMD distance between the embedding of the Chinese-language-crossing words to obtain a Chinese-language-crossing bilingual dictionary, then training the embedding of the Chinese-language-crossing bilingual words by using the Chinese-language-crossing bilingual dictionary as a seed dictionary and utilizing a self-learning method, and realizing unsupervised neural machine translation of the Chinese characters on a shared coding encoder model.

Disclosure of Invention

The invention provides a Chinese-more unsupervised neural machine translation method fused with an EMD minimized bilingual dictionary, which is used for an unsupervised translation system of low-resource languages and improves the unsupervised neural machine translation performance of the Chinese-more unsupervised neural machine.

The technical scheme of the invention is as follows: the Chinese-more unsupervised neural machine translation method fusing the EMD minimized bilingual dictionary comprises the following specific steps:

step1, corpus collection: crawling Chinese and Vietnamese monolingual corpora by using a web crawler; the monolingual corpus is mainly from Chinese and Vietnamese monolingual news websites;

step2, corpus preprocessing: on the basis of Step1, performing word segmentation processing and part-of-speech tagging on Chinese and Vietnamese single-language sentences by using a word segmentation and part-of-speech tagging tool, and obtaining word-of-Chinese-Vietnamese words embedding by using a word vector training tool to obtain single-language word vectors; training respectively obtained word vectors of the Chinese-more monolingus, mapping the word vectors into a vector space, wherein the word vector space of the monolingus of the two languages shows approximate homomorphism, which means that linear mapping exists and the two spaces can be approximately connected;

step3, unsupervised bilingual dictionary based on EMD minimization: on the basis of Step2, training an unsupervised Chinese-Vietnamese bilingual dictionary by utilizing an EMD (empirical mode decomposition) minimization-based method according to Chinese and Vietnamese monolingual word vectors;

as a preferred embodiment of the present invention, the Step3 specifically comprises the following steps:

step3, using the EMD minimization method between the Chinese word vector distribution and the Vietnam word vector distribution, regarding the word vectors as probability distribution, using the distance between the distributions as the vocabulary level criterion, and finding the EMD minimization between the Chinese word vector distribution and the Vietnam word vector distribution by the unsupervised training without using any seed dictionary to obtain the Chinese word and Vietnam bilingual dictionary.

The circles in fig. 3 are regarded as soil mounds and the squares as pot holes, and their sizes represent the volume of the soil mounds and the volume of the pot holes, or the corresponding weights. In the example of fig. 3, all weights are equal. At this setting, it is desirable to fill the excavation with a minimum overall cost of moving the soil heap, as measured by the product of the distance and volume of the moving soil heap. It is to be understood that the arrow in fig. 3(b) represents the optimal movement scheme in this example, and this scheme can be just regarded as the result of vocabulary translation. From a microscopic view, due to

The earth in the pile has been used entirely to fill "music" pot holes, which will not interfere with "dance" pot holes, and thus

The soil heap is responsible for filling up the 'dance' pot hole. From a macroscopic view, the minimization of the overall moving cost enables the global information to be considered, so that the locality of the nearest neighbor search is overcome, and the hubness problem is solved. The notion of representing global weighted matching for the above metaphor can be mathematically implemented using EMD, whose name is derived from the above metaphor.

s.t.W_ij≥0，

Wherein, V_sRepresenting the size of the source language vocabulary, V_tRepresentsTarget language vocabulary size, C_ijRepresents the distance between the ith soil pile and the jth pit hole, t_iRepresents the volume of the ith soil heap, s_jRepresents the volume of the j-th pit, W_ijThe decision variables for optimizing the problem represent the volume of soil transferred from the ith soil heap to the jth cavern, and therefore the objective function is to minimize the overall movement cost. After the solution is completed, W is non-zero_ijExperiments show that the effect of vocabulary translation by using EMD can be better than that of the nearest neighbor.

In order to better exploit the ability of EMD to handle the phenomenon of one-word multi-translation, it is proposed herein to introduce EMD into the training process of bilingual word vectors. In the training objective function, EMD is used as one item to participate in training in a regular form, so that the bilingual word vector obtained by training can better capture the phenomenon of one-word multi-translation. Its effect is verified experimentally.

The method of counterlearning can also be viewed in this framework, as counterlearning implicitly optimizes Jensen-Shannon divergence. But other better distribution distances are possible for the task of vocabulary translation to choose from. Since EMD is also a measure of the distance between distributions that is well suited for the task of vocabulary translation, consider using EMD as a vocabulary level criterion to guide the learning of linear mappings, i.e., finding a mapping G that minimizes EMD between the word vector distribution in the source language after mapping and the word vector distribution in the target language, as shown in FIG. 4. The use of a mathematical formula can be expressed in the form,

wherein p is_G(x)Representing the distribution of source language word vectors, p, after G mapping_yRepresenting the target-language word vector distribution.

Step4, obtaining the bilingual word embedding: on the basis of the steps of Step2 and Step3, an unsupervised bilingual dictionary based on EMD minimization is used as a seed dictionary to guide the learning of bilingual word embedding by using a self-learning model; generating Chinese-more bilingual word embedding;

word embedding mapping: assuming that the word embedding matrices for the languages chinese and vietnamese are X and Y respectively,

a vector for the ith word of the source language,

a vector for the jth word of the target language; the dictionary D is a binary matrix, D is when the ith word of the source language is aligned with the jth word of the target language_ij1. The goal of word mapping is to find a mapping matrix W such that the mapped word is

And

is closest to the Euclidean distance of (i.e. is

After normalizing and centering the matrices X and Y and setting W as an orthogonal matrix, the above problem of solving euclidean distances is equivalent to maximizing the dot product:

tr represents trace operation of the matrix, and the optimal solution can be obtained by solving W ═ UV^T(U, V denotes two orthogonal matrices), subjected to singular value decomposition, X^TDY＝U∑V^T. Given that the matrix D is sparse, a solution can be obtained in linear time.

Dictionary self-learning: and according to a nearest neighbor retrieval method, allocating a target language word closest to each source language word, adding the aligned word pair into the dictionary, and iterating again until convergence.

Taking FIG. 5 as an example, the word pairs aligned in the beginning dictionary are (horse)

dog-Ch Lou), mapped once according to dictionary L1, such that the mapped "horse" is associated with

And the Euclidean distance between "dog" and "ChLou". Then in the mapped space, the closest corresponding word is found for the other words, and cat can be found to be closer to me so it is also added to the dictionary

dog-Ch, cat-meo) as a new reference dictionary, and re-calculating the euclidean distance will result in a new mapping matrix W, and thus a new alignment result.

After training, translation is carried out by using a beam search (beam search), and the size of a beam needs to be determined by balancing translation time and search accuracy.

The unsupervised bilingual dictionary based on EMD minimum training is fused, and the unsupervised dictionary is used as a seed dictionary to improve the self-learning effect of the dictionary and further improve the quality of bilingual word vectors.

And Step5, on the basis of the Step4, applying the bilingual word vectors to an unsupervised neural machine translation model of the shared encoder, and training to obtain a Chinese-to-more unsupervised neural machine translation model fused with the EMD minimized bilingual dictionary.

The method provided by the invention is characterized in that an unsupervised bilingual dictionary based on EMD minimization is fused on the basis of an artemie and other people sharing an encoder, and the method has stronger capability of mining cross-language information in Chinese and Vietnamese monolingual corpora than an original model. Model structure as shown in fig. 6, the model used follows the standard encoder and decoder with attention mechanism proposed by bahdana et al. The system consists of a shared encoder and two decoders, wherein the two decoders respectively correspond to a source language and a target language. The encoder end is a double-layer bidirectional recurrent neural network (BiGRU), and the decoder end is a double-layer recurrent neural network (UniGRU). With regard to the attention mechanism, the global attention method and general alignment function proposed by Luong et al are used herein. At the encoder side, a pre-trained bilingual dictionary of chinese-over-the-word and bilingual word vectors are used, accepting the input sequence and generating language-independent tokens. And the word vector at the decoder end is continuously updated along with training, and training and translation are carried out through two decoders.

For each sentence in chinese (L1), the model trains two steps alternately: denoising, which optimizes the probability of encoding the noisy encoding of a sentence with a shared encoder and reconstructs it with an L1 decoder, and dynamic reverse translation, which translates the sentence in inference mode (encodes it with a shared encoder and decodes it with a vietnamese (L2) decoder) and then optimizes the probability of encoding the translated sentence with a shared encoder and restores the original sentence with an L1 decoder. Training alternates between sentences in L1 and L2, which takes similar steps.

The double structure is as follows: while NMT systems are typically built for a particular translation direction (e.g., chinese- > vietnamese or vietnamese- > chinese), this document takes advantage of the dual nature of machine translation to process two directions simultaneously (e.g., chinese < - > vietnamese).

Sharing the encoder: similar to Ha et al, Lee et al, and Johnson et al, the system herein is one encoder shared by both languages. I.e., both chinese and vietnamese are encoded using the same encoder. The shared encoder is intended to represent both languages as language independent representations, and then each decoder should decode into the language corresponding to it.

Pre-training fixed bilingual word embedding: while most neural machine translation systems randomly initialize their word vectors and update them during training, pre-trained cross-language word vectors are used in the encoder, which remain unchanged during the training process. The encoder has language independent word-level representations and it only needs to learn how to combine them to build a representation of a larger phrase.

Experiments in artemie et al have shown that adding denoising and retranslation to the system helps to improve translation quality, and the present invention uses a shared encoder system with denoising and retranslation.

For each sentence in chinese (L1), the system is trained in two steps, denoising: it optimizes the probability of encoding the noise encoding of a sentence with a shared encoder, and reconstructs it with an L1 decoder, as in fig. 7 (a); and (3) translation back: the sentence is translated in inference mode (inference mode) (the sentence is encoded using a shared encoder, as in fig. 7(b) decoded using a vietnamese (L2) decoder), and then the probabilities of encoding the translated sentence and recovering the source sentence using an L1 decoder are optimized using the shared encoder. These two steps are performed alternately for training L1 and L2, and the training step for L2 and L1 are similar to fig. 7(c) and (d). The neural machine translation system is usually trained by a parallel corpus, and since only a monolingual corpus is available, the supervised training method cannot be used in the scene of the text. However, using the model architecture of fig. 6, the entire system can be trained unsupervised in a combination of two methods, denoising and translation:

denoising: due to the use of a shared encoder and the use of the dual structure of machine translation, the system herein can be trained directly to reconstruct the input sentence. In particular, the system encodes an input sentence in a given language using a shared encoder, and then reconstructs the source sentence using a decoder for that language. Given that pre-trained cross-language word vectors are used in a shared encoder, which learns to combine the embedding of two languages into a language-independent characterization, each decoder should learn to decode such characterization into the corresponding language. In inference mode, the source language decoder is replaced by the target language decoder only, so that the system can generate a translation of the input text using the language independent tokens generated by the encoder.

This text introduces random noise in the input sentence. The idea is that with the same automatic encoder denoising principle, the system is trained to reconstruct the original version of the corrupted input sentence. For this purpose, the word order of the input sentence is changed by randomly exchanging between consecutive words. This N/2 random exchange is performed for a sequence of N elements. Thus, the system needs to learn the internal structure of the language to recover the correct word order. At the same time, the system is prevented from excessively depending on the word order of the input sequence, so that the actual word order difference of cross-language can be better explained.

And (3) translation back: despite the denoising strategy, the training program is still a replication task, which includes some synthetic changes, most importantly, each time involving one language, regardless of the final goal of translation between the two languages. In order to train the system of the present document in a true translation environment without violating the constraint of using only monolingual corpora, the translation method proposed by Sennrich et al was added to the system. Specifically, given an input sentence in one language, the system translates it into another language in inference mode using greedy decoding (i.e., using a shared encoder and decoder for the other language). In this way, pseudo parallel sentence pairs can be obtained and the system trained to predict the original sentence from the synthesized translation.

It is noted that, in contrast to standard reverse translation, which uses an independent model to reverse translate the entire corpus at once, each small batch of sentences is reverse translated on-the-fly using the model being trained, taking advantage of the dual structure of the proposed architecture. Thus, as training progresses and the model improves, it will produce better pairs of synthesized sentences through reverse translation, which will help to further improve the model in subsequent iterations.

Because the difference of the Chinese-Vietnamese language is large and homologous words do not exist, according to the difference characteristic of the Chinese-Vietnamese language, the method for minimizing EMD among word vector distributions is introduced to learn a Chinese-Vietnamese dictionary from a Chinese-Vietnamese corpus, a Chinese-Vietnamese unsupervised neural machine translation method fusing the EMD minimized bilingualse dictionary is provided, and the performance of unsupervised neural machine translation is improved.

The invention has the beneficial effects that:

the invention realizes the unsupervised neural machine translation system of the languages with larger Chinese cross-distance difference, improves the capability of the shared encoder model unsupervised neural machine translation model to acquire cross-language information of the languages with larger difference, and further improves the unsupervised neural machine translation quality of the Chinese cross-distance. The unsupervised Chinese-Vietnamese language translation model has the advantages that unsupervised operation is expanded from similar languages containing homologous words to Chinese-Vietnamese language tasks with large differences, and the performance of the unsupervised neural machine translation model of the shared encoder is improved.

Drawings

FIG. 1 is a flow chart of a phrase-based Chinese-to-pseudo parallel sentence pair generation method proposed by the present invention;

FIG. 2 is a monolingual word vector space for Chinese and Vietnamese of the present invention;

FIG. 3 is a chart of the Hubness problem of the present invention;

FIG. 4 is an Earth mover's distance minimization learning diagram of the present invention;

FIG. 5 is a schematic diagram of the word mapping process using number alignment of the present invention;

FIG. 6 is a Chinese-cross unsupervised NMT model of the fused EMD minimized bilingual dictionary of the present invention;

fig. 7 is a diagram of 4 processes of unsupervised NMT model training for a fused EMD minimized bilingual dictionary chinese.

Detailed Description

Example 1: as shown in fig. 1-7, the chinese-to-more unsupervised neural machine translation method fusing EMD minimized bilingual dictionary, Step1, first obtains parallel corpus: 5800 ten thousand sentences of Chinese monolingual corpus and 3000 ten thousand sentences of Vietnamese monolingual corpus are crawled from the Internet.

Step2, preprocessing the corpus; on the basis of Step1, segmenting Chinese and Vietnamese monolingual sentences and marking parts of speech to obtain monolingual word vectors through training; and performing word segmentation and part-of-speech tagging on the Vietnamese by using an understhesellp Vietnamese word segmentation tool for the Vietnamese, and performing word segmentation and part-of-speech tagging on the Chinese by using a jieba word segmentation tool. Word2vec is used to train monolingual word vectors for both hanyue and vietnamese. Both Chinese and Vietnamese respectively train 300-dimensional word vectors. The 300-dimensional word vector is trained using the skip-gram model. For training bilingual word vectors after dictionary addition.

Training the respectively obtained word vectors of the Chinese-more monolingual words, mapping the word vectors into a vector space as shown in FIG. 2, wherein the monolingual word vector spaces of the two languages show approximate homomorphism, which means that linear mapping exists and can approximately connect the two spaces.

Step3, unsupervised bilingual dictionary based on EMD minimization; based on Step2, an unsupervised Chinese-overt bilingual dictionary is trained by using an EMD minimization method according to the Chinese-overtime and Vietnamese monolingual word vectors.

Further, the Step3 includes the specific steps of:

step3, using EMD minimization method between Chinese word vector distribution and Vietnam word vector distribution; regarding the word vectors as probability distributions, and regarding the distances among the distributions as criteria of the vocabulary level; training in an unsupervised manner without using any seed dictionary to find EMD minimization between the distribution of the chinese-overtaking word vectors; acquiring a Chinese-Yue bilingual dictionary;

a bilingual dictionary is trained using the method proposed by Zhang et al, and 50-dimensional word vectors are trained in chinese and vietnamese using word2vec, respectively. The 50-dimensional word vector is trained using a default hyper-parametric trained CBOW framework, the frequency of word occurrence is limited to not less than 1000 nouns, and the experimental results are shown in table 1.

TABLE 1 generating number table for Chinese-to-English dictionary based on EMD minimization

Step4, obtaining the Chinese-Yue bilingual word embedding; on the basis of the steps of Step2 and Step3, an unsupervised bilingual dictionary based on EMD minimization is used as a seed dictionary to guide the learning of bilingual word embedding by using a self-learning model; generating Chinese-more bilingual word embedding;

in Step4, performing word embedding mapping: assuming that the word embedding matrices for the languages chinese and vietnamese are X and Y respectively,

a vector for the ith word of the source language,

And

is closest to the Euclidean distance of (i.e. is

tr represents trace operation of the matrix, and the optimal solution can be obtained by solving W ═ UV^TU, V denotes two orthogonal matrices, decomposed by singular values, X^TDY＝U∑V^TGiven that the matrix D is sparse, a solution is obtained in linear time;

Further, in Step 5:

the experiment is mainly divided into the following five parts: unsupervised baseline model translation on Chinese-cross, UNMT fusing EMD minimized bilingual dictionary, adding 1 ten thousand and 10 ten thousand parallel corpuses on the basis of the method model, and directly using 1 ten thousand and 10 ten thousand parallel corpuses to train on GNMT and Transform with supervised models.

Unsupervised model training: the translation system is trained only by using monolingual corpus, and the 1 st benchmark experiment applies the benchmark model to train the Chinese unsupervised translation model. Article 2 is the method herein, fusing EMD minimized bilingual dictionary hanyuemt on baseline experiments.

Semi-supervised model training: in most cases, the languages under study often have a small amount of parallel corpora, which can be used to improve the performance of the model, but the corpus scale is not large enough to directly train the complete conventional NMT system. So in addition to the monolingual corpus, a small amount of parallel corpora is added. Experiments were also performed using 10,000 and 100,000 parallel sentence pairs based on the method presented herein.

And (3) supervision model training: the conventional supervised neural machine translation model was trained using the 10,000 and 100,000 parallel sentence pairs added in the semi-supervised experiment described above for comparison of the semi-supervised experiments.

TABLE 2 comparison of Hanyue machine translation experiments in different methods

As can be seen from comparison between the 1 st line and the 2 nd line of the experimental results in table 2, the unsupervised bilingual dictionary training model is fused on the basis of the unsupervised model, and compared with the baseline system, the unsupervised bilingual dictionary training model has about 2.5 BLEU values, which indicates that the model of the text can capture more cross-language information from the monolingual corpus, improve the quality of bilingual word vectors, and further improve the translation quality. The semi-supervised system adds 1 million parallel corpora BLEU Han-to-reach 10.02 BLEU values and-reach 13.91 BLEU values from the 3 rd row, and the comparison between the 5 th, 6 th, 7 th and 8 th rows shows that the model is directly trained by 10 million parallel corpora, so that the method provided by the text has a better effect. From a comparison of lines 4 and 8, it can be seen that both the Han-Yuan and the Yuan-Han directions exceed the Transform model when 10 ten thousand parallel sentence pairs are added.

TABLE 3 different methods Hanyue unsupervised machine translation example analysis

From the experimental translation results in table 3, although the model still has the problem of inaccurate translation caused by learning bias, the translation quality of the method is obviously improved compared with that of the baseline system.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. The Chinese-more unsupervised neural machine translation method fused with the EMD minimized bilingual dictionary is characterized in that:

the method comprises the following specific steps:

step1, corpus collection: crawling Chinese and Vietnamese monolingual corpora by using a web crawler;

step2, corpus preprocessing: on the basis of Step1, segmenting Chinese and Vietnamese monolingual sentences and marking parts of speech to obtain monolingual word vectors through training;

step4, obtaining the bilingual word embedding: on the basis of the steps of Step2 and Step3, an unsupervised bilingual dictionary based on EMD minimization is used as a seed dictionary to guide the learning of bilingual word embedding; generating Chinese-more bilingual word embedding;

2. The method of Chinese-to-many unsupervised neural machine translation fusing EMD minimizing bilingual dictionaries according to claim 1, wherein: the specific steps of Step2 are as follows:

step2, segmenting Chinese and Vietnamese single-language sentences and labeling parts of speech, performing segmentation processing and labeling parts of speech of the Chinese and Vietnamese single-language linguistic data by using a segmentation and labeling tool, and obtaining embedding of Chinese and Vietnamese single-language words by using a word vector training tool.

3. The method of Chinese-to-many unsupervised neural machine translation fusing EMD minimizing bilingual dictionaries according to claim 1, wherein: the specific steps of Step3 are as follows:

4. The method of Chinese-to-many unsupervised neural machine translation fusing EMD minimizing bilingual dictionaries according to claim 1, wherein: the specific steps of Step4 are as follows:

using the Hanyue bilingual dictionary obtained in Step3 as a seed dictionary; guiding the embedding training of the Chinese-language-crossing single words by using a self-learning model; and obtaining Chinese-more bilingual word embedding training.

5. The method of Chinese-to-many unsupervised neural machine translation fusing EMD minimizing bilingual dictionaries according to claim 1, wherein: in Step 5:

and embedding and applying the trained bilingual words fused with the EMD bilingual dictionary in the model of the shared encoder by using a shared encoder model, so as to realize word-level correspondence between the Chinese language and the more bilingual language and train an unsupervised neural machine translation model of the Chinese language.

6. The method of Chinese-to-many unsupervised neural machine translation fusing EMD minimizing bilingual dictionaries according to claim 1, wherein: in Step4, performing word embedding mapping: assuming that the word embedding matrices for the languages chinese and vietnamese are X and Y respectively,

a vector for the ith word of the source language,

a vector for the jth word of the target language; the dictionary D is a binary matrix, D is when the ith word of the source language is aligned with the jth word of the target language_ij1. The goal of word mapping is to find a mapping matrix W, such that X is mapped_i*And Y_j*Is closest to the Euclidean distance of (i.e. is

the dictionary self-learns as follows: and according to a nearest neighbor retrieval method, allocating a target language word closest to each source language word, adding the aligned word pair into the dictionary, and iterating again until convergence.