CN110598221B

CN110598221B - Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network

Info

Publication number: CN110598221B
Application number: CN201910807617.7A
Authority: CN
Inventors: 苏依拉; 孙晓骞; 王宇飞; 赵亚平; 张振; 高芬; 贺玉玺; 王昊
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2020-07-07
Anticipated expiration: 2039-08-29
Also published as: CN110598221A

Abstract

A method of constructing Mongolian-Chinese parallel corpus using generative adversarial network to improve the quality of Mongolian-Chinese translation. The generative adversarial network includes a generator and a discriminator. The generator uses a hybrid encoder to encode the source language sentence Mongolian into a vector representation, and uses a bidirectional Transformer-based method. The decoder combines the sparse attention mechanism to convert this representation into the target language sentence Chinese, thereby generating Mongolian sentences that are closer to human translation and more Mongolian-Chinese parallel corpora. In the discriminator, it is judged that the Chinese sentences generated by the generator are similar to The gap between human translations, the generator and the discriminator are trained against each other until the discriminator thinks that the Chinese sentences generated by the generator are very similar to the human translations, and a high-quality Mongolian-Chinese machine translation system and a large number of Mongolian-Chinese parallel data sets are obtained. The Mongolian-Chinese machine translation system is used for Mongolian-Chinese translation. The invention solves the problems of the serious shortage of Mongolian-Chinese parallel data sets and the inability of NMT to guarantee the naturalness, sufficiency and accuracy of translation results.

Description

Using Generative Adversarial Networks to Construct Mongolian-Chinese Parallel Corpus to Improve the Quality of Mongolian-Chinese Translation method

技术领域technical field

本发明属于机器翻译技术领域，特别涉及一种利用生成对抗网络构造蒙汉平行语料提高蒙汉翻译质量的方法。The invention belongs to the technical field of machine translation, and particularly relates to a method for constructing Mongolian-Chinese parallel corpus by using a generative confrontation network to improve the quality of Mongolian-Chinese translation.

背景技术Background technique

机器翻译能够利用计算机将一种语言自动翻译成为另外一种语言，是解决语言障碍问题的最有力手段之一。近年来，许多大型搜索企业和服务中心例如谷歌、百度等针对机器翻译都进行了大规模的研究，为获取机器翻译的高质量译文做出了重要贡献，因此大语种之间的翻译已经接近人类翻译水平，数百万人使用在线翻译系统和移动应用实现了跨越语言障碍的交流。在近几年深度学习的浪潮中，机器翻译已成为重中之重，已经成为促进全球交流的重要组成部分。Machine translation can use computers to automatically translate one language into another language, and it is one of the most powerful means to solve the problem of language barriers. In recent years, many large search companies and service centers, such as Google and Baidu, have conducted large-scale research on machine translation, which has made important contributions to obtaining high-quality translations for machine translation. Therefore, the translation between large languages is close to that of human beings. Translation level, millions of people use online translation systems and mobile apps to communicate across language barriers. In the wave of deep learning in recent years, machine translation has become a top priority and has become an important part of promoting global communication.

基于Seq2Seq的神经机器翻译框架由编码器和解码器组成，编码器读取输入序列并输出单个矢量，解码器读取该矢量以产生输出序列。自2013年以来，该框架获得了迅速的发展，相对于统计机器翻译而言在翻译质量上获得了显著的提升。句子级最大似然估计原理、LSTM和GRU中的门控单元以及注意力机制的加入使得NMT翻译长句子的能力得到了提高。2017年AshishVaswani等人提出了Transformer架构，一种完全依赖于注意机制来绘制输入和输出之间全局依赖关系的架构。这样做的好处是实现了并行化计算、有效减少了模型的训练时间、在一定程度上提高了机器翻译模型的质量。避免了RNN及其衍生网络慢且无法实现并行化等缺点。A Seq2Seq-based neural machine translation framework consists of an encoder and a decoder, where the encoder reads an input sequence and outputs a single vector, and the decoder reads this vector to produce an output sequence. Since 2013, the framework has grown rapidly, achieving significant improvements in translation quality relative to statistical machine translation. The principle of sentence-level maximum likelihood estimation, gating units in LSTM and GRU, and the addition of attention mechanism have improved the ability of NMT to translate long sentences. In 2017, Ashish Vaswani et al. proposed the Transformer architecture, an architecture that completely relies on an attention mechanism to map the global dependencies between input and output. The advantage of doing this is to realize parallel computing, effectively reduce the training time of the model, and improve the quality of the machine translation model to a certain extent. It avoids the shortcomings of RNN and its derived networks that are slow and cannot be parallelized.

目前，神经机器翻译已经很成功了，但是最好的NMT系统和人们的期望任有较大的差距，翻译质量有待提高。因为NMT通常采用最大似然估计训练模型，即最大化以源句为条件的目标真实句子的概率，即：模型可以为当前生成最佳的候选词，但是从长远来看对整个句子的翻译并不是最佳翻译，这给NMT留下了一个隐患。就连强大的Transformer也不例外。与人类的真实翻译相比，这样的目标并不能保证翻译结果的自然性、充分性和准确性。At present, neural machine translation has been very successful, but the best NMT system is still far from people's expectations, and the translation quality needs to be improved. Because NMT usually uses maximum likelihood estimation to train the model, that is, to maximize the probability of the target real sentence conditioned on the source sentence, that is: the model can generate the best candidate word for the current, but the translation of the entire sentence in the long run will not Not the best translation, which leaves a pitfall for NMT. Even the mighty Transformer is no exception. Such a goal does not guarantee the naturalness, adequacy and accuracy of the translation results compared to real human translations.

另外，大语种之间的互译已经相对比较成熟，但小语种之间的机器翻译由于各种挑战尤其是语料库的严重缺乏，人工构建平行语料代价十分昂贵，因此翻译效果仍不尽人意。In addition, the mutual translation between large languages is relatively mature, but the machine translation between small languages is still unsatisfactory due to various challenges, especially the serious lack of corpora, and the cost of manually constructing parallel corpora is very expensive.

发明内容SUMMARY OF THE INVENTION

为了克服上述现有技术的缺点，本发明的目的在于提供一种利用生成对抗网络构造蒙汉平行语料提高蒙汉翻译质量的方法，该方法主要针对蒙汉平行数据集严重匮乏以及NMT不能保证翻译结果的自然性、充分性和准确性等的问题，将生成对抗网络应用在蒙汉神经机器翻译中。In order to overcome the above-mentioned shortcomings of the prior art, the object of the present invention is to provide a method for constructing Mongolian-Chinese parallel corpus by using generative adversarial network to improve the quality of Mongolian-Chinese translation. Questions such as the naturalness, adequacy and accuracy of the results, the application of generative adversarial networks in Mongolian-Chinese neural machine translation.

为了实现上述目的，本发明采用的技术方案是：In order to achieve the above object, the technical scheme adopted in the present invention is:

一种利用生成对抗网络构造蒙汉平行语料提高蒙汉翻译质量的方法，将生成对抗网络用在蒙汉机器翻译中来缓解蒙汉平行语料库匮乏导致的蒙汉机器翻译质量较低的问题以及最小化人类翻译与NMT模型给出的翻译之间的区别，所述生成对抗网络主要包括生成器和鉴别器，生成器的加入可以有效的利用蒙汉单语数据，缓解机器翻译任务中蒙汉平行语料匮乏的问题。在所述生成器中，为了缓解蒙汉机器翻译中的UNK现象，使用混合编码器将源语言句子蒙古语编码为向量表示，使用基于双向Transformer的解码器结合稀疏注意力机制将该向量表示转化成为目标语言句子汉语，从而生成更加接近人类翻译的蒙古语句子和更多的蒙汉平行语料，提高蒙汉机器翻译的质量和效率。在所述鉴别器中，判断生成器生成的汉语句子与人类译文的差距，所述生成器的目标主要是生成更加接近人类翻译的蒙古语句子以及有效的利用蒙汉单语数据生成更多的蒙汉平行语料，而所述鉴别器的目的是计算生成器生成的汉语句子与人类翻译的汉语句子之间的差距。将生成器和鉴别器进行对抗训练，直到鉴别器认为生成器生成的汉语句子与人类译文非常相似时，即生成器和鉴别器实现纳什均衡时，得到高质量的蒙汉机器翻译系统和大量的蒙汉平行数据集，利用该蒙汉机器翻译系统进行蒙汉翻译。A method of using generative adversarial network to construct Mongolian-Chinese parallel corpus to improve the quality of Mongolian-Chinese translation, using generative adversarial network in Mongolian-Chinese machine translation to alleviate the problem of low quality of Mongolian-Chinese machine translation caused by the lack of Mongolian-Chinese parallel corpus and the minimum The difference between human translation and the translation given by the NMT model, the generative adversarial network mainly includes a generator and a discriminator. The addition of the generator can effectively use the Mongolian-Chinese monolingual data and alleviate the Mongolian-Chinese parallel in the machine translation task. The problem of lack of corpus. In the generator, in order to alleviate the UNK phenomenon in Mongolian-Chinese machine translation, a hybrid encoder is used to encode the source language sentence Mongolian into a vector representation, and a bidirectional Transformer-based decoder combined with a sparse attention mechanism is used to transform the vector representation Become the target language sentence Chinese, so as to generate Mongolian sentences that are closer to human translation and more Mongolian-Chinese parallel corpus, and improve the quality and efficiency of Mongolian-Chinese machine translation. In the discriminator, the difference between the Chinese sentence generated by the generator and the human translation is judged. The goal of the generator is to generate Mongolian sentences that are closer to human translation and to effectively use Mongolian-Chinese monolingual data to generate more Mongolian-Chinese parallel corpus, and the purpose of the discriminator is to calculate the gap between the Chinese sentences generated by the generator and the Chinese sentences translated by humans. The generator and the discriminator are trained against each other until the discriminator thinks that the Chinese sentences generated by the generator are very similar to human translations, that is, when the generator and the discriminator achieve Nash equilibrium, a high-quality Mongolian-Chinese machine translation system and a large number of Mongolian-Chinese parallel dataset, using the Mongolian-Chinese machine translation system for Mongolian-Chinese translation.

所述混合编码器由句子编码器和单词编码器组成，为了捕获句子之间的语义信息及编码器效率，句子编码器由双向Transformer组成，单词编码器使用双向LSTM，在保证编码器质量的条件下提高了单词编码器的效率。所述双向Transformer为优化Transformer1，在原Transformer的基础上首先加入了门控线性单元，以有效的获取源语言句子中的重要信息并舍弃多余信息；其次加入了分支结构，以有效地捕获源语言句子之间的多样化语义信息；最后，在分支结构上以及第三个层标准化之后加入了胶囊网络，使编码器可以捕获到源语言句子中词的准确位置，进一步强化编码器的准确性，提高了编码质量；所述解码器中，双向Transformer为优化Transformer2，在原Transformer的基础上首先加入了分支结构；其次加入了胶囊网络；最后加入了Swish激活函数，以有效地提高解码器解码的准确率。The hybrid encoder consists of a sentence encoder and a word encoder. In order to capture the semantic information between sentences and the efficiency of the encoder, the sentence encoder consists of a bidirectional Transformer, and the word encoder uses a bidirectional LSTM to ensure the quality of the encoder. This improves the efficiency of the word encoder. In order to optimize Transformer1, the two-way Transformer firstly adds a gated linear unit on the basis of the original Transformer to effectively obtain important information in the source language sentence and discard redundant information; secondly, a branch structure is added to effectively capture the source language sentence. Finally, a capsule network is added to the branch structure and after the third layer standardization, so that the encoder can capture the exact position of the words in the source language sentence, further strengthen the accuracy of the encoder and improve the In the decoder, in order to optimize Transformer2, the bidirectional Transformer firstly added a branch structure on the basis of the original Transformer; secondly, the capsule network was added; finally, the Swish activation function was added to effectively improve the decoding accuracy of the decoder. .

所述单词编码器和句子编码器先后将源语言句子进行编码，然后通过融合函数进行融合得到带有上下文信息的向量表示，其中，单词编码器将每个单词表示成向量形式，构建以单词为基本单元的蒙古语句子的向量表示，其模型公式为：The word encoder and the sentence encoder successively encode the source language sentences, and then fuse them through a fusion function to obtain a vector representation with context information. The vector representation of the Mongolian sentence of the basic unit, and its model formula is:

h1_i＝Φ(h1_i-1，W_i)h1 _i =Φ(h1 _i _-1 , Wi )

其中，Φ为激活函数，W_i为权重，h1_i-1为第i-1个字的隐层状态。Among them, Φ is the activation function, Wi is the weight, and h1 _i _-1 is the hidden layer state of the i-1th word.

句子编码器将一整个蒙古语句子表示成向量形式，构建以句子为基本单元的向量表示，其模型公式为：The sentence encoder expresses an entire Mongolian sentence in the form of a vector, and constructs a vector representation with the sentence as the basic unit. The model formula is:

其中，v^j表示第j个字的值(Value)，

的计算公式如下：Among them, v ^j represents the value of the jth word (Value),

The calculation formula is as follows:

其中，α_i,_j的计算如下式所示：Among them, the calculation of α _i , _j is as follows:

其中，qⁱ为第i个字的查询(query)，k^j为第j个字的键(key)，·表示点积运算，d表示q和k的维度；Among them, qi is the query of the ⁱ -th word, k ^j is the key of the j-th word, · represents the dot product operation, and d represents the dimensions of q and k;

所述融合函数如下式所示：The fusion function is as follows:

ψ(h1_i,h2_i)＝a₁h1_i+a₂h2_i ψ(h1 _i , h2 _i )=a ₁ h1 _i +a ₂ h2 _i

其中，ψ为融合函数，a₁，a₂表示两种编码器通过随机初始化的相应权重，通过两种编码融合成包含句子、单词两种向量信息的编码器。Among them, ψ is the fusion function, a ₁ , a ₂ represent the corresponding weights of the two encoders through random initialization, and the two kinds of encodings are fused into an encoder containing two kinds of vector information of sentences and words.

所述句子编码器中，双向Transformer是指一次性读取整个文本序列，即基于句子的两侧学习，而不是从左到右或从右到左地按顺序读取，从而能够学习文本中单词之间的上下文关系。In the sentence encoder, the bidirectional Transformer refers to reading the entire text sequence at one time, that is, learning based on both sides of the sentence, rather than reading sequentially from left to right or right to left, so that words in the text can be learned. contextual relationship between them.

所述解码器中，双向Transformer指一次性读取源语言句子的向量表示，即基于整个句子向量表示的两侧进行解码，以进一步提高解码器解码的准确率。In the decoder, the bidirectional Transformer refers to reading the vector representation of the source language sentence at one time, that is, decoding is performed based on both sides of the vector representation of the entire sentence, so as to further improve the decoding accuracy of the decoder.

为了增强鉴别器的鉴别能力，所述鉴别器为多尺度鉴别器，能够鉴别生成器生成的汉语句子的大体句意和细节信息(例如短语和单词等)，以协助生成器生成更加接近真实翻译的句子；同时，为了克服卷积神经网络的平移不变性，所述多尺度鉴别器使用胶囊网络来实现，在不降低训练效率的条件下可以有效的提高鉴别器的鉴别能力，所述平移不变性是指：比如在人脸识别中，卷积神经网络认为一张有眼睛有嘴巴等特征的脸就是人脸，忽略了脸中五官的具体位置。如果将其用在生成对抗网络中作为鉴别器，因为它的平移不变性会认为生成器生成的汉语句子中只要有全部的人工翻译的句子中的词的句子就是人工翻译的句子。而忽略掉词的位置信息，从而导致鉴别失误。胶囊网络包括卷积层、主胶囊层、卷积胶囊层和全连接胶囊；为了使用一个网络表示多个鉴别器，提高训练效率，在卷积层中，不同子层的激活值代表不同粒度句子的激活值，低层的激活值表示单词的激活值，高层的激活值表示整个句子的激活值，最后将不同层的特征映射变换到通道数为1的同尺度特征映射。In order to enhance the discriminator's discriminant ability, the discriminator is a multi-scale discriminator, which can identify the general sentence meaning and detailed information (such as phrases and words, etc.) of the Chinese sentences generated by the generator, so as to assist the generator to generate translations that are closer to the real ones. At the same time, in order to overcome the translation invariance of the convolutional neural network, the multi-scale discriminator is implemented by using a capsule network, which can effectively improve the discriminator's discrimination ability without reducing the training efficiency. Degeneration means: for example, in face recognition, the convolutional neural network considers a face with features such as eyes and a mouth to be a human face, ignoring the specific location of the facial features in the face. If it is used as a discriminator in a generative adversarial network, because of its translation invariance, it will be considered that as long as the Chinese sentence generated by the generator has all the words in the human-translated sentence, it is a human-translated sentence. However, the location information of the missing word is ignored, resulting in a misidentification. Capsule network includes convolution layer, main capsule layer, convolution capsule layer and fully connected capsule; in order to use one network to represent multiple discriminators and improve training efficiency, in the convolution layer, the activation values of different sub-layers represent sentences of different granularity The activation value of the lower layer represents the activation value of the word, and the activation value of the high layer represents the activation value of the entire sentence. Finally, the feature maps of different layers are transformed into the same-scale feature map with a channel number of 1.

给定句子对(x,y)，每个胶囊网络首先通过连接x和y中单词的嵌入向量来构造类似2D图像的表示，即对于源语言句子x中的第i个词x_i和目标语言句子y中的第j个词y_j，有如下的对应关系：Given a sentence pair (x, y), each capsule network first constructs a 2D image-like representation by concatenating the embedding vectors of words in x and y, i.e. for the ith word x _i in the source language sentence x and the target language The jth word y _j in sentence y has the following correspondence:

其中：x_i ^T表示x_i的转置，y_j ^T表示y_j的转置，

表示源语言中第i个词x_i和目标语言中第j个词y_j构成的矩阵，即虚拟2D图像表示；Where: x _i ^T represents the transpose of x _i , y _j ^T represents the transpose of y _j ,

Represents the matrix formed by the i-th word x _i in the source language and the j-th word y _j in the target language, that is, a virtual 2D image representation;

基于所述虚拟2D图像表示，依次经过所述胶囊网络的卷积层、主胶囊层、卷积胶囊层、全连接胶囊层，来捕获在源语言句子x的条件下，生成器翻译的句子y'与人工翻译的句子y之间的相似程度。Based on the virtual 2D image representation, the convolution layer, main capsule layer, convolution capsule layer, and fully connected capsule layer of the capsule network are sequentially passed to capture the sentence y translated by the generator under the condition of the source language sentence x. ' and the degree of similarity between the human-translated sentence y.

所述虚拟2D图像表示，依次经过所述胶囊网络的卷积层、主胶囊层、卷积胶囊层、全连接胶囊层的具体过程为：The virtual 2D image representation goes through the convolution layer, main capsule layer, convolution capsule layer and fully connected capsule layer of the capsule network in sequence as follows:

(1)经过卷积层，首先进行步长为1的、卷积核为9×9的卷积运算，通过如下的特征映射捕获x与y中句子之间的对应关系。(1) After the convolution layer, a convolution operation with a stride of 1 and a convolution kernel of 9 × 9 is first performed, and the correspondence between the sentences in x and y is captured by the following feature map.

其中，f为第一次卷积运算的激活函数，

为第一次卷积运算的权重，

表示源语言中第i个词x_i和目标语言中第j个词y_j构成的矩阵，b^(1,f)为第一次卷积运算的偏置；Among them, f is the activation function of the first convolution operation,

is the weight of the first convolution operation,

represents the matrix formed by the i-th word x _i in the source language and the j-th word y _j in the target language, and b ^{(1, f)} is the bias of the first convolution operation;

然后进行步长为1，卷积核为3×3的卷积运算，通过如下的特征映射捕获x与y中单词之间的对应关系。Then, a convolution operation with a stride of 1 and a convolution kernel of 3 × 3 is performed, and the correspondence between the words in x and y is captured by the following feature map.

其中，

为第二次卷积运算的权重，f、

b^(1,f)与第一次卷积算法中的相同；in,

is the weight of the second convolution operation, f,

b ^(1,f) is the same as in the first convolution algorithm;

经过两次卷积运算后分别得到两个大小不同的特征图

和

接着对较小的特征图

进行填充使得两个特征图大小相同，然后用对两个相同大小的特征图求平均的方法得到最终的特征图，如下式所示：After two convolution operations, two feature maps of different sizes are obtained respectively.

and

Then for smaller feature maps

Filling is performed to make the two feature maps of the same size, and then the final feature map is obtained by averaging the two feature maps of the same size, as shown in the following formula:

(2)进入主胶囊层即第一个胶囊层，对卷积层的输出进行如下计算(2) Enter the main capsule layer, that is, the first capsule layer, and perform the following calculation on the output of the convolution layer

p＝g(W^bM+b₁)p=g(W ^b M+b ₁ )

其中，g为通过整个向量的非线性挤压函数squash，M表示胶囊的输入，b₁为胶囊的偏置，W^b为权重；在主胶囊层中，胶囊将卷积操作的标量输出替换为了矢量输出；where g is the nonlinear squashing function through the entire vector, M represents the input of the capsule, b ₁ is the bias of the capsule, and W ^b is the weight; in the main capsule layer, the capsule replaces the scalar output of the convolution operation with vector output;

(3)通过动态路由算法代替最大池化，动态地加强或者减弱权重来得到有效的特征；(3) The maximum pooling is replaced by the dynamic routing algorithm, and the weight is dynamically strengthened or weakened to obtain effective features;

(4)进入卷积胶囊层；在该层以后，上一层的所有胶囊都改成一些列胶囊，通过动态路由算法进一步动态的加强或者减弱权重，从而获得更有效的特征；(4) Enter the convolution capsule layer; after this layer, all capsules in the previous layer are changed to a series of capsules, and the weights are further dynamically strengthened or weakened by the dynamic routing algorithm to obtain more effective features;

(5)进入全连接胶囊层，将所有提取到的特征进行连接；(5) Enter the fully connected capsule layer, and connect all the extracted features;

(6)将所有的特征输入到多层感知器中，使用激活函数得到生成器生成的数据集(x,y')为真实数据(x,y)的概率即与真实句子的相似程度。(6) Input all the features into the multi-layer perceptron, and use the activation function to obtain the probability that the data set (x, y') generated by the generator is the real data (x, y), that is, the similarity with the real sentence.

所述生成对抗网络的最终训练目标是：The final training objective of the generative adversarial network is:

其中：G表示生成器，D表示鉴别器，V(D,G)表示生成器G和鉴别器D的损失函数，

表示求能使损失函数V(D,G)达到最大的D并且使损失函数达到最小的G，E表示期望，P_data(x,y)表示将平行语料库中的源语言x和目标语言y输入到鉴别器D中，鉴别器认为其为人工翻译的概率，G(y'|x)表示将平行语料库中的源语言x和G生成的目标语言y'输入到鉴别器D中，鉴别器认为其为人工翻译的概率，x表示平行语料库中的蒙古语句子，即源语言句子，y表示平行语料库中的汉语句子，即人工翻译结果，y'表示生成器生成的汉语句子，即生成器的翻译结果。Where: G represents the generator, D represents the discriminator, V(D, G) represents the loss function of the generator G and the discriminator D,

Indicates to find the D that can maximize the loss function V(D, G) and the G that minimizes the loss function, E represents the expectation, and P _data (x, y) represents the source language x and target language y in the parallel corpus input. In the discriminator D, the discriminator considers it as the probability of human translation, G(y'|x) indicates that the source language x in the parallel corpus and the target language y' generated by G are input into the discriminator D, the discriminator considers that It is the probability of human translation, x represents the Mongolian sentence in the parallel corpus, that is, the source language sentence, y represents the Chinese sentence in the parallel corpus, that is, the artificial translation result, y' represents the Chinese sentence generated by the generator, that is, the generator's translation result.

过程中：During the process:

对蒙古语语料格的附加成分进行处理，方法如下：The additional components of the Mongolian corpus are processed as follows:

将蒙古语句子中的控制符与格的附加成分一同去除，只留下词干部分；Remove the control characters and the additional elements of the case together in Mongolian sentences, leaving only the stem part;

对蒙古语语料进行不同粒度的切分，对汉语进行分字处理，以缓解蒙汉机器翻译中的UNK现象，通过对蒙古文格的附加成分进行处理来进一步提高蒙汉机器翻译的质量。The Mongolian corpus is divided into different granularities, and the Chinese is word-segmented to alleviate the UNK phenomenon in the Mongolian-Chinese machine translation, and the quality of the Mongolian-Chinese machine translation is further improved by processing the additional components of the Mongolian style.

切分方法如下：The segmentation method is as follows:

(1)首先将所需预处理的语料以最小的单元切分开，对于蒙古语而言，最小的单元为蒙古语字母。(1) First, the corpus to be preprocessed is divided into the smallest unit. For Mongolian, the smallest unit is the Mongolian alphabet.

(2)然后对语料中所有相邻的最小单元组合出现的次数进行统计并且排序，找出出现频率最高的组合，并将这些组合加入到词典中同时删除掉词典中频率最低的词使词典的大小保持不变。(2) Then count the number of occurrences of all adjacent smallest unit combinations in the corpus and sort them, find out the combinations with the highest frequency, and add these combinations to the dictionary and delete the words with the lowest frequency in the dictionary to make the dictionary more frequent. The size remains the same.

(3)重复步骤(1)(2)，直至词典里的词在自身语料库中出现的频率都高于设定值；(3) Repeat steps (1) and (2) until the frequency of the words in the dictionary appearing in their own corpus is higher than the set value;

神经机器翻译(Neural machine translation，NMT)在端到端的框架上取得了较好的翻译效果，但是最好的NMT系统和人们的期望任有较大的差距。相较于此，本发明可最小化人类翻译与NMT模型给出的翻译之间的区别、缓解蒙汉机器翻译中的数据稀疏问题以及缓解蒙汉机器翻译中的UNK现象，不仅得到了高质量的蒙汉机器翻译系统，同时也得到了大量的蒙汉平行数据集。Neural machine translation (NMT) has achieved good translation results in an end-to-end framework, but the best NMT systems are far from people's expectations. Compared with this, the present invention can minimize the difference between human translation and the translation given by the NMT model, alleviate the data sparse problem in Mongolian-Chinese machine translation, and alleviate the UNK phenomenon in Mongolian-Chinese machine translation, and not only obtain high-quality The Mongolian-Chinese machine translation system has also obtained a large number of Mongolian-Chinese parallel datasets.

附图说明Description of drawings

图1为优化Transformer1架构。Figure 1 shows the optimized Transformer1 architecture.

图2为优化Transformer2架构。Figure 2 shows the optimized Transformer2 architecture.

图3为生成器框架。Figure 3 shows the generator framework.

图4为鉴别器框架。Figure 4 shows the discriminator framework.

具体实施方式Detailed ways

下面结合附图和实施例详细说明本发明的实施方式。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples.

本发明利用生成对抗网络构造蒙汉平行语料提高蒙汉翻译质量的方法，主要包括编码器和解码器的构建以及鉴别器模型的构建。The present invention utilizes the generative confrontation network to construct the Mongolian-Chinese parallel corpus to improve the Mongolian-Chinese translation quality, which mainly includes the construction of the encoder and the decoder and the construction of the discriminator model.

图1所示为优化Transformer1架构。在原Transformer的基础上首先加入了门控线性单元，有效的获取源语言句子中的重要信息并舍弃多余信息；其次加入了分支结构，该结构可以有效地捕获源语言句子之间的多样化语义信息；最后，在分支结构上以及第三个层标准化之后加入了胶囊网络，使编码器可以捕获到源语言句子中词的准确位置，进一步强化编码器的准确性。Figure 1 shows the optimized Transformer1 architecture. On the basis of the original Transformer, a gated linear unit is firstly added to effectively obtain important information in the source language sentences and discard redundant information; secondly, a branch structure is added, which can effectively capture the diverse semantic information between source language sentences ; Finally, a capsule network is added to the branch structure and after the third layer is normalized, so that the encoder can capture the exact position of the words in the source language sentence, further enhancing the accuracy of the encoder.

图2所示为优化Transformer2架构。在原Transformer的基础上首先加入了分支结构；其次加入了胶囊网络；最后加入了Swish激活函数，该激活函数的加入可以有效地提高解码器解码的准确率。Figure 2 shows the optimized Transformer2 architecture. On the basis of the original Transformer, the branch structure is firstly added; secondly, the capsule network is added; finally, the Swish activation function is added. The addition of this activation function can effectively improve the decoding accuracy of the decoder.

图3所示为生成器框架。主要由混合编码器、稀疏注意力及解码器3大部分组成，编码器接受输入的蒙古语句子如：

首先基于双向优化Transformer将整个句子进行双向编码，同时基于双向LSTM的编码器对其中的单词进行编码，然后使用一个融合函数将两种编码器的表示进行融合，生成源语言的编码表示。接着解码器结合稀疏注意力机制将源语言编码表示解码为目标语言汉语句子“明天要下雨”。Figure 3 shows the generator framework. It is mainly composed of three parts: hybrid encoder, sparse attention and decoder. The encoder accepts input Mongolian sentences such as:

First, the entire sentence is encoded bidirectionally based on the bidirectional optimization Transformer, while the bidirectional LSTM-based encoder encodes the words in it, and then a fusion function is used to fuse the representations of the two encoders to generate the encoded representation of the source language. Then the decoder combines the sparse attention mechanism to decode the source language encoded representation into the target language Chinese sentence "it will rain tomorrow".

图4所示为胶囊网络的框架，包括卷积层、主胶囊层、卷积胶囊层、全连接胶囊层等。其中卷积层包括两层，一层捕获句子级别的特征，另外一层捕获单词级别的特征，实现所述的多尺度鉴别功能。Figure 4 shows the framework of the capsule network, including the convolution layer, the main capsule layer, the convolution capsule layer, the fully connected capsule layer, and so on. The convolutional layer includes two layers, one layer captures sentence-level features, and the other layer captures word-level features to realize the multi-scale discrimination function.

蒙汉机器翻译中的数据稀疏问题的缓解：Mitigation of data sparsity in Mongolian-Chinese machine translation:

生成对抗网络中生成器的加入可以有效的缓解蒙汉机器翻译中目前存在的数据稀疏问题，具体的，首先通过蒙汉对齐语料对生成器进行预训练，得到预训练模型后，借助该模型利用蒙古语单语数据生成蒙汉伪双语数据，然后在鉴别器的协助下，生成更加接近人工翻译的汉语句子，形成蒙汉对齐语料。The addition of the generator in the generative adversarial network can effectively alleviate the current data sparse problem in Mongolian-Chinese machine translation. Specifically, firstly, the generator is pre-trained through the Mongolian-Chinese aligned corpus, and after the pre-training model is obtained, the model is used to use the Mongolian monolingual data generates Mongolian-Chinese pseudo-bilingual data, and then, with the assistance of the discriminator, generates Chinese sentences that are closer to human translation, forming a Mongolian-Chinese aligned corpus.

蒙汉机器翻译译文的准确性以及自然性的提高：The accuracy and naturalness of Mongolian-Chinese machine translation translation:

生成器生成的汉语句子往往是比较生硬、不自然的，此发明中，鉴别器相当于生成器的老师，协助生成器生成更加自然准确的汉语句子。多尺度鉴别器表示老师具有从多方面判断生成器生成的句子与人工译文是否相似的能力。The Chinese sentences generated by the generator are often blunt and unnatural. In this invention, the discriminator is equivalent to the generator's teacher, assisting the generator to generate more natural and accurate Chinese sentences. The multi-scale discriminator indicates that the teacher has the ability to judge from many aspects whether the sentences generated by the generator are similar to the human translation.

决策变量：在生成器的编码器端输入的蒙古语句子x，在生成器的解码器端输出对应的机器翻译的汉语句子y。在鉴别器的输入端输入的蒙古语句子x、对应的人工翻译的汉语句子y以及对应的生成器翻译的汉语句子y′Decision variable: Mongolian sentence x is input on the encoder side of the generator, and the corresponding machine-translated Chinese sentence y is output on the decoder side of the generator. The Mongolian sentence x, the corresponding human-translated Chinese sentence y, and the corresponding generator-translated Chinese sentence y' are input at the input of the discriminator

本发明包括以下部分：The present invention includes the following parts:

1、基于生成对抗网络的蒙汉神经机器翻译系统模型，包括以下几个部分：1. Mongolian-Chinese neural machine translation system model based on generative adversarial network, including the following parts:

A.基于生成对抗网络的蒙汉神经机器翻译系统生成器中混合编码器描述：混合编码器由句子编码器和单词编码器融合而成，单词编码器和句子编码器先后将源语言句子进行编码，其中，单词编码器将每个单词表示成向量形式，构建以单词为基本单元的蒙古语句子的向量表示，其模型公式为：A. The description of the hybrid encoder in the generator of the Mongolian-Chinese neural machine translation system based on the generative adversarial network: the hybrid encoder is composed of a sentence encoder and a word encoder. The word encoder and the sentence encoder successively encode the source language sentences. , where the word encoder represents each word in the form of a vector, and constructs a vector representation of Mongolian sentences with words as the basic unit. The model formula is:

h1_i＝Φ(h1_i-1，W_i)h1 _i =Φ(h1 _i _-1 , Wi )

句子编码器将一整个蒙古语句子表示成向量形式，构建以句子为基本单元的向量表示。所述句子编码器中，双向优化Transformer1一次性读取整个文本序列，即基于句子的两侧学习，从而能够学习到句子之间的上下文关系并且实现并行化。其模型公式为：The sentence encoder expresses a whole Mongolian sentence in the form of a vector, and constructs a vector representation with the sentence as the basic unit. In the sentence encoder, the two-way optimization Transformer1 reads the entire text sequence at one time, that is, learning based on both sides of the sentence, so that the context relationship between sentences can be learned and parallelized. Its model formula is:

其中，v^j表示第j个字的Value(值)，

的计算公式如下：Among them, v ^j represents the Value of the jth word,

The calculation formula is as follows:

其中，α_i,j的计算如下式所示：Among them, the calculation of α _i,j is as follows:

其中，qⁱ为第i各字的query(查询)，k^j为第j个字的键(key)，·表示点积运算，d表示q和k的维度。Among them, qi is the query (query) of the ⁱ -th word, k ^j is the key of the j-th word, · represents the dot product operation, and d represents the dimensions of q and k.

最后通过融合函数进行融合得到编码为带有上下文信息的向量表示。融合函数如下式所示：Finally, the fusion function is used to obtain a vector representation encoded with context information. The fusion function is as follows:

ψ(h1_i,h2_i)＝a₁h1_i+a₂h2_i ψ(h1 _i , h2 _i )=a ₁ h1 _i +a ₂ h2 _i

其中，ψ为融合函数，a₁，a₂表示两种编码器通过随机初始化的相应权重，通过两种粒度编码融合成包含句子、单词两种向量信息的编码器。Among them, ψ is the fusion function, a ₁ , a ₂ represent the corresponding weights of the two encoders through random initialization, and are fused into an encoder containing two kinds of vector information of sentences and words through two kinds of granularity encoding.

B.基于生成对抗网络的蒙汉神经机器翻译系统生成器中解码器描述：解码器由双向优化transformer2组成，优化Transformer2与编码器中的优化Transformer1结构基本类似，不同的是，该部分的优化Transformer中加入了Swish激活函数。解码器在解码的过程中结合稀疏注意力机制将源语言句子的向量表示解码为目标语言句子。B. The description of the decoder in the generator of the Mongolian-Chinese neural machine translation system based on the generative adversarial network: The decoder consists of a bidirectionally optimized Transformer2. The optimized Transformer2 is basically similar to the optimized Transformer1 in the encoder. The difference is that this part of the optimized Transformer Added Swish activation function. In the decoding process, the decoder combines the sparse attention mechanism to decode the vector representation of the source language sentence into the target language sentence.

C.基于生成对抗网络的蒙汉神经机器翻译系统鉴别器描述：所述鉴别器为多尺度鉴别器，该结构不仅可以鉴别生成器生成的汉语句子的大体句意，而且还可以鉴别生成器生成的汉语句子的细节信息(例如短语和单词等)，可以协助生成器生成更加接近真实翻译的句子。所述多尺度鉴别器使用胶囊网络来实现，胶囊网络包括卷积层、主胶囊层、卷积胶囊层和全连接胶囊。为了使用一个网络表示多个鉴别器，提高训练效率，在卷积层中，不同子层的激活值代表不同粒度句子的激活值，低层的激活值表示单词的激活值，高层的激活值表示整个句子的激活值，最后将不同层的特征映射变换到通道数为1的同尺度特征映射。具体的，给定句子对(x,y)，每个胶囊网络首先通过连接x和y中单词的嵌入向量来构造类似2D图像的表示，即对于源语言句子x中的第i个词x_i和目标语言句子y中的第j个词y_j，有如下的对应关系：C. Description of the discriminator of Mongolian-Chinese neural machine translation system based on generative adversarial network: the discriminator is a multi-scale discriminator, and this structure can not only identify the general sentence meaning of the Chinese sentences generated by the generator, but also identify the The detailed information of Chinese sentences (such as phrases and words, etc.) can assist the generator to generate sentences that are closer to the real translation. The multi-scale discriminator is implemented using a capsule network, which includes a convolutional layer, a main capsule layer, a convolutional capsule layer, and a fully connected capsule. In order to use one network to represent multiple discriminators and improve training efficiency, in the convolutional layer, the activation values of different sub-layers represent the activation values of sentences of different granularities, the activation values of the lower layers represent the activation values of words, and the activation values of the upper layers represent the whole The activation value of the sentence, and finally transform the feature maps of different layers to the same-scale feature map with a channel number of 1. Specifically, given a sentence pair (x, y), each capsule network first constructs a 2D image-like representation by concatenating the embedding vectors of words in x and y, that is, for the ith word x _i in the source language sentence x There is the following correspondence with the jth word y _j in the target language sentence y:

其中：x_i ^T表示x_i的转置，y_j ^T表示y_j的转置，

表示源语言中第i个词x_i和目标语言中第j个词y_j构成的矩阵，即虚拟2D图像表示。Where: x _i ^T represents the transpose of x _i , y _j ^T represents the transpose of y _j ,

Represents the matrix formed by the i-th word x _i in the source language and the j-th word y _j in the target language, that is, a virtual 2D image representation.

基于这样的虚拟2D图像表示，依次经过所述胶囊网络的卷积层、主胶囊层、卷积胶囊层、全连接胶囊层来捕获在源语言句子x的条件下，生成器翻译的句子y′与人工翻译的句子y之间的相似程度。具体过程为：Based on such a virtual 2D image representation, through the convolutional layer, main capsule layer, convolutional capsule layer, and fully connected capsule layer of the capsule network in order to capture the sentence y' translated by the generator under the condition of the source language sentence x Similarity to human-translated sentence y. The specific process is:

其中，f为第一次卷积运算的激活函数，

为第一次卷积运算的权重，

表示源语言中第i个词x_i和目标语言中第j个词y_j构成的矩阵，b^(1,f)为第一次卷积运算的偏置。Among them, f is the activation function of the first convolution operation,

is the weight of the first convolution operation,

Represents the matrix formed by the i-th word x _i in the source language and the j-th word y _j in the target language, and b ^{(1, f)} is the bias of the first convolution operation.

其中，

为第二次卷积运算的权重，f、

b^(1,f)与第一次卷积算法中的相同。in,

is the weight of the second convolution operation, f,

b ^(1,f) is the same as in the first convolution algorithm.

经过两次卷积运算后分别得到两个大小不同的特征图

和

接着对较小的特征图

and

Then for smaller feature maps

(2)进入主胶囊层，对卷积层的输出进行如下计算。(2) Enter the main capsule layer, and perform the following calculations on the output of the convolutional layer.

p＝g(W^bM+b₁)p=g(W ^b M+b ₁ )

其中，g为通过整个向量的非线性挤压函数squash，M表示胶囊的输入，b₁为胶囊的偏置，W^b为权重。where g is the non-linear squeeze function squash through the whole vector, M is the input of the capsule, b ₁ is the bias of the capsule, and W ^b is the weight.

这是第一个胶囊层，在这个胶囊层中，胶囊将卷积操作的标量输出替换为了矢量输出。This is the first capsule layer where the capsule replaces the scalar output of the convolution operation with a vector output.

(3)通过动态路由算法代替最大池化，动态的加强或者减弱权重来得到有效的特征。(3) The maximum pooling is replaced by a dynamic routing algorithm, and the weight is dynamically strengthened or weakened to obtain effective features.

(4)进入卷积胶囊层。在这一层以后，上一层的所有胶囊都改成一些列胶囊，通过动态路由算法进一步动态的加强或者减弱权重，从而获得更有效的特征。(4) Enter the convolution capsule layer. After this layer, all the capsules in the previous layer are changed into a series of capsules, and the weights are further dynamically strengthened or weakened through the dynamic routing algorithm to obtain more effective features.

(5)进入全连接胶囊层，将所有提取到的特征进行连接。(5) Enter the fully connected capsule layer and connect all the extracted features.

(6)将所有的特征输入到多层感知器中，使用激活函数得到生成器生成的数据集(x,y')为真实数据(x,y)的概率。(6) Input all the features into the multi-layer perceptron, and use the activation function to obtain the probability that the data set (x, y') generated by the generator is the real data (x, y).

2、优化的蒙汉机器翻译模型，包括以下部分：2. The optimized Mongolian-Chinese machine translation model includes the following parts:

A.对蒙古语进行BPE处理A. BPE processing of Mongolian

现行蒙古文是一种纯粹的拼音文字，它在拼音的方法上与西欧以及世界各地的主要拼音文字没有什么不同，BPE技术是一种通过统计邻近字符出现频率对拼音文字进行切分的算法，出现频率较高的连续字符被认为是一个组合。一般情况下，蒙古文中的各种词根词缀就是出现频率较高的蒙古文字符组合，因此本发明将BPE算法应用在对蒙古文的切分上。具体算法描述如下：The current Mongolian script is a pure phonetic script, which is no different from the main phonetic script in Western Europe and around the world in terms of the phonetic method. BPE technology is an algorithm for segmenting the phonetic script by counting the frequency of adjacent characters. Consecutive characters that appear more frequently are considered a combination. In general, various root affixes in Mongolian are Mongolian character combinations with high frequency, so the present invention applies the BPE algorithm to the segmentation of Mongolian. The specific algorithm is described as follows:

(3)重复步骤(1)(2)，直至词典里的词在自身语料库中出现的频率都是较高的。(3) Steps (1) and (2) are repeated until the words in the dictionary appear frequently in their own corpus.

B.去除蒙古语中格的附加成分B. Remove the additional elements of the Mongolian case

在蒙古语语料中，蒙古文空格与普通空格之间的成分被标注为格的附加成分。蒙古文的词与词之间需要格的附加成分，格的附加成分本身仅有语法意义而无语义。蒙古文加上格的附加成分之后，句子才能变得通顺。在蒙汉机器翻译中，如果不对格的附加成分进行处理，机器翻译模型会将蒙古文空格识别为普通的空格进行处理，从而容易使一个蒙古文词从中间被切分开来而被识别成两个词甚至多个词。这会使蒙古文句子长度明显增长而影响翻译质量和最终的BLEU测评。因此，本发明将蒙古语句子中的控制符与格的附加成分一同去除，只留下词干部分。In the Mongolian corpus, the components between Mongolian spaces and ordinary spaces are marked as additional components of the case. Mongolian words need additional elements of the case, and the additional elements of the case have only grammatical meaning but no semantics. The Mongolian language can only become smooth after adding the additional element of case. In Mongolian-Chinese machine translation, if the additional elements of the case are not processed, the machine translation model will recognize Mongolian spaces as ordinary spaces for processing, so that a Mongolian word is easily split from the middle and recognized as two words or even multiple words. This will significantly increase the length of Mongolian sentences and affect the translation quality and the final BLEU evaluation. Therefore, the present invention removes the control characters and the additional elements of the case together in the Mongolian sentence, leaving only the stem part.

C.对汉语进行分字处理C. Perform word segmentation on Chinese

汉语属于汉藏语系，每句话只由单个的字和标点符号构成，这样计算机在处理的时候只能将整句话看成一个单元，不利于计算机的计算与处理，因此在蒙汉机器翻译模型训练之前需要对汉语语料进行分隔处理，本发明采用对汉语进行分字的方法。Chinese belongs to the Sino-Tibetan language family, and each sentence is only composed of a single word and punctuation, so the computer can only treat the whole sentence as a unit when processing it, which is not conducive to the calculation and processing of the computer. Therefore, in the Mongolian-Chinese machine Before the training of the translation model, the Chinese corpus needs to be separated and processed, and the present invention adopts the method of dividing Chinese characters.

本发明的整体流程如下：The overall flow of the present invention is as follows:

(1)搭建生成对抗网络中生成器的混合编码器(1) Building a hybrid encoder of generators in generative adversarial networks

(2)搭建生成对抗网络中生成器的解码器(2) Build the decoder of the generator in the generative adversarial network

(3)搭建生成对抗网络中的鉴别器(3) Building a discriminator in a generative adversarial network

(4)对蒙古语语料进行格的处理(4) Case processing of Mongolian corpus

(5)对蒙古语语料进行不同粒度的切分(5) Segment Mongolian corpus with different granularities

(6)对汉语进行分字处理(6) characterize Chinese

(7)训练生成器(7) Training the generator

(8)通过训练好的生成器模型生成消极数据(8) Generate negative data through the trained generator model

(9)训练鉴别器(9) Training the discriminator

(10)进行对抗训练(10) Conduct adversarial training

(11)对所得蒙汉机器翻译模型的BLEU值进行测试。(11) Test the BLEU value of the obtained Mongolian-Chinese machine translation model.

Claims

1. a method that utilizes generative confrontation network to construct Mongolian-Chinese parallel corpus to improve the quality of Mongolian-Chinese translation, described generative confrontation network mainly comprises generator and discriminator, in described generator, use hybrid encoder to convert source language sentence Mongolian The language is encoded into a vector representation, and the vector representation is converted into the target language sentence Chinese using a bidirectional Transformer-based decoder combined with a sparse attention mechanism, thereby generating Mongolian sentences and Mongolian-Chinese parallel corpora that are close to human translation. In the discriminator , judge the gap between the Chinese sentence generated by the generator and the human translation, and train the generator and the discriminator against each other until the discriminator thinks that the Chinese sentence generated by the generator is very similar to the human translation, that is, the generator and the discriminator achieve Nash equilibrium When the Mongolian-Chinese machine translation system and the Mongolian-Chinese parallel data set are obtained, the Mongolian-Chinese machine translation system is used to perform the Mongolian-Chinese translation. General sentence meaning and detailed information to assist the generator to generate sentences that are close to the real translation; the multi-scale discriminator is implemented using a capsule network, which includes a convolutional layer, a main capsule layer, a convolutional capsule layer, and a fully connected capsule layer ; In the convolutional layer, the activation values of different sub-layers represent the activation values of sentences of different granularities, the activation values of the lower layers represent the activation values of words, the activation values of the upper layers represent the activation values of the whole sentence, and finally the feature maps of different layers are transformed. to the same-scale feature map with channel number 1.

2. according to claim 1, utilize Generative Adversarial Network to construct the method for Mongolian-Chinese parallel corpus to improve Mongolian-Chinese translation quality, it is characterized in that, described hybrid encoder is made up of sentence encoder and word encoder, and sentence encoder is made up of two-way Transformer The word encoder uses a bidirectional LSTM. The bidirectional Transformer optimizes Transformer1. On the basis of the original Transformer, a gated linear unit is first added to obtain important information in the source language sentence and discard redundant information; secondly, a branch structure is added, In order to capture the diverse semantic information between sentences in the source language, the branch structure contains 2 capsule networks and 2 activation functions, and the output of each capsule network corresponds to an activation function; finally, on the branch structure and the third layer After standardization, a capsule network is added, so that the encoder can capture the exact position of words in the source language sentence; in the decoder, the bidirectional Transformer optimizes Transformer2, and firstly adds a branch structure on the basis of the original Transformer. The branch structure contains 2 Multi-head attention mechanism, 2 capsule networks and 1 activation function, 2 multi-head attention mechanisms are located between the first layer normalization and the second layer normalization, 2 capsule networks and 1 activation function are located in the second layer between normalization and the third layer normalization, where the output of 1 capsule network is connected to 1 activation function, and 1 capsule network is directly located between the second layer normalization and the third layer normalization; followed by the connection of the third layer Capsule network with layer normalized output and fourth layer normalized input; Swish activation function added at the end.

3. according to claim 2, utilize Generative Adversarial Network to construct the method for Mongolian-Chinese parallel corpus to improve Mongolian-Chinese translation quality, it is characterized in that, described word encoder and sentence encoder successively encode source language sentence, then by fusion function Fusion is performed to obtain a vector representation with contextual information. The word encoder represents each word in the form of a vector, and constructs a vector representation of Mongolian sentences with words as the basic unit. The model formula is:

h1 _i =Φ(h1 _i _-1 , Wi )

Among them, Φ is the activation function, Wi is the weight, and h1 _i _-1 is the hidden layer state of the i-1th word;

The sentence encoder expresses an entire Mongolian sentence in the form of a vector, and constructs a vector representation with the sentence as the basic unit. The model formula is:

Among them, v ^j represents the value of the jth word,

The calculation formula is as follows:

Among them, the calculation of α _i,j is as follows:

Among them, qi is the query of the ^ith word, k ^j is the key of the jth word, · represents the dot product operation, and d represents the dimensions of q and k;

The fusion function is as follows:

ψ(h1 _i , h2 _i )=a ₁ h1 _i +a ₂ h2 _i

Among them, ψ is the fusion function, a ₁ , a ₂ represent the corresponding weights of the two encoders through random initialization, and the two kinds of encodings are fused into an encoder containing two kinds of vector information of sentences and words.

4. according to claim 2, utilize Generative Adversarial Network to construct the method for Mongolian-Chinese parallel corpus to improve Mongolian-Chinese translation quality, it is characterized in that, in described sentence encoder, two-way Transformer refers to reading the whole text sequence at one time, that is, based on Both sides of the sentence are learned, enabling the learning of contextual relationships between words in the text.

5. according to claim 2, utilize Generative Adversarial Network to construct the method for Mongolian-Chinese parallel corpus to improve Mongolian-Chinese translation quality, it is characterized in that, in described decoder, two-way Transformer refers to the vector representation of one-time reading source language sentence, that is, Decoding based on both sides of the entire sentence vector representation.

6. The method for constructing Mongolian-Chinese parallel corpus using generative adversarial network to improve Mongolian-Chinese translation quality according to claim 1, it is characterized in that, given sentence pair (x, y), each capsule network first connects x and y by The embedding vector of the word is used to construct the representation of the 2D image, that is, for the i-th word x _i in the source language sentence x and the j-th word y _j in the target language sentence y, there is the following correspondence:

Where: x _i ^T represents the transpose of x _i , y _j ^T represents the transpose of y _j ,

Based on the virtual 2D image representation, the convolution layer, main capsule layer, convolution capsule layer, and fully connected capsule layer of the capsule network are sequentially passed to capture the sentence y translated by the generator under the condition of the source language sentence x. ' and the degree of similarity between the human-translated sentence y.

7. The method for constructing Mongolian-Chinese parallel corpus using Generative Adversarial Networks to improve Mongolian-Chinese translation quality according to claim 6, wherein the virtual 2D image representation is successively passed through the convolution layer and the main capsule layer of the capsule network. , the specific process of the convolution capsule layer and the fully connected capsule layer are:

(1) After the convolution layer, first perform a convolution operation with a stride of 1 and a convolution kernel of 9 × 9, and capture the correspondence between the sentences in x and y through the following feature map;

Among them, f is the activation function of the first convolution operation,

is the weight of the first convolution operation,

Then perform a convolution operation with a stride of 1 and a convolution kernel of 3×3, and capture the correspondence between words in x and y through the following feature map:

in,

is the weight of the second convolution operation, f,

b ^(1,f) is the same as in the first convolution operation;

After two convolution operations, two feature maps of different sizes are obtained respectively.

and

Then for smaller feature maps

(2) Enter the main capsule layer, that is, the first capsule layer, and perform the following calculation on the output of the convolution layer

p=g(W ^b M+b ₁ )

Among them, g is the nonlinear squash function squash through the whole vector, M indicates that the input of the capsule is also the output of the convolution layer, b ₁ is the bias of the capsule, and W ^b is the weight; in the main capsule layer, the capsule will convolution The scalar output of the operation is replaced by the vector output;

(3) The maximum pooling is replaced by the dynamic routing algorithm, and the weight is dynamically strengthened or weakened to obtain effective features;

(4) Enter the convolution capsule layer; after this layer, all capsules in the previous layer are changed to a series of capsules, and the weight is further dynamically strengthened or weakened by the dynamic routing algorithm to obtain effective features;

(5) Enter the fully connected capsule layer, and connect all the extracted features;

(6) Input all the features into the multi-layer perceptron, and use the activation function to obtain the probability that the data set (x, y') generated by the generator is the real data (x, y), that is, the similarity with the real sentence.

8. according to claim 1, utilize Generative Adversarial Network to construct the method for Mongolian-Chinese parallel corpus to improve Mongolian-Chinese translation quality, it is characterized in that, the final training target of described Generative Adversarial Network is:

Where: G represents the generator, D represents the discriminator, V(D, G) represents the loss function of the generator G and the discriminator D,

9. according to claim 1, utilize Generative Adversarial Network to construct the method for Mongolian-Chinese parallel corpus to improve Mongolian-Chinese translation quality, it is characterized in that, in process:

The additional components of the Mongolian corpus are processed as follows:

Remove the control characters and the additional elements of the case together in Mongolian sentences, leaving only the stem part;

The Mongolian corpus is segmented with different granularities, and the methods are as follows:

(1) First, the corpus to be preprocessed is divided into the smallest unit, and for Mongolian, the smallest unit is the Mongolian alphabet;

(2) Then count the number of occurrences of all adjacent smallest unit combinations in the corpus and sort them, find out the combinations with the highest frequency, and add these combinations to the dictionary and delete the words with the lowest frequency in the dictionary to make the dictionary more frequent. the size remains the same;

(3) Repeat steps (1) and (2) until the frequency of the words in the dictionary appearing in their own corpus is higher than the set value;

Characterize Chinese.