CN112836526B

CN112836526B - Multi-language neural machine translation method and device based on gating mechanism

Info

Publication number: CN112836526B
Application number: CN202110132050.5A
Authority: CN
Inventors: 王亦宁; 刘升平; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2021-01-31
Filing date: 2021-01-31
Publication date: 2024-01-30
Anticipated expiration: 2041-01-31
Also published as: CN112836526A

Abstract

The invention relates to a multilingual neural machine translation method and device based on a gating mechanism, wherein the method comprises the steps of obtaining bilingual parallel data of all language pairs, and preprocessing, wherein the bilingual parallel data comprises a source language and a target language; encoding by using the preprocessed source language to obtain encoded data; decoding the target language and the coded data by using a gating mechanism to obtain target language decoded data; and performing function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language.

Description

Multi-language neural machine translation method and device based on gating mechanism

Technical Field

The invention relates to the field of translation, in particular to a multilingual neural machine translation method and device based on a gating mechanism.

Background

Multilingual neural machine translation refers to a method of translating each source language into multiple target languages using a neural network model. The existing multilingual neural machine translation method mainly adopts an end-to-end translation method proposed by Google corporation in 2017, and the method consists of a set of encoder and decoder frameworks based on a self-attention mechanism, wherein the encoder can encode sentences in different source languages into hidden vectors which are fused with information in multiple languages, and the decoder can simultaneously generate multiple target languages by using the vectors.

The existing method uses one decoder to generate a plurality of different target languages in the decoding process, however, the grammar structure and semantic information of different languages are generally quite different, and the plurality of target languages are difficult to characterize by using one set of decoder network parameters. Therefore, in the current multi-language translation task, a phenomenon that different language words are mixed in one translation result usually occurs, and the translation effect is affected.

Disclosure of Invention

The invention provides a multi-language neural machine translation method and device based on a gating mechanism, which can solve the technical problem that different language vocabulary mixing phenomenon occurs in one translation result.

The technical scheme for solving the technical problems is as follows:

provided is a multi-language neural machine translation method based on a gating mechanism, the method comprising: obtaining bilingual parallel data of all language pairs, and preprocessing, wherein the bilingual parallel data comprises a source language and a target language; encoding by using the preprocessed source language to obtain encoded data; decoding the target language and the coded data by using a gating mechanism to obtain target language decoded data; and performing function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language.

Further, the obtaining bilingual parallel data of all language pairs and preprocessing includes: performing sub-word segmentation on the training corpus in the bilingual parallel data to generate a sub-word sequence; and adding a translation direction label at the tail end of the sub-word sequence to obtain a sub-word sequence vector matrix, wherein the sub-word vector matrix comprises a source language sub-word vector matrix and a target language sub-word vector matrix.

Further, the encoding with the preprocessed source language to obtain encoded data includes: acquiring a word sequence matrix of a source language; and coding according to the word sequence matrix of the source language and the sub-word sequence vector matrix of the source language to obtain coded data.

Further, the decoding the target language and the encoded representation data by using a gating mechanism to obtain target language decoded data includes: acquiring a word sequence matrix of a target language; decoding according to the coded data and the target language subword vector matrix to obtain hidden state data of a decoder; and fusing the information to be translated and the hidden state data of the decoder by using a gating mechanism to obtain target language decoding data.

Further, the fusing the information to be translated and the hidden state data of the decoder by using a gating mechanism, and decoding the data by using the target language includes: acquiring a fusion factor coefficient and a word vector of information to be translated; and obtaining target language decoding data according to the fusion factor coefficient, the hidden state data of the decoder and the word vector of the information to be translated.

Further, performing function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language, including: after carrying out linear transformation on the target language decoding data, carrying out softmax function calculation to obtain probability distribution of words in a target language vocabulary; and determining that the word corresponding to the maximum probability forms a target language translation result.

There is provided a gating mechanism-based multilingual neural machine translation device, the device comprising: the preprocessing module is used for acquiring bilingual parallel data of all language pairs and preprocessing, wherein the bilingual parallel data comprises a source language and a target language; the encoding module is used for encoding by utilizing the preprocessed source language to obtain encoded data; the decoding module is used for decoding the target language and the coded data by utilizing a gating mechanism to obtain target language decoded data; and the translation module is used for carrying out function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language.

Further, the decoding module is further used for obtaining a word sequence matrix of the target language; decoding according to the coded data and the word sequence matrix of the target language to obtain hidden state data of a decoder; and fusing the information to be translated and the hidden state data of the decoder by using a gating mechanism to obtain target language decoding data.

There is provided an electronic device including: at least one processor and at least one memory; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform a gating mechanism-based multilingual neural machine translation method.

A computer readable storage medium is provided having one or more program instructions embodied therein for performing a gating mechanism based multilingual neural machine translation method by a system.

The beneficial effects of the invention are as follows: by introducing a gating mechanism into the multilingual neural machine translation, different target languages can be distinguished in a decoding end, and the method can integrate general information and vector information of all the target languages in a decoder network, so that the method can generate a plurality of different target languages by using one set of decoder network parameters, can effectively alleviate the defect that a translation result easily contains a plurality of different language vocabularies in the current multilingual translation method, and effectively improves the quality of multilingual translation.

Drawings

FIG. 1 is a flowchart of a multi-language neural machine translation method based on a gating mechanism according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a multilingual neural machine translation device based on a gating mechanism according to an embodiment of the present invention.

Detailed Description

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

The embodiment of the invention provides a multilingual neural machine translation method based on a gating mechanism, and referring to fig. 1, the method comprises the following steps:

step S1: obtaining bilingual parallel data of all language pairs, and preprocessing, wherein the bilingual parallel data comprises a source language and a target language;

in this embodiment, in order to reduce the influence of the external word on the translation performance, first, the BPE method is used to segment sentences in all the training corpora to generate a subword sequence, so that the input of the encoder and the decoder networks are both the subword sequence, the input of the encoder is the subword sequence of the source language, and the input of the decoder is the subword sequence of the target language. Specifically, bilingual parallel data of all language pairs are firstly obtained, when translation is carried out, a label t representing the translation direction is added after a subword sequence of the bilingual parallel data of the language pairs, and a subword sequence vector matrix is obtained, wherein the subword sequence vector matrix comprises a source language subword vector matrix and a target language subword vector matrix. For example, when the source language is english and the target language is german, < E2D > is added, indicating that english is translated into german; when the source language is chinese and the target language is japanese, < Z2J > is added, indicating that chinese is translated into japanese. The preprocessing can lead bilingual parallel data not only to contain specific words but also to contain vectors representing the translation direction, so that the translation direction is guided, and the phenomenon of mixing multiple target languages is avoided.

The BPE (byte pair encoding) method is an algorithm for encoding according to byte pairs. The main purpose is to compress the data, and the algorithm is described as a layer-by-layer iterative process in which the most frequent pair of characters in a string is replaced by a character that does not appear in this character. The resource splits the training corpus by taking characters as a unit, combines the training corpus according to character pairs, sorts all combined results according to the frequency of occurrence, and ranks the higher the frequency of occurrence, the higher the ranking is, and the first ranking is the subword with the highest frequency of occurrence.

Step S2: encoding by using the preprocessed source language to obtain encoded data;

in this embodiment, a word sequence matrix x= [ x ] of the source language is obtained ₀ ，x ₁ …，x _n ]The method comprises the steps of carrying out a first treatment on the surface of the With X= [ v ₀ ，v ₁ ，…，v _n ]Representing a matrix obtained by word vector preprocessing of the input word sequence in the source language, namely a source language sub-word vector matrix, wherein v is as follows ₀ Vector representing translation direction tag, v _n(n>0) A vector representing the nth word. And coding according to the word sequence matrix of the source language and the sub-word sequence vector matrix of the source language to obtain coded data. Wherein Self is defined _enc () For a self-attention mechanism based encoder calculation unit, the encoded representation of each word through the encoder can be calculated from the following formula:

wherein the method comprises the steps ofThe encoded representation representing the sequence of the t-th word in the n-th layer, we can get the top-most encoded representation h using the encoder ^N Wherein each source language shares the same set of encoder representations.

That is, the encoded data of the present embodiment isWhile the encoded data may characterize the hidden state of the encoder to facilitate decoding based on the hidden state of the encoder.

Step S3: decoding the target language and the coded data by using a gating mechanism to obtain target language decoded data;

in this embodiment, first, a word sequence matrix of a target language is obtained; decoding according to the coded data and the target language subword vector matrix to obtain hidden state data of the decoder; and finally, fusing the information to be translated and the hidden state data of the decoder by using a gating mechanism to obtain target language decoding data.

Specifically, the decoder network relies on h obtained in step S2 ^N And the attention mechanism module is used for obtaining word sequence decoding representation of the multi-language information. Acquiring word sequence matrix y= [ y ] of target language ₁ …，y _n ]And a target language subword vector matrix y= [ u ] ₁ ，…，u _n ]That is, a matrix obtained by word vector preprocessing of word sequences input in a target language, wherein u _n A vector representing the nth word.

Further, define Self _dec () For self-attention based decoder computing unit, the decoder outputs hidden state at time tCalculated from the following formula:

wherein h is ^N Representing the hidden state obtained by the encoder,decoding data of the nth word sequence in the nth layer of the target end, u _t Representing the input of the decoder at time t, u _t Comprises one target language, namely +.> At the same time (I)>Also, hidden layer unit characterization shared by all target languages.

In order to distinguish different target languages, the present embodiment adopts a gating mechanism, which can fuse the information to be translated and the hidden state data of the decoder, that is, fuse the hidden layer unit of the shared decoder and the input of one target language. And acquiring the fusion factor coefficient and the word vector of the information to be translated, and acquiring target language decoding data according to the fusion factor coefficient, the hidden state data of the decoder and the word vector of the information to be translated.

Specifically, the expression of the target language decoded data is:

wherein, the hidden state data of the decoder are fused with factor coefficient lambdaWord vector of information to be translated->Obtaining target language decoded data->The expression of the fusion factor coefficient is +.>Wherein ReLU is a linear rectification function.

Step S4: and performing function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language.

In this embodiment, after performing linear transformation on the target language decoded data, performing softmax function calculation to obtain probability distribution of words in the target language vocabulary; and determining that the word corresponding to the maximum probability forms a target language translation result.

For example, the uppermost hidden state output via a decoderAfter a layer of linear transformation (), the following is shown:

wherein,is an input representation of the softmax layer of the target language i.

Obtained by linear transformationThe probability distribution of each time t in the target language vocabulary is output through a softmax function:

wherein W is _i And b _i Is a training parameter of the model, W _i The dimension is the same as the vocabulary dimension.

Finally, selecting the word corresponding to the maximum probability as the generated result of the time t:

according to the steps, the final translation result Y= [ Y ] of the language is generated by decoding in turn ₁ ，…，y _n ]Wherein Y comprises L target languages.

According to the method, a gating mechanism is introduced into the multilingual neural machine translation, different target languages can be distinguished in a decoding end, and universal information and vector information of all the target languages can be fused in a decoder network, so that a set of decoder network parameters are used for generating a plurality of different target languages, the defect that a translation result easily contains a plurality of different language words in the current multilingual translation method can be effectively relieved, and the quality of multilingual translation is effectively improved.

The embodiment of the invention also provides a multilingual neural machine translation device based on a gating mechanism, and referring to fig. 2, the device comprises:

the preprocessing module 01 is used for acquiring bilingual parallel data of all language pairs and preprocessing, wherein the bilingual parallel data comprises a source language and a target language; the functions corresponding to the modules are described in the above step S1, and in order to avoid repetition, they are not described herein again.

The encoding module 02 is used for encoding by utilizing the preprocessed source language to obtain encoded data; the functions corresponding to the modules are described in the above step S2, and in order to avoid repetition, they are not described herein again.

The decoding module 03 is configured to decode the target language and the encoded data by using a gating mechanism to obtain target language decoded data; the decoding module is also used for obtaining a word sequence matrix of the target language; decoding according to the coded data and the word sequence matrix of the target language to obtain hidden state data of a decoder; and fusing the information to be translated and the hidden state data of the decoder by using a gating mechanism to obtain target language decoding data. The functions corresponding to the modules are described in the above step S3, and in order to avoid repetition, they are not described herein again.

And the translation module 04 is used for carrying out function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language. The functions corresponding to the modules are described in the above step S4, and in order to avoid repetition, they are not described herein again.

The present embodiment provides an electronic device including: at least one processor and at least one memory; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform a gating mechanism-based multilingual neural machine translation method.

The present embodiment provides a computer readable storage medium having one or more program instructions embodied therein for performing a gating mechanism-based multilingual neural machine translation method by a system.

In the embodiment of the invention, the processor may be an integrated circuit chip with signal processing capability. The processor may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific f ntegrated Circuit ASIC for short), a field programmable gate array (FieldProgrammable Gate Array FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.

The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (Electrically EPROM, EEPROM), or a flash Memory.

The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (Double Data RateSDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (directracram, DRRAM).

The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The present invention is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for multilingual neural machine translation based on a gating mechanism, the method comprising:

obtaining bilingual parallel data of all language pairs, and preprocessing, wherein the bilingual parallel data comprises a source language and a target language;

encoding by using the preprocessed source language to obtain encoded data;

decoding the target language and the coded data by using a gating mechanism to obtain target language decoded data;

performing function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language;

wherein the decoding the target language and the encoded data by using the gating mechanism to obtain target language decoded data includes: decoding according to the coded data and the target language subword vector matrix to obtain hidden state data of a decoder; fusing the information to be translated and the hidden state data of the decoder by using a gating mechanism to obtain target language decoding data;

the method for decoding data by using the gating mechanism to fuse the information to be translated and the hidden state data of the decoder comprises the following steps: acquiring a fusion factor coefficient and a word vector of information to be translated; according to the fusion factor coefficient, the hidden state data of the decoder and the word vector of the information to be translated, utilizing a gating mechanism to fuse, and obtaining target language decoding data;

the obtaining bilingual parallel data of all language pairs and preprocessing comprises the following steps:

performing subword segmentation on the bilingual parallel data by using a BPE algorithm to generate a subword sequence;

adding a translation direction label at the tail end of a sub-word sequence of bilingual parallel data of a language pair, and obtaining a sub-word sequence vector matrix through word vector processing, wherein the sub-word sequence vector matrix comprises a source language sub-word vector matrix and a target language sub-word vector matrix, and X= [ v ] is used ₀ ,v ₁ ,…,v _n ]Representing a source language subword vector matrix, where v ₀ Vector representing translation direction tag, v _n The vector representing the nth word is represented by y= [ u ] ₁ ,…,u _n ]Representing a target language subword vector matrix, where u _n A vector representing the nth word.

2. The method for multiple language neural machine translation based on gating mechanism of claim 1, wherein the encoding with the preprocessed source language to obtain encoded data comprises:

acquiring a word sequence matrix of a source language;

and coding according to the word sequence matrix of the source language and the sub-word sequence vector matrix of the source language to obtain coded data.

3. The method for multi-language neural machine translation based on a gating mechanism according to claim 1, wherein performing a function calculation on the target language decoded data to obtain a target language translation result corresponding to a source language comprises:

after carrying out linear transformation on the target language decoding data, carrying out softmax function calculation to obtain probability distribution of words in a target language vocabulary;

and determining that the word corresponding to the maximum probability forms a target language translation result.

4. A gating mechanism-based multilingual neural machine translation device, the device comprising:

the preprocessing module is used for acquiring bilingual parallel data of all language pairs and preprocessing, wherein the bilingual parallel data comprises a source language and a target language;

the encoding module is used for encoding by utilizing the preprocessed source language to obtain encoded data;

the decoding module is used for decoding the target language and the coded data by utilizing a gating mechanism to obtain target language decoded data;

the translation module is used for carrying out function calculation on the target language decoding data to obtain a target language translation result corresponding to the source language;

wherein, the decoding module is further configured to perform the following operations: decoding according to the coded data and the target language subword vector matrix to obtain hidden state data of a decoder; acquiring a fusion factor coefficient and a word vector of information to be translated; according to the fusion factor coefficient, the hidden state data of the decoder and the word vector of the information to be translated, utilizing a gating mechanism to fuse, and obtaining target language decoding data;

the preprocessing module is further used for performing subword segmentation on the bilingual parallel data by using a BPE algorithm to generate a subword sequence;

adding a translation direction label at the tail end of a sub-word sequence of bilingual parallel data of a language pair, and obtaining a sub-word sequence vector matrix through word vector processing, wherein the sub-word sequence vector matrix is formed byThe word sequence vector matrix comprises a source language subword vector matrix and a target language subword vector matrix, and X= [ v ] ₀ ,v ₁ ,…,v _n ]Representing a source language subword vector matrix, where v ₀ Vector representing translation direction tag, v _n The vector representing the nth word is represented by y= [ u ] ₁ ,…,u _n ]Representing a target language subword vector matrix, where u _n A vector representing the nth word.

5. An electronic device, the electronic device comprising: at least one processor and at least one memory;

the memory is used for storing one or more program instructions;

the processor being operative to execute one or more program instructions for performing the method as claimed in any one of claims 1-3.

6. A computer readable storage medium, characterized in that the computer storage medium contains one or more program instructions for performing the method according to any of claims 1-3 by a system.