CN110321567B

CN110321567B - Neural machine translation method, device and equipment based on attention mechanism

Info

Publication number: CN110321567B
Application number: CN201910539986.2A
Authority: CN
Inventors: 朱宪超
Original assignee: Sichuan Lan Bridge Information Technology Co ltd
Current assignee: Sichuan Lan Bridge Information Technology Co ltd
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2023-08-11
Anticipated expiration: 2039-06-20
Also published as: CN110321567A

Abstract

The application discloses a neural machine translation method, device and equipment based on an attention mechanism. The method comprises the steps of obtaining a source language and a target language during translation, wherein the source language refers to language information needing translation, and the target language refers to language information after translation; calculating distance tensors of the source language and the target language, wherein the distance tensors refer to distance weights; the distance tensor is used in calculating an alignment tensor using an alignment function in order to conform the neural machine translation result to expectations. The application solves the technical problem of poor translation effect. The application can effectively improve the alignment effect of the attention function and improve the translation effect and the score.

Description

Neural machine translation method, device and equipment based on attention mechanism

Technical Field

The application relates to the field of neural machine translation, in particular to a neural machine translation method, device and equipment based on an attention mechanism.

Background

Neural machine translation is a machine translation method. Based on the coding and decoding system, the source language sequence is coded by coding, the information in the source language is extracted, and the information is converted into another language, namely the target language by decoding, so that the translation of the language is completed.

The inventors found that the translation effect is poor due to the influence of similar words or the like in neural machine translation.

Aiming at the problem of poor translation effect in the related technology, no effective solution is proposed at present.

Disclosure of Invention

The application mainly aims to provide a neural machine translation method, device and equipment based on an attention mechanism so as to solve the problem of poor translation effect.

To achieve the above object, according to one aspect of the present application, there is provided a neural machine translation method based on an attention mechanism.

The neural machine translation method based on the attention mechanism comprises the following steps: acquiring a source language and a target language during translation, wherein the source language refers to language information needing translation, and the target language refers to language information after translation; calculating distance tensors of the source language and the target language, wherein the distance tensors refer to distance weights; the distance tensor is used in calculating an alignment tensor using an attention mechanism to conform the neural machine translation result to expectations, wherein the alignment tensor is calculated using an alignment function.

Further, using the distance tensor in the process of calculating the alignment tensor using the attention mechanism includes: the distance tensor is introduced into the attention mechanism for calculation, and part of the distance tensor is subtracted from the output alignment tensor based on the attention mechanism.

Further, calculating a distance tensor for the source language and the target language includes: and calculating the distance parameter and carrying into the process of calculating the distance tensor.

Further, the process of calculating the distance parameter and carrying into the distance tensor calculation includes: taking the source language word vector and the target language word vector based on the attention function input tensor as calculated initial quantities; calculating Euclidean distance between the source language word vector and the target language word vector to obtain a distance tensor; and normalizing the distance tensor to obtain a new distance tensor.

Further, a seq2seq framework model for attention-based mechanisms.

To achieve the above object, according to another aspect of the present application, there is provided a neural machine translation device based on an attention mechanism.

The neural machine translation device based on the attention mechanism according to the present application includes: the system comprises an acquisition module, a translation processing module and a translation processing module, wherein the acquisition module is used for acquiring a source language and a target language during translation, the source language refers to language information needing translation, and the target language refers to translated language information; the computing module is used for computing distance tensors of the source language and the target language, wherein the distance tensors refer to distance weights; and the substituting module is used for using the distance tensor in the process of calculating the alignment tensor by using an attention mechanism so as to enable the neural machine translation result to accord with expectations, wherein the alignment tensor is calculated by adopting an alignment function.

Further, the substitution module is configured to introduce the distance tensor into an attention mechanism for calculation, and subtract a portion of the distance tensor from an output alignment tensor based on the attention mechanism.

Further, the calculation module is used for calculating the distance parameter and carrying into the process of calculating the distance tensor.

Further, the computing module is further configured to take the source language word vector and the target language word vector based on an attention function input tensor as initial quantities for computation; calculating Euclidean distance between the source language word vector and the target language word vector to obtain a distance tensor; and normalizing the distance tensor to obtain a new distance tensor.

To achieve the above object, according to still another aspect of the present application, there is provided a processing apparatus including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the neural machine translation method when executing the program.

According to the neural machine translation method, device and equipment based on the attention mechanism, a mode of acquiring the source language and the target language during translation is adopted, and the distance tensor between the source language and the target language is calculated, so that the distance tensor is used in the process of calculating the alignment tensor by using the attention mechanism, the neural machine translation result meets the expected purpose, the technical effect of improving the alignment of the attention function is achieved, and the technical problem of poor translation effect is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, are incorporated in and constitute a part of this specification. The drawings and their description are illustrative of the application and are not to be construed as unduly limiting the application. In the drawings:

FIG. 1 is a schematic flow diagram of a neural machine translation method based on an attention mechanism according to an embodiment of the present application;

FIG. 2 is a schematic flow diagram of a neural machine translation method based on an attention mechanism, according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a neural machine translation device based on an attention mechanism according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the present application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal" and the like indicate an azimuth or a positional relationship based on that shown in the drawings. These terms are only used to better describe the present application and its embodiments and are not intended to limit the scope of the indicated devices, elements or components to the particular orientations or to configure and operate in the particular orientations.

Also, some of the terms described above may be used to indicate other meanings in addition to orientation or positional relationships, for example, the term "upper" may also be used to indicate some sort of attachment or connection in some cases. The specific meaning of these terms in the present application will be understood by those of ordinary skill in the art according to the specific circumstances.

Furthermore, the terms "mounted," "configured," "provided," "connected," "coupled," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; may be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements, or components. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

As shown in fig. 1, the method includes steps S102 to S106 as follows:

step S102, acquiring a source language and a target language during translation,

the source language refers to language information to be translated, and the target language refers to translated language information.

The method can determine the voice information to be translated when translating, and can acquire the language information to be translated or translated.

Step S104, calculating distance tensors of the source language and the target language,

the distance tensor refers to the distance weight.

Specifically, tensors Q and K of word vectors in the source language and the target language are used as initial quantities of calculation, euclidean distances between Q and K are calculated, and a tensor distance is obtained.

By calculating the distance tensor of the source language and the target language, the word vector distance is introduced, so that the difference of alignment degree can be increased, the correspondence degree of similar words is higher, the alignment degree of dissimilar words is lower, and the translation effect is better.

Step S106, using the distance tensor in the process of calculating the alignment tensor by using the attention mechanism to make the neural machine translation result accord with the expectation,

the alignment tensor is calculated using an alignment function.

When applied on a neural machine translation model based on an attention mechanism, the distance tensor is introduced in the process of calculating an alignment tensor therein using the attention mechanism. Because the distance of the word vectors between the aligned sentences represents the difference degree between the two sentences, the calculation of adding the distance parameter into the alignment function can effectively enlarge the alignment probability difference of different words, so that the alignment is more effective.

It should be noted that the distance tensor is used in the calculation of the alignment tensor using the attention mechanism, and is applicable to all neural machine translation models containing the attention mechanism, without modifying the model framework.

It should be noted that the neural machine translation result meets the expectation that the effect and the score of the translation result meet the preset translation requirement.

From the above description, it can be seen that the following technical effects are achieved:

by adopting a mode of acquiring a source language and a target language during translation, the distance tensor of the source language and the target language is calculated, so that the distance tensor is used in the process of calculating an alignment tensor by using an attention mechanism, and the neural machine translation result meets the expected aim, thereby realizing the technical effect of improving the alignment of an attention function and further solving the technical problem of poor translation effect.

According to an embodiment of the present application, as a preference in the present embodiment, using the distance tensor in the process of calculating the alignment tensor using the attention mechanism includes: the distance tensor is introduced into the attention mechanism for calculation, and part of the distance tensor is subtracted from the output alignment tensor based on the attention mechanism.

Specifically, in the translation process, the word vector distance between the input sentences of the source language and the target language is calculated to obtain the distance tensor, the distance tensor is introduced into an attention mechanism for calculation, and a part of distance tensor is subtracted from the attention output alignment tensor to obtain the output alignment tensor with higher efficiency.

According to an embodiment of the present application, as a preference in the present embodiment, calculating the distance tensor of the source language and the target language includes: and calculating the distance parameter and carrying into the process of calculating the distance tensor.

In particular, the process of distance tensor calculation may be carried over after calculating the distance parameter.

In consideration of the existing alignment process of the attention function, the similarity of two input sentence word vectors is calculated, and then a series of calculation is performed to obtain the alignment function. In the process, the relative distance is not used for introducing calculation, and the participation of word vector distance equidistance parameters in an alignment function is absent. For example, when calculating the alignment of "drink" and "drink", the distance between the two word vectors is substantially 0, and when calculating the alignment of "drink" and "distance", the distance between the two word vectors is large. After the distance parameters are calculated, the distance vector distance can be introduced in the distance tensor calculation process, so that the difference of alignment degree can be increased, the correspondence degree of similar words is higher, the alignment degree of dissimilar words is lower, and the translation effect is better.

According to an embodiment of the present application, as a preferred embodiment, as shown in fig. 2, the process of calculating the distance parameter and carrying into the distance tensor calculation includes:

step S202, the source language word vector and the target language word vector based on the attention function input tensor are used as calculated initial quantities;

step S204, calculating Euclidean distance between the source language word vector and the target language word vector to obtain a distance tensor;

step S206, performing normalization processing on the distance tensor to obtain a new distance tensor.

Specifically, one possible calculation process is as follows:

step S1, enabling the output vector of the hidden layer to be ki, and performing dot product operation QKt to obtain Si;

step S2, softmax normalization is carried out to obtain an Ai alignment weight, and a calculation formula is as follows:

step S3, calculating a target language word vector zj and a source language word vector vi, and carrying out softmax function normalization on the output vector to obtain a distance tensor hi;

step S4, introducing a distance tensor to calculate to obtain an improved pair Ji Quanchong Ai, wherein a calculation formula is ai=ai-0.5 hi;

step S5, multiplication of ai and Vi is obtained, and the Attention (Q, K, V) is obtained, wherein the calculation formula is as follows:

Attention(Q,K,V)＝∑ _i a _i V _i 。

according to an embodiment of the present application, as a preference in this embodiment, a seq2seq framework model for an attention-based mechanism. The seq2seq framework model is known as Sequence to Sequence. Is a universal encoder-decoder framework and can be used in scenes such as machine translation, text summarization, session modeling, image captions and the like.

Specifically, the basic Seq2Seq model includes two RNNs, a Decoder and an Encoder, the most basic Seq2Seq model includes three parts, namely Encoder, decoder and an intermediate State Vector connecting the two, the Encoder encodes the three parts into a State Vector S with a fixed size through learning input, then the State Vector S is transmitted to the Encoder, and the Encoder outputs the State Vector S through learning the State Vector S.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

There is also provided, in accordance with an embodiment of the present application, an attention-based neural machine translation device for implementing the above method, as shown in fig. 3, the device including: the acquiring module 10 is configured to acquire a source language and a target language during translation, where the source language refers to language information that needs to be translated, and the target language refers to translated language information; a calculation module 20, configured to calculate a distance tensor of the source language and the target language, where the distance tensor refers to a distance weight; the substituting module 30 is configured to use the distance tensor in a process of calculating an alignment tensor using an attention mechanism to make the neural machine translation result conform to expectations, where the alignment tensor is calculated using an alignment function.

The source language in the obtaining module 10 in the embodiment of the present application refers to language information that needs to be translated, and the target language refers to translated language information.

The distance tensor in the calculation module 20 of the embodiment of the present application refers to a distance weight.

The alignment tensor in the substitution module 30 of the embodiment of the present application is calculated using an alignment function.

According to an embodiment of the present application, as a preference in this embodiment, the substitution module 30 is configured to introduce the distance tensor into an attention mechanism for calculation, and subtract a part of the distance tensor from an output alignment tensor based on the attention mechanism.

In the embodiment of the application, the word vector distance of the input sentence of the source language and the target language is calculated in the translation process, the distance tensor is obtained, the distance tensor is introduced into the attention mechanism for calculation, and the output alignment tensor with higher efficiency is obtained by subtracting a part of the distance tensor from the output alignment tensor of the attention.

According to an embodiment of the present application, as a preference in this embodiment, the calculation module 20 is configured to calculate the distance parameter and carry over the process of calculating the distance tensor.

According to an embodiment of the present application, as a preference in this embodiment, the calculation module 20 is further configured to take the source language word vector and the target language word vector based on an attention function input tensor as initial quantities of calculation; calculating Euclidean distance between the source language word vector and the target language word vector to obtain a distance tensor; and normalizing the distance tensor to obtain a new distance tensor.

Specifically, one possible calculation process is as follows:

Attention(Q,K,V)＝∑ _i a _i V _i 。

in another embodiment of the present application, there is also provided a processing device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the neural machine translation method when the program is executed. The neural machine translation method comprises the following steps:

acquiring a source language and a target language during translation, wherein the source language refers to language information needing translation, and the target language refers to language information after translation;

calculating distance tensors of the source language and the target language, wherein the distance tensors refer to distance weights;

the distance tensor is used in calculating an alignment tensor using an attention mechanism to conform the neural machine translation result to expectations, wherein the alignment tensor is calculated using an alignment function.

It will be apparent to those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they may be stored in a memory device for execution by the computing devices, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A neural machine translation method based on an attention mechanism, comprising:

using the distance tensor in the process of calculating an alignment tensor by using an attention mechanism so as to enable the neural machine translation result to accord with expectations, wherein the alignment tensor is calculated by adopting an alignment function;

using the distance tensor in the process of calculating the alignment tensor using the attention mechanism includes:

introducing the distance tensor into an attention mechanism for calculation, and subtracting part of the distance tensor from an output alignment tensor based on the attention mechanism;

calculating a distance tensor for the source language and the target language includes: calculating the distance parameter and carrying into a distance tensor calculation process;

the process of calculating the distance parameter and carrying into the distance tensor calculation comprises the following steps:

taking the source language word vector and the target language word vector based on the attention function input tensor as calculated initial quantities;

calculating Euclidean distance between the source language word vector and the target language word vector to obtain a distance tensor;

and normalizing the distance tensor to obtain a new distance tensor.

2. The neural machine translation method of claim 1, wherein the model is a seq2seq framework model for attention-based mechanisms.

3. A neural machine translation device based on an attention mechanism, comprising:

the system comprises an acquisition module, a translation processing module and a translation processing module, wherein the acquisition module is used for acquiring a source language and a target language during translation, the source language refers to language information needing translation, and the target language refers to translated language information;

the computing module is used for computing distance tensors of the source language and the target language, wherein the distance tensors refer to distance weights;

the substituting module is used for using the distance tensor in the process of calculating the alignment tensor by using an attention mechanism so as to enable the neural machine translation result to accord with expectations, wherein the alignment tensor is calculated by adopting an alignment function;

the substituting module is used for introducing the distance tensor into an attention mechanism for calculation, and subtracting part of the distance tensor from an output alignment tensor based on the attention mechanism;

the calculation module is used for calculating the distance parameter and carrying into a distance tensor calculation process;

the calculation module is also used for the calculation of the data,

and normalizing the distance tensor to obtain a new distance tensor.

4. A processing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the neural machine translation method of any one of claims 1 to 2 when the program is executed by the processor.