CN110717342B

CN110717342B - Distance parameter alignment translation method based on transformer

Info

Publication number: CN110717342B
Application number: CN201910924019.8A
Authority: CN
Inventors: 闫明明; 陈绪浩; 李迅波; 罗华成; 赵宇; 段世豪
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2023-03-14
Anticipated expiration: 2039-09-27
Also published as: CN110717342A

Abstract

The invention discloses a distance parameter alignment translation method based on a transformer, which is applied to a transformer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by an attention mechanism in a training process to obtain a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models with attention mechanism, and the model framework does not need to be modified.

Description

Distance parameter alignment translation method based on transform

Technical Field

The invention relates to neural machine translation, in particular to a neural machine translation method with a distance weighting mechanism.

Background

Neural network machine translation is a machine translation method proposed in recent years. Compared with the traditional statistical machine translation, the neural network machine translation can train a neural network which can be mapped from one sequence to another sequence, and the output can be a sequence with a variable length, so that the machine translation can obtain very good performance in the aspects of translation, dialogue and text summarization. The neural network machine translation is actually a coding-decoding system, coding encodes a source language sequence, extracts information in the source language, and converts the information into another language, namely a target language through decoding, so as to finish language translation.

Since 2013, a neural machine translation system is proposed, along with the rapid development of computing power of a computer, the neural machine translation is also rapidly developed, a seq-seq model, a transform model and the like are successively proposed, and in 2013, a Nal Kalchbrenner and a Phil Blunom propose a novel end-to-end encoder-decoder structure for machine translation. The model may use a Convolutional Neural Network (CNN) to encode a given piece of source text into a continuous vector, and then use a Recurrent Neural Network (RNN) as a decoder to convert the state vector into the target language. Google in 2017 issued a new machine learning model, a Transformer, that performed far better than existing algorithms in machine translation and other language understanding tasks. The purpose of the framework is to accomplish a wider range of tasks, of which neural machine translation is only one.

The traditional technology has the following technical problems:

in the process of aligning the attention function, the existing framework firstly calculates the similarity of two input sentence word vectors, and then performs a series of calculations to obtain the alignment function. In the process, no relative distance is used for introducing calculation, and the participation of word vector distance in an alignment function is lacked. If the word vector distance of the 'eat' and the 'eat' is basically 0 when the alignment of the 'eat' and the 'eat' is calculated, the word vector distance of the 'eat' and the 'distance' is very large, and the difference of the alignment can be increased by introducing the word vector distance, so that the corresponding degree of similar words is higher, the alignment degree of dissimilar words is lower, and the translation effect is better.

Disclosure of Invention

Therefore, in order to solve the above-mentioned disadvantages, the present invention provides a transform-based distance parameter alignment translation method.

The invention is realized in this way, construct a distance parameter alignment translation method based on transformer, apply to the transformer model based on attention mechanism, characterized by that; the method comprises the following steps: calculating the word vector distance between a source language and a target language input sentence in the translation process to obtain a distance tensor; calculating a distance parameter and substituting the distance parameter into a calculation process;

introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation score is improved.

The method for distance parameter alignment translation based on the transformer is characterized by comprising the following steps of (1) aligning and translating a distance parameter based on the transformer; the method is implemented in the following specific manner;

the first step is as follows: generating the time semantic vector

s _t ＝tanh(W[s _t-1 ,y _t-1 ])

e _ti ＝s _t W _a h _i

The second step is that: passing hidden layer information and prediction

Taking tensors Q and K of word vectors of a source language and a target language as initial quantities of calculation, and calculating the Euclidean distance between Q and K to obtain a tensor distance; normalization function normalization is carried out on the distance tensor to obtain a new distance tensor distance, and the distance is substituted into output to be calculated: the process is as follows:

step 1: and (5) making the output vector of the hidden layer be ki, and performing dot product operation QKt to obtain Si.

And 2, step: performing softmax normalization to obtain Ai alignment weight, wherein the calculation formula is

And 3, step 3: then calculating the subtraction of the target language word vector zj and the source language word vector vi, and carrying out softmax function normalization on the output vector to obtain a distance tensor hi;

and 4, step 4: introducing a distance tensor to calculate to obtain an improved alignment weight Ai, wherein the calculation formula is Ai = Ai-0.5hi;

and 5: multiplying ai and Vi to obtain an attitude (Q, K, V) with the calculation formula of

Step 6: repeating the steps 1-5 for 6 times to obtain a final output matrix;

and 7: and finally, the output matrix participates in subsequent operation.

The invention has the following advantages: the invention relates to a distance parameter alignment translation method based on a transducer, which is applied to a transducer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by the attention mechanism in the training process (different calculation modes can obtain different relative word vector distances), and obtaining a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models containing attention mechanisms without modifying the model framework.

Detailed Description

The present invention is described in detail below, and the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a distance parameter alignment translation method based on a transformer by improvement, which is applied to a transformer model based on an attention mechanism and is characterized by comprising the following steps of:

calculating the word vector distance between a source language and a target language input sentence in the translation process to obtain a distance tensor; and calculating the distance parameter and carrying the distance parameter into a calculation process.

transform framework introduction:

encoder consisting of 6 identical layers, each layer containing two sub-layers, the first sub-layer being a multi-head attention layer and then a simple fully connected layer. Where each sub-layer is concatenated and normalized with the residual).

The Decoder consists of 6 identical layers, but the layers are different from the encoders, and the layers comprise three sub-layers, wherein one self-attachment Layer is arranged, and the encoder-Decoder attachment Layer is finally a full connection Layer. Both the first two sub-layers are based on multi-head attention layers. One particular point is masking, which prevents future output words from being used during training.

Attention model:

the original encoder-decoder model is very classical, but has very large limitation. A large limitation is that the link between encoding and decoding is a fixed length semantic vector C. That is, the encoder compresses the entire sequence of information into a fixed length vector. However, there are two disadvantages, one is that the semantic vector cannot completely represent the information of the whole sequence, and the information carried by the first input content is diluted by the later input information. The longer the input sequence, the more severe this phenomenon is. This results in insufficient information being initially obtained for the input sequence at the time of decoding, which can compromise accuracy.

In order to solve the above problems, an attention model is proposed. When the model generates the output, an attention range is generated to indicate which parts of the input sequence are focused on when the output is next generated, and then the next output is generated according to the focused area, and the process is repeated. Attention and some behavior characteristics of a person have certain similarities, and when the person looks at a certain word, the person usually only focuses attention on words with information amount, but not all words, namely, the attention weight given to each word by the person is different. The attention model increases the training difficulty of the model, but improves the effect of text generation.

The first step is as follows: generating the temporal semantic vector

s _t ＝tanh(W[s _t-1 ,y _t-1 ])

e _ti ＝s _t W _a h _i

The second step: passing hidden layer information and prediction

The improvement here is a modification in the attention function.

Calculating the Euclidean distance between Q and K by taking tensors Q and K of word vectors of a source language and a target language as initial quantities of calculation, and obtaining a tensor distance; normalization function normalization is carried out on the distance tensor to obtain a new distance tensor distance, and the distance is substituted into output to be calculated: the process is as follows:

step 1: and taking the output vector of the hidden layer as ki, and performing dot product operation QKt to obtain Si.

Step 2: performing softmax normalization to obtain Ai alignment weight, wherein the calculation formula is

And step 3: and then calculating the subtraction of the target language word vector zj and the source language word vector vi, and carrying out softmax function normalization on the output vector to obtain the distance tensor hi.

Step 4, introducing a distance tensor to calculate to obtain an improved alignment weight Ai, wherein the calculation formula is Ai = Ai-0.5hi;

Step 6: repeating the steps 1-5 and calculating 6 times to obtain the final output matrix.

And 7: and finally, the output matrix participates in subsequent operation.

The invention relates to a distance parameter alignment translation method based on a transformer, which is applied to a transformer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by an attention mechanism in a training process (different calculation modes can obtain different relative word vector distances), and obtaining a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models containing attention mechanisms without modifying the model framework.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A distance parameter alignment translation method based on a transform is applied to a transform model based on an attention mechanism, and is characterized in that; the method comprises the following steps:

calculating the word vector distance of input sentences of a source language and a target language in the translation process to obtain a distance tensor; calculating a distance tensor parameter and substituting the distance tensor parameter into a calculation process;

introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation value is improved;

the method is implemented in the following specific manner;

the first step is as follows: generating a semantic vector of the t-th character;

s _t ＝tanh(W[s _t-1 ,y _t-1 ])

e _ti ＝s _t W _a h _i

the second step is that: passing hidden layer information and prediction

Calculating the Euclidean distance between Q and K by taking tensors Q and K of word vectors of source language and target language as initial quantities of calculation to obtain a distance tensor; normalizing the distance tensor by a normalization function to obtain a new distance tensor, and finally substituting the distance tensor into output to calculate; the process is as follows:

step 1: let the word vector of the target language be K _i Performing dot product operation QK _i To obtain S _i ；

Step 2: performing softmax normalization to obtain A _i Alignment weight is calculated as

And step 3: recalculating target language word vector K _i With the word of the source languageVector Q _i Vector subtraction is carried out, the output vector is normalized by a softmax function to obtain a distance tensor h _i ；

And 4, step 4: introducing a distance tensor to calculate to obtain an improved alignment weight a _i The calculation formula is a _i ＝A _i -0.5h _i ；

And 5: to get a _i And V _i Multiplying to obtain an Attention (Q, K, V) with a calculation formula of;

step 6: repeating the steps 1-5 for 6 times to obtain a final output matrix;

and 7: and finally, the output matrix participates in subsequent operation.