CN110717342B - Distance parameter alignment translation method based on transformer - Google Patents

Distance parameter alignment translation method based on transformer Download PDF

Info

Publication number
CN110717342B
CN110717342B CN201910924019.8A CN201910924019A CN110717342B CN 110717342 B CN110717342 B CN 110717342B CN 201910924019 A CN201910924019 A CN 201910924019A CN 110717342 B CN110717342 B CN 110717342B
Authority
CN
China
Prior art keywords
distance
tensor
alignment
calculation
translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910924019.8A
Other languages
Chinese (zh)
Other versions
CN110717342A (en
Inventor
闫明明
陈绪浩
李迅波
罗华成
赵宇
段世豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910924019.8A priority Critical patent/CN110717342B/en
Publication of CN110717342A publication Critical patent/CN110717342A/en
Application granted granted Critical
Publication of CN110717342B publication Critical patent/CN110717342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a distance parameter alignment translation method based on a transformer, which is applied to a transformer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by an attention mechanism in a training process to obtain a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models with attention mechanism, and the model framework does not need to be modified.

Description

Distance parameter alignment translation method based on transform
Technical Field
The invention relates to neural machine translation, in particular to a neural machine translation method with a distance weighting mechanism.
Background
Neural network machine translation is a machine translation method proposed in recent years. Compared with the traditional statistical machine translation, the neural network machine translation can train a neural network which can be mapped from one sequence to another sequence, and the output can be a sequence with a variable length, so that the machine translation can obtain very good performance in the aspects of translation, dialogue and text summarization. The neural network machine translation is actually a coding-decoding system, coding encodes a source language sequence, extracts information in the source language, and converts the information into another language, namely a target language through decoding, so as to finish language translation.
Since 2013, a neural machine translation system is proposed, along with the rapid development of computing power of a computer, the neural machine translation is also rapidly developed, a seq-seq model, a transform model and the like are successively proposed, and in 2013, a Nal Kalchbrenner and a Phil Blunom propose a novel end-to-end encoder-decoder structure for machine translation. The model may use a Convolutional Neural Network (CNN) to encode a given piece of source text into a continuous vector, and then use a Recurrent Neural Network (RNN) as a decoder to convert the state vector into the target language. Google in 2017 issued a new machine learning model, a Transformer, that performed far better than existing algorithms in machine translation and other language understanding tasks. The purpose of the framework is to accomplish a wider range of tasks, of which neural machine translation is only one.
The traditional technology has the following technical problems:
in the process of aligning the attention function, the existing framework firstly calculates the similarity of two input sentence word vectors, and then performs a series of calculations to obtain the alignment function. In the process, no relative distance is used for introducing calculation, and the participation of word vector distance in an alignment function is lacked. If the word vector distance of the 'eat' and the 'eat' is basically 0 when the alignment of the 'eat' and the 'eat' is calculated, the word vector distance of the 'eat' and the 'distance' is very large, and the difference of the alignment can be increased by introducing the word vector distance, so that the corresponding degree of similar words is higher, the alignment degree of dissimilar words is lower, and the translation effect is better.
Disclosure of Invention
Therefore, in order to solve the above-mentioned disadvantages, the present invention provides a transform-based distance parameter alignment translation method.
The invention is realized in this way, construct a distance parameter alignment translation method based on transformer, apply to the transformer model based on attention mechanism, characterized by that; the method comprises the following steps: calculating the word vector distance between a source language and a target language input sentence in the translation process to obtain a distance tensor; calculating a distance parameter and substituting the distance parameter into a calculation process;
introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation score is improved.
The method for distance parameter alignment translation based on the transformer is characterized by comprising the following steps of (1) aligning and translating a distance parameter based on the transformer; the method is implemented in the following specific manner;
the first step is as follows: generating the time semantic vector
Figure BDA0002218386520000021
Figure BDA0002218386520000022
s t =tanh(W[s t-1 ,y t-1 ])
e ti =s t W a h i
The second step is that: passing hidden layer information and prediction
Figure BDA0002218386520000024
Figure BDA0002218386520000025
Taking tensors Q and K of word vectors of a source language and a target language as initial quantities of calculation, and calculating the Euclidean distance between Q and K to obtain a tensor distance; normalization function normalization is carried out on the distance tensor to obtain a new distance tensor distance, and the distance is substituted into output to be calculated: the process is as follows:
step 1: and (5) making the output vector of the hidden layer be ki, and performing dot product operation QKt to obtain Si.
And 2, step: performing softmax normalization to obtain Ai alignment weight, wherein the calculation formula is
Figure BDA0002218386520000023
And 3, step 3: then calculating the subtraction of the target language word vector zj and the source language word vector vi, and carrying out softmax function normalization on the output vector to obtain a distance tensor hi;
and 4, step 4: introducing a distance tensor to calculate to obtain an improved alignment weight Ai, wherein the calculation formula is Ai = Ai-0.5hi;
and 5: multiplying ai and Vi to obtain an attitude (Q, K, V) with the calculation formula of
Figure BDA0002218386520000031
Step 6: repeating the steps 1-5 for 6 times to obtain a final output matrix;
and 7: and finally, the output matrix participates in subsequent operation.
The invention has the following advantages: the invention relates to a distance parameter alignment translation method based on a transducer, which is applied to a transducer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by the attention mechanism in the training process (different calculation modes can obtain different relative word vector distances), and obtaining a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models containing attention mechanisms without modifying the model framework.
Detailed Description
The present invention is described in detail below, and the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a distance parameter alignment translation method based on a transformer by improvement, which is applied to a transformer model based on an attention mechanism and is characterized by comprising the following steps of:
calculating the word vector distance between a source language and a target language input sentence in the translation process to obtain a distance tensor; and calculating the distance parameter and carrying the distance parameter into a calculation process.
Introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation score is improved.
transform framework introduction:
encoder consisting of 6 identical layers, each layer containing two sub-layers, the first sub-layer being a multi-head attention layer and then a simple fully connected layer. Where each sub-layer is concatenated and normalized with the residual).
The Decoder consists of 6 identical layers, but the layers are different from the encoders, and the layers comprise three sub-layers, wherein one self-attachment Layer is arranged, and the encoder-Decoder attachment Layer is finally a full connection Layer. Both the first two sub-layers are based on multi-head attention layers. One particular point is masking, which prevents future output words from being used during training.
Attention model:
the original encoder-decoder model is very classical, but has very large limitation. A large limitation is that the link between encoding and decoding is a fixed length semantic vector C. That is, the encoder compresses the entire sequence of information into a fixed length vector. However, there are two disadvantages, one is that the semantic vector cannot completely represent the information of the whole sequence, and the information carried by the first input content is diluted by the later input information. The longer the input sequence, the more severe this phenomenon is. This results in insufficient information being initially obtained for the input sequence at the time of decoding, which can compromise accuracy.
In order to solve the above problems, an attention model is proposed. When the model generates the output, an attention range is generated to indicate which parts of the input sequence are focused on when the output is next generated, and then the next output is generated according to the focused area, and the process is repeated. Attention and some behavior characteristics of a person have certain similarities, and when the person looks at a certain word, the person usually only focuses attention on words with information amount, but not all words, namely, the attention weight given to each word by the person is different. The attention model increases the training difficulty of the model, but improves the effect of text generation.
The first step is as follows: generating the temporal semantic vector
Figure BDA0002218386520000051
Figure BDA0002218386520000052
s t =tanh(W[s t-1 ,y t-1 ])
e ti =s t W a h i
The second step: passing hidden layer information and prediction
Figure BDA0002218386520000053
Figure BDA0002218386520000056
The improvement here is a modification in the attention function.
Calculating the Euclidean distance between Q and K by taking tensors Q and K of word vectors of a source language and a target language as initial quantities of calculation, and obtaining a tensor distance; normalization function normalization is carried out on the distance tensor to obtain a new distance tensor distance, and the distance is substituted into output to be calculated: the process is as follows:
step 1: and taking the output vector of the hidden layer as ki, and performing dot product operation QKt to obtain Si.
Step 2: performing softmax normalization to obtain Ai alignment weight, wherein the calculation formula is
Figure BDA0002218386520000054
And step 3: and then calculating the subtraction of the target language word vector zj and the source language word vector vi, and carrying out softmax function normalization on the output vector to obtain the distance tensor hi.
Step 4, introducing a distance tensor to calculate to obtain an improved alignment weight Ai, wherein the calculation formula is Ai = Ai-0.5hi;
and 5: multiplying ai and Vi to obtain an attitude (Q, K, V) with the calculation formula of
Figure BDA0002218386520000055
Step 6: repeating the steps 1-5 and calculating 6 times to obtain the final output matrix.
And 7: and finally, the output matrix participates in subsequent operation.
The invention relates to a distance parameter alignment translation method based on a transformer, which is applied to a transformer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by an attention mechanism in a training process (different calculation modes can obtain different relative word vector distances), and obtaining a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models containing attention mechanisms without modifying the model framework.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (1)

1. A distance parameter alignment translation method based on a transform is applied to a transform model based on an attention mechanism, and is characterized in that; the method comprises the following steps:
calculating the word vector distance of input sentences of a source language and a target language in the translation process to obtain a distance tensor; calculating a distance tensor parameter and substituting the distance tensor parameter into a calculation process;
introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation value is improved;
the method is implemented in the following specific manner;
the first step is as follows: generating a semantic vector of the t-th character;
Figure FDA0003911494830000011
Figure FDA0003911494830000012
s t =tanh(W[s t-1 ,y t-1 ])
e ti =s t W a h i
the second step is that: passing hidden layer information and prediction
Figure FDA0003911494830000013
Figure FDA0003911494830000014
Calculating the Euclidean distance between Q and K by taking tensors Q and K of word vectors of source language and target language as initial quantities of calculation to obtain a distance tensor; normalizing the distance tensor by a normalization function to obtain a new distance tensor, and finally substituting the distance tensor into output to calculate; the process is as follows:
step 1: let the word vector of the target language be K i Performing dot product operation QK i To obtain S i
Step 2: performing softmax normalization to obtain A i Alignment weight is calculated as
Figure FDA0003911494830000015
And step 3: recalculating target language word vector K i With the word of the source languageVector Q i Vector subtraction is carried out, the output vector is normalized by a softmax function to obtain a distance tensor h i
And 4, step 4: introducing a distance tensor to calculate to obtain an improved alignment weight a i The calculation formula is a i =A i -0.5h i
And 5: to get a i And V i Multiplying to obtain an Attention (Q, K, V) with a calculation formula of;
Figure FDA0003911494830000021
step 6: repeating the steps 1-5 for 6 times to obtain a final output matrix;
and 7: and finally, the output matrix participates in subsequent operation.
CN201910924019.8A 2019-09-27 2019-09-27 Distance parameter alignment translation method based on transformer Active CN110717342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910924019.8A CN110717342B (en) 2019-09-27 2019-09-27 Distance parameter alignment translation method based on transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910924019.8A CN110717342B (en) 2019-09-27 2019-09-27 Distance parameter alignment translation method based on transformer

Publications (2)

Publication Number Publication Date
CN110717342A CN110717342A (en) 2020-01-21
CN110717342B true CN110717342B (en) 2023-03-14

Family

ID=69212001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910924019.8A Active CN110717342B (en) 2019-09-27 2019-09-27 Distance parameter alignment translation method based on transformer

Country Status (1)

Country Link
CN (1) CN110717342B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11875131B2 (en) * 2020-09-16 2024-01-16 International Business Machines Corporation Zero-shot cross-lingual transfer learning
TWI814216B (en) * 2022-01-19 2023-09-01 中國信託商業銀行股份有限公司 Method and device for establishing translation model based on triple self-learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460028A (en) * 2018-04-12 2018-08-28 苏州大学 Sentence weight is incorporated to the field adaptive method of neural machine translation
CN110162799A (en) * 2018-11-28 2019-08-23 腾讯科技(深圳)有限公司 Model training method, machine translation method and relevant apparatus and equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126507B (en) * 2016-06-22 2019-08-09 哈尔滨工业大学深圳研究生院 A kind of depth nerve interpretation method and system based on character code
KR102630668B1 (en) * 2016-12-06 2024-01-30 한국전자통신연구원 System and method for expanding input text automatically
CN110321567B (en) * 2019-06-20 2023-08-11 四川语言桥信息技术有限公司 Neural machine translation method, device and equipment based on attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460028A (en) * 2018-04-12 2018-08-28 苏州大学 Sentence weight is incorporated to the field adaptive method of neural machine translation
CN110162799A (en) * 2018-11-28 2019-08-23 腾讯科技(深圳)有限公司 Model training method, machine translation method and relevant apparatus and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Gaussian Transformer: A Lightweight Approach for Natural Language Inference;Maosheng Guo 等;《The Thirty-Third AAAI Conference on Artificial Intelligence》;20190707;全文 *
一种基于词级权重的Transformer 模型改进方法;王明申;《小型微型计算机系统》;20190430;第40卷(第4期);全文 *

Also Published As

Publication number Publication date
CN110717342A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
CN108681610B (en) generating type multi-turn chatting dialogue method, system and computer readable storage medium
CN111382582B (en) Neural machine translation decoding acceleration method based on non-autoregressive
CN108153913B (en) Training method of reply information generation model, reply information generation method and device
CN112633010B (en) Aspect-level emotion analysis method and system based on multi-head attention and graph convolution network
CN110135551B (en) Robot chatting method based on word vector and recurrent neural network
CN107644014A (en) A kind of name entity recognition method based on two-way LSTM and CRF
CN110059324B (en) Neural network machine translation method and device based on dependency information supervision
CN107729329A (en) A kind of neural machine translation method and device based on term vector interconnection technique
CN111274375B (en) Multi-turn dialogue method and system based on bidirectional GRU network
CN110929092A (en) Multi-event video description method based on dynamic attention mechanism
CN110717342B (en) Distance parameter alignment translation method based on transformer
CN111581383A (en) Chinese text classification method based on ERNIE-BiGRU
CN109299479A (en) Translation memory is incorporated to the method for neural machine translation by door control mechanism
CN110473267A (en) Social networks image based on attention feature extraction network describes generation method
CN115841119B (en) Emotion cause extraction method based on graph structure
CN107463928A (en) Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM
CN110688860B (en) Weight distribution method based on multiple attention mechanisms of transformer
CN112560456A (en) Generation type abstract generation method and system based on improved neural network
CN114691858B (en) Improved UNILM digest generation method
CN113297374B (en) Text classification method based on BERT and word feature fusion
CN114328866A (en) Strong anthropomorphic intelligent dialogue robot with smooth and accurate response
CN110717343B (en) Optimal alignment method based on transformer attention mechanism output
CN111243578A (en) Chinese mandarin character-voice conversion method based on self-attention mechanism
CN116580278A (en) Lip language identification method, equipment and storage medium based on multi-attention mechanism
CN114548090B (en) Fast relation extraction method based on convolutional neural network and improved cascade labeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant