CN112329483A - Multi-mechanism attention-combined multi-path neural machine translation method - Google Patents

Multi-mechanism attention-combined multi-path neural machine translation method Download PDF

Info

Publication number
CN112329483A
CN112329483A CN202011209086.0A CN202011209086A CN112329483A CN 112329483 A CN112329483 A CN 112329483A CN 202011209086 A CN202011209086 A CN 202011209086A CN 112329483 A CN112329483 A CN 112329483A
Authority
CN
China
Prior art keywords
attention
translation
training
embedding vector
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011209086.0A
Other languages
Chinese (zh)
Inventor
范洪博
郑棋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202011209086.0A priority Critical patent/CN112329483A/en
Publication of CN112329483A publication Critical patent/CN112329483A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a multi-mechanism attention-combined multi-path neural machine translation method, and belongs to the field of natural language processing. The invention independently generates self attention values by a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, carries out weighted accumulation on the calculated attention values, then aligns and normalizes the calculated attention values to form a new attention value, and transmits the new attention value to a Dec-Enc attribute layer of a decoder, so that each translation mechanism finishes the subsequent machine translation process to obtain a decoding key-value matrix. And performing weighted superposition and normalization by adopting the decoding key-value matrixes generated by all mechanisms, and generating a target translation through a linear transformation layer and a softmax layer. The process of overlapping and normalizing the attention of the multiple mechanisms can effectively integrate the analysis capability of various algorithms, and the formed attention is closer to the theoretical real attention, so that a better translation effect is obtained, and the translation accuracy can be effectively improved.

Description

Multi-mechanism attention-combined multi-path neural machine translation method
Technical Field
The invention relates to a multi-mechanism attention-combined multi-path neural machine translation method, and belongs to the field of natural language processing.
Background
Machine translation, which is a process implemented by a computer to translate a sentence in one language (a source language sentence) into a sentence in another language (a target language sentence) having the same meaning, has become an important research direction in the field of artificial intelligence.
Gehring et al in the prior art propose a CNN translation mechanism to realize machine translation, which completely utilizes a convolutional neural network to realize machine translation, and the convolutional neural network is respectively used as working units of an encoder and a decoder, wherein the encoder and the decoder are formed by stacking multiple layers of convolutional neural networks. At the encoding end, the input sequence is encoded using a convolution operation. At the decoding end, each convolutional layer performs attention, and the result continues as input for the next layer. And finally, predicting the next target word based on the hidden state of the last layer.
In the prior art, Vaswani et al propose a transform translation mechanism to realize machine translation, which completely utilizes an attention mechanism to realize machine translation, and at a coding end, the transform translation mechanism is formed by stacking 6 identical coding layers, wherein each layer consists of a multi-head self-attention mechanism sublayer and a feedforward neural network sublayer, and the layers use residual connection and layer normalization. At the decoding end, a stack of 6 identical decoding layers is used, where each layer decoder has one more masking attention layer than the encoder.
Wang et al in the prior art proposed a Tree-Transformer translation mechanism to implement machine translation, which can consider the syntactic information in a sentence during translation, and which adds a component attention module for capturing the syntactic information on the basis of multi-head self-attention of the conventional Transformer encoding end.
The three algorithms are respectively derived from the top academic conference in the field, are newer algorithms with excellent performance in the automatic translation method based on machine learning in the prior art, but have room for improvement.
Currently, attention has become the core key of most automatic translation methods based on machine learning, and the accuracy of attention directly determines the quality of translation. Under different attention generation mechanisms, the calculation results of the attention generation mechanisms are inconsistent, and the attention generated by any single mechanism cannot truly reflect the theoretical attention in the language completely and accurately.
When the practical decision is taken into consideration, the way is widely opened, people have a desire to speak, and then the opinions of people are collated to form a democratic decision which is generally more ideal than a speaker-type independent line decision. We speculate that the translation accuracy can be improved by introducing a mechanism similar to democratic decision in the attention of automatic translation formation.
Disclosure of Invention
The invention provides a multi-mechanism attention-combined multi-path neural machine translation method, which is used for effectively improving the translation quality.
The technical scheme of the invention is as follows: a multi-mechanism attention-combined multipath neural machine translation method combines a CNN translation mechanism, a Transformer translation mechanism and a Tree-Transformer translation mechanism. The attention value of each automatic translation method is independently generated, and the calculated attention value is subjected to weighted accumulation, wherein the updated algorithm and the algorithm with better actual experimental data are considered to be closer to the theoretical attention value possibly due to the fact that the calculated attention value of the updated algorithm and the calculated attention value of the actual experimental data are closer, therefore, the algorithms are endowed with higher weights during accumulation, specific weight values need to be further determined through experiments, and the new attention value is formed through normalization after alignment accumulation.
In the multi-computer method, a newer algorithm and an algorithm with better experimental data are closer to real attention theoretically, so that the newer algorithm or the algorithm with better experimental data is endowed with higher weight in the process of designing the weight of the method.
In the process of constructing a multi-mechanism attention-combined multi-path neural machine translation model, firstly, the sum of an input word embedding vector and a position embedding vector is input to a plurality of translation mechanisms respectively, then each translation mechanism trains the input according to the training mode of the translation mechanism, a training model of the translation mechanism is formed respectively, and the attention vector of the translation mechanism is calculated. At the encoding end of the model, the plurality of attention values obtained through calculation are weighted and superposed, then the plurality of attention values are aligned and normalized to form a new attention value which is transmitted to a Dec-Enc attention layer of a decoder, so that each translation mechanism finishes the subsequent machine translation process to obtain a decoding key-value matrix. And performing weighted superposition and normalization by adopting the decoding key-value matrixes generated by all mechanisms, and generating a target translation through a linear transformation layer and a softmax layer.
The method comprises the following specific steps:
step1, collecting training corpora;
step2, preprocessing the corpus: and performing word segmentation, lowercase processing and data cleaning on the bilingual corpus in the training corpus by using MOSES (motion-based expert system), and finally keeping sentence pairs with the length within 175, and performing word segmentation on all preprocessed data by using a BPE (business process optimization) algorithm.
Step3, extracting a part from the preprocessed corpus to be used as a test set, a part to be used as a verification set and the other parts to be used as a training set: randomly extracting 160K parallel corpora from the processed corpora to be used as a training set, using 7K parallel corpora as a verification set to train the translation model, and using 6K parallel corpora as a test set to evaluate the translation model; the training set is used for training parameters in the neural network, the test set is used for testing the accuracy of the current translation model, and the hyperparameters such as iteration times, learning rate and the like are adjusted according to the test result of the verification set, so that the translation model is better in performance.
Step4, generating a source language word embedding vector for training and a position embedding vector for training from the training set corpus, splicing the source language word embedding vector and the position embedding vector for training together to serve as input, and respectively inputting the input into a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, wherein each translation mechanism trains the input according to the training mode of the translation mechanism to respectively form a training model of the translation mechanism.
Step5, generating language word embedded vectors and position embedded vectors for the sentences to be translated, inputting the language word embedded vectors and the position embedded vectors into a plurality of translation mechanisms respectively, and calculating corresponding attention vectors by each translation mechanism according to self-training models. By adopting a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, namely adopting the superposition of 3 models, the model firstly converts an input sequence into word embedding vectors, adds a position embedding vector for each input word embedding vector in order to enable the model to learn the sequence order of words in the sequence, wherein the position embedding vector represents the position relation of different words in a source sentence, and defines that the position embedding vector and the word embedding vector are respectively represented as p ═ respectively (p ═ is)1,…,pm) And w ═ w1,…,wm) Wherein the position embedding vector is calculated by adopting the following formula:
Figure BDA0002758044630000031
Figure BDA0002758044630000032
wherein pos represents the position of a word in the source sentence, and i represents the dimension;
adding the word embedding vector and the position embedding vector to be input into the model, respectively inputting the word embedding vector and the position embedding vector to a plurality of translation mechanisms, and training the input by each translation mechanism according to a training mode of the translation mechanism to respectively form a training model of the translation mechanism.
Step6, carrying out weighted superposition on the plurality of attention values obtained by calculation in the Step5, then aligning and normalizing to form a new attention value, wherein the new attention value is used as new attention, relatively speaking, an algorithm with higher performance on the tested corpus can obtain higher weight, and the specific weight is obtained by experiments.
And Step7, sending the attention value obtained by the calculation in the Step6 to a Dec-Enc attention layer of a decoder, and enabling each translation mechanism to complete the subsequent machine translation process to respectively generate a decoding key-value matrix.
And Step8, performing weighted superposition and normalization on the decoding key-value matrixes generated by the mechanisms, and sending the result to a linear transformation layer and a softmax layer to generate a target translation.
The invention has the beneficial effects that:
1. in the invention, each automatic translation method independently generates the attention value of the user, the calculated attention values are subjected to weighted accumulation, and then the new attention values are formed by alignment and normalization, so that a better translation effect is obtained.
2. The process of multi-mechanism attention superposition and normalization can effectively integrate the analysis capability of various algorithms, form the democratic decision advantage like 'Zhuge Liang at the top of three smelly skinners', and the formed attention is closer to the theoretical real attention, thereby obtaining better translation effect.
Drawings
FIG. 1 is a block flow diagram of the present invention;
FIG. 2 is a bar graph of experimental results of the present invention;
Detailed Description
The invention is further described with reference to the following figures and specific examples.
Example 1: in this example, the german-english language material is used as the translation language material, and the selected multi-decision method is the CNN translation mechanism, the transform translation mechanism, and the Tree-transform translation mechanism, respectively.
As shown in fig. 1-2, a multi-mechanism attention-combined multi-path neural machine translation method includes the following specific steps:
and (3) model construction process:
step1, downloading the German and English language material from the website, and determining a plurality of translation mechanisms;
step2, preprocessing the corpus: performing word segmentation, lowercase processing and data cleaning on bilingual corpus by using MOSES (motion-based expert system), and finally keeping sentence pairs with the length within 175, and performing word segmentation on all preprocessed data by using a BPE (business process optimization) algorithm;
step3, generating a training set, a verification set and a test set: randomly extracting 160K parallel corpora from the processed corpora to be used as a training set, 7K parallel corpora to be used as a verification set to train the translation model, and 6K parallel corpora to be used as a test set to evaluate the translation model;
step4, fully utilizing the advantages of a CNN translation mechanism, a Transformer translation mechanism and a Tree-Transformer translation mechanism, and respectively using a convolutional neural network-based encoder, a Transformer encoder and a Tree-Transformer encoder to encode an input sequence in an encoding end;
in order to enable the model to learn the sequence order of words in a source sentence, the position embedding vector and the word embedding vector are added in a bit-by-bit mode to serve as input of a coding end, the model can capture the position information of words in the input sequence, the position embedding vector represents the position information of different words in the input sequence, and the position embedding vector and the word embedding vector are defined to be respectively represented as p ═ p (p ═ p)1,…,pm) And w ═ w1,…,wm);
Step5, training the input by all translation mechanisms in the translation model according to the training mode of the translation mechanisms, respectively forming the training models of the translation mechanisms, and calculating respective attention vectors;
step6, carrying out weighted superposition on the plurality of attention values obtained by calculation in the Step5, aligning and normalizing to form a new attention value;
step7, between the encoder and the decoder, adopting an attention fusion module with fusion function, and automatically acquiring the information required by the decoding target word. At the decoding end, the decoder of each path calculates attention using the three-path output of the encoder as context. Thus, there are nine types of information streams that pass from the encoder to the decoder. Specifically, the Attention value calculated in step6 is sent to the Dec-Enc authorization layer of the decoder, which extracts the context information generated by three paths at the encoding end and the output of the decoder at the previous moment as the decoder input for decoding, wherein the specific calculation formula of the Dec-Enc authorization module is as follows:
ctxcc=Attention(qc,kc,vc)
ctxca=Attention(qc,ka,va)
ctxcl=Attention(qc,kl,vl)
ctxaa=Attention(qa,ka,va)
ctxac=Attention(qa,kc,vc)
ctxal=Attention(qa,kl,vl)
ctxll=Attention(ql,kl,vl)
ctxlc=Attention(ql,kc,vc)
ctxla=Attention(ql,ka,va)
wherein ctxccAttention query value q referring to CNN path in decodercAnd note key k of CNN path in encoding endcSum value vcAttention results of (1). ctx (ctx)caAttention query value q referring to CNN path in decodercAnd attention key k of Transformer path in encoding endaSum value vaAttention results of (1). ctx (ctx)clAttention query value q referring to CNN path in decodercAnd an attention key k of the Tree-Transformer path in the encoding endlSum value vlAttention results of (1). ctx (ctx)aaAttention query value q referring to CNN path in decoderaAnd note key k of CNN path in encoding endaSum value vaAttention results of (1). ctx (ctx)acAttention query value q referring to CNN path in decoderaAnd attention key k of Transformer path in encoding endcSum value vcAttention results of (1). ctx (ctx)clAttention query value q referring to CNN path in decoderaAnd an attention key k of the Tree-Transformer path in the encoding endlSum value vlAttention results of (1). ctx (ctx)llAttention query value q referring to CNN path in decoderlAnd note key k of CNN path in encoding endlSum value vlAttention results of (1). ctx (ctx)lcAttention query value q referring to CNN path in decoderlAnd attention key k of Transformer path in encoding endcSum value vcAttention results of (1). ctx (ctx)laAttention query value q referring to CNN path in decoderlAnd an attention key k of the Tree-Transformer path in the encoding endaSum value vaAttention results of (1).
To fully exploit the information captured by the different encoder paths, we use a weighted summation mechanism to fuse them.
Step8, predicting the target word: at a decoding end, after the three decoders generate decoding information, integrating the information generated by the three decoders by adopting a weighted summation mechanism, and transmitting an integration result into a prediction target word of a softmax layer, wherein the formula is as follows:
zo=normal(zc+za+zt)
P(y)=softmax(zoWs+bs)
wherein z isc、za、ztRespectively representing the decoded information generated by the three decoders. z is a radical ofoRepresenting the final output result of the three-way decoder fusion. P (y) is the predicted probability of the target word.
In order to verify the effectiveness of the method, a neural machine translation model, a Transformer translation model, a Tree Transformer translation model and a translation model combining CNN and Transformer are compared in an experiment;
when the model parameters are set, the set parameters are as follows:
the operating environment of this experiment was: pythen 3.6, the deep learning framework is torch 0.4.0, the experimental corpus selects IWSLT2014 delta corpus, for a specific algorithm in the experiment, the specific parameters set by us are all 256-dimensional word embedding dimensions, the number of network layers of an encoder and a decoder is 2, the number of hidden units of each layer is 256, dropout is set to 0.1, filter _ size is set to 1024, kernel _ size is set to 3, learning rate is set to 0.25, label smoothing rate is set to 0.1, an NAG optimizer is used to optimize a training model, and the batch size is 128.
To demonstrate the effectiveness of our method, we compared our method with four reference models, CNN translation mechanism, Transformer translation mechanism, Tree-Transformer translation mechanism, CNN + Transformer translation mechanism, respectively.
Since the specific parameters of each method may affect the final experimental data, but fine tuning of these parameters is not relevant to highlight the benefit of the present invention, and to show the effectiveness and benefit of the present invention generally, we set the operating parameters of all algorithms close, which are also close to the examples provided by the original authors of these algorithms.
The BLEU value is adopted to evaluate a translation model, and as can be seen from the attached figure 2 and the table 1, the multi-mechanism attention-combined multipath neural machine translation method can effectively improve the performance of neural machine translation.
To demonstrate the benefit of the present invention, we designed two examples (example 5, example 6) for comparison with the existing method, both of which generated attention in a three-mechanism overlay manner, where example 5 did not use a weighted overlay, and example 6 set twice the weight of the updated Tree transform method. In this experiment, the encoding end of the algorithm 5 respectively adopts a transform encoder, a Tree transform encoder and a CNN-based encoder, the decoding end only adopts 1 transform decoder, and the attention generation scheme is as follows. Algorithm 6 the encoding side adopts a transform encoder, a Tree transform encoder and a CNN-based encoder respectively, the decoding side adopts 1 transform decoder and 1 CNN-based decoder, and adopts 2 times of syntax information.
Table 1 shows the translation results of different models
Model (model) De-English dataCollection
Algorithm 1 CNN 29.07
Algorithm 2 Transformer 28.65
Algorithm 3 Tree Transformer 29.62
Algorithm 4 CNN+Transformer 31.69
Algorithm 5 (invention) CNN+Transformer+Tree Transformer 32.49
Algorithm 6 (invention) CNN+Transformer+2*Tree Transformer 32.69
The translation results of the de-english corpus for the translation model and the baseline model proposed by the present invention are shown in table 1. As can be seen from Table 1, the CNN, Transformer, Tree Transformer models all expect the correct results in German-English and the Tree Transformer has better performance. The three-mechanism attention weighted superposition method has a better BLEU value, is more accurate in translation, and is more accurate and consistent with the democratic voting effect predicted by people compared with a single decision. In particular, when we give 2 times weight to the Tree transducer, the BLEU value continues to increase by 0.2, indicating that our weighting strategy is effective. At present, based on an attention generation mechanism with weighted democratic voting, better experimental performance is obtained, and the beneficial effects of the invention are fully embodied.
The multi-mechanism attention-combined multi-path neural machine translation method provided by the invention has good performance on a translation task, and mainly has the following reasons: 1. the translation model simultaneously combines the advantages of a CNN translation mechanism, a transform translation mechanism and a Tree transform translation mechanism, wherein the Tree transform can be integrated with syntactic information during translation. 2. The attention value of each automatic translation method is independently generated, the calculated attention values are subjected to weighted accumulation, and then the calculated attention values are aligned and normalized to form a new attention value. The process of multi-mechanism attention superposition and normalization can effectively integrate the analysis capability of various algorithms, form the multi-decision-making advantage like 'three smelly skinners carrying all the Zhuge Liang', and the formed attention is closer to the theoretical real attention, thereby obtaining better translation effect.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (6)

1. The multi-mechanism attention-combined multipath neural machine translation method is characterized by comprising the following steps of: the method comprises the following specific steps:
step1, collecting training corpora;
step2, preprocessing the training corpus;
step3, extracting a training set, a verification set and a test set from the preprocessed training corpus;
step4, generating a source language word embedding vector for training and a position embedding vector for training from the corpus of the training set, splicing the source language word embedding vector and the position embedding vector for training together to serve as input, and respectively inputting the input into a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, wherein each translation mechanism trains the input according to a training mode of the translation mechanism to respectively form a training model of the translation mechanism;
step5, generating language word embedded vectors and position embedded vectors for the sentences to be translated, and inputting the language word embedded vectors and the position embedded vectors into a plurality of translation mechanisms respectively, wherein each translation mechanism calculates corresponding attention vectors according to a self-training model;
step6, carrying out weighted superposition on the plurality of attention values obtained by calculation in the Step5, aligning and normalizing to form a new attention value;
step7, sending the attention value obtained by the calculation in the Step6 to a Dec-Enc attention layer of a decoder, and enabling each translation mechanism to complete the subsequent machine translation process and respectively generate a decoding key-value matrix;
and Step8, performing weighted superposition and normalization on the decoding key-value matrixes generated by the mechanisms, and sending the result to a linear transformation layer and a softmax layer to generate a target translation.
2. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: the preprocessing in Step2 is to perform word segmentation, lowercase processing and data cleaning on the bilingual corpus in the corpus, and finally keep the sentence pair with the length within 175, and then perform word segmentation on all the preprocessed data by using a BPE algorithm.
3. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: the Step of extracting the training set, the verification set and the test set in Step3 means that 160K parallel corpora are randomly extracted from the processed corpora to be used as the training set, 7K parallel corpora are used as the verification set to train the translation model, and 6K parallel corpora are used as the test set to evaluate the translation model.
4. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: the Step4 adopts a CNN translation mechanismThe model firstly converts an input sequence into word embedding vectors, adds a position embedding vector to each input word embedding vector in order to enable the model to learn the sequence order of words in the sequence, wherein the position embedding vector represents the position relation of different words in a source sentence, and defines that the position embedding vector and the word embedding vector are respectively represented as p ═ p (p ═ p)1,…,pm) And w ═ w1,…,wm) Wherein the position embedding vector is calculated by adopting the following formula:
Figure FDA0002758044620000021
Figure FDA0002758044620000022
wherein pos represents the position of a word in the source sentence, and i represents the dimension;
adding the word embedding vector and the position embedding vector to be input into the model, respectively inputting the word embedding vector and the position embedding vector to a plurality of translation mechanisms, and training the input by each translation mechanism according to a training mode of the translation mechanism to respectively form a training model of the translation mechanism.
5. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: in the Step6, the encoding end receives the input vectors, calculates the attention values respectively, performs weighted superposition, then aligns and normalizes the input vectors to form a new attention value, wherein the encoding end comprises three encoders, and the weight given to the Tree-Transformer translation mechanism is twice of the weights given to the other two translation mechanisms.
6. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: in Step7, the Dec-Enc attribute layer of the decoder receives the attention value key-value key value pair generated by the encoding end, respectively, the query matrix q and the key matrix k in the decoder perform dot product operation, and then perform weighted summation with the value matrix v, so that each translation mechanism completes the subsequent machine translation process, and generates a decoding key-value matrix, respectively.
CN202011209086.0A 2020-11-03 2020-11-03 Multi-mechanism attention-combined multi-path neural machine translation method Pending CN112329483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011209086.0A CN112329483A (en) 2020-11-03 2020-11-03 Multi-mechanism attention-combined multi-path neural machine translation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011209086.0A CN112329483A (en) 2020-11-03 2020-11-03 Multi-mechanism attention-combined multi-path neural machine translation method

Publications (1)

Publication Number Publication Date
CN112329483A true CN112329483A (en) 2021-02-05

Family

ID=74322805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011209086.0A Pending CN112329483A (en) 2020-11-03 2020-11-03 Multi-mechanism attention-combined multi-path neural machine translation method

Country Status (1)

Country Link
CN (1) CN112329483A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717343A (en) * 2019-09-27 2020-01-21 电子科技大学 Optimal alignment method based on transformer attention mechanism output
CN114118111A (en) * 2021-11-26 2022-03-01 昆明理工大学 Multi-mode machine translation method fusing text and picture characteristics
CN114580443A (en) * 2022-03-01 2022-06-03 腾讯科技(深圳)有限公司 Text translation method, text translation device, kernel function combination method, server and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377918A (en) * 2019-07-15 2019-10-25 昆明理工大学 Merge the more neural machine translation method of the Chinese-of syntax analytic tree

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377918A (en) * 2019-07-15 2019-10-25 昆明理工大学 Merge the more neural machine translation method of the Chinese-of syntax analytic tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAITAO SONG等: "Double Path Networks for Sequence to Sequence Learning", 《COMPUTATION AND LANGUAGE》 *
习翔宇: "论文解读:Attention is all you need", 《ZHUANLAN.ZHIHU.COM/P/46990010》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717343A (en) * 2019-09-27 2020-01-21 电子科技大学 Optimal alignment method based on transformer attention mechanism output
CN110717343B (en) * 2019-09-27 2023-03-14 电子科技大学 Optimal alignment method based on transformer attention mechanism output
CN114118111A (en) * 2021-11-26 2022-03-01 昆明理工大学 Multi-mode machine translation method fusing text and picture characteristics
CN114118111B (en) * 2021-11-26 2024-05-24 昆明理工大学 Multi-mode machine translation method integrating text and picture features
CN114580443A (en) * 2022-03-01 2022-06-03 腾讯科技(深圳)有限公司 Text translation method, text translation device, kernel function combination method, server and medium

Similar Documents

Publication Publication Date Title
CN108052512B (en) Image description generation method based on depth attention mechanism
CN112329483A (en) Multi-mechanism attention-combined multi-path neural machine translation method
CN110490946B (en) Text image generation method based on cross-modal similarity and antagonism network generation
CN112613303B (en) Knowledge distillation-based cross-modal image aesthetic quality evaluation method
CN110472238B (en) Text summarization method based on hierarchical interaction attention
CN109492227A (en) It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations
CN112733533B (en) Multi-modal named entity recognition method based on BERT model and text-image relation propagation
CN111274398A (en) Method and system for analyzing comment emotion of aspect-level user product
WO2022020467A1 (en) System and method for training multilingual machine translation evaluation models
He et al. Improving neural relation extraction with positive and unlabeled learning
CN114398976A (en) Machine reading understanding method based on BERT and gate control type attention enhancement network
US20240119716A1 (en) Method for multimodal emotion classification based on modal space assimilation and contrastive learning
Zhao et al. RoR: Read-over-read for long document machine reading comprehension
CN113704437A (en) Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding
CN114238649B (en) Language model pre-training method with common sense concept enhancement
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN112015760B (en) Automatic question-answering method and device based on candidate answer set reordering and storage medium
CN113901847A (en) Neural machine translation method based on source language syntax enhanced decoding
CN113822125A (en) Processing method and device of lip language recognition model, computer equipment and storage medium
CN114648024B (en) Method for generating cross-language abstract of Chinese crossing based on multi-type word information guidance
CN116450877A (en) Image text matching method based on semantic selection and hierarchical alignment
CN117174163A (en) Virus evolution trend prediction method and system
CN116010622A (en) BERT knowledge graph completion method and system for fusion entity type
CN117350330A (en) Semi-supervised entity alignment method based on hybrid teaching
CN115810351A (en) Controller voice recognition method and device based on audio-visual fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210205

RJ01 Rejection of invention patent application after publication