CN112329483A - Multi-mechanism attention-combined multi-path neural machine translation method - Google Patents
Multi-mechanism attention-combined multi-path neural machine translation method Download PDFInfo
- Publication number
- CN112329483A CN112329483A CN202011209086.0A CN202011209086A CN112329483A CN 112329483 A CN112329483 A CN 112329483A CN 202011209086 A CN202011209086 A CN 202011209086A CN 112329483 A CN112329483 A CN 112329483A
- Authority
- CN
- China
- Prior art keywords
- attention
- translation
- training
- embedding vector
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 136
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000001537 neural effect Effects 0.000 title claims abstract description 18
- 230000007246 mechanism Effects 0.000 claims abstract description 80
- 230000008569 process Effects 0.000 claims abstract description 12
- 238000010606 normalization Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims abstract description 8
- 230000009466 transformation Effects 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 51
- 238000012549 training Methods 0.000 claims description 40
- 238000012360 testing method Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 5
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 34
- 230000008901 benefit Effects 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 6
- 230000004927 fusion Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013475 authorization Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 229930091051 Arenine Natural products 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a multi-mechanism attention-combined multi-path neural machine translation method, and belongs to the field of natural language processing. The invention independently generates self attention values by a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, carries out weighted accumulation on the calculated attention values, then aligns and normalizes the calculated attention values to form a new attention value, and transmits the new attention value to a Dec-Enc attribute layer of a decoder, so that each translation mechanism finishes the subsequent machine translation process to obtain a decoding key-value matrix. And performing weighted superposition and normalization by adopting the decoding key-value matrixes generated by all mechanisms, and generating a target translation through a linear transformation layer and a softmax layer. The process of overlapping and normalizing the attention of the multiple mechanisms can effectively integrate the analysis capability of various algorithms, and the formed attention is closer to the theoretical real attention, so that a better translation effect is obtained, and the translation accuracy can be effectively improved.
Description
Technical Field
The invention relates to a multi-mechanism attention-combined multi-path neural machine translation method, and belongs to the field of natural language processing.
Background
Machine translation, which is a process implemented by a computer to translate a sentence in one language (a source language sentence) into a sentence in another language (a target language sentence) having the same meaning, has become an important research direction in the field of artificial intelligence.
Gehring et al in the prior art propose a CNN translation mechanism to realize machine translation, which completely utilizes a convolutional neural network to realize machine translation, and the convolutional neural network is respectively used as working units of an encoder and a decoder, wherein the encoder and the decoder are formed by stacking multiple layers of convolutional neural networks. At the encoding end, the input sequence is encoded using a convolution operation. At the decoding end, each convolutional layer performs attention, and the result continues as input for the next layer. And finally, predicting the next target word based on the hidden state of the last layer.
In the prior art, Vaswani et al propose a transform translation mechanism to realize machine translation, which completely utilizes an attention mechanism to realize machine translation, and at a coding end, the transform translation mechanism is formed by stacking 6 identical coding layers, wherein each layer consists of a multi-head self-attention mechanism sublayer and a feedforward neural network sublayer, and the layers use residual connection and layer normalization. At the decoding end, a stack of 6 identical decoding layers is used, where each layer decoder has one more masking attention layer than the encoder.
Wang et al in the prior art proposed a Tree-Transformer translation mechanism to implement machine translation, which can consider the syntactic information in a sentence during translation, and which adds a component attention module for capturing the syntactic information on the basis of multi-head self-attention of the conventional Transformer encoding end.
The three algorithms are respectively derived from the top academic conference in the field, are newer algorithms with excellent performance in the automatic translation method based on machine learning in the prior art, but have room for improvement.
Currently, attention has become the core key of most automatic translation methods based on machine learning, and the accuracy of attention directly determines the quality of translation. Under different attention generation mechanisms, the calculation results of the attention generation mechanisms are inconsistent, and the attention generated by any single mechanism cannot truly reflect the theoretical attention in the language completely and accurately.
When the practical decision is taken into consideration, the way is widely opened, people have a desire to speak, and then the opinions of people are collated to form a democratic decision which is generally more ideal than a speaker-type independent line decision. We speculate that the translation accuracy can be improved by introducing a mechanism similar to democratic decision in the attention of automatic translation formation.
Disclosure of Invention
The invention provides a multi-mechanism attention-combined multi-path neural machine translation method, which is used for effectively improving the translation quality.
The technical scheme of the invention is as follows: a multi-mechanism attention-combined multipath neural machine translation method combines a CNN translation mechanism, a Transformer translation mechanism and a Tree-Transformer translation mechanism. The attention value of each automatic translation method is independently generated, and the calculated attention value is subjected to weighted accumulation, wherein the updated algorithm and the algorithm with better actual experimental data are considered to be closer to the theoretical attention value possibly due to the fact that the calculated attention value of the updated algorithm and the calculated attention value of the actual experimental data are closer, therefore, the algorithms are endowed with higher weights during accumulation, specific weight values need to be further determined through experiments, and the new attention value is formed through normalization after alignment accumulation.
In the multi-computer method, a newer algorithm and an algorithm with better experimental data are closer to real attention theoretically, so that the newer algorithm or the algorithm with better experimental data is endowed with higher weight in the process of designing the weight of the method.
In the process of constructing a multi-mechanism attention-combined multi-path neural machine translation model, firstly, the sum of an input word embedding vector and a position embedding vector is input to a plurality of translation mechanisms respectively, then each translation mechanism trains the input according to the training mode of the translation mechanism, a training model of the translation mechanism is formed respectively, and the attention vector of the translation mechanism is calculated. At the encoding end of the model, the plurality of attention values obtained through calculation are weighted and superposed, then the plurality of attention values are aligned and normalized to form a new attention value which is transmitted to a Dec-Enc attention layer of a decoder, so that each translation mechanism finishes the subsequent machine translation process to obtain a decoding key-value matrix. And performing weighted superposition and normalization by adopting the decoding key-value matrixes generated by all mechanisms, and generating a target translation through a linear transformation layer and a softmax layer.
The method comprises the following specific steps:
step1, collecting training corpora;
step2, preprocessing the corpus: and performing word segmentation, lowercase processing and data cleaning on the bilingual corpus in the training corpus by using MOSES (motion-based expert system), and finally keeping sentence pairs with the length within 175, and performing word segmentation on all preprocessed data by using a BPE (business process optimization) algorithm.
Step3, extracting a part from the preprocessed corpus to be used as a test set, a part to be used as a verification set and the other parts to be used as a training set: randomly extracting 160K parallel corpora from the processed corpora to be used as a training set, using 7K parallel corpora as a verification set to train the translation model, and using 6K parallel corpora as a test set to evaluate the translation model; the training set is used for training parameters in the neural network, the test set is used for testing the accuracy of the current translation model, and the hyperparameters such as iteration times, learning rate and the like are adjusted according to the test result of the verification set, so that the translation model is better in performance.
Step4, generating a source language word embedding vector for training and a position embedding vector for training from the training set corpus, splicing the source language word embedding vector and the position embedding vector for training together to serve as input, and respectively inputting the input into a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, wherein each translation mechanism trains the input according to the training mode of the translation mechanism to respectively form a training model of the translation mechanism.
Step5, generating language word embedded vectors and position embedded vectors for the sentences to be translated, inputting the language word embedded vectors and the position embedded vectors into a plurality of translation mechanisms respectively, and calculating corresponding attention vectors by each translation mechanism according to self-training models. By adopting a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, namely adopting the superposition of 3 models, the model firstly converts an input sequence into word embedding vectors, adds a position embedding vector for each input word embedding vector in order to enable the model to learn the sequence order of words in the sequence, wherein the position embedding vector represents the position relation of different words in a source sentence, and defines that the position embedding vector and the word embedding vector are respectively represented as p ═ respectively (p ═ is)1,…,pm) And w ═ w1,…,wm) Wherein the position embedding vector is calculated by adopting the following formula:
wherein pos represents the position of a word in the source sentence, and i represents the dimension;
adding the word embedding vector and the position embedding vector to be input into the model, respectively inputting the word embedding vector and the position embedding vector to a plurality of translation mechanisms, and training the input by each translation mechanism according to a training mode of the translation mechanism to respectively form a training model of the translation mechanism.
Step6, carrying out weighted superposition on the plurality of attention values obtained by calculation in the Step5, then aligning and normalizing to form a new attention value, wherein the new attention value is used as new attention, relatively speaking, an algorithm with higher performance on the tested corpus can obtain higher weight, and the specific weight is obtained by experiments.
And Step7, sending the attention value obtained by the calculation in the Step6 to a Dec-Enc attention layer of a decoder, and enabling each translation mechanism to complete the subsequent machine translation process to respectively generate a decoding key-value matrix.
And Step8, performing weighted superposition and normalization on the decoding key-value matrixes generated by the mechanisms, and sending the result to a linear transformation layer and a softmax layer to generate a target translation.
The invention has the beneficial effects that:
1. in the invention, each automatic translation method independently generates the attention value of the user, the calculated attention values are subjected to weighted accumulation, and then the new attention values are formed by alignment and normalization, so that a better translation effect is obtained.
2. The process of multi-mechanism attention superposition and normalization can effectively integrate the analysis capability of various algorithms, form the democratic decision advantage like 'Zhuge Liang at the top of three smelly skinners', and the formed attention is closer to the theoretical real attention, thereby obtaining better translation effect.
Drawings
FIG. 1 is a block flow diagram of the present invention;
FIG. 2 is a bar graph of experimental results of the present invention;
Detailed Description
The invention is further described with reference to the following figures and specific examples.
Example 1: in this example, the german-english language material is used as the translation language material, and the selected multi-decision method is the CNN translation mechanism, the transform translation mechanism, and the Tree-transform translation mechanism, respectively.
As shown in fig. 1-2, a multi-mechanism attention-combined multi-path neural machine translation method includes the following specific steps:
and (3) model construction process:
step1, downloading the German and English language material from the website, and determining a plurality of translation mechanisms;
step2, preprocessing the corpus: performing word segmentation, lowercase processing and data cleaning on bilingual corpus by using MOSES (motion-based expert system), and finally keeping sentence pairs with the length within 175, and performing word segmentation on all preprocessed data by using a BPE (business process optimization) algorithm;
step3, generating a training set, a verification set and a test set: randomly extracting 160K parallel corpora from the processed corpora to be used as a training set, 7K parallel corpora to be used as a verification set to train the translation model, and 6K parallel corpora to be used as a test set to evaluate the translation model;
step4, fully utilizing the advantages of a CNN translation mechanism, a Transformer translation mechanism and a Tree-Transformer translation mechanism, and respectively using a convolutional neural network-based encoder, a Transformer encoder and a Tree-Transformer encoder to encode an input sequence in an encoding end;
in order to enable the model to learn the sequence order of words in a source sentence, the position embedding vector and the word embedding vector are added in a bit-by-bit mode to serve as input of a coding end, the model can capture the position information of words in the input sequence, the position embedding vector represents the position information of different words in the input sequence, and the position embedding vector and the word embedding vector are defined to be respectively represented as p ═ p (p ═ p)1,…,pm) And w ═ w1,…,wm);
Step5, training the input by all translation mechanisms in the translation model according to the training mode of the translation mechanisms, respectively forming the training models of the translation mechanisms, and calculating respective attention vectors;
step6, carrying out weighted superposition on the plurality of attention values obtained by calculation in the Step5, aligning and normalizing to form a new attention value;
step7, between the encoder and the decoder, adopting an attention fusion module with fusion function, and automatically acquiring the information required by the decoding target word. At the decoding end, the decoder of each path calculates attention using the three-path output of the encoder as context. Thus, there are nine types of information streams that pass from the encoder to the decoder. Specifically, the Attention value calculated in step6 is sent to the Dec-Enc authorization layer of the decoder, which extracts the context information generated by three paths at the encoding end and the output of the decoder at the previous moment as the decoder input for decoding, wherein the specific calculation formula of the Dec-Enc authorization module is as follows:
ctxcc=Attention(qc,kc,vc)
ctxca=Attention(qc,ka,va)
ctxcl=Attention(qc,kl,vl)
ctxaa=Attention(qa,ka,va)
ctxac=Attention(qa,kc,vc)
ctxal=Attention(qa,kl,vl)
ctxll=Attention(ql,kl,vl)
ctxlc=Attention(ql,kc,vc)
ctxla=Attention(ql,ka,va)
wherein ctxccAttention query value q referring to CNN path in decodercAnd note key k of CNN path in encoding endcSum value vcAttention results of (1). ctx (ctx)caAttention query value q referring to CNN path in decodercAnd attention key k of Transformer path in encoding endaSum value vaAttention results of (1). ctx (ctx)clAttention query value q referring to CNN path in decodercAnd an attention key k of the Tree-Transformer path in the encoding endlSum value vlAttention results of (1). ctx (ctx)aaAttention query value q referring to CNN path in decoderaAnd note key k of CNN path in encoding endaSum value vaAttention results of (1). ctx (ctx)acAttention query value q referring to CNN path in decoderaAnd attention key k of Transformer path in encoding endcSum value vcAttention results of (1). ctx (ctx)clAttention query value q referring to CNN path in decoderaAnd an attention key k of the Tree-Transformer path in the encoding endlSum value vlAttention results of (1). ctx (ctx)llAttention query value q referring to CNN path in decoderlAnd note key k of CNN path in encoding endlSum value vlAttention results of (1). ctx (ctx)lcAttention query value q referring to CNN path in decoderlAnd attention key k of Transformer path in encoding endcSum value vcAttention results of (1). ctx (ctx)laAttention query value q referring to CNN path in decoderlAnd an attention key k of the Tree-Transformer path in the encoding endaSum value vaAttention results of (1).
To fully exploit the information captured by the different encoder paths, we use a weighted summation mechanism to fuse them.
Step8, predicting the target word: at a decoding end, after the three decoders generate decoding information, integrating the information generated by the three decoders by adopting a weighted summation mechanism, and transmitting an integration result into a prediction target word of a softmax layer, wherein the formula is as follows:
zo=normal(zc+za+zt)
P(y)=softmax(zoWs+bs)
wherein z isc、za、ztRespectively representing the decoded information generated by the three decoders. z is a radical ofoRepresenting the final output result of the three-way decoder fusion. P (y) is the predicted probability of the target word.
In order to verify the effectiveness of the method, a neural machine translation model, a Transformer translation model, a Tree Transformer translation model and a translation model combining CNN and Transformer are compared in an experiment;
when the model parameters are set, the set parameters are as follows:
the operating environment of this experiment was: pythen 3.6, the deep learning framework is torch 0.4.0, the experimental corpus selects IWSLT2014 delta corpus, for a specific algorithm in the experiment, the specific parameters set by us are all 256-dimensional word embedding dimensions, the number of network layers of an encoder and a decoder is 2, the number of hidden units of each layer is 256, dropout is set to 0.1, filter _ size is set to 1024, kernel _ size is set to 3, learning rate is set to 0.25, label smoothing rate is set to 0.1, an NAG optimizer is used to optimize a training model, and the batch size is 128.
To demonstrate the effectiveness of our method, we compared our method with four reference models, CNN translation mechanism, Transformer translation mechanism, Tree-Transformer translation mechanism, CNN + Transformer translation mechanism, respectively.
Since the specific parameters of each method may affect the final experimental data, but fine tuning of these parameters is not relevant to highlight the benefit of the present invention, and to show the effectiveness and benefit of the present invention generally, we set the operating parameters of all algorithms close, which are also close to the examples provided by the original authors of these algorithms.
The BLEU value is adopted to evaluate a translation model, and as can be seen from the attached figure 2 and the table 1, the multi-mechanism attention-combined multipath neural machine translation method can effectively improve the performance of neural machine translation.
To demonstrate the benefit of the present invention, we designed two examples (example 5, example 6) for comparison with the existing method, both of which generated attention in a three-mechanism overlay manner, where example 5 did not use a weighted overlay, and example 6 set twice the weight of the updated Tree transform method. In this experiment, the encoding end of the algorithm 5 respectively adopts a transform encoder, a Tree transform encoder and a CNN-based encoder, the decoding end only adopts 1 transform decoder, and the attention generation scheme is as follows. Algorithm 6 the encoding side adopts a transform encoder, a Tree transform encoder and a CNN-based encoder respectively, the decoding side adopts 1 transform decoder and 1 CNN-based decoder, and adopts 2 times of syntax information.
Table 1 shows the translation results of different models
Model (model) | De-English dataCollection | |
Algorithm 1 | CNN | 29.07 |
|
Transformer | 28.65 |
Algorithm 3 | Tree Transformer | 29.62 |
Algorithm 4 | CNN+Transformer | 31.69 |
Algorithm 5 (invention) | CNN+Transformer+Tree Transformer | 32.49 |
Algorithm 6 (invention) | CNN+Transformer+2*Tree Transformer | 32.69 |
The translation results of the de-english corpus for the translation model and the baseline model proposed by the present invention are shown in table 1. As can be seen from Table 1, the CNN, Transformer, Tree Transformer models all expect the correct results in German-English and the Tree Transformer has better performance. The three-mechanism attention weighted superposition method has a better BLEU value, is more accurate in translation, and is more accurate and consistent with the democratic voting effect predicted by people compared with a single decision. In particular, when we give 2 times weight to the Tree transducer, the BLEU value continues to increase by 0.2, indicating that our weighting strategy is effective. At present, based on an attention generation mechanism with weighted democratic voting, better experimental performance is obtained, and the beneficial effects of the invention are fully embodied.
The multi-mechanism attention-combined multi-path neural machine translation method provided by the invention has good performance on a translation task, and mainly has the following reasons: 1. the translation model simultaneously combines the advantages of a CNN translation mechanism, a transform translation mechanism and a Tree transform translation mechanism, wherein the Tree transform can be integrated with syntactic information during translation. 2. The attention value of each automatic translation method is independently generated, the calculated attention values are subjected to weighted accumulation, and then the calculated attention values are aligned and normalized to form a new attention value. The process of multi-mechanism attention superposition and normalization can effectively integrate the analysis capability of various algorithms, form the multi-decision-making advantage like 'three smelly skinners carrying all the Zhuge Liang', and the formed attention is closer to the theoretical real attention, thereby obtaining better translation effect.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (6)
1. The multi-mechanism attention-combined multipath neural machine translation method is characterized by comprising the following steps of: the method comprises the following specific steps:
step1, collecting training corpora;
step2, preprocessing the training corpus;
step3, extracting a training set, a verification set and a test set from the preprocessed training corpus;
step4, generating a source language word embedding vector for training and a position embedding vector for training from the corpus of the training set, splicing the source language word embedding vector and the position embedding vector for training together to serve as input, and respectively inputting the input into a CNN translation mechanism, a transform translation mechanism and a Tree-transform translation mechanism, wherein each translation mechanism trains the input according to a training mode of the translation mechanism to respectively form a training model of the translation mechanism;
step5, generating language word embedded vectors and position embedded vectors for the sentences to be translated, and inputting the language word embedded vectors and the position embedded vectors into a plurality of translation mechanisms respectively, wherein each translation mechanism calculates corresponding attention vectors according to a self-training model;
step6, carrying out weighted superposition on the plurality of attention values obtained by calculation in the Step5, aligning and normalizing to form a new attention value;
step7, sending the attention value obtained by the calculation in the Step6 to a Dec-Enc attention layer of a decoder, and enabling each translation mechanism to complete the subsequent machine translation process and respectively generate a decoding key-value matrix;
and Step8, performing weighted superposition and normalization on the decoding key-value matrixes generated by the mechanisms, and sending the result to a linear transformation layer and a softmax layer to generate a target translation.
2. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: the preprocessing in Step2 is to perform word segmentation, lowercase processing and data cleaning on the bilingual corpus in the corpus, and finally keep the sentence pair with the length within 175, and then perform word segmentation on all the preprocessed data by using a BPE algorithm.
3. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: the Step of extracting the training set, the verification set and the test set in Step3 means that 160K parallel corpora are randomly extracted from the processed corpora to be used as the training set, 7K parallel corpora are used as the verification set to train the translation model, and 6K parallel corpora are used as the test set to evaluate the translation model.
4. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: the Step4 adopts a CNN translation mechanismThe model firstly converts an input sequence into word embedding vectors, adds a position embedding vector to each input word embedding vector in order to enable the model to learn the sequence order of words in the sequence, wherein the position embedding vector represents the position relation of different words in a source sentence, and defines that the position embedding vector and the word embedding vector are respectively represented as p ═ p (p ═ p)1,…,pm) And w ═ w1,…,wm) Wherein the position embedding vector is calculated by adopting the following formula:
wherein pos represents the position of a word in the source sentence, and i represents the dimension;
adding the word embedding vector and the position embedding vector to be input into the model, respectively inputting the word embedding vector and the position embedding vector to a plurality of translation mechanisms, and training the input by each translation mechanism according to a training mode of the translation mechanism to respectively form a training model of the translation mechanism.
5. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: in the Step6, the encoding end receives the input vectors, calculates the attention values respectively, performs weighted superposition, then aligns and normalizes the input vectors to form a new attention value, wherein the encoding end comprises three encoders, and the weight given to the Tree-Transformer translation mechanism is twice of the weights given to the other two translation mechanisms.
6. The multi-mechanism attention-merging multipath neural machine translation method of claim 1, wherein: in Step7, the Dec-Enc attribute layer of the decoder receives the attention value key-value key value pair generated by the encoding end, respectively, the query matrix q and the key matrix k in the decoder perform dot product operation, and then perform weighted summation with the value matrix v, so that each translation mechanism completes the subsequent machine translation process, and generates a decoding key-value matrix, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011209086.0A CN112329483A (en) | 2020-11-03 | 2020-11-03 | Multi-mechanism attention-combined multi-path neural machine translation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011209086.0A CN112329483A (en) | 2020-11-03 | 2020-11-03 | Multi-mechanism attention-combined multi-path neural machine translation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112329483A true CN112329483A (en) | 2021-02-05 |
Family
ID=74322805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011209086.0A Pending CN112329483A (en) | 2020-11-03 | 2020-11-03 | Multi-mechanism attention-combined multi-path neural machine translation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112329483A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717343A (en) * | 2019-09-27 | 2020-01-21 | 电子科技大学 | Optimal alignment method based on transformer attention mechanism output |
CN114118111A (en) * | 2021-11-26 | 2022-03-01 | 昆明理工大学 | Multi-mode machine translation method fusing text and picture characteristics |
CN114580443A (en) * | 2022-03-01 | 2022-06-03 | 腾讯科技(深圳)有限公司 | Text translation method, text translation device, kernel function combination method, server and medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110377918A (en) * | 2019-07-15 | 2019-10-25 | 昆明理工大学 | Merge the more neural machine translation method of the Chinese-of syntax analytic tree |
-
2020
- 2020-11-03 CN CN202011209086.0A patent/CN112329483A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110377918A (en) * | 2019-07-15 | 2019-10-25 | 昆明理工大学 | Merge the more neural machine translation method of the Chinese-of syntax analytic tree |
Non-Patent Citations (2)
Title |
---|
KAITAO SONG等: "Double Path Networks for Sequence to Sequence Learning", 《COMPUTATION AND LANGUAGE》 * |
习翔宇: "论文解读:Attention is all you need", 《ZHUANLAN.ZHIHU.COM/P/46990010》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717343A (en) * | 2019-09-27 | 2020-01-21 | 电子科技大学 | Optimal alignment method based on transformer attention mechanism output |
CN110717343B (en) * | 2019-09-27 | 2023-03-14 | 电子科技大学 | Optimal alignment method based on transformer attention mechanism output |
CN114118111A (en) * | 2021-11-26 | 2022-03-01 | 昆明理工大学 | Multi-mode machine translation method fusing text and picture characteristics |
CN114118111B (en) * | 2021-11-26 | 2024-05-24 | 昆明理工大学 | Multi-mode machine translation method integrating text and picture features |
CN114580443A (en) * | 2022-03-01 | 2022-06-03 | 腾讯科技(深圳)有限公司 | Text translation method, text translation device, kernel function combination method, server and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108052512B (en) | Image description generation method based on depth attention mechanism | |
CN112329483A (en) | Multi-mechanism attention-combined multi-path neural machine translation method | |
CN110490946B (en) | Text image generation method based on cross-modal similarity and antagonism network generation | |
CN112613303B (en) | Knowledge distillation-based cross-modal image aesthetic quality evaluation method | |
CN110472238B (en) | Text summarization method based on hierarchical interaction attention | |
CN109492227A (en) | It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations | |
CN112733533B (en) | Multi-modal named entity recognition method based on BERT model and text-image relation propagation | |
CN111274398A (en) | Method and system for analyzing comment emotion of aspect-level user product | |
WO2022020467A1 (en) | System and method for training multilingual machine translation evaluation models | |
He et al. | Improving neural relation extraction with positive and unlabeled learning | |
CN114398976A (en) | Machine reading understanding method based on BERT and gate control type attention enhancement network | |
US20240119716A1 (en) | Method for multimodal emotion classification based on modal space assimilation and contrastive learning | |
Zhao et al. | RoR: Read-over-read for long document machine reading comprehension | |
CN113704437A (en) | Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding | |
CN114238649B (en) | Language model pre-training method with common sense concept enhancement | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN112015760B (en) | Automatic question-answering method and device based on candidate answer set reordering and storage medium | |
CN113901847A (en) | Neural machine translation method based on source language syntax enhanced decoding | |
CN113822125A (en) | Processing method and device of lip language recognition model, computer equipment and storage medium | |
CN114648024B (en) | Method for generating cross-language abstract of Chinese crossing based on multi-type word information guidance | |
CN116450877A (en) | Image text matching method based on semantic selection and hierarchical alignment | |
CN117174163A (en) | Virus evolution trend prediction method and system | |
CN116010622A (en) | BERT knowledge graph completion method and system for fusion entity type | |
CN117350330A (en) | Semi-supervised entity alignment method based on hybrid teaching | |
CN115810351A (en) | Controller voice recognition method and device based on audio-visual fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210205 |
|
RJ01 | Rejection of invention patent application after publication |