CN113408307B

CN113408307B - Neural machine translation method based on translation template

Info

Publication number: CN113408307B
Application number: CN202110796282.0A
Authority: CN
Inventors: 冯冲; 尚伟
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2022-06-14
Anticipated expiration: 2041-07-14
Also published as: CN113408307A

Abstract

The invention relates to a neural machine translation method based on a translation template, and belongs to the technical field of machine translation in natural language processing. The method guides and restricts the decoding process of the model by introducing the matched high-similarity translation template, thereby improving the quality of the translated text. First, a translation template library and a corresponding template matching algorithm are constructed. Then, a template-based neural machine translation model is constructed. And then, introducing the translation template into the model by using a two-stage training strategy, continuously and iteratively updating the constructed model parameters, and guiding the training process. And finally, respectively translating the sentences matched with the high-similarity translation template by using the trained neural machine translation model. Compared with the prior art, the method simplifies the construction process of the translation template, focuses more on improving the translation effect of partial sentences which can be matched with the high-similarity translation template, but not all sentences, and improves the translation quality by using the matched high-similarity translation template.

Description

Neural machine translation method based on translation template

Technical Field

The invention relates to a technology for constructing a translation template library in neural machine translation and introducing a translation template into a neural machine translation optimizer for corresponding translation performance, in particular to a neural machine translation method based on the translation template, and belongs to the technical field of machine translation in natural language processing.

Background

At present, because the neural machine translation is superior to the traditional statistical machine translation in various natural languages, the neural machine translation has been successfully deployed with the translation service based on the neural machine translation in the industrial field and in all large companies such as google, track, hundredth and the like. These convenient and fast translation services are widely used by people.

However, the neural machine translation mainly obtains the language feature knowledge of the source language and the target language and the corresponding relationship between the source language and the target language through the training of bilingual parallel corpus data, and therefore, the neural machine translation has great dependence on the training data. When the corpus data does not include some feature information or only includes less feature information, the model will have difficulty learning the corresponding knowledge, so that the model cannot capture the information. When a sentence to be translated containing the part of knowledge is translated, the neural machine translation can generate a translation with poor quality.

In a computer-aided translation scenario, a human translator receives a translation generated by a machine translation model, first checks whether the translation is in error and makes necessary corrections, and then post-translates the translation error to ensure final translation quality. Measuring review and post-compilation time is the most straightforward and effective way to quantify the workload of a human translator. When using traditional neural-machine translation methods, the human translator is not interpreting the quality of the translations, which means that the human translator must expend the same amount of effort reviewing each translation. In this case, only how to improve the translation performance of the entire test set is studied, and only the post-translation editing time can be reduced.

In a real scene, a lot of existing translation knowledge exists, such as fixed translation sentence patterns, inherent translation collocation and bilingual dictionaries in the professional field. The generalized and summarized translation knowledge of the human language expert is completely correct, and the human translator can directly use the fixed translation knowledge to assist the translation work. Therefore, the method has high research value for improving the translation quality of the machine translation model by using external knowledge. Overall, most of the research work has focused on decoding constraints or data enhancement using bilingual dictionaries and bilingual translation examples, but relatively little research has been directed to incorporating translation templates as external knowledge into neural machine translation. The translation template retains the syntactic structure information of the sentence and part of the target words. In terms of knowledge granularity, a template is between a translation rule and a translation instance. Compared with the translation example, the translation template has higher abstraction degree, so that the matching rate is higher. Translation templates contain more lexical information than translation rules.

In summary, if a high-quality translation template library suitable for neural machine translation can be constructed and translation template knowledge is introduced into the neural machine translation, a high-quality translation can be obtained.

However, a machine translation system or related technology disclosure which is more complete for introducing a translation template into neural machine translation is not seen at present.

Disclosure of Invention

The invention aims to solve the technical problem that the quality of a generated translation is poor due to the limitation of the scale and the quality of a corpus in the existing machine translation system, and creatively provides a neural machine translation method based on a translation template. The method guides and restricts the decoding process of the model by introducing the matched high-similarity translation template, thereby improving the quality of the translated text.

The innovation points of the invention are as follows: first, a translation template library and a corresponding template matching algorithm are constructed. Then, a template-based neural machine translation model is constructed. And then, introducing the translation template into the model by using a two-stage training strategy, continuously and iteratively updating the constructed model parameters, and guiding the training process. And finally, respectively translating the sentences matched with the high-similarity translation template by using the trained neural machine translation model.

In order to achieve the purpose, the invention adopts the following technical scheme.

A neural machine translation method based on a translation template comprises the following steps:

step 1: a translation template library is constructed based on a translation template construction method of the longest noun phrase.

Step 2: and constructing a multi-strategy template matching algorithm and searching a high-similarity translation template.

And step 3: and constructing a neural machine translation model based on the template, and introducing the translation template into the neural machine translation.

And 4, step 4: and training a neural machine translation model based on the template by adopting a two-stage model training strategy.

And 5: and translating the sentences matched with the high-similarity translation template by using the translation neural model of the training model.

Advantageous effects

Compared with the prior art, the invention has the following beneficial effects and advantages:

1. the invention uses the self-defined translation template extraction algorithm for constructing the high-quality translation template, and can omit bilingual word alignment information and simplify the construction process of the translation template by a method for extracting the longest noun phrase.

2. The method is different from the existing machine translation system, and is more focused on improving the translation effect of partial sentences which can be matched with the high-similarity translation template instead of all sentences, and the quality of translated text is improved by utilizing the matched high-similarity translation template.

Drawings

FIG. 1 is a schematic diagram of a translation template construction algorithm of the present invention;

FIG. 2 is a diagram of a template-based neural machine translation model of the present invention;

FIG. 3 is a diagram of a two-stage model training strategy in accordance with the present invention.

Detailed Description

The method of the invention is further illustrated below with reference to the figures and examples.

As shown in fig. 1, the specific method is as follows:

step 1.1: constructing bilingual syntax trees on the parallel sentence pairs by using a component syntax tree analysis method;

step 1.2: and identifying and extracting the longest noun phrase to construct a translation template.

Wherein, the longest noun phrase (MNP) refers to a noun phrase that is not nested by any other noun phrase. In a syntactic tree, the longest noun phrase refers to the subtree labeled "NP" first, starting from the root node. The longest noun phrase has a greater granularity of information than the base noun phrase. The invention takes the longest noun phrase comprising common nouns (NN), proper Nouns (NR), time Nouns (NT) and human pronouns (PRP) as template variables, and the rest part is taken as a template constant to construct a translation template.

The translation template comprises a template constant and a template variable; the template constant refers to fixed words in the template and represents sentence structure information of the source sentence; template variables are a class of words or noun phrases, which are the generalization information in the template. The template constant is used as the information to be retrieved in the template matching and is used as the constraint information generated by the translation in the translation process; and replacing the translation template variable according to the source sentence information in the translation process of the translation template variable to obtain a corresponding translation.

Step 1.3: and screening the translation templates by using the length of the translation template and the template abstraction degree, and reserving the translation templates which accord with the set length threshold value and the set abstraction degree threshold value.

Specifically, step 1.3 includes the steps of:

step 1.3.1: setting a length threshold value, and discarding translation templates which do not meet the length threshold value.

Step 1.3.2: setting upper and lower thresholds of the abstraction degree, calculating the abstraction degree of the translation template, and abandoning the translation template which is not in the threshold range.

Wherein, the translation template abstraction degree Score_absThe calculation of (c) is as follows:

wherein, Num_vaRepresents the number of variables of the translation template, and lt represents the number of words contained in the translation template.

And 2, step: and constructing a multi-strategy template matching algorithm for searching the high-similarity translation template.

Specifically, step 2 comprises the steps of:

step 2.1: and (3) processing the sentence to be translated by utilizing the translation template construction algorithm described in the step (1) to obtain a template to be matched.

Step 2.2: and (3) obtaining a candidate set from the translation template library constructed in the step (1) by using a coarse-grained matching strategy based on the word hit rate.

The coarse-grained matching strategy based on the word hit rate is defined as follows:

the coarse-grained matching strategy measures the similarity between the template to be matched and the source translation template in the template library by using the word co-occurrence frequency of the template to be matched and the source translation template, and a similarity function FM is defined as follows:

wherein word (·) represents a word contained in the character string; tm is_srcRepresenting the matched source end translation template; x' represents a sentence to be translated by utilizing the template to be matched obtained in the step 1; len (-) denotes the length of the template to be matched. To quickly retrieve the candidate translation template set, an offline search engine elastic search may be used to perform coarse-grained matching.

Step 2.3: and matching on the candidate set by using a fine-grained matching strategy based on the similarity of the character strings.

The fine-grained matching strategy based on the character string similarity is defined as follows:

and (3) a fine-grained matching strategy, namely measuring the similarity between each template in the candidate set and the searched target by adopting a Levenshtein edit Distance (Levenshtein Distance).

The levensit edit distance is the minimum number of edits to change one template into another template through add, insert, and delete operations. In linguistics, levenstein edit distance is a metric used to quantify language distance, i.e., the difference between two languages. The fine-grained matching similarity function Lev is defined as follows:

wherein,

representing the minimum editing distance required by converting the template to be matched into the source end translation template matched in the template library; score_tmRepresenting a template X to be matched^′With source translation template X matched from template library_t ^′ _mFuzzy matching scores therebetween; i and j each represent X^′And X_t ^′ _mThe ith and jth positions in (b).

Specifically, step 3 includes the steps of:

step 3.1: and adding an additional template encoder at the encoding end to encode the retrieved target end translation template.

Wherein the template encoder is as follows:

the template encoder adopts a Transformer encoder structure and is formed by stacking a plurality of identical sublayers, wherein each sublayer comprises a self-attention layer sublayer and a feedforward neural network sublayer.

The template encoder has the same structure as the original Transformer encoder, and the method has two advantages: (1) the Transformer has excellent semantic information capturing capability and can better represent additional knowledge of a target end; (2) the original encoder and the template encoder adopt the same structure, and are more beneficial to mapping two different kinds of information into the same high-dimensional semantic space.

The template encoder and the source encoder are mutually independent in the encoding process, the condition that the two kinds of information are mutually interacted and fused in the representing process does not exist, and finally the vector representation of the source sentence and the target translation template in the high-dimensional semantic space is obtained.

The encoding of the source encoder and the target template encoder is represented as follows:

H_s＝Enc^src(X,θ_src) (5)

wherein Enc^srcA presentation source encoder; x represents a sentence to be translated; enc^tmRepresenting a template encoder; theta_srcAnd theta_tmParameters of a source sentence encoder and parameters of a template encoder are respectively expressed, and the parameters of the source sentence encoder and the parameters of the template encoder are not shared; h_sThe vector representation containing the source sentence information obtained by the source sentence encoder encoding the source sentence is shown,

the representation template encoder encodes a target translation template to obtain vector representation containing target translation template information; tm is_tgtAnd representing the matched target translation template.

Step 3.2: at the decoding end, a template coding-decoding attention sublayer is added, and template knowledge is introduced into a decoder to guide and constrain the decoding process of the model, so that a high-quality translation is obtained.

Wherein the decoder is as follows:

on the basis of a Transformer decoder, a template coding-decoding attention sublayer is added. In general, the new decoder contains four sublayers: a masked multi-head attention sublayer, a template encoding-decoding attention sublayer, a source encoding-decoding attention sublayer, and a feedforward neural network sublayer.

Because the translation template knowledge merged into the translation template is the knowledge of the target end, and compared with the source sentences, the knowledge of the translation template is more similar to the translated text of the target end in the semantic space, the template coding-decoding attention sublayer is placed between the source coding-decoding attention sublayer and the mask multi-head attention sublayer. The arrangement mode enables the generated translation sequence to interact and fuse with the information of the target translation template earlier, and in a real scene, the scale of the translation template library is limited, the matched translation template cannot be completely matched with the translation, and partial noise information often exists. Through earlier interaction between the target translation and the translation template, the knowledge of the translation template can be selectively captured, so that the translation template can be better applied to generation of the translation.

The decoder generates a translation as follows:

P(y_t|x,Tm_tgt,y_<t；θ)∝exp(H_dW) (8)

wherein H_dThe representation decoder is used for decoding the context vectors generated by the source encoder and the template encoder to obtain vector representations containing the translation information. DEC (·) denotes a decoder; y represents a translation sequence generated by the model; t represents the current decoding time step; y is_tRepresenting a target word generated at the current time step; θ represents a model parameter; p (-) represents a probability function generated by the translation; x represents a sentence to be translated; tm is_tgtRepresenting a target translation template; w represents the weight of the model full connection layer; exp (-) denotes the probability function that generated the current word.

Specifically, step 4 includes the steps of:

step 4.1: the training data set is divided into two parts: a base training set and a fine training set. As shown in fig. 2.

Wherein, the function of basic training set contains two: firstly, a translation template library is constructed, and secondly, a model is trained. And the target end translation template corresponding to the basic data set is directly extracted from the corresponding reference and completely corresponds to the source sentence.

And (3) searching the corresponding target translation template in the fine tuning data set by the multi-strategy template matching method described in the step (2). Specifically, the coarse-grained matching policy threshold may be set to 0.8, and the fine-grained matching policy threshold may be set to 0.9.

Step 4.2: a two-stage model training strategy is used to obtain a template-based neural machine translation model, as shown in fig. 3.

The method specifically comprises the following steps:

step 4.2.1: and training the model by using the basic data set, and continuously updating the parameters of the iterative model, so that the model can capture the target translation template knowledge.

Step 4.2.2: and (3) retraining the basic model by utilizing the fine tuning data set training, updating iterative model parameters, and updating the iterative model parameters by utilizing the data after template matching and screening, thereby improving the robustness of the model.

And 5: and translating the sentences which can be matched with the translation template within a set threshold range by using a translation neural model of the training model.

Examples of the experiments

The present invention was tested on Han-Ying (zh-en) and De-Ying (de-en).

(1) Experimental data set-up

In order to compare with the results of the previous study and approach the actual translation scenario, the present invention has been experimented in the news fields of chinese-to-english translation and german translation. The invention respectively utilizes a part of parallel corpus contained in the public data LDC data set and a part of parallel corpus contained in the WMT-18 data set to train, verify and test. According to the neural machine translation method for the fusion template, the template information is input while the source sentence is input, the scale of the translation template library is limited, and the matching similarity difference exists between different sentences in a test set and between the different sentences in the test set and the template library. Therefore, different matching intervals are set according to the template matching similarity scores in the testing process. The larger the matching similarity interval value is, the more similar templates can be searched in the template library by the sentences in the interval; the smaller the matching similarity interval value is, the lower the similarity between the sentences in the interval and the template library is.

And (4) carrying out LDC Chinese English data set. LDC belongs to a data set in the news domain. As shown in table 1, 564726 pieces of training sets, 37417 pieces of fine tuning data sets, 6000 pieces of verification sets and 3000 pieces of testing sets are randomly extracted from a chinese-english (Ch-EN) translation task; adopting an NLTK tool to perform Chinese word segmentation, adopting Moses to perform unified capital and lower case processing on English words and standardizing punctuation marks; and the Berkeley syntactic analysis tool performs syntactic analysis on Chinese and English respectively to obtain corresponding syntactic analysis results. Finally, on the two translation tasks, the translation template 342183 is obtained, and the invention sets a test set with the same scale to check the model. 6000 and 3000 sentences were randomly selected from the corpus as development and test sets, respectively, and the remaining data was used to create training data. Specific data sizes are shown in the following table.

WMT18 english data set. German-English shown in Table 1

491000 pieces of data are randomly extracted on a translation task as a basic training set, 21064 pieces of data are used as a fine adjustment data set, 6000 pieces of data are used as a verification set, and 3000 pieces of data are used as a test set; moses is adopted to carry out unified case and case processing and word and punctuation standardization on German and English; the Berkeley syntactic analysis tool parses german and english, respectively. Finally, 307968 translation sheets were obtained.

TABLE 1 LDC Zhongying and WMT Engdi datasets

The BLEU value is used as an evaluation index.

(2) Baseline system experimental setup:

RNNSEARCH: a standard attention-based neural machine translation system.

Transformer: a neural machine translation system based on a self-attention mechanism.

The invention realizes the above-mentioned baseline system and the model proposed by the invention on the open source machine translation tool Openmt.

(3) The main experimental results are as follows: table 2 shows the results of the experiment.

Experimental results of English and German translation tasks in Table 2

First, on the task of translating Chinese to English, when the fuzzy matching interval is (0.9,1.0], (0.8,0.9], (0.7,0.8], (0.6,0.7 ]), TBMT respectively increases the BLEU score of 12.6, 10.06, 8.41, 6.86 compared to RNNSearch, increases the BLEU score of 2.79, 1.33, 0.5, 0.01 compared to Transformer, which indicates that providing a translation template highly similar to the source sentence as external knowledge to the model can achieve the function of guiding and constraining the model decoding process, thereby increasing the translation capability of the model.

Second, when the fuzzy match value is below 0.6, it is lower than Transformer, although the BLEU score for TBMT is still higher than rnssearch. This is because translation templates that are low-similarity to the source sentence contain too much garbage and can mislead the decoding of the model. Based on the method, the highly similar translation template can provide the syntactic structure knowledge of the target end and can also provide reusable fragments; translation templates with low similarity can introduce too much noise information, so that the translation effect of the model is poor.

Finally, similar results to those obtained in Chinese-to-English translation tasks were obtained in the German-to-English experiment. In a high fuzzy matching interval, the method provided by the invention is superior to RNNSearch and Transformer; in the low ambiguity matching region, the method in this chapter is slightly inferior to the transform. This means that the more similar the translation templates are retrieved, the more and more reliable the information on the effective target end that can be provided to the model, and the higher the quality of the translation generated by the model.

(4) Analyzing a training strategy: table 3 shows the results of the experiment.

The experiment analyzes the influence of the two-stage training strategy provided by the invention on the model quality. The TBMT performance is superior to TBMT _ all and TBMT _ b, which shows that the model training strategy designed for the TBMT can improve the robustness of the model, so that the model can better capture the target end knowledge contained in the translation template and filter noise information. Compared with TBMT _ b, TBMT _ all has 37417 fine tuning training sets added, but the effect is only a slight improvement in the BLEU value. This indicates that the training strategy of integrating the data does not sufficiently improve the noise screening capability of the model.

TABLE 3 training strategy analysis experimental results

Match score	TBMT_b	TBMT_all	TBMT
				(0.9,1.0]	72.43	72.76	73.41
(0.8,0.9]	65.65	65.78	66.52
				(0.7,0.8]	61.46	61.45	62.44
(0.6,0.7]	57.08	57.15	58.07
				(0.5,0.6]	52.05	51.99	52.85
(0.4,0.5]	49.07	48.99	49.84

(5) Field adaptation analysis: table 4 shows the results of the experiment.

This experiment analyzed the ability of the model of the invention to capture domain knowledge through translation templates. The experimental result shows that although the model does not learn the language features related to the legal field in the training process, the method provided by the invention can capture the related field knowledge through the translation template, so that the translated text which is more in line with the features of the legal field is obtained.

TABLE 4 field Adaptation analysis Experimental results

Match score	Transformer	TBMT
			(0.9,1.0]	7.06	15.5
(0.8,0.9]	7.19	15.44
			(0.7,0.8]	7.41	15.38
(0.6,0.7]	7.33	14.83
			(0.5,0.6]	7.51	14.32
(0.4,0.5]	7.61	13.93

Claims

1. A neural machine translation method based on a translation template is characterized by comprising the following steps:

step 1: constructing a translation template library based on a translation template construction method of the longest noun phrase;

step 1.2: identifying and extracting the longest noun phrase to construct a translation template;

wherein the longest noun phrase refers to a noun phrase that is not nested by any other noun phrase; taking the longest noun phrase containing common nouns, proper nouns, time nouns and human pronouns as a template variable, and taking the rest part as a template constant to construct a translation template;

the translation template comprises a template constant and a template variable; the template constant refers to fixed words in the template and represents sentence structure information of the source sentence; the template variable is a word or noun phrase and is generalization information in the template; the template constant is used as the information to be retrieved in the template matching and is used as the constraint information generated by the translation in the translation process; in the translation process of the translation template variable, replacing the translation template variable according to the source sentence information to obtain a corresponding translation;

step 1.3: screening the translation template by using the length of the translation template and the template abstraction degree, and reserving the translation template which accords with a set length threshold value and an abstraction degree threshold value;

step 2: the method for constructing the multi-strategy template matching algorithm and searching the high-similarity translation template comprises the following steps of:

step 2.1: processing the sentence to be translated by utilizing the translation template construction algorithm described in the step 1 to obtain a template to be matched;

step 2.2: obtaining a candidate set from the translation template library constructed in the step 1 by using a coarse-grained matching strategy based on the word hit rate;

wherein word (·) represents a word contained in the character string; tm is_srcRepresenting the matched source end translation template; x^′Representing the sentence to be translated by using the matching to be obtained in the step 1A template; len (·) denotes the length of the template to be matched;

step 2.3: matching on the candidate set by using a fine-grained matching strategy based on the similarity of the character strings;

the fine-grained matching strategy based on the similarity of the character strings is defined as follows:

a fine-grained matching strategy, namely measuring the similarity between each template in the candidate set and the searched target by adopting the Levenson editing distance;

the Levensstein editing distance refers to the minimum editing frequency of converting one template into another template through adding, inserting and deleting operations; the fine-grained matching similarity function Lev is defined as follows:

wherein,

representing the minimum editing distance required by converting the template to be matched into the source end translation template matched in the template library; score_tmRepresenting the template X ' to be matched and the source end translation template X ' matched from the template library '_tmFuzzy matching scores therebetween; i and j represent X ' and X ', respectively '_tmThe ith and jth positions in (a);

and 3, step 3: constructing a neural machine translation model based on a template, and introducing a translation template into neural machine translation, wherein the method comprises the following steps:

step 3.1: at the encoding end, an additional template encoder is added to encode the retrieved translation template at the target end;

wherein the template encoder is as follows:

the template encoder adopts a Transformer encoder structure and is formed by stacking a plurality of same sublayers, wherein each sublayer comprises a self-attention layer sublayer and a feedforward neural network sublayer; the template encoder and the original Transformer encoder have the same structure;

the template encoder and the source encoder are mutually independent in the encoding process, the condition that two kinds of information are mutually interacted and fused in the representing process does not exist, and finally the vector representation of the source sentence and the target translation template in the high-dimensional semantic space is obtained;

the encoding representation of the source encoder and the target template encoder is as follows:

H_s＝Enc^src(X，θ_src) (4)

wherein Enc^srcA presentation source encoder; x represents a sentence to be translated; enc^tmRepresenting a template encoder; theta_srcAnd theta_tmRespectively representing parameters of a source sentence encoder and a template encoder, wherein the parameters of the source sentence encoder and the template encoder are not shared; h_sThe vector representation containing the source sentence information obtained by the source sentence encoder encoding the source sentence is shown,

the representation template encoder encodes a target translation template to obtain vector representation containing target translation template information; tm is_tgtRepresenting the matched target end translation template;

step 3.2: at a decoding end, a template coding-decoding attention sublayer is added, and template knowledge is introduced into a decoding process of a guide and constraint model in a decoder, so that a high-quality translation is obtained;

wherein the decoder is as follows:

adding a template coding-decoding attention sublayer on the basis of a Transformer decoder; the new decoder comprises four sub-layers: mask multi-head attention sublayer, template coding-decoding attention sublayer, source coding-decoding attention sublayer and feedforward neural network sublayer;

placing the template coding-decoding attention sublayer between the source coding-decoding attention sublayer and the mask multi-head attention sublayer so that the generated translation sequence is interacted and fused with the information of the target translation template earlier;

the decoder generates a translation as follows:

P(y_t|x，Tm_tgt，y_＜t；θ)∝exp(H_dW) (7)

wherein H_dRepresenting a vector representation which is obtained by decoding the context vector generated by the source encoder and the template encoder and contains the translation information by a decoder; DEC (·) denotes a decoder; y represents a translation sequence generated by the model; t represents the current decoding time step; y is_tRepresenting a target word generated at the current time step; θ represents a model parameter; p (-) represents a probability function generated by the translation; x represents a sentence to be translated; tm is_tgtRepresenting a target translation template; w represents the weight of the model full connection layer; exp (·) represents a probability function that generates the current word;

and 4, step 4: training a neural machine translation model based on a template by adopting a two-stage model training strategy;

step 4.1: the training data set is divided into two parts: a basic training set and a fine tuning training set;

wherein, the function of basic training set contains two: firstly, constructing a translation template library, and secondly, training a model; a target end translation template corresponding to the basic data set is directly extracted from the corresponding reference and completely corresponds to the source sentence;

searching a corresponding target translation template in the fine tuning data set by the multi-strategy template matching method described in the step 2;

step 4.2: obtaining a neural machine translation model based on a template by utilizing a two-stage model training strategy;

step 4.2.1: training the model by using the basic data set, and continuously updating the parameters of the iterative model, so that the model can capture the target translation template knowledge;

step 4.2.2: retraining the basic model by utilizing fine tuning data set training, updating iterative model parameters, and updating iterative model parameters by utilizing the data after template matching and screening to improve the robustness of the model;

2. The neural machine translation method based on translation templates as claimed in claim 1, wherein step 1.3 comprises the following steps:

step 1.3.1: setting a length threshold value, and discarding translation templates which do not meet the length threshold value;

step 1.3.2: setting upper and lower thresholds of the abstraction degree, calculating the abstraction degree of the translation template, and abandoning the translation template which is not in the threshold range;

wherein, the translation template abstraction degree Score_absThe calculation method of (c) is as follows:

3. The neural machine translation method based on translation templates as claimed in claim 1, wherein in step 4.1, the coarse-grained matching strategy threshold is set to 0.8.

4. The neural machine translation method based on translation templates as claimed in claim 1, wherein in step 4.1, the threshold of the fine-grained matching strategy is set to 0.9.