CN113408307B - Neural machine translation method based on translation template - Google Patents
Neural machine translation method based on translation template Download PDFInfo
- Publication number
- CN113408307B CN113408307B CN202110796282.0A CN202110796282A CN113408307B CN 113408307 B CN113408307 B CN 113408307B CN 202110796282 A CN202110796282 A CN 202110796282A CN 113408307 B CN113408307 B CN 113408307B
- Authority
- CN
- China
- Prior art keywords
- template
- translation
- model
- encoder
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013519 translation Methods 0.000 title claims abstract description 255
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000001537 neural effect Effects 0.000 title claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 41
- 230000008569 process Effects 0.000 claims abstract description 19
- 238000010276 construction Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 230000014616 translation Effects 0.000 description 178
- 238000012360 testing method Methods 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/47—Machine-assisted translation, e.g. using translation memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a neural machine translation method based on a translation template, and belongs to the technical field of machine translation in natural language processing. The method guides and restricts the decoding process of the model by introducing the matched high-similarity translation template, thereby improving the quality of the translated text. First, a translation template library and a corresponding template matching algorithm are constructed. Then, a template-based neural machine translation model is constructed. And then, introducing the translation template into the model by using a two-stage training strategy, continuously and iteratively updating the constructed model parameters, and guiding the training process. And finally, respectively translating the sentences matched with the high-similarity translation template by using the trained neural machine translation model. Compared with the prior art, the method simplifies the construction process of the translation template, focuses more on improving the translation effect of partial sentences which can be matched with the high-similarity translation template, but not all sentences, and improves the translation quality by using the matched high-similarity translation template.
Description
Technical Field
The invention relates to a technology for constructing a translation template library in neural machine translation and introducing a translation template into a neural machine translation optimizer for corresponding translation performance, in particular to a neural machine translation method based on the translation template, and belongs to the technical field of machine translation in natural language processing.
Background
At present, because the neural machine translation is superior to the traditional statistical machine translation in various natural languages, the neural machine translation has been successfully deployed with the translation service based on the neural machine translation in the industrial field and in all large companies such as google, track, hundredth and the like. These convenient and fast translation services are widely used by people.
However, the neural machine translation mainly obtains the language feature knowledge of the source language and the target language and the corresponding relationship between the source language and the target language through the training of bilingual parallel corpus data, and therefore, the neural machine translation has great dependence on the training data. When the corpus data does not include some feature information or only includes less feature information, the model will have difficulty learning the corresponding knowledge, so that the model cannot capture the information. When a sentence to be translated containing the part of knowledge is translated, the neural machine translation can generate a translation with poor quality.
In a computer-aided translation scenario, a human translator receives a translation generated by a machine translation model, first checks whether the translation is in error and makes necessary corrections, and then post-translates the translation error to ensure final translation quality. Measuring review and post-compilation time is the most straightforward and effective way to quantify the workload of a human translator. When using traditional neural-machine translation methods, the human translator is not interpreting the quality of the translations, which means that the human translator must expend the same amount of effort reviewing each translation. In this case, only how to improve the translation performance of the entire test set is studied, and only the post-translation editing time can be reduced.
In a real scene, a lot of existing translation knowledge exists, such as fixed translation sentence patterns, inherent translation collocation and bilingual dictionaries in the professional field. The generalized and summarized translation knowledge of the human language expert is completely correct, and the human translator can directly use the fixed translation knowledge to assist the translation work. Therefore, the method has high research value for improving the translation quality of the machine translation model by using external knowledge. Overall, most of the research work has focused on decoding constraints or data enhancement using bilingual dictionaries and bilingual translation examples, but relatively little research has been directed to incorporating translation templates as external knowledge into neural machine translation. The translation template retains the syntactic structure information of the sentence and part of the target words. In terms of knowledge granularity, a template is between a translation rule and a translation instance. Compared with the translation example, the translation template has higher abstraction degree, so that the matching rate is higher. Translation templates contain more lexical information than translation rules.
In summary, if a high-quality translation template library suitable for neural machine translation can be constructed and translation template knowledge is introduced into the neural machine translation, a high-quality translation can be obtained.
However, a machine translation system or related technology disclosure which is more complete for introducing a translation template into neural machine translation is not seen at present.
Disclosure of Invention
The invention aims to solve the technical problem that the quality of a generated translation is poor due to the limitation of the scale and the quality of a corpus in the existing machine translation system, and creatively provides a neural machine translation method based on a translation template. The method guides and restricts the decoding process of the model by introducing the matched high-similarity translation template, thereby improving the quality of the translated text.
The innovation points of the invention are as follows: first, a translation template library and a corresponding template matching algorithm are constructed. Then, a template-based neural machine translation model is constructed. And then, introducing the translation template into the model by using a two-stage training strategy, continuously and iteratively updating the constructed model parameters, and guiding the training process. And finally, respectively translating the sentences matched with the high-similarity translation template by using the trained neural machine translation model.
In order to achieve the purpose, the invention adopts the following technical scheme.
A neural machine translation method based on a translation template comprises the following steps:
step 1: a translation template library is constructed based on a translation template construction method of the longest noun phrase.
Step 2: and constructing a multi-strategy template matching algorithm and searching a high-similarity translation template.
And step 3: and constructing a neural machine translation model based on the template, and introducing the translation template into the neural machine translation.
And 4, step 4: and training a neural machine translation model based on the template by adopting a two-stage model training strategy.
And 5: and translating the sentences matched with the high-similarity translation template by using the translation neural model of the training model.
Advantageous effects
Compared with the prior art, the invention has the following beneficial effects and advantages:
1. the invention uses the self-defined translation template extraction algorithm for constructing the high-quality translation template, and can omit bilingual word alignment information and simplify the construction process of the translation template by a method for extracting the longest noun phrase.
2. The method is different from the existing machine translation system, and is more focused on improving the translation effect of partial sentences which can be matched with the high-similarity translation template instead of all sentences, and the quality of translated text is improved by utilizing the matched high-similarity translation template.
Drawings
FIG. 1 is a schematic diagram of a translation template construction algorithm of the present invention;
FIG. 2 is a diagram of a template-based neural machine translation model of the present invention;
FIG. 3 is a diagram of a two-stage model training strategy in accordance with the present invention.
Detailed Description
The method of the invention is further illustrated below with reference to the figures and examples.
A neural machine translation method based on a translation template comprises the following steps:
step 1: a translation template library is constructed based on a translation template construction method of the longest noun phrase.
As shown in fig. 1, the specific method is as follows:
step 1.1: constructing bilingual syntax trees on the parallel sentence pairs by using a component syntax tree analysis method;
step 1.2: and identifying and extracting the longest noun phrase to construct a translation template.
Wherein, the longest noun phrase (MNP) refers to a noun phrase that is not nested by any other noun phrase. In a syntactic tree, the longest noun phrase refers to the subtree labeled "NP" first, starting from the root node. The longest noun phrase has a greater granularity of information than the base noun phrase. The invention takes the longest noun phrase comprising common nouns (NN), proper Nouns (NR), time Nouns (NT) and human pronouns (PRP) as template variables, and the rest part is taken as a template constant to construct a translation template.
The translation template comprises a template constant and a template variable; the template constant refers to fixed words in the template and represents sentence structure information of the source sentence; template variables are a class of words or noun phrases, which are the generalization information in the template. The template constant is used as the information to be retrieved in the template matching and is used as the constraint information generated by the translation in the translation process; and replacing the translation template variable according to the source sentence information in the translation process of the translation template variable to obtain a corresponding translation.
Step 1.3: and screening the translation templates by using the length of the translation template and the template abstraction degree, and reserving the translation templates which accord with the set length threshold value and the set abstraction degree threshold value.
Specifically, step 1.3 includes the steps of:
step 1.3.1: setting a length threshold value, and discarding translation templates which do not meet the length threshold value.
Step 1.3.2: setting upper and lower thresholds of the abstraction degree, calculating the abstraction degree of the translation template, and abandoning the translation template which is not in the threshold range.
Wherein, the translation template abstraction degree ScoreabsThe calculation of (c) is as follows:
wherein, NumvaRepresents the number of variables of the translation template, and lt represents the number of words contained in the translation template.
And 2, step: and constructing a multi-strategy template matching algorithm for searching the high-similarity translation template.
Specifically, step 2 comprises the steps of:
step 2.1: and (3) processing the sentence to be translated by utilizing the translation template construction algorithm described in the step (1) to obtain a template to be matched.
Step 2.2: and (3) obtaining a candidate set from the translation template library constructed in the step (1) by using a coarse-grained matching strategy based on the word hit rate.
The coarse-grained matching strategy based on the word hit rate is defined as follows:
the coarse-grained matching strategy measures the similarity between the template to be matched and the source translation template in the template library by using the word co-occurrence frequency of the template to be matched and the source translation template, and a similarity function FM is defined as follows:
wherein word (·) represents a word contained in the character string; tm issrcRepresenting the matched source end translation template; x' represents a sentence to be translated by utilizing the template to be matched obtained in the step 1; len (-) denotes the length of the template to be matched. To quickly retrieve the candidate translation template set, an offline search engine elastic search may be used to perform coarse-grained matching.
Step 2.3: and matching on the candidate set by using a fine-grained matching strategy based on the similarity of the character strings.
The fine-grained matching strategy based on the character string similarity is defined as follows:
and (3) a fine-grained matching strategy, namely measuring the similarity between each template in the candidate set and the searched target by adopting a Levenshtein edit Distance (Levenshtein Distance).
The levensit edit distance is the minimum number of edits to change one template into another template through add, insert, and delete operations. In linguistics, levenstein edit distance is a metric used to quantify language distance, i.e., the difference between two languages. The fine-grained matching similarity function Lev is defined as follows:
wherein,representing the minimum editing distance required by converting the template to be matched into the source end translation template matched in the template library; scoretmRepresenting a template X to be matched′With source translation template X matched from template libraryt ′ mFuzzy matching scores therebetween; i and j each represent X′And Xt ′ mThe ith and jth positions in (b).
And step 3: and constructing a neural machine translation model based on the template, and introducing the translation template into the neural machine translation.
Specifically, step 3 includes the steps of:
step 3.1: and adding an additional template encoder at the encoding end to encode the retrieved target end translation template.
Wherein the template encoder is as follows:
the template encoder adopts a Transformer encoder structure and is formed by stacking a plurality of identical sublayers, wherein each sublayer comprises a self-attention layer sublayer and a feedforward neural network sublayer.
The template encoder has the same structure as the original Transformer encoder, and the method has two advantages: (1) the Transformer has excellent semantic information capturing capability and can better represent additional knowledge of a target end; (2) the original encoder and the template encoder adopt the same structure, and are more beneficial to mapping two different kinds of information into the same high-dimensional semantic space.
The template encoder and the source encoder are mutually independent in the encoding process, the condition that the two kinds of information are mutually interacted and fused in the representing process does not exist, and finally the vector representation of the source sentence and the target translation template in the high-dimensional semantic space is obtained.
The encoding of the source encoder and the target template encoder is represented as follows:
Hs=Encsrc(X,θsrc) (5)
wherein EncsrcA presentation source encoder; x represents a sentence to be translated; enctmRepresenting a template encoder; thetasrcAnd thetatmParameters of a source sentence encoder and parameters of a template encoder are respectively expressed, and the parameters of the source sentence encoder and the parameters of the template encoder are not shared; hsThe vector representation containing the source sentence information obtained by the source sentence encoder encoding the source sentence is shown,the representation template encoder encodes a target translation template to obtain vector representation containing target translation template information; tm istgtAnd representing the matched target translation template.
Step 3.2: at the decoding end, a template coding-decoding attention sublayer is added, and template knowledge is introduced into a decoder to guide and constrain the decoding process of the model, so that a high-quality translation is obtained.
Wherein the decoder is as follows:
on the basis of a Transformer decoder, a template coding-decoding attention sublayer is added. In general, the new decoder contains four sublayers: a masked multi-head attention sublayer, a template encoding-decoding attention sublayer, a source encoding-decoding attention sublayer, and a feedforward neural network sublayer.
Because the translation template knowledge merged into the translation template is the knowledge of the target end, and compared with the source sentences, the knowledge of the translation template is more similar to the translated text of the target end in the semantic space, the template coding-decoding attention sublayer is placed between the source coding-decoding attention sublayer and the mask multi-head attention sublayer. The arrangement mode enables the generated translation sequence to interact and fuse with the information of the target translation template earlier, and in a real scene, the scale of the translation template library is limited, the matched translation template cannot be completely matched with the translation, and partial noise information often exists. Through earlier interaction between the target translation and the translation template, the knowledge of the translation template can be selectively captured, so that the translation template can be better applied to generation of the translation.
The decoder generates a translation as follows:
P(yt|x,Tmtgt,y<t;θ)∝exp(HdW) (8)
wherein HdThe representation decoder is used for decoding the context vectors generated by the source encoder and the template encoder to obtain vector representations containing the translation information. DEC (·) denotes a decoder; y represents a translation sequence generated by the model; t represents the current decoding time step; y istRepresenting a target word generated at the current time step; θ represents a model parameter; p (-) represents a probability function generated by the translation; x represents a sentence to be translated; tm istgtRepresenting a target translation template; w represents the weight of the model full connection layer; exp (-) denotes the probability function that generated the current word.
And 4, step 4: and training a neural machine translation model based on the template by adopting a two-stage model training strategy.
Specifically, step 4 includes the steps of:
step 4.1: the training data set is divided into two parts: a base training set and a fine training set. As shown in fig. 2.
Wherein, the function of basic training set contains two: firstly, a translation template library is constructed, and secondly, a model is trained. And the target end translation template corresponding to the basic data set is directly extracted from the corresponding reference and completely corresponds to the source sentence.
And (3) searching the corresponding target translation template in the fine tuning data set by the multi-strategy template matching method described in the step (2). Specifically, the coarse-grained matching policy threshold may be set to 0.8, and the fine-grained matching policy threshold may be set to 0.9.
Step 4.2: a two-stage model training strategy is used to obtain a template-based neural machine translation model, as shown in fig. 3.
The method specifically comprises the following steps:
step 4.2.1: and training the model by using the basic data set, and continuously updating the parameters of the iterative model, so that the model can capture the target translation template knowledge.
Step 4.2.2: and (3) retraining the basic model by utilizing the fine tuning data set training, updating iterative model parameters, and updating the iterative model parameters by utilizing the data after template matching and screening, thereby improving the robustness of the model.
And 5: and translating the sentences which can be matched with the translation template within a set threshold range by using a translation neural model of the training model.
Examples of the experiments
The present invention was tested on Han-Ying (zh-en) and De-Ying (de-en).
(1) Experimental data set-up
In order to compare with the results of the previous study and approach the actual translation scenario, the present invention has been experimented in the news fields of chinese-to-english translation and german translation. The invention respectively utilizes a part of parallel corpus contained in the public data LDC data set and a part of parallel corpus contained in the WMT-18 data set to train, verify and test. According to the neural machine translation method for the fusion template, the template information is input while the source sentence is input, the scale of the translation template library is limited, and the matching similarity difference exists between different sentences in a test set and between the different sentences in the test set and the template library. Therefore, different matching intervals are set according to the template matching similarity scores in the testing process. The larger the matching similarity interval value is, the more similar templates can be searched in the template library by the sentences in the interval; the smaller the matching similarity interval value is, the lower the similarity between the sentences in the interval and the template library is.
And (4) carrying out LDC Chinese English data set. LDC belongs to a data set in the news domain. As shown in table 1, 564726 pieces of training sets, 37417 pieces of fine tuning data sets, 6000 pieces of verification sets and 3000 pieces of testing sets are randomly extracted from a chinese-english (Ch-EN) translation task; adopting an NLTK tool to perform Chinese word segmentation, adopting Moses to perform unified capital and lower case processing on English words and standardizing punctuation marks; and the Berkeley syntactic analysis tool performs syntactic analysis on Chinese and English respectively to obtain corresponding syntactic analysis results. Finally, on the two translation tasks, the translation template 342183 is obtained, and the invention sets a test set with the same scale to check the model. 6000 and 3000 sentences were randomly selected from the corpus as development and test sets, respectively, and the remaining data was used to create training data. Specific data sizes are shown in the following table.
WMT18 english data set. German-English shown in Table 1491000 pieces of data are randomly extracted on a translation task as a basic training set, 21064 pieces of data are used as a fine adjustment data set, 6000 pieces of data are used as a verification set, and 3000 pieces of data are used as a test set; moses is adopted to carry out unified case and case processing and word and punctuation standardization on German and English; the Berkeley syntactic analysis tool parses german and english, respectively. Finally, 307968 translation sheets were obtained.
TABLE 1 LDC Zhongying and WMT Engdi datasets
The BLEU value is used as an evaluation index.
(2) Baseline system experimental setup:
RNNSEARCH: a standard attention-based neural machine translation system.
Transformer: a neural machine translation system based on a self-attention mechanism.
The invention realizes the above-mentioned baseline system and the model proposed by the invention on the open source machine translation tool Openmt.
(3) The main experimental results are as follows: table 2 shows the results of the experiment.
Experimental results of English and German translation tasks in Table 2
First, on the task of translating Chinese to English, when the fuzzy matching interval is (0.9,1.0], (0.8,0.9], (0.7,0.8], (0.6,0.7 ]), TBMT respectively increases the BLEU score of 12.6, 10.06, 8.41, 6.86 compared to RNNSearch, increases the BLEU score of 2.79, 1.33, 0.5, 0.01 compared to Transformer, which indicates that providing a translation template highly similar to the source sentence as external knowledge to the model can achieve the function of guiding and constraining the model decoding process, thereby increasing the translation capability of the model.
Second, when the fuzzy match value is below 0.6, it is lower than Transformer, although the BLEU score for TBMT is still higher than rnssearch. This is because translation templates that are low-similarity to the source sentence contain too much garbage and can mislead the decoding of the model. Based on the method, the highly similar translation template can provide the syntactic structure knowledge of the target end and can also provide reusable fragments; translation templates with low similarity can introduce too much noise information, so that the translation effect of the model is poor.
Finally, similar results to those obtained in Chinese-to-English translation tasks were obtained in the German-to-English experiment. In a high fuzzy matching interval, the method provided by the invention is superior to RNNSearch and Transformer; in the low ambiguity matching region, the method in this chapter is slightly inferior to the transform. This means that the more similar the translation templates are retrieved, the more and more reliable the information on the effective target end that can be provided to the model, and the higher the quality of the translation generated by the model.
(4) Analyzing a training strategy: table 3 shows the results of the experiment.
The experiment analyzes the influence of the two-stage training strategy provided by the invention on the model quality. The TBMT performance is superior to TBMT _ all and TBMT _ b, which shows that the model training strategy designed for the TBMT can improve the robustness of the model, so that the model can better capture the target end knowledge contained in the translation template and filter noise information. Compared with TBMT _ b, TBMT _ all has 37417 fine tuning training sets added, but the effect is only a slight improvement in the BLEU value. This indicates that the training strategy of integrating the data does not sufficiently improve the noise screening capability of the model.
TABLE 3 training strategy analysis experimental results
Match score | TBMT_b | TBMT_all | TBMT |
(0.9,1.0] | 72.43 | 72.76 | 73.41 |
(0.8,0.9] | 65.65 | 65.78 | 66.52 |
(0.7,0.8] | 61.46 | 61.45 | 62.44 |
(0.6,0.7] | 57.08 | 57.15 | 58.07 |
(0.5,0.6] | 52.05 | 51.99 | 52.85 |
(0.4,0.5] | 49.07 | 48.99 | 49.84 |
(5) Field adaptation analysis: table 4 shows the results of the experiment.
This experiment analyzed the ability of the model of the invention to capture domain knowledge through translation templates. The experimental result shows that although the model does not learn the language features related to the legal field in the training process, the method provided by the invention can capture the related field knowledge through the translation template, so that the translated text which is more in line with the features of the legal field is obtained.
TABLE 4 field Adaptation analysis Experimental results
Match score | Transformer | TBMT |
(0.9,1.0] | 7.06 | 15.5 |
(0.8,0.9] | 7.19 | 15.44 |
(0.7,0.8] | 7.41 | 15.38 |
(0.6,0.7] | 7.33 | 14.83 |
(0.5,0.6] | 7.51 | 14.32 |
(0.4,0.5] | 7.61 | 13.93 |
Claims (4)
1. A neural machine translation method based on a translation template is characterized by comprising the following steps:
step 1: constructing a translation template library based on a translation template construction method of the longest noun phrase;
step 1.1: constructing bilingual syntax trees on the parallel sentence pairs by using a component syntax tree analysis method;
step 1.2: identifying and extracting the longest noun phrase to construct a translation template;
wherein the longest noun phrase refers to a noun phrase that is not nested by any other noun phrase; taking the longest noun phrase containing common nouns, proper nouns, time nouns and human pronouns as a template variable, and taking the rest part as a template constant to construct a translation template;
the translation template comprises a template constant and a template variable; the template constant refers to fixed words in the template and represents sentence structure information of the source sentence; the template variable is a word or noun phrase and is generalization information in the template; the template constant is used as the information to be retrieved in the template matching and is used as the constraint information generated by the translation in the translation process; in the translation process of the translation template variable, replacing the translation template variable according to the source sentence information to obtain a corresponding translation;
step 1.3: screening the translation template by using the length of the translation template and the template abstraction degree, and reserving the translation template which accords with a set length threshold value and an abstraction degree threshold value;
step 2: the method for constructing the multi-strategy template matching algorithm and searching the high-similarity translation template comprises the following steps of:
step 2.1: processing the sentence to be translated by utilizing the translation template construction algorithm described in the step 1 to obtain a template to be matched;
step 2.2: obtaining a candidate set from the translation template library constructed in the step 1 by using a coarse-grained matching strategy based on the word hit rate;
the coarse-grained matching strategy based on the word hit rate is defined as follows:
the coarse-grained matching strategy measures the similarity between the template to be matched and the source translation template in the template library by using the word co-occurrence frequency of the template to be matched and the source translation template, and a similarity function FM is defined as follows:
wherein word (·) represents a word contained in the character string; tm issrcRepresenting the matched source end translation template; x′Representing the sentence to be translated by using the matching to be obtained in the step 1A template; len (·) denotes the length of the template to be matched;
step 2.3: matching on the candidate set by using a fine-grained matching strategy based on the similarity of the character strings;
the fine-grained matching strategy based on the similarity of the character strings is defined as follows:
a fine-grained matching strategy, namely measuring the similarity between each template in the candidate set and the searched target by adopting the Levenson editing distance;
the Levensstein editing distance refers to the minimum editing frequency of converting one template into another template through adding, inserting and deleting operations; the fine-grained matching similarity function Lev is defined as follows:
wherein,representing the minimum editing distance required by converting the template to be matched into the source end translation template matched in the template library; scoretmRepresenting the template X ' to be matched and the source end translation template X ' matched from the template library 'tmFuzzy matching scores therebetween; i and j represent X ' and X ', respectively 'tmThe ith and jth positions in (a);
and 3, step 3: constructing a neural machine translation model based on a template, and introducing a translation template into neural machine translation, wherein the method comprises the following steps:
step 3.1: at the encoding end, an additional template encoder is added to encode the retrieved translation template at the target end;
wherein the template encoder is as follows:
the template encoder adopts a Transformer encoder structure and is formed by stacking a plurality of same sublayers, wherein each sublayer comprises a self-attention layer sublayer and a feedforward neural network sublayer; the template encoder and the original Transformer encoder have the same structure;
the template encoder and the source encoder are mutually independent in the encoding process, the condition that two kinds of information are mutually interacted and fused in the representing process does not exist, and finally the vector representation of the source sentence and the target translation template in the high-dimensional semantic space is obtained;
the encoding representation of the source encoder and the target template encoder is as follows:
Hs=Encsrc(X,θsrc) (4)
wherein EncsrcA presentation source encoder; x represents a sentence to be translated; enctmRepresenting a template encoder; thetasrcAnd thetatmRespectively representing parameters of a source sentence encoder and a template encoder, wherein the parameters of the source sentence encoder and the template encoder are not shared; hsThe vector representation containing the source sentence information obtained by the source sentence encoder encoding the source sentence is shown,the representation template encoder encodes a target translation template to obtain vector representation containing target translation template information; tm istgtRepresenting the matched target end translation template;
step 3.2: at a decoding end, a template coding-decoding attention sublayer is added, and template knowledge is introduced into a decoding process of a guide and constraint model in a decoder, so that a high-quality translation is obtained;
wherein the decoder is as follows:
adding a template coding-decoding attention sublayer on the basis of a Transformer decoder; the new decoder comprises four sub-layers: mask multi-head attention sublayer, template coding-decoding attention sublayer, source coding-decoding attention sublayer and feedforward neural network sublayer;
placing the template coding-decoding attention sublayer between the source coding-decoding attention sublayer and the mask multi-head attention sublayer so that the generated translation sequence is interacted and fused with the information of the target translation template earlier;
the decoder generates a translation as follows:
P(yt|x,Tmtgt,y<t;θ)∝exp(HdW) (7)
wherein HdRepresenting a vector representation which is obtained by decoding the context vector generated by the source encoder and the template encoder and contains the translation information by a decoder; DEC (·) denotes a decoder; y represents a translation sequence generated by the model; t represents the current decoding time step; y istRepresenting a target word generated at the current time step; θ represents a model parameter; p (-) represents a probability function generated by the translation; x represents a sentence to be translated; tm istgtRepresenting a target translation template; w represents the weight of the model full connection layer; exp (·) represents a probability function that generates the current word;
and 4, step 4: training a neural machine translation model based on a template by adopting a two-stage model training strategy;
step 4.1: the training data set is divided into two parts: a basic training set and a fine tuning training set;
wherein, the function of basic training set contains two: firstly, constructing a translation template library, and secondly, training a model; a target end translation template corresponding to the basic data set is directly extracted from the corresponding reference and completely corresponds to the source sentence;
searching a corresponding target translation template in the fine tuning data set by the multi-strategy template matching method described in the step 2;
step 4.2: obtaining a neural machine translation model based on a template by utilizing a two-stage model training strategy;
step 4.2.1: training the model by using the basic data set, and continuously updating the parameters of the iterative model, so that the model can capture the target translation template knowledge;
step 4.2.2: retraining the basic model by utilizing fine tuning data set training, updating iterative model parameters, and updating iterative model parameters by utilizing the data after template matching and screening to improve the robustness of the model;
and 5: and translating the sentences matched with the high-similarity translation template by using the translation neural model of the training model.
2. The neural machine translation method based on translation templates as claimed in claim 1, wherein step 1.3 comprises the following steps:
step 1.3.1: setting a length threshold value, and discarding translation templates which do not meet the length threshold value;
step 1.3.2: setting upper and lower thresholds of the abstraction degree, calculating the abstraction degree of the translation template, and abandoning the translation template which is not in the threshold range;
wherein, the translation template abstraction degree ScoreabsThe calculation method of (c) is as follows:
wherein, NumvaRepresents the number of variables of the translation template, and lt represents the number of words contained in the translation template.
3. The neural machine translation method based on translation templates as claimed in claim 1, wherein in step 4.1, the coarse-grained matching strategy threshold is set to 0.8.
4. The neural machine translation method based on translation templates as claimed in claim 1, wherein in step 4.1, the threshold of the fine-grained matching strategy is set to 0.9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110796282.0A CN113408307B (en) | 2021-07-14 | 2021-07-14 | Neural machine translation method based on translation template |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110796282.0A CN113408307B (en) | 2021-07-14 | 2021-07-14 | Neural machine translation method based on translation template |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113408307A CN113408307A (en) | 2021-09-17 |
CN113408307B true CN113408307B (en) | 2022-06-14 |
Family
ID=77686475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110796282.0A Active CN113408307B (en) | 2021-07-14 | 2021-07-14 | Neural machine translation method based on translation template |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113408307B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117034968B (en) * | 2023-10-10 | 2024-02-02 | 中国科学院自动化研究所 | Neural machine translation method, device, electronic equipment and medium |
CN117273027B (en) * | 2023-11-22 | 2024-04-30 | 四川语言桥信息技术有限公司 | Automatic machine translation post-verification method based on translation error correction |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5442546A (en) * | 1991-11-29 | 1995-08-15 | Hitachi, Ltd. | System and method for automatically generating translation templates from a pair of bilingual sentences |
CN101206643A (en) * | 2006-12-21 | 2008-06-25 | 中国科学院计算技术研究所 | Translation method syncretizing sentential form template and statistics mechanical translation technique |
CN107562734A (en) * | 2016-06-30 | 2018-01-09 | 阿里巴巴集团控股有限公司 | Translation template determination, machine translation method and device |
CN108874791A (en) * | 2018-07-06 | 2018-11-23 | 北京联合大学 | A kind of semantic analysis based on minimum semantic chunk and Chinese-English sequence adjusting method and system |
CN111611814A (en) * | 2020-05-08 | 2020-09-01 | 北京理工大学 | Neural machine translation method based on similarity perception |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304390B (en) * | 2017-12-15 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Translation model-based training method, training device, translation method and storage medium |
-
2021
- 2021-07-14 CN CN202110796282.0A patent/CN113408307B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5442546A (en) * | 1991-11-29 | 1995-08-15 | Hitachi, Ltd. | System and method for automatically generating translation templates from a pair of bilingual sentences |
CN101206643A (en) * | 2006-12-21 | 2008-06-25 | 中国科学院计算技术研究所 | Translation method syncretizing sentential form template and statistics mechanical translation technique |
CN107562734A (en) * | 2016-06-30 | 2018-01-09 | 阿里巴巴集团控股有限公司 | Translation template determination, machine translation method and device |
CN108874791A (en) * | 2018-07-06 | 2018-11-23 | 北京联合大学 | A kind of semantic analysis based on minimum semantic chunk and Chinese-English sequence adjusting method and system |
CN111611814A (en) * | 2020-05-08 | 2020-09-01 | 北京理工大学 | Neural machine translation method based on similarity perception |
Non-Patent Citations (2)
Title |
---|
基于最长名词短语分治策略的神经机器翻译;张学强 等;《中文信息学报》;20180331;第32卷(第3期);全文 * |
模板驱动的神经机器翻译;李强 等;《计算机学报》;20190331;第42卷(第3期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113408307A (en) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pust et al. | Parsing English into abstract meaning representation using syntax-based machine translation | |
KR101031970B1 (en) | Statistical method and apparatus for learning translation relationships among phrases | |
KR101099177B1 (en) | Unilingual translator | |
CN109992775B (en) | Text abstract generation method based on high-level semantics | |
WO2010046782A2 (en) | Hybrid machine translation | |
CN113408307B (en) | Neural machine translation method based on translation template | |
CN111611814B (en) | Neural machine translation method based on similarity perception | |
Bollmann | Normalization of historical texts with neural network models | |
CN107656921A (en) | A kind of short text dependency analysis method based on deep learning | |
Alnajjar | When word embeddings become endangered | |
Bakay et al. | A tree-based approach for English-to-Turkish translation | |
Vashistha et al. | Active learning for neural machine translation | |
He | Self-calibration system for pragmatic failure in English-Chinese translation based on big data | |
Nghiem et al. | Using MathML parallel markup corpora for semantic enrichment of mathematical expressions | |
CN116306594A (en) | Medical OCR recognition error correction method | |
Nguyen et al. | Korean morphological analysis for Korean-Vietnamese statistical machine translation | |
Seresangtakul et al. | Thai-Isarn dialect parallel corpus construction for machine translation | |
Shekhar et al. | Computational linguistic retrieval framework using negative bootstrapping for retrieving transliteration variants | |
Laukaitis et al. | Sentence level alignment of digitized books parallel corpora | |
Nathani et al. | Part of speech tagging for a resource poor language: Sindhi in Devanagari script using HMM and CRF | |
Satpathy et al. | Analysis of Learning Approaches for Machine Translation Systems | |
CN111597827A (en) | Method and device for improving machine translation accuracy | |
Wan et al. | Lexicon-constrained copying network for Chinese abstractive summarization | |
Gao et al. | Syntax-based chinese-vietnamese tree-to-tree statistical machine translation with bilingual features | |
Dong | Automatic Extraction of English‐Chinese Translation Templates Based on Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |