CN104199813A

CN104199813A - Pseudo-feedback-based personalized machine translation system and method

Info

Publication number: CN104199813A
Application number: CN201410491100.9A
Authority: CN
Inventors: 杨沐昀; 朱俊国; 赵铁军; 李生; 徐冰; 曹海龙; 朱聪慧; 郑德权
Original assignee: Harbin Institute of Technology
Current assignee: Harbin University of technology high tech Development Corporation
Priority date: 2014-09-24
Filing date: 2014-09-24
Publication date: 2014-12-10
Anticipated expiration: 2034-09-24
Also published as: CN104199813B

Abstract

The invention relates to a pseudo-feedback-based personalized machine translation system and method. The existing traditional machine translation methods are unavailable for the obtaining of high-quality personalized translation systems, and the various translation demands of users cannot be met. The pseudo-feedback-based personalized machine translation system comprises a phrase table filter module, an input module, an initial translation module, a pseudo-feedback search module, a phrase table sorting module and a decoder module. The pseudo-feedback-based personalized machine translation method includes the steps: an inputting step, a user inputs a translation task S; an initial translation step, an initial machine translation result T' of the translation task is obtained with the initial translation module; a pseudo-feedback search step, the pseudo-feedback search module is used to search to obtain initial translation results and standard translations R of similar translation instances; a phrase table sorting step, a trained universal post-editing model is turned into a personalized post-editing model, and filtering is performed to obtain an optimized post-editing model; a decoder module decoding step, the optimized personalized post-editing model is used to decode the initial machine translation result T' of the translation task so as to obtain an optimal final translation result. The pseudo-feedback-based personalized machine translation system and method is applicable to the field of machine translation.

Description

Personalization machine translation system and method based on puppet feedback

Technical field

The present invention relates to a kind of personalization machine translation system and method, belong to mechanical translation field.

Background technology

Developing rapidly along with machine translation mothod in recent years, the quality of its translation has had lifting to a great extent, and the obstacle that some general translation on line services can help people to break through language at present removes to read and understand texts across language that some are commonly used.Yet further the quality of hoisting machine translation has but run into very large difficulty.On the one hand, because existing statistical machine translation technology major defect is, if complete personalized translation, needs a large amount of field feedbacks, and in these data, add up and train modeling, realize the Machine Translation Model of property one by one.And these acquisitions of training required field feedback are very difficult, and existing method cannot effectively utilize these feedback informations, thereby cannot obtain high-quality personalized translation system.Although can utilize field feedback by traditional postedit, due to can user's data less, cause the advantage of adding up postedit model to be difficult to bring into play.On the other hand, the optimization aim of traditional machine translation method is normally based on open field, rather than is directed to that specific translation duties carries out.Although have the research for domain-adaptive problem, but still belong to for professional colony, and in the face of extensively various mechanical translation user especially internet online user, cannot meet the various translate requirements of user.So further improve the quality of mechanical translation, be that we want a technical matters urgently to be resolved hurrily.

Summary of the invention

The object of the invention is cannot obtain high-quality personalized translation system in order to solve traditional machine translation method, cause the problem that can not meet the various translate requirements of user, and propose a kind of personalization machine translation system and interpretation method based on puppet feedback that can improve mechanical translation quality.

A personalization machine translation system based on puppet feedback, described translation system comprises:

For the phrase table filtering module that each general postedit model phrase table of exploitation collection data is filtered;

For obtaining the load module of the translation duties S of user's input;

For user is inputted, translate after translation duties S and obtain the preliminary mechanical translation result T ' of translation duties, the preliminary translation module that obtains the preliminary translation of the sentence T of translation instance translated in the source language sentence in the translation instance storehouse that local system is provided;

For in the translation instance storehouse of local system word alignment form, retrieve the pseudo-feedback searching module of the preliminary translation result and the standard translation translation R that obtain similar translation example;

For being classified, the phrase table of postedit model after training obtains the phrase table sort module of personalized postedit model;

The preliminary mechanical translation result that is used for pseudo-feedback searching module retrieval to obtain is decoded, and obtains the decoder module of final translation result.

A kind of personalization machine interpretation method based on puppet feedback, before user inputs translation duties S, utilize preliminary mechanical translation sentence T and the standard translation translation R of the translation instance in translation memory to adopt statistical method to train general postedit model, complete the training process of general postedit model; Described personalization machine interpretation method is realized by following steps:

Step 1, phrase table filters mold process: utilize phrase table filtering module to filter each general postedit model phrase table of exploitation collection data;

Utilize the result after filtering to adopt default-weight to each sentence D in exploitation collection data _idecode, produce n-best translation result; Then, n-best translation result is carried out to combination; Finally, the whole ginseng of adjusting of n-best translation result after using MERT instrument to combination, can also realization character parameter optimisation procedure;

Step 2, input process: user inputs to load module by translation duties S;

Step 3, preliminary translation process: described preliminary translation process comprises that user inputs translation duties S before and user inputs translation duties S two parts afterwards;

Before user inputs translation duties S, the transcription platform that utilizes the machine translation system of local system to build, the source language sentence in the translation instance storehouse that local system is provided is tentatively translated, and obtains the preliminary translation of the sentence T of translation instance;

Meanwhile, obtain the translation duties S of user's input by load module after, utilize preliminary translation module translation to obtain the preliminary mechanical translation result T ' of translation duties;

Step 4, pseudo-feedback searching process: according to the preliminary mechanical translation sentence of the translation instance obtaining in step 3 T, in the translation instance storehouse of local word alignment form, utilize pseudo-feedback searching module to carry out the retrieval of cosine similarity with source language word bag model, obtain preliminary translation result and the standard translation translation R of similar translation example, and select the most similar front 900-1100 from the preliminary translation result of similar translation example and the result for retrieval of standard translation translation R;

Wherein, described cosine similarity CS calculates according to take the vector space model that source language word bag model is unit, and the computing method of described cosine similarity CS are:

CS (S_{input}, S_{example}) = \frac{Vec (S_{input}) \cdot Vec (S_{example})}{| | Vec (S_{input}) | | * | | Vec (S_{example}) | |},

Wherein, Vec (S _example) be the source language sentence subvector of translation instance, Vec (S _input) be translation duties vector, Vec (S _input) Vec (S _example) be two vectorial inner products, || || be vectorial norm;

Step 5, phrase table assorting process: the preliminary translation result of front 900-1100 the most similar similar translation example of selecting according to step 4 and standard translation translation R, utilize phrase table sort module that the phrase table of the general postedit model after training is categorized as to the passive phrase that contributes to promote the positive phrase of translation quality and final translation result is incorporated to noise, make the general postedit model after training become personalized postedit model, again by the preliminary translation result of the similar translation example that in the positive phrase in personalized postedit model and passive phrase and step 4, pseudo-feedback searching procedural retrieval goes out and standard translation translation R contrast, described passive phrase is filtered out from personalized postedit model phrase table, thereby obtain the personalized postedit model of an optimization,

Step 6, decoder module decode procedure: using the personalized postedit model optimized in step 5 as translation model, utilize demoder to use traditional mechanical translation coding/decoding method to decode to the preliminary mechanical translation result T ' of the translation duties of step 3 acquisition, obtain the final translation result of goodization.

Beneficial effect of the present invention is: the present invention utilizes pseudo-feedback searching module to retrieve carrying out similar translation example in translation instance storehouse, by phrase table sort module, general postedit phrase is classified again, filter out passive postedit phrase, select postedit rule and obtain the personalized postedit model of optimizing, thus the quality of hoisting machine translation.In addition, application characteristic parameter optimisation procedure when building postedit model in preliminary translation process, and in characteristic parameter optimizing process, for given exploitation collection data, carrying out to input decoded respectively, then carry out integral body and adjust ginseng, there is the benefit of effective Optimal Parameters, elevator system performance.Especially, utilizing pseudo-feedback searching module in the concentrated retrieving of local translation instance database data, obtain the parallel statement similar with the preliminary translation result of sentence to be translated of being inputted acquisition by user to replacing feedback information, thereby solved, be difficult to obtain this problem of field feedback.

In addition, the inventive method has well been utilized feedback information, on initial translation model, set up postedit model effectively, the translation result that the personalization machine translation system of feeding back based on puppet of the present invention and method obtain and the translation result of Google contrast, and its translation quality has improved 19.5%; The translation result of the machine translation system training with Moses instrument contrasts, and its translation quality has improved 14.1%

Accompanying drawing explanation

Fig. 1 is translation flow schematic diagram of the present invention.

Embodiment

Embodiment one:

The personalization machine translation system based on puppet feedback of present embodiment, described translation system comprises:

For obtaining the load module of the translation duties S of user's input;

Embodiment two:

Different from embodiment one, the personalization machine translation system based on puppet feedback described in present embodiment, described phrase table filtering module is contained in described phrase table sort module.

Embodiment three:

The personalization machine interpretation method based on puppet feedback of present embodiment, before user inputs translation duties S, utilize preliminary mechanical translation sentence T and the standard translation translation R of the translation instance in translation memory to adopt statistical method to train general postedit model, complete the training process of general postedit model; Described personalization machine interpretation method is realized by following steps:

Step 2, input process: user inputs to load module by translation duties S;

CS (S_{input}, S_{example}) = \frac{Vec (S_{input}) \cdot Vec (S_{example})}{| | Vec (S_{input}) | | * | | Vec (S_{example}) | |},

Embodiment four:

Different from embodiment three, the personalization machine interpretation method based on puppet feedback described in present embodiment, decode procedure utilizes formula described in step 6: the preliminary mechanical translation result T ' that processes translation duties obtains the final translation result of goodization; In formula, P (T " | T ') be the translation probability of general postedit model; P (S|T ", T ') in general postedit model, utilize phrase to (T "; T ') the preliminary mechanical translation sentence T ' of the translation duties of the translation duties S of given input is carried out to the probability of postedit model; defining its probable value is 1 or 0, then by following two kinds of methods, obtains the value of P (S|T ", T '):

1) phrase in the personalized postedit model of optimizing is to (P _t, P _r) in two phrases respectively with the preliminary mechanical translation result T ' of translation duties and standard translation translation R in while having at least one phrase to match, the probable value of P (S|T ", T ') gets 1, otherwise gets 0;

2) phrase in the personalized postedit model of optimizing is to (P _t, P _r) in phrase P _rwhen having at least one phrase to match in standard translation translation R, the probable value of P (S|T ", T ') gets 1, otherwise gets 0.

Embodiment five:

Different from embodiment three or four is, personalization machine interpretation method based on puppet feedback described in present embodiment, while carrying out described in step 4 pseudo-feedback searching process, from the preliminary translation result of similar translation example and the result for retrieval of standard translation translation R, select the most similar first 1000.

Adopt the Olympics of IWSLT2012 as the translation duties of user's input, utilize personalization machine translation system and the method based on puppet feedback of this translation duties data test the present invention design, the training data that the translation duties of user's input provides is the spoken field of tourism, covered the concrete application scenario such as traffic under Olympic Games application background, food and drink, stadiums, commercial affairs, comprise altogether 52,603 pairs of Chinese-English bilingual sentences are right, be specially 495,638 Chinese words and 527,599 English words, the local translation instance of the personalization storehouse using it as user.Adopted and comprised the exploitation collection that 2, the 057 pairs of Chinese-English bilingual sentences are right and comprise 998 pairs of test sets that Chinese-English bilingual sentence is right; Preliminary translation module has been used Google's translation on line system, has crawled the translation result of above-mentioned language material from Google's translation on line system, and translation quality evaluation criterion adopts BLEU-4, by the test result obtaining directly and Google's translation result contrast.Meanwhile, the machine translation system that the Moses instrument that use is increased income trains, as second group of control test, contrasts.

With BLEU-4, must be divided into evaluation criterion, the translation result contrast of the personalization machine translation system based on puppet feedback of the present invention's design and the translation result that method obtains and Google's translation on line system, its translation quality has improved 19.5%; The translation result of the machine translation system training with Moses instrument contrasts, and its translation quality has improved 14.1%, and test result is as shown in table 1:

Table 1: the translation quality contrast of the personalized translation result based on puppet feedback and other system translation result.

Claims

1. the personalization machine translation system based on puppet feedback, is characterized in that, described translation system comprises:

For obtaining the load module of the translation duties S of user's input;

2. the personalization machine translation system of feeding back based on puppet according to claim 1, is characterized in that, described phrase table filtering module is contained in described phrase table sort module.

3. the personalization machine interpretation method of the personalization machine translation system based on above-mentioned pseudo-feedback, it is characterized in that: before user inputs translation duties S, utilize preliminary mechanical translation sentence T and the standard translation translation R of the translation instance in translation memory to adopt statistical method to train general postedit model, complete the training process of general postedit model; Described personalization machine interpretation method is realized by following steps:

Step 1, phrase table filter process: utilize phrase table filtering module to filter each general postedit model phrase table of exploitation collection data;

Step 2, input process: user inputs to load module by translation duties S;

CS (S_{input}, S_{example}) = \frac{Vec (S_{input}) \cdot Vec (S_{example})}{| | Vec (S_{input}) | | * | | Vec (S_{example}) | |},

4. the personalization machine interpretation method feeding back based on puppet according to claim 3, is characterized in that: described in step 6, decode procedure utilizes formula: the preliminary mechanical translation result T ' that processes translation duties obtains the final translation result of goodization; In formula, P (T " | T ') be the translation probability of general postedit model; P (S|T ", T ') in general postedit model, utilize phrase to (T "; T ') the preliminary mechanical translation sentence T ' of the translation duties of the translation duties S of given input is carried out to the probability of postedit model; defining its probable value is 1 or 0, then by following two kinds of methods, obtains the value of P (S|T ", T '):

5. according to the personalization machine interpretation method based on puppet feedback described in claim 3 or 4, it is characterized in that: while carrying out described in step 4 pseudo-feedback searching process, from the preliminary translation result of similar translation example and the result for retrieval of standard translation translation R, select the most similar first 1000.