CN103646019A

CN103646019A - Method and device for fusing multiple machine translation systems

Info

Publication number: CN103646019A
Application number: CN201310751047.7A
Authority: CN
Inventors: 刘宇鹏
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2013-12-31
Filing date: 2013-12-31
Publication date: 2014-03-19

Abstract

The invention discloses a method and a device for fusing multiple machine translation systems, relates to the related field of machine translation, and aims at solving the problems that the information of a decoding process is not fully considered by the traditional method for carrying out system fusion on post-treatment and a search space in decoding cannot be fully considered. The device for fusing of the multiple machine translation systems comprises a preprocessor, a phrase extractor, a voice model generator, a plurality of machine translation system trainers and a decoder. The method comprises 1, pretreating machine translation systems; 2, building a translation hypergraph of each translation system; 3, fusing two translation hypergraphs and training a training set, wherein training comprises two parts, wherein a single machine translation system before fusion adopts a BTG ordering model of the maximum entropy training, and the machine translation system after fusion adopts MERT of the minimum error rate training; 4, decoding a test set to generate a translation result, and grading the translation result. The method and the device are applied to the field of machine translation.

Description

Method and device that a kind of a plurality of machine translation system merges

Technical field

The present invention relates to method and device that a plurality of machine translation systems merge, belong to the association area of mechanical translation.

Background technology

Along with the fast development of computing machine, utilize computing machine to realize the translation technology between different language, well known already.It is that the output N-best result of a plurality of systems is merged that machine translation system merges, and generates new translation result.And the translation result that has proved fusion is better than the output of individual system.According to the granularity merging, divide and comprise Sentence-level, phrase level and word level, have obtained significantly performance recently and improved, but these methods are all to merge in the aftertreatment of mechanical translation in the word one level system integration technologies based on confusion network.The method that traditional system of doing in aftertreatment merges does not take into full account the information of decode procedure, and the fusion in aftertreatment can not take into full account search volume huge in decoding.The present invention is merged in the process of model decoding.Along with the development of Parallelizing Techniques, algorithm time complexity and space complexity can be accepted.

Hypergraph since 19 century 70s just in discrete mathematics many modeling problems obtained application, also hypergraph is called to Directed Hypergraph (Gallo, 1993).It is the stratification search volume that can solve with dynamic programming for abstract, namely a large problem is become to subproblem and divides and rule.Hypergraph is sensu lato figure, and its limit can be connected to the summit of any amount.Directed Hypergraph be a Weight collection W to H=< X, E >, X is the set on summit;

be the set on super limit, E is that (subset of X) Φ, wherein P (X) is the power set of X, wherein V to P ^*represent tail node set, V represents a node set; W is the set of weight; Each super limit e ∈ E is a tlv triple e=< T (e), h (e), f _e>, wherein T (e) ∈ V ^*the ordered sequence of caudal knot point, because caudal knot point may be null set, so belong to the closure of caudal knot point.H (e) ∈ V is a node, f _e: R ^{| T (e) |}the weight function of → R (R represents real number space, | T (e) | represent the radix of caudal knot point ordered sequence); The all nodes that are associated with super limit are all called super node, and every stature supernode is all connected with a plurality of super limits, and h (e) is called to source node, definition | T (e) | and be first number on super limit; If first number on certain super limit is 0, so the weight function f on this super limit _e∈ R is a constant.First number that in hypergraph, the greastest element number on all super limits is hypergraph.The super limit that unit's number is 1 is canonical limit, and the hypergraph that first number is 1 is regular graph (lattice).

Word figure (Word Lattice) be exactly first number be 1 hypergraph, word figure is modal hypergraph.At mechanical translation field word figure, mean the data structure important tool of from left to right translating between decode empty, hypergraph is more extensive word figure, not only can represent from left to right, between decode empty, also can to represent Down-Up between decode empty.

Translation hypergraph is to be based upon on the basis of hypergraph, a corresponding super limit (derivation) of translation rule; The weight function on the corresponding super limit of weight of translation rule.Translation node is the part translation generating in translation process, and with various eigenwerts.Translation hypergraph is for bilingual mechanical translation modeling, not only has source language, also has target language, on translation hypergraph, can derive, and deriving is exactly the process of translation.

Summary of the invention

The present invention will solve the information that the method for the traditional system of doing in aftertreatment fusion does not take into full account decode procedure, can not take into full account the problem of search volume huge in decoding with the fusion in aftertreatment, and the method and the device that provide a kind of a plurality of machine translation system to merge.

The device that a kind of a plurality of machine translation system merges comprises single language or bilingual pretreater, phrase extraction device, language model maker, a plurality of machine translation system training aids and demoder;

Described single language or bilingual pretreater are to single language and bilingually carry out pre-service; Phrase extraction device extracts phrase and is put in phrase table from bilingual corpus; Use language model maker to train language model from single language corpus; Machine translation system before fusion is used phrase table and language model to train, and the parameter weight that training is obtained is as the weight of final demoder; Demoder is that testing material decoding is generated to translation result, and translation result is evaluated and tested to output score.

The method that a plurality of machine translation systems merge realizes according to the following steps:

One, the preprocessing process of machine translation system;

Two, set up the translation hypergraph of each translation system;

Three, merge two translation hypergraphs and training set is trained;

Wherein, described training package contains two parts: the individual machine translation system before fusion adopts the BTG of maximum entropy training to adjust the machine translation system after order model and fusion to adopt minimal error rate training MERT;

Four, test set is decoded and generated translation result, and translation result is marked, completed a kind of method that a plurality of machine translation system merges.

Effect of the present invention:

The present invention merges a plurality of different mechanical translation, improves mutually performance, than the obvious BLEU mark that has improved 7 percentage points of single system.The benefit merging in the process of model decoding is the restriction that is not subject to Machine Translation Model, is not subject to the restriction of training algorithm, as long as decoding process is similar just can be merged, has good extensibility.

Accompanying drawing explanation

Fig. 1 is the installation drawing that a plurality of machine translation systems merge;

Fig. 2 is process flow diagram of the present invention;

Fig. 3 is the result figure after participle;

Fig. 4 is the result figure after part-of-speech tagging;

Fig. 5 is the result figure after syntactic analysis;

Fig. 6 is the sentence figure that contains syntax, bilingual alignment and phrase information;

Fig. 7 is the phrase figure that tree extracts to string machine translation system;

Fig. 8 is the sentence figure that contains bilingual alignment and phrase information;

Fig. 9 is the phrase figure that phrase machine translation system extracts;

The translation hypergraph that Figure 10 (a) generates based on maximum entropy BTG;

The translation hypergraph that Figure 10 (b) generates based on SCFG;

Figure 10 (c) merges two kinds of translation hypergraphs that the syntax generate afterwards;

Figure 11 is the training process figure of mechanical translation;

Figure 12 (a) is the MERT training exemplary plot that score is weighed;

Figure 12 (b) is the wrong MERT training exemplary plot of weighing;

Figure 13 is CYK decoding table exemplary plot;

Figure 14 is the translation result figure that uses Figure 13 decoding table to generate;

Figure 15 is the false code figure that mechanical translation merges main algorithm;

Figure 16 is the false code figure of core function Add_Edge in decoding.

Embodiment

Embodiment one: the device that a plurality of machine translation systems of present embodiment merge comprises single language or bilingual pretreater, phrase extraction device, language model maker, a plurality of machine translation system training aids and demoder;

Embodiment two: the method that a plurality of machine translation systems of present embodiment merge realizes according to the following steps:

One, the preprocessing process of machine translation system;

Two, set up the translation hypergraph of each translation system;

Three, merge two translation hypergraphs and training set is trained;

Wherein, described training package contains two parts: the individual machine translation system before fusion adopts the BTG of maximum entropy training to adjust the machine translation system after order model and fusion to adopt minimal error rate training (MERT);

Modern machine translation mothod is to be all based upon on the basis of the bilingual syntax, and the syntax are four-tuple

G=(V _n, V _t, P, S), V wherein _nnon-terminal set, the non-terminal that has comprised source language and target language; V _tthe finishing sign set that has comprised source language and target language, and V _n∩ V _t=Φ; Whole assemble of symbol V=V _n∪ V _t; P is production set,

time production set, a node is V _nelement, caudal knot point is V ^** V ^*element; S is unique begin symbol, S ∈ V _n.

Present embodiment adopts two kinds of classical bilingual syntax to merge, because be merges in decoding, so be not subject to grammatical restriction, can certainly expand the fusion of the bilingual syntax of other types, and training process is also a self-contained process, can adopt classical minimal error rate training (MERT) algorithm.Under regard to these two kinds of bilingual syntax of classical mechanical translation and be introduced:

1. the syntax (BTG) transcribed in bracket: by machine learning algorithm, realize the bilingual tune order in mechanical translation, in decoding, according to the word order of source language, generate translation result, complete automatically tune order and the generative process of target language.

For example: for translating Chinese sentence " tianshang de yuncai ", can match BTG rule has two classes:

(a) vocabularyization rule:

X→＜tianshang?de,in?the?sky＞(1)

X→＜yuncai,cloud＞(2)

(b) adjust order rule:

S \overset{< >}{&RightArrow;} < X_{de; yuncai}, X_{clouds; in} > - - - (3)

The left side of rule is the non-terminal generating, and the front portion on right side is source language, and rear portion is target language.For adjust order rule above arrow for angle brackets are backwards, top is that square bracket are positive sequence; The subscript left side on rule right side is left side linguistic context, and the right side is right side linguistic context, if the left side linguistic context of source language is de, and right side linguistic context yuncai; The left side linguistic context of target language is clouds, and right side linguistic context is in.The language ambience information of source language and target language is mainly used for using machine learning algorithm training to adjust order model (present embodiment is used maximum entropy kit to train).

The process that generates target language is the process of a derivation, and present embodiment adopts two kinds of method for expressing to represent:

Traditional method:

S \overset{(3)}{&DoubleRightArrow;} < X_{de; yuncai}, X_{clouds; in} > \overset{(1), (2)}{&DoubleRightArrow;} < tianshangdeyuncai, cloudinthesky >

Symbol on derivation symbol ((1), (2) and (3)) represents the rewriting rule of using.

Method based on theorem proving, this method and a little difference above, used bottom-up mode to represent:

Axiom:

\underset{X &RightArrow; < tianshangde, inthesky > : w 1 X &RightArrow; < yuncai, cloud > : w 2}{&OverBar;} - - - (4)

Derivation step:

\frac{X &RightArrow; < tianshangde, inthesky > : w 1}{[X, 0,2] : w 1} - - - (5)

\frac{X &RightArrow; < yuncai, cloud > : w 2}{[X, 2,3] : w 2} - - - (6)

\frac{S \overset{< >}{&RightArrow;} < X_{de; yuncai}, X_{clouds; in} > : w 3 [X, 0,2] : w 1 [X, 2,3] : w 2}{[S, 0,3] : w 1 \times w 2 \times w 3} - - - (7)

Derived object: [S, 0,3]: w1 * w2 * w3 (8)

The former piece that the above representation theorem of whippletree is derived, below represents the consequent of deriving, and is the weight of using every rule on the right side of rewriting rule, in the present embodiment in order to make proof procedure simpler, all weights be set to 1.In the process of proof, axiom (4) does not need to derive, so there is not any symbol more than whippletree; Derivation step (5) has been used first consequent rule of axiom (4); Derivation step (6) has been used (4) second consequents of axiom; Derivation step (7) has been used the consequent rule of derivation step (5) and (6) and has been adjusted order rule (3), obtain final conclusion [S, 0,3]: w1 * w2 * w3(is derived object (8)), be illustrated in span and be in 0 to 3 cell and storing begin symbol (being the begin symbol that reduction arrives sentence), its weight is w1 * w2 * w3, because each the rewriting rule weight arranging is in this example 1, so final weight result is 1.Figure 10 (a) diagram has provided the derivation of the translation hypergraph generating based on maximum entropy BTG.

2. synchronous context Grammars (SCFG): the expansion for upper and lower Grammars in traditional Chomsky normal form, make its task of being applicable to mechanical translation, in extracting grammatical process, tune order model is based upon in every rewriting rule of translation model.

For matching two SCFG rewriting rules in previous example:

X→＜tianshang?de?X ₁,X ₁on?the?sky＞(1)

X→＜X ₁yuncai,clouds?X ₁＞(2)

。Except the rewriting rule extracting, also have two special viscous rules (GlueRule) from training is expected:

S→＜S ₁X ₂,S ₁X ₂＞(3)

S→＜X ₁,X ₁＞(4)

, the subscript of nonterminal symbol represents the corresponding relation of nonterminal symbol.

Synchronous context Grammars derivation also adopts two kinds of method for expressing, and classic method is:

S \overset{(3) (4)}{&DoubleRightArrow;} < X_{1} X_{2}, X_{1} X_{2} > \overset{(1)}{&DoubleRightArrow;} < tianshangde X_{1}, X_{1} onthesky >

\overset{(2)}{&DoubleRightArrow;} < tianshangdeyuncai, cloudsonthesky >

Method based on theorem proving:

Axiom:

\underset{X &RightArrow; < tianshangde X_{1}, X_{1} onthesky > : w 1 X &RightArrow; < X_{1} yuncai, clouds X_{1} > : w 2}{&OverBar;}

Derivation step:

\frac{X &RightArrow; < tianshangde X_{1}, X_{l} onthesky > : w 1}{[X, 0,2] : w 1} - - - (5)

\frac{X &RightArrow; < X_{1} yuncai, clouds X_{1} > : w 2}{[X, 2,3] : w 2} - - - (6)

\frac{S &RightArrow; < X_{1}, X_{1} > : w 3 [X, 0,2] : w 1}{[S, 0,2] : w 1 \times w 3} - - - (7)

\frac{S &RightArrow; < S_{1} X_{2}, S_{1} X_{2} > : w 4 [S, 0,2] : w 1 \times w 3 [X, 2,3] : w 2}{[X, 0,3] : w 1 \times w 2 \times w 3 \times w 4} - - - (8)

Derived object: [S, 0,3]: w1 * w2 * w3 * w4 (9)

The derivation of its derivation and BTG is similar, and Figure 10 (b) diagram has provided the translation hypergraph generating based on SCFG.

Two kinds of syntax respectively have relative merits, and this also provides possibility: BTG to use powerful machine learning algorithm for merging, and can better adjust order, but limited in one's ability in view of this model representation, can be subject to the restriction of model; SCFG model representation ability is relatively strong, has used relatively simple maximum likelihood to estimate in estimated parameter, adjusting program process to be based upon in model, does not need independent tune order module, completes bilingual tune order.

In order to illustrate that this fusion method is shown in Figure 10 (c), solid line is the rule of BTG, and dotted line is SCFG.Figure 10 (c) finds to generate " clouds in the sky " due to SCFG and BTG, can share SCFG and BTG translation node (part translation) above, and these results that just two kinds of syntax generated have merged.

Present embodiment effect:

Present embodiment is that a plurality of different mechanical translation is merged, and improves mutually performance, than the obvious BLEU mark that has improved 7 percentage points of single system.The benefit merging in the process of model decoding is the restriction that is not subject to Machine Translation Model, is not subject to the restriction of training algorithm, as long as decoding process is similar just can be merged, has good extensibility.

Embodiment three: present embodiment is different from embodiment two: the preprocessing process of machine translation system is specially:

(1) source language and target language carry out participle;

(2) sentence that need to carry out part-of-speech tagging carries out part-of-speech tagging, aligns to bilingual simultaneously;

(3) sentence that need to carry out syntactic analysis carries out syntactic analysis;

(4) alignment information and part of speech & syntactic information are combined;

(5) extract phrase, and calculate the feature score relevant to phrase.

In order better to understand pretreated process, present embodiment adopts tree to be introduced to the model of string, at Fig. 3, is to carry out participle for sentence " Bush and salon have held talks "; The sentence of Fig. 4 after for participle carries out part-of-speech tagging; The result that the sentence of Fig. 5 after for participle and part-of-speech tagging carries out syntactic analysis; Fig. 6 is bilingual alignment result, and source language syntax analysis result combines, and marks the phrase of black part for extracting; Fig. 7 is the phrase extracting for Fig. 6.Fig. 8 is the phrase that may extract out in alignment result.By relatively know (comparison that is Fig. 7 and Fig. 9) for the phrase of two kinds of machine translation systems, find that phrase machine translation system has more phrase than tree to string machine translation system, along with the increase of pre-service sentence scale, can be exponent increase, but the phrase increasing do not have syntactic structure.The phrase that meets syntactic information in the sentence of part translation can increase translation performance, and the phrase that does not meet syntactic information in the sentence of another part translation can increase translation performance.

Other step and parameter are identical with embodiment two.

Embodiment four: present embodiment is different from one of embodiment two or three: merge two translation hypergraphs in step 3 and be specially:

By hypergraph, translation process is carried out to modeling, therefore first need to introduce implicit variable derivation d and represent each derivation, so P (e|f) can be expressed as follows:

p (e | f) = \underset{d}{Σ} p (e, d | f)

Then, to every in the right of formula above and formula, carrying out probability of use formula expands into:

p(e,d|f)＝p(d|f)p(e|d,f)

Formula is decomposed into 2 factors by p (e, d|f) above, the corresponding submodel of each factor; Wherein the 2nd submodel p (e|d, f) is corresponding from source language with obtain the process of target language deriving, and due to source language with derive and determine, target language namely determines, thus can neglect,

p (e, d | f) = p (d | f) = \frac{e^{γh (e, f, d)}}{\underset{γ, d}{Σ} e^{γh (e, f, d)}}

Wherein h (e, f, d) is proper vector, and γ is feature weight vector, has completed the fusion of two translation hypergraphs.

In the system realizing in present embodiment, adopted following characteristics:

(1) two-way translation probability: Trans (e|f) and Trans (f|e)

(2) two-way vocabulary translation probability: Lex_Trans (e|f) and Lex_Trans (f|e)

(3) probabilistic language model: LM (e)

(4) translation process is used regular number: Num (Rule)

(5) in translation process, use the number of phrase: Numphrase (f, e)

(6) probability of the Twisting model of maximum entropy: Distortion (f, e)

(7) generate the word number of translation: NumWord (e)

Amount to 9 features, in each model, have 8 features.Two some features of model are shared: as two-way translation probability, and two-way vocabulary translation probability, probabilistic language model, generates the word number of translating; Some be exclusively enjoy used " translation process is used regular number " feature as SCFG, BTG has used " probability of the Twisting model of maximum entropy ", does not need to repeat score for such feature when merging.

Other step and parameter are identical with embodiment two or three.

Embodiment five: present embodiment is different from one of embodiment one two to four: used maximum entropy model to train when the BTG of maximum entropy training adjusts order model in step 3, maximum entropy is the training algorithm of a protruding optimization, and when reaching maximum entropy, be probability distribution average in limited features.Its basic training formula is as follows:

p^{*} = \underset{E_{p (f)} = E_{\overset{&OverBar;}{p} (f)}}{\arg \max} (H (Y | X))

The implication that this formula represents for

probability p when making conditional entropy H (Y|X) preferably under restriction ^*, E wherein _{p (f)}the expectation value of representation feature f in model, the expectation value of representation feature f in sample, conditional entropy can be unfolded as follows:

H (Y | X) = Σ_{(x, y)} p (y | x) \overset{&OverBar;}{p} (x) \log \frac{1}{p (y | x)}

The algorithm that training adopts is the plan gradient algorithm of conventional limited memory, and the formula of its realization is as follows, first f (x) at x _kexpansion becomes Taylor progression second order and launches to obtain formula:

f (x) \approx φ (x) f (x_{k}) + &dtri; f {(x_{k})}^{T} (x - x_{k}) + \frac{1}{2} {(x - x_{k})}^{T} {&dtri;}^{2} f (x_{k}) (x - x_{k})

In order to ask the extreme point of φ (x), order

be

if

reversible, by formula above, can be obtained the iterative formula of Newton method:

x_{k + 1} = x_{k} - {&dtri;}^{2} f {(x_{k})}^{- 1} &dtri; f (x_{k})

Wherein

be Hassian matrix, with the correction matrix in LBFGS, being similar to Hassian matrix is H _k:

H_{k + 1} = (I - ρ_{k} s_{k} y_{k}^{T}) H_{k} (I - ρ_{k} y_{k} s_{k}^{T}) + ρ_{k} s_{k} s_{k}^{T}

s_{k} = x_{k + 1} - x_{k}, y_{k} = &dtri; f (x_{k + 1}) - &dtri; f (x_{k}), ρ_{k} = \frac{1}{y_{k}^{T} s_{k}}

Conventional method is initialization second derivative (He Sen) matrix H ₀be made as unit matrix I.

Other step and parameter are identical with one of embodiment one to four.

Embodiment six: present embodiment is different from one of embodiment two to five: in step 3 the training algorithm of minimal error rate training MERT training process classics use below formula represent training process (training process flow diagram is shown in Figure 11):

e^{*} (f, γ) = \underset{e &Element; C_{f}}{\arg \max} {a (e, f) + γ \cdot b (e, f)}

C wherein _fbe the set of all translation candidates formation of source language f, this formula represents to find the best translation result e that has Different Slope γ ^*(f, γ), Figure 12 (a) and Figure 12 (b) have been used 6 candidates to translate set C _f={ e ₁, e ₂, e ₃, e ₄, e ₅, e ₆score weigh and the wrong MERT of measurement training exemplary plot.

Other step and parameter are identical with one of embodiment one to five.

Embodiment seven: present embodiment is different from one of embodiment two to six: in step 4, test set is decoded and generated translation result.

Whole demoder is to be based upon on CYK algorithm basis, has adopted beam search strategy.If Viterbi chooses top score; If Crunching, Partial Feature score is added and.Partial Feature is the feature except language model and word punishment common characteristics; Two algorithms (main algorithm is shown in Figure 15, and the key algorithm Add_Edge that main algorithm calls is shown in Figure 16) all identify for committed step; For eliminating, pseudo-ambiguity decoding formula is as follows:

\hat{e} = \arg \max (Σ_{d &Element; D (e, f) \cap ND (e)} P (f, d | e))

For generating, consistance translation decoding formula is as follows:

\hat{e} \arg \max (Σ_{e^{'} &Element; T (f)} Loss (e, e^{'}) P (e^{'} | f))

P (e ' | f) represent to be generated by source language the probability of target language; P (e, d|f) is for adding the probability after hidden variable derivation d; D (e, f) represents the set about source language and all derivations of target language; ND (e) represents to generate n-best translation result; LOSS (e, e ') is in order to calculate a loss function of minimum Bayes risk; T ' is (f) verify hypothesis space;

The source language in consistance decoding to the translation probability of target language become generate same target language with,

P(e'|f)＝∑ _{d∈D(e,f)∩ND(e)}P(f,d|e)

In the work of laying a foundation property of statistical machine translation, be to have used source channel model to carry out modeling to translation process, so all translation process is called to decoding in research on the machine translation subsequently.The task of a machine translation system is exactly that the source language sentence f of input translation is become to object language sentence e.It is exactly merging of a plurality of systems that mechanical translation merges, and uses for reference mutually translation information, has adopted CYK decoding in present embodiment, and the process of its decoding is shown in Figure 13 and Figure 14.In Figure 13, each pane represents a translation unit, and the inside content is the translation rule adopting of this part translation of generation, and red line represents the result that final decoding finds best translation result to adopt, and the syntax tree form of translation result is shown in Figure 14.

Other step and parameter are identical with one of embodiment two to six.

Claims

1. the device that a plurality of machine translation systems merge, is characterized in that the device that a plurality of machine translation systems merge comprises single language or bilingual pretreater, phrase extraction device, language model maker, a plurality of machine translation system training aids and demoder;

Described single language or bilingual pretreater are to single language and bilingually carry out pre-service; Phrase extraction device extracts phrase and is put in phrase table from bilingual corpus; Use language model maker to train language model from single language language material; Machine translation system before fusion is used phrase table and language model to train, and the parameter weight that training is obtained is as the weight of final demoder; Demoder is that testing material decoding is generated to translation result, and translation result is evaluated and tested to output score.

2. application rights requires the method that the device of a kind of a plurality of machine translation systems fusions described in 1 carries out a plurality of machine translation system fusions, it is characterized in that a kind of method that a plurality of machine translation system merges realizes according to the following steps:

One, the preprocessing process of machine translation system, is used single language or bilingual pretreater to process single language and bilingual corpora, and language model maker production language model, is used phrase extraction device to extract phrase;

Two, for each translation system generates translation hypergraph;

Three, in a plurality of machine translation system training aids, by sharing feature, merge two translation hypergraphs and training set is trained;

Four, in demoder, test set is decoded and generated translation result, and translation result is marked, completed a kind of method that a plurality of machine translation system merges.

3. the method that a kind of a plurality of machine translation systems according to claim 2 merge, is characterized in that the preprocessing process of machine translation system in described step 1 is specially:

(1) source language and target language carry out participle;

(5) extract phrase, and calculate the feature score relevant to phrase.

4. the method that a kind of a plurality of machine translation systems according to claim 3 merge, is characterized in that in described step 3, merging two translation hypergraphs is specially:

p (e | f) = \underset{d}{Σ} p (e, d | f)

p(e,d|f)=p(d|f)p(e|d,f)

p (e, d | f) = p (d | f) = \frac{e^{γh (e, f, d)}}{\underset{γ, d}{Σ} e^{γh (e, f, d)}}

5. the method that a kind of a plurality of machine translation systems according to claim 4 merge, the BTG that it is characterized in that the training of maximum entropy in described step 3 has been used maximum entropy model to train while adjusting order model, maximum entropy is the training algorithm of a protruding optimization, and when reaching maximum entropy, be probability distribution average in limited features, its propaedeutics formula is as follows:

p^{*} = \underset{E_{p (f)} = E_{\overset{&OverBar;}{p} (f)}}{\arg \max} (H (Y | X))

The implication that this formula represents for probability p when making conditional entropy H (Y|X) preferably under restriction ^*, E wherein _{p (f)}the expectation value of representation feature f in model,

the expectation value of representation feature f in sample, conditional entropy can be unfolded as follows:

H (Y | X) = Σ_{(x, y)} p (y | x) \overset{&OverBar;}{p} (x) \log \frac{1}{p (y | x)}

The algorithm that training adopts is the plan gradient algorithm of conventional limited memory, first f (x) at x _kexpansion becomes Taylor progression second order and launches to obtain formula:

f (x) \approx φ (x) f (x_{k}) + &dtri; f {(x_{k})}^{T} (x - x_{k}) + \frac{1}{2} {(x - x_{k})}^{T} {&dtri;}^{2} f (x_{k}) (x - x_{k})

In order to ask the extreme point of φ (x), order be

if

x_{k + 1} = x_{k} - {&dtri;}^{2} f {(x_{k})}^{- 1} &dtri; f (x_{k})

Wherein

H_{k + 1} = (I - ρ_{k} s_{k} y_{k}^{T}) H_{k} (I - ρ_{k} y_{k} s_{k}^{T}) + ρ_{k} s_{k} s_{k}^{T}

s_{k} = x_{k + 1} - x_{k}, y_{k} = &dtri; f (x_{k + 1}) - &dtri; f (x_{k}), ρ_{k} = \frac{1}{y_{k}^{T} s_{k}}

Traditional method is made as unit matrix I initialization second derivative Hassian matrix.

6. the method that a kind of a plurality of machine translation systems according to claim 5 merge, is characterized in that the training algorithm of minimal error rate training MERT training process classics in described step 3 is used formula below to represent training process:

e^{*} (f, γ) = \underset{e &Element; C_{f}}{\arg \max} {a (e, f) + γ \cdot b (e, f)}

C wherein _fbe the set of all translation candidates formation of source language f, this formula represents to find the best translation result e that has Different Slope γ ^*(f, γ).

7. the method merging according to a kind of a plurality of machine translation systems described in claim 2,3,4,5 or 6, is characterized in that in described step 4, gathering decodes and generate translation result is specially to testing:

Whole demoder is to be based upon on CYK algorithm basis, has adopted beam search strategy, if Viterbi chooses top score; If Crunching, Partial Feature score is added and, Partial Feature is the feature except language model and word punishment common characteristics; Two algorithms all identify for committed step; For eliminating, pseudo-ambiguity decoding formula is as follows:

\hat{e} = \arg \max (Σ_{d &Element; D (e, f) \cap ND (e)} P (f, d | e))

For generating, consistance translation decoding formula is as follows:

\hat{e} \arg \max (Σ_{e^{'} &Element; T (f)} Loss (e, e^{'}) P (e^{'} | f))

P (e|f) represents to be generated by source language the probability of target language; P (e, d|f) is for adding the probability after hidden variable derivation d; D (e, f) represents the set about source language and all derivations of target language; ND (e) represents to generate n-best translation result; LOSS (e, e ') is in order to calculate a loss function of minimum Bayes risk; T ' is (f) verify hypothesis space;

P(e'|f)=∑ _{d∈D(e,f)∩ND(e)}P(f,d|e)。