CN103646019A - Method and device for fusing multiple machine translation systems - Google Patents

Method and device for fusing multiple machine translation systems Download PDF

Info

Publication number
CN103646019A
CN103646019A CN201310751047.7A CN201310751047A CN103646019A CN 103646019 A CN103646019 A CN 103646019A CN 201310751047 A CN201310751047 A CN 201310751047A CN 103646019 A CN103646019 A CN 103646019A
Authority
CN
China
Prior art keywords
translation
machine translation
language
training
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310751047.7A
Other languages
Chinese (zh)
Inventor
刘宇鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN201310751047.7A priority Critical patent/CN103646019A/en
Publication of CN103646019A publication Critical patent/CN103646019A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for fusing multiple machine translation systems, relates to the related field of machine translation, and aims at solving the problems that the information of a decoding process is not fully considered by the traditional method for carrying out system fusion on post-treatment and a search space in decoding cannot be fully considered. The device for fusing of the multiple machine translation systems comprises a preprocessor, a phrase extractor, a voice model generator, a plurality of machine translation system trainers and a decoder. The method comprises 1, pretreating machine translation systems; 2, building a translation hypergraph of each translation system; 3, fusing two translation hypergraphs and training a training set, wherein training comprises two parts, wherein a single machine translation system before fusion adopts a BTG ordering model of the maximum entropy training, and the machine translation system after fusion adopts MERT of the minimum error rate training; 4, decoding a test set to generate a translation result, and grading the translation result. The method and the device are applied to the field of machine translation.

Description

Method and device that a kind of a plurality of machine translation system merges
Technical field
The present invention relates to method and device that a plurality of machine translation systems merge, belong to the association area of mechanical translation.
Background technology
Along with the fast development of computing machine, utilize computing machine to realize the translation technology between different language, well known already.It is that the output N-best result of a plurality of systems is merged that machine translation system merges, and generates new translation result.And the translation result that has proved fusion is better than the output of individual system.According to the granularity merging, divide and comprise Sentence-level, phrase level and word level, have obtained significantly performance recently and improved, but these methods are all to merge in the aftertreatment of mechanical translation in the word one level system integration technologies based on confusion network.The method that traditional system of doing in aftertreatment merges does not take into full account the information of decode procedure, and the fusion in aftertreatment can not take into full account search volume huge in decoding.The present invention is merged in the process of model decoding.Along with the development of Parallelizing Techniques, algorithm time complexity and space complexity can be accepted.
Hypergraph since 19 century 70s just in discrete mathematics many modeling problems obtained application, also hypergraph is called to Directed Hypergraph (Gallo, 1993).It is the stratification search volume that can solve with dynamic programming for abstract, namely a large problem is become to subproblem and divides and rule.Hypergraph is sensu lato figure, and its limit can be connected to the summit of any amount.Directed Hypergraph be a Weight collection W to H=< X, E >, X is the set on summit;
Figure BDA0000451262500000011
be the set on super limit, E is that (subset of X) Φ, wherein P (X) is the power set of X, wherein V to P *represent tail node set, V represents a node set; W is the set of weight; Each super limit e ∈ E is a tlv triple e=< T (e), h (e), f e>, wherein T (e) ∈ V *the ordered sequence of caudal knot point, because caudal knot point may be null set, so belong to the closure of caudal knot point.H (e) ∈ V is a node, f e: R | T (e) |the weight function of → R (R represents real number space, | T (e) | represent the radix of caudal knot point ordered sequence); The all nodes that are associated with super limit are all called super node, and every stature supernode is all connected with a plurality of super limits, and h (e) is called to source node, definition | T (e) | and be first number on super limit; If first number on certain super limit is 0, so the weight function f on this super limit e∈ R is a constant.First number that in hypergraph, the greastest element number on all super limits is hypergraph.The super limit that unit's number is 1 is canonical limit, and the hypergraph that first number is 1 is regular graph (lattice).
Word figure (Word Lattice) be exactly first number be 1 hypergraph, word figure is modal hypergraph.At mechanical translation field word figure, mean the data structure important tool of from left to right translating between decode empty, hypergraph is more extensive word figure, not only can represent from left to right, between decode empty, also can to represent Down-Up between decode empty.
Translation hypergraph is to be based upon on the basis of hypergraph, a corresponding super limit (derivation) of translation rule; The weight function on the corresponding super limit of weight of translation rule.Translation node is the part translation generating in translation process, and with various eigenwerts.Translation hypergraph is for bilingual mechanical translation modeling, not only has source language, also has target language, on translation hypergraph, can derive, and deriving is exactly the process of translation.
Summary of the invention
The present invention will solve the information that the method for the traditional system of doing in aftertreatment fusion does not take into full account decode procedure, can not take into full account the problem of search volume huge in decoding with the fusion in aftertreatment, and the method and the device that provide a kind of a plurality of machine translation system to merge.
The device that a kind of a plurality of machine translation system merges comprises single language or bilingual pretreater, phrase extraction device, language model maker, a plurality of machine translation system training aids and demoder;
Described single language or bilingual pretreater are to single language and bilingually carry out pre-service; Phrase extraction device extracts phrase and is put in phrase table from bilingual corpus; Use language model maker to train language model from single language corpus; Machine translation system before fusion is used phrase table and language model to train, and the parameter weight that training is obtained is as the weight of final demoder; Demoder is that testing material decoding is generated to translation result, and translation result is evaluated and tested to output score.
The method that a plurality of machine translation systems merge realizes according to the following steps:
One, the preprocessing process of machine translation system;
Two, set up the translation hypergraph of each translation system;
Three, merge two translation hypergraphs and training set is trained;
Wherein, described training package contains two parts: the individual machine translation system before fusion adopts the BTG of maximum entropy training to adjust the machine translation system after order model and fusion to adopt minimal error rate training MERT;
Four, test set is decoded and generated translation result, and translation result is marked, completed a kind of method that a plurality of machine translation system merges.
Effect of the present invention:
The present invention merges a plurality of different mechanical translation, improves mutually performance, than the obvious BLEU mark that has improved 7 percentage points of single system.The benefit merging in the process of model decoding is the restriction that is not subject to Machine Translation Model, is not subject to the restriction of training algorithm, as long as decoding process is similar just can be merged, has good extensibility.
Accompanying drawing explanation
Fig. 1 is the installation drawing that a plurality of machine translation systems merge;
Fig. 2 is process flow diagram of the present invention;
Fig. 3 is the result figure after participle;
Fig. 4 is the result figure after part-of-speech tagging;
Fig. 5 is the result figure after syntactic analysis;
Fig. 6 is the sentence figure that contains syntax, bilingual alignment and phrase information;
Fig. 7 is the phrase figure that tree extracts to string machine translation system;
Fig. 8 is the sentence figure that contains bilingual alignment and phrase information;
Fig. 9 is the phrase figure that phrase machine translation system extracts;
The translation hypergraph that Figure 10 (a) generates based on maximum entropy BTG;
The translation hypergraph that Figure 10 (b) generates based on SCFG;
Figure 10 (c) merges two kinds of translation hypergraphs that the syntax generate afterwards;
Figure 11 is the training process figure of mechanical translation;
Figure 12 (a) is the MERT training exemplary plot that score is weighed;
Figure 12 (b) is the wrong MERT training exemplary plot of weighing;
Figure 13 is CYK decoding table exemplary plot;
Figure 14 is the translation result figure that uses Figure 13 decoding table to generate;
Figure 15 is the false code figure that mechanical translation merges main algorithm;
Figure 16 is the false code figure of core function Add_Edge in decoding.
Embodiment
Embodiment one: the device that a plurality of machine translation systems of present embodiment merge comprises single language or bilingual pretreater, phrase extraction device, language model maker, a plurality of machine translation system training aids and demoder;
Described single language or bilingual pretreater are to single language and bilingually carry out pre-service; Phrase extraction device extracts phrase and is put in phrase table from bilingual corpus; Use language model maker to train language model from single language corpus; Machine translation system before fusion is used phrase table and language model to train, and the parameter weight that training is obtained is as the weight of final demoder; Demoder is that testing material decoding is generated to translation result, and translation result is evaluated and tested to output score.
Embodiment two: the method that a plurality of machine translation systems of present embodiment merge realizes according to the following steps:
One, the preprocessing process of machine translation system;
Two, set up the translation hypergraph of each translation system;
Three, merge two translation hypergraphs and training set is trained;
Wherein, described training package contains two parts: the individual machine translation system before fusion adopts the BTG of maximum entropy training to adjust the machine translation system after order model and fusion to adopt minimal error rate training (MERT);
Four, test set is decoded and generated translation result, and translation result is marked, completed a kind of method that a plurality of machine translation system merges.
Modern machine translation mothod is to be all based upon on the basis of the bilingual syntax, and the syntax are four-tuple
G=(V n, V t, P, S), V wherein nnon-terminal set, the non-terminal that has comprised source language and target language; V tthe finishing sign set that has comprised source language and target language, and V n∩ V t=Φ; Whole assemble of symbol V=V n∪ V t; P is production set,
Figure BDA0000451262500000043
time production set, a node is V nelement, caudal knot point is V ** V *element; S is unique begin symbol, S ∈ V n.
Present embodiment adopts two kinds of classical bilingual syntax to merge, because be merges in decoding, so be not subject to grammatical restriction, can certainly expand the fusion of the bilingual syntax of other types, and training process is also a self-contained process, can adopt classical minimal error rate training (MERT) algorithm.Under regard to these two kinds of bilingual syntax of classical mechanical translation and be introduced:
1. the syntax (BTG) transcribed in bracket: by machine learning algorithm, realize the bilingual tune order in mechanical translation, in decoding, according to the word order of source language, generate translation result, complete automatically tune order and the generative process of target language.
For example: for translating Chinese sentence " tianshang de yuncai ", can match BTG rule has two classes:
(a) vocabularyization rule:
X→<tianshang?de,in?the?sky>(1)
X→<yuncai,cloud>(2)
(b) adjust order rule:
S &RightArrow; < > < X de ; yuncai , X clouds ; in > - - - ( 3 )
The left side of rule is the non-terminal generating, and the front portion on right side is source language, and rear portion is target language.For adjust order rule above arrow for angle brackets are backwards, top is that square bracket are positive sequence; The subscript left side on rule right side is left side linguistic context, and the right side is right side linguistic context, if the left side linguistic context of source language is de, and right side linguistic context yuncai; The left side linguistic context of target language is clouds, and right side linguistic context is in.The language ambience information of source language and target language is mainly used for using machine learning algorithm training to adjust order model (present embodiment is used maximum entropy kit to train).
The process that generates target language is the process of a derivation, and present embodiment adopts two kinds of method for expressing to represent:
Traditional method:
S &DoubleRightArrow; ( 3 ) < X de ; yuncai , X clouds ; in > &DoubleRightArrow; ( 1 ) , ( 2 ) < tianshangdeyuncai , cloudinthesky >
Symbol on derivation symbol ((1), (2) and (3)) represents the rewriting rule of using.
Method based on theorem proving, this method and a little difference above, used bottom-up mode to represent:
Axiom: &OverBar; X &RightArrow; < tianshangde , inthesky > : w 1 X &RightArrow; < yuncai , cloud > : w 2 - - - ( 4 )
Derivation step: X &RightArrow; < tianshangde , inthesky > : w 1 [ X , 0,2 ] : w 1 - - - ( 5 )
X &RightArrow; < yuncai , cloud > : w 2 [ X , 2,3 ] : w 2 - - - ( 6 )
S &RightArrow; < > < X de ; yuncai , X clouds ; in > : w 3 [ X , 0,2 ] : w 1 [ X , 2,3 ] : w 2 [ S , 0,3 ] : w 1 &times; w 2 &times; w 3 - - - ( 7 )
Derived object: [S, 0,3]: w1 * w2 * w3 (8)
The former piece that the above representation theorem of whippletree is derived, below represents the consequent of deriving, and is the weight of using every rule on the right side of rewriting rule, in the present embodiment in order to make proof procedure simpler, all weights be set to 1.In the process of proof, axiom (4) does not need to derive, so there is not any symbol more than whippletree; Derivation step (5) has been used first consequent rule of axiom (4); Derivation step (6) has been used (4) second consequents of axiom; Derivation step (7) has been used the consequent rule of derivation step (5) and (6) and has been adjusted order rule (3), obtain final conclusion [S, 0,3]: w1 * w2 * w3(is derived object (8)), be illustrated in span and be in 0 to 3 cell and storing begin symbol (being the begin symbol that reduction arrives sentence), its weight is w1 * w2 * w3, because each the rewriting rule weight arranging is in this example 1, so final weight result is 1.Figure 10 (a) diagram has provided the derivation of the translation hypergraph generating based on maximum entropy BTG.
2. synchronous context Grammars (SCFG): the expansion for upper and lower Grammars in traditional Chomsky normal form, make its task of being applicable to mechanical translation, in extracting grammatical process, tune order model is based upon in every rewriting rule of translation model.
For matching two SCFG rewriting rules in previous example:
X→<tianshang?de?X 1,X 1on?the?sky>(1)
X→<X 1yuncai,clouds?X 1>(2)
。Except the rewriting rule extracting, also have two special viscous rules (GlueRule) from training is expected:
S→<S 1X 2,S 1X 2>(3)
S→<X 1,X 1>(4)
, the subscript of nonterminal symbol represents the corresponding relation of nonterminal symbol.
Synchronous context Grammars derivation also adopts two kinds of method for expressing, and classic method is:
S &DoubleRightArrow; ( 3 ) ( 4 ) < X 1 X 2 , X 1 X 2 > &DoubleRightArrow; ( 1 ) < tianshangde X 1 , X 1 onthesky >
&DoubleRightArrow; ( 2 ) < tianshangdeyuncai , cloudsonthesky >
Method based on theorem proving:
Axiom:
&OverBar; X &RightArrow; < tianshangde X 1 , X 1 onthesky > : w 1 X &RightArrow; < X 1 yuncai , clouds X 1 > : w 2
Derivation step: X &RightArrow; < tianshangde X 1 , X l onthesky > : w 1 [ X , 0,2 ] : w 1 - - - ( 5 )
X &RightArrow; < X 1 yuncai , clouds X 1 > : w 2 [ X , 2,3 ] : w 2 - - - ( 6 )
S &RightArrow; < X 1 , X 1 > : w 3 [ X , 0,2 ] : w 1 [ S , 0,2 ] : w 1 &times; w 3 - - - ( 7 )
S &RightArrow; < S 1 X 2 , S 1 X 2 > : w 4 [ S , 0,2 ] : w 1 &times; w 3 [ X , 2,3 ] : w 2 [ X , 0,3 ] : w 1 &times; w 2 &times; w 3 &times; w 4 - - - ( 8 )
Derived object: [S, 0,3]: w1 * w2 * w3 * w4 (9)
The derivation of its derivation and BTG is similar, and Figure 10 (b) diagram has provided the translation hypergraph generating based on SCFG.
Two kinds of syntax respectively have relative merits, and this also provides possibility: BTG to use powerful machine learning algorithm for merging, and can better adjust order, but limited in one's ability in view of this model representation, can be subject to the restriction of model; SCFG model representation ability is relatively strong, has used relatively simple maximum likelihood to estimate in estimated parameter, adjusting program process to be based upon in model, does not need independent tune order module, completes bilingual tune order.
In order to illustrate that this fusion method is shown in Figure 10 (c), solid line is the rule of BTG, and dotted line is SCFG.Figure 10 (c) finds to generate " clouds in the sky " due to SCFG and BTG, can share SCFG and BTG translation node (part translation) above, and these results that just two kinds of syntax generated have merged.
Present embodiment effect:
Present embodiment is that a plurality of different mechanical translation is merged, and improves mutually performance, than the obvious BLEU mark that has improved 7 percentage points of single system.The benefit merging in the process of model decoding is the restriction that is not subject to Machine Translation Model, is not subject to the restriction of training algorithm, as long as decoding process is similar just can be merged, has good extensibility.
Embodiment three: present embodiment is different from embodiment two: the preprocessing process of machine translation system is specially:
(1) source language and target language carry out participle;
(2) sentence that need to carry out part-of-speech tagging carries out part-of-speech tagging, aligns to bilingual simultaneously;
(3) sentence that need to carry out syntactic analysis carries out syntactic analysis;
(4) alignment information and part of speech & syntactic information are combined;
(5) extract phrase, and calculate the feature score relevant to phrase.
In order better to understand pretreated process, present embodiment adopts tree to be introduced to the model of string, at Fig. 3, is to carry out participle for sentence " Bush and salon have held talks "; The sentence of Fig. 4 after for participle carries out part-of-speech tagging; The result that the sentence of Fig. 5 after for participle and part-of-speech tagging carries out syntactic analysis; Fig. 6 is bilingual alignment result, and source language syntax analysis result combines, and marks the phrase of black part for extracting; Fig. 7 is the phrase extracting for Fig. 6.Fig. 8 is the phrase that may extract out in alignment result.By relatively know (comparison that is Fig. 7 and Fig. 9) for the phrase of two kinds of machine translation systems, find that phrase machine translation system has more phrase than tree to string machine translation system, along with the increase of pre-service sentence scale, can be exponent increase, but the phrase increasing do not have syntactic structure.The phrase that meets syntactic information in the sentence of part translation can increase translation performance, and the phrase that does not meet syntactic information in the sentence of another part translation can increase translation performance.
Other step and parameter are identical with embodiment two.
Embodiment four: present embodiment is different from one of embodiment two or three: merge two translation hypergraphs in step 3 and be specially:
By hypergraph, translation process is carried out to modeling, therefore first need to introduce implicit variable derivation d and represent each derivation, so P (e|f) can be expressed as follows:
p ( e | f ) = &Sigma; d p ( e , d | f )
Then, to every in the right of formula above and formula, carrying out probability of use formula expands into:
p(e,d|f)=p(d|f)p(e|d,f)
Formula is decomposed into 2 factors by p (e, d|f) above, the corresponding submodel of each factor; Wherein the 2nd submodel p (e|d, f) is corresponding from source language with obtain the process of target language deriving, and due to source language with derive and determine, target language namely determines, thus can neglect,
p ( e , d | f ) = p ( d | f ) = e &gamma;h ( e , f , d ) &Sigma; &gamma; , d e &gamma;h ( e , f , d )
Wherein h (e, f, d) is proper vector, and γ is feature weight vector, has completed the fusion of two translation hypergraphs.
In the system realizing in present embodiment, adopted following characteristics:
(1) two-way translation probability: Trans (e|f) and Trans (f|e)
(2) two-way vocabulary translation probability: Lex_Trans (e|f) and Lex_Trans (f|e)
(3) probabilistic language model: LM (e)
(4) translation process is used regular number: Num (Rule)
(5) in translation process, use the number of phrase: Numphrase (f, e)
(6) probability of the Twisting model of maximum entropy: Distortion (f, e)
(7) generate the word number of translation: NumWord (e)
Amount to 9 features, in each model, have 8 features.Two some features of model are shared: as two-way translation probability, and two-way vocabulary translation probability, probabilistic language model, generates the word number of translating; Some be exclusively enjoy used " translation process is used regular number " feature as SCFG, BTG has used " probability of the Twisting model of maximum entropy ", does not need to repeat score for such feature when merging.
Other step and parameter are identical with embodiment two or three.
Embodiment five: present embodiment is different from one of embodiment one two to four: used maximum entropy model to train when the BTG of maximum entropy training adjusts order model in step 3, maximum entropy is the training algorithm of a protruding optimization, and when reaching maximum entropy, be probability distribution average in limited features.Its basic training formula is as follows:
p * = arg max E p ( f ) = E p &OverBar; ( f ) ( H ( Y | X ) )
The implication that this formula represents for
Figure BDA0000451262500000082
probability p when making conditional entropy H (Y|X) preferably under restriction *, E wherein p (f)the expectation value of representation feature f in model, the expectation value of representation feature f in sample, conditional entropy can be unfolded as follows:
H ( Y | X ) = &Sigma; ( x , y ) p ( y | x ) p &OverBar; ( x ) log 1 p ( y | x )
The algorithm that training adopts is the plan gradient algorithm of conventional limited memory, and the formula of its realization is as follows, first f (x) at x kexpansion becomes Taylor progression second order and launches to obtain formula:
f ( x ) &ap; &phi; ( x ) f ( x k ) + &dtri; f ( x k ) T ( x - x k ) + 1 2 ( x - x k ) T &dtri; 2 f ( x k ) ( x - x k )
In order to ask the extreme point of φ (x), order
Figure BDA0000451262500000086
be
Figure BDA0000451262500000087
if
Figure BDA0000451262500000088
reversible, by formula above, can be obtained the iterative formula of Newton method:
x k + 1 = x k - &dtri; 2 f ( x k ) - 1 &dtri; f ( x k )
Wherein
Figure BDA00004512625000000810
be Hassian matrix, with the correction matrix in LBFGS, being similar to Hassian matrix is H k:
H k + 1 = ( I - &rho; k s k y k T ) H k ( I - &rho; k y k s k T ) + &rho; k s k s k T
s k = x k + 1 - x k , y k = &dtri; f ( x k + 1 ) - &dtri; f ( x k ) , &rho; k = 1 y k T s k
Conventional method is initialization second derivative (He Sen) matrix H 0be made as unit matrix I.
Other step and parameter are identical with one of embodiment one to four.
Embodiment six: present embodiment is different from one of embodiment two to five: in step 3 the training algorithm of minimal error rate training MERT training process classics use below formula represent training process (training process flow diagram is shown in Figure 11):
e * ( f , &gamma; ) = arg max e &Element; C f { a ( e , f ) + &gamma; &CenterDot; b ( e , f ) }
C wherein fbe the set of all translation candidates formation of source language f, this formula represents to find the best translation result e that has Different Slope γ *(f, γ), Figure 12 (a) and Figure 12 (b) have been used 6 candidates to translate set C f={ e 1, e 2, e 3, e 4, e 5, e 6score weigh and the wrong MERT of measurement training exemplary plot.
Other step and parameter are identical with one of embodiment one to five.
Embodiment seven: present embodiment is different from one of embodiment two to six: in step 4, test set is decoded and generated translation result.
Whole demoder is to be based upon on CYK algorithm basis, has adopted beam search strategy.If Viterbi chooses top score; If Crunching, Partial Feature score is added and.Partial Feature is the feature except language model and word punishment common characteristics; Two algorithms (main algorithm is shown in Figure 15, and the key algorithm Add_Edge that main algorithm calls is shown in Figure 16) all identify for committed step; For eliminating, pseudo-ambiguity decoding formula is as follows:
e ^ = arg max ( &Sigma; d &Element; D ( e , f ) &cap; ND ( e ) P ( f , d | e ) )
For generating, consistance translation decoding formula is as follows:
e ^ arg max ( &Sigma; e &prime; &Element; T ( f ) Loss ( e , e &prime; ) P ( e &prime; | f ) )
P (e ' | f) represent to be generated by source language the probability of target language; P (e, d|f) is for adding the probability after hidden variable derivation d; D (e, f) represents the set about source language and all derivations of target language; ND (e) represents to generate n-best translation result; LOSS (e, e ') is in order to calculate a loss function of minimum Bayes risk; T ' is (f) verify hypothesis space;
The source language in consistance decoding to the translation probability of target language become generate same target language with,
P(e'|f)=∑ d∈D(e,f)∩ND(e)P(f,d|e)
In the work of laying a foundation property of statistical machine translation, be to have used source channel model to carry out modeling to translation process, so all translation process is called to decoding in research on the machine translation subsequently.The task of a machine translation system is exactly that the source language sentence f of input translation is become to object language sentence e.It is exactly merging of a plurality of systems that mechanical translation merges, and uses for reference mutually translation information, has adopted CYK decoding in present embodiment, and the process of its decoding is shown in Figure 13 and Figure 14.In Figure 13, each pane represents a translation unit, and the inside content is the translation rule adopting of this part translation of generation, and red line represents the result that final decoding finds best translation result to adopt, and the syntax tree form of translation result is shown in Figure 14.
Other step and parameter are identical with one of embodiment two to six.

Claims (7)

1. the device that a plurality of machine translation systems merge, is characterized in that the device that a plurality of machine translation systems merge comprises single language or bilingual pretreater, phrase extraction device, language model maker, a plurality of machine translation system training aids and demoder;
Described single language or bilingual pretreater are to single language and bilingually carry out pre-service; Phrase extraction device extracts phrase and is put in phrase table from bilingual corpus; Use language model maker to train language model from single language language material; Machine translation system before fusion is used phrase table and language model to train, and the parameter weight that training is obtained is as the weight of final demoder; Demoder is that testing material decoding is generated to translation result, and translation result is evaluated and tested to output score.
2. application rights requires the method that the device of a kind of a plurality of machine translation systems fusions described in 1 carries out a plurality of machine translation system fusions, it is characterized in that a kind of method that a plurality of machine translation system merges realizes according to the following steps:
One, the preprocessing process of machine translation system, is used single language or bilingual pretreater to process single language and bilingual corpora, and language model maker production language model, is used phrase extraction device to extract phrase;
Two, for each translation system generates translation hypergraph;
Three, in a plurality of machine translation system training aids, by sharing feature, merge two translation hypergraphs and training set is trained;
Wherein, described training package contains two parts: the individual machine translation system before fusion adopts the BTG of maximum entropy training to adjust the machine translation system after order model and fusion to adopt minimal error rate training MERT;
Four, in demoder, test set is decoded and generated translation result, and translation result is marked, completed a kind of method that a plurality of machine translation system merges.
3. the method that a kind of a plurality of machine translation systems according to claim 2 merge, is characterized in that the preprocessing process of machine translation system in described step 1 is specially:
(1) source language and target language carry out participle;
(2) sentence that need to carry out part-of-speech tagging carries out part-of-speech tagging, aligns to bilingual simultaneously;
(3) sentence that need to carry out syntactic analysis carries out syntactic analysis;
(4) alignment information and part of speech & syntactic information are combined;
(5) extract phrase, and calculate the feature score relevant to phrase.
4. the method that a kind of a plurality of machine translation systems according to claim 3 merge, is characterized in that in described step 3, merging two translation hypergraphs is specially:
By hypergraph, translation process is carried out to modeling, therefore first need to introduce implicit variable derivation d and represent each derivation, so P (e|f) can be expressed as follows:
p ( e | f ) = &Sigma; d p ( e , d | f )
Then, to every in the right of formula above and formula, carrying out probability of use formula expands into:
p(e,d|f)=p(d|f)p(e|d,f)
Formula is decomposed into 2 factors by p (e, d|f) above, the corresponding submodel of each factor; Wherein the 2nd submodel p (e|d, f) is corresponding from source language with obtain the process of target language deriving, and due to source language with derive and determine, target language namely determines, thus can neglect,
p ( e , d | f ) = p ( d | f ) = e &gamma;h ( e , f , d ) &Sigma; &gamma; , d e &gamma;h ( e , f , d )
Wherein h (e, f, d) is proper vector, and γ is feature weight vector, has completed the fusion of two translation hypergraphs.
5. the method that a kind of a plurality of machine translation systems according to claim 4 merge, the BTG that it is characterized in that the training of maximum entropy in described step 3 has been used maximum entropy model to train while adjusting order model, maximum entropy is the training algorithm of a protruding optimization, and when reaching maximum entropy, be probability distribution average in limited features, its propaedeutics formula is as follows:
p * = arg max E p ( f ) = E p &OverBar; ( f ) ( H ( Y | X ) )
The implication that this formula represents for probability p when making conditional entropy H (Y|X) preferably under restriction *, E wherein p (f)the expectation value of representation feature f in model,
Figure FDA0000451262490000024
the expectation value of representation feature f in sample, conditional entropy can be unfolded as follows:
H ( Y | X ) = &Sigma; ( x , y ) p ( y | x ) p &OverBar; ( x ) log 1 p ( y | x )
The algorithm that training adopts is the plan gradient algorithm of conventional limited memory, first f (x) at x kexpansion becomes Taylor progression second order and launches to obtain formula:
f ( x ) &ap; &phi; ( x ) f ( x k ) + &dtri; f ( x k ) T ( x - x k ) + 1 2 ( x - x k ) T &dtri; 2 f ( x k ) ( x - x k )
In order to ask the extreme point of φ (x), order be
Figure FDA0000451262490000028
if
Figure FDA0000451262490000029
reversible, by formula above, can be obtained the iterative formula of Newton method:
x k + 1 = x k - &dtri; 2 f ( x k ) - 1 &dtri; f ( x k )
Wherein
Figure FDA00004512624900000211
be Hassian matrix, with the correction matrix in LBFGS, being similar to Hassian matrix is H k:
H k + 1 = ( I - &rho; k s k y k T ) H k ( I - &rho; k y k s k T ) + &rho; k s k s k T
s k = x k + 1 - x k , y k = &dtri; f ( x k + 1 ) - &dtri; f ( x k ) , &rho; k = 1 y k T s k
Traditional method is made as unit matrix I initialization second derivative Hassian matrix.
6. the method that a kind of a plurality of machine translation systems according to claim 5 merge, is characterized in that the training algorithm of minimal error rate training MERT training process classics in described step 3 is used formula below to represent training process:
e * ( f , &gamma; ) = arg max e &Element; C f { a ( e , f ) + &gamma; &CenterDot; b ( e , f ) }
C wherein fbe the set of all translation candidates formation of source language f, this formula represents to find the best translation result e that has Different Slope γ *(f, γ).
7. the method merging according to a kind of a plurality of machine translation systems described in claim 2,3,4,5 or 6, is characterized in that in described step 4, gathering decodes and generate translation result is specially to testing:
Whole demoder is to be based upon on CYK algorithm basis, has adopted beam search strategy, if Viterbi chooses top score; If Crunching, Partial Feature score is added and, Partial Feature is the feature except language model and word punishment common characteristics; Two algorithms all identify for committed step; For eliminating, pseudo-ambiguity decoding formula is as follows:
e ^ = arg max ( &Sigma; d &Element; D ( e , f ) &cap; ND ( e ) P ( f , d | e ) )
For generating, consistance translation decoding formula is as follows:
e ^ arg max ( &Sigma; e &prime; &Element; T ( f ) Loss ( e , e &prime; ) P ( e &prime; | f ) )
P (e|f) represents to be generated by source language the probability of target language; P (e, d|f) is for adding the probability after hidden variable derivation d; D (e, f) represents the set about source language and all derivations of target language; ND (e) represents to generate n-best translation result; LOSS (e, e ') is in order to calculate a loss function of minimum Bayes risk; T ' is (f) verify hypothesis space;
The source language in consistance decoding to the translation probability of target language become generate same target language with,
P(e'|f)=∑ d∈D(e,f)∩ND(e)P(f,d|e)。
CN201310751047.7A 2013-12-31 2013-12-31 Method and device for fusing multiple machine translation systems Pending CN103646019A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310751047.7A CN103646019A (en) 2013-12-31 2013-12-31 Method and device for fusing multiple machine translation systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310751047.7A CN103646019A (en) 2013-12-31 2013-12-31 Method and device for fusing multiple machine translation systems

Publications (1)

Publication Number Publication Date
CN103646019A true CN103646019A (en) 2014-03-19

Family

ID=50251238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310751047.7A Pending CN103646019A (en) 2013-12-31 2013-12-31 Method and device for fusing multiple machine translation systems

Country Status (1)

Country Link
CN (1) CN103646019A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550174A (en) * 2015-12-30 2016-05-04 哈尔滨工业大学 Adaptive method of automatic machine translation field on the basis of sample importance
CN107273363A (en) * 2017-05-12 2017-10-20 清华大学 A kind of language text interpretation method and system
CN108038111A (en) * 2017-12-11 2018-05-15 中译语通科技股份有限公司 A kind of machine translation pipeline method for building up and system, computer program, computer
CN108304388A (en) * 2017-09-12 2018-07-20 腾讯科技(深圳)有限公司 Machine translation method and device
CN108320026A (en) * 2017-05-16 2018-07-24 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN108920472A (en) * 2018-07-04 2018-11-30 哈尔滨工业大学 A kind of emerging system and method for the machine translation system based on deep learning
CN108932231A (en) * 2017-05-26 2018-12-04 华为技术有限公司 Machine translation method and device
CN109357696A (en) * 2018-09-28 2019-02-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Multiple Source Sensor information merges closed loop test framework
CN109977424A (en) * 2017-12-27 2019-07-05 北京搜狗科技发展有限公司 A kind of training method and device of Machine Translation Model
TWI685759B (en) * 2018-08-31 2020-02-21 愛酷智能科技股份有限公司 Method and system for intelligent learning word editing and multi-language translating
CN110879940A (en) * 2019-11-21 2020-03-13 哈尔滨理工大学 Machine translation method and system based on deep neural network
CN111291553A (en) * 2014-10-24 2020-06-16 谷歌有限责任公司 Neural-machine translation system with rare word processing
CN111967247A (en) * 2020-10-23 2020-11-20 北京智源人工智能研究院 Natural language semantic representation method and device based on function declaration and electronic equipment
CN112085985A (en) * 2020-08-20 2020-12-15 安徽七天教育科技有限公司 Automatic student answer scoring method for English examination translation questions
CN113379065A (en) * 2021-05-17 2021-09-10 百融云创科技股份有限公司 Automatic machine learning method based on multi-target grammar evolution

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156692A (en) * 2011-02-25 2011-08-17 哈尔滨工业大学 Forest-based system combination method for counting machine translation
US8285536B1 (en) * 2009-07-31 2012-10-09 Google Inc. Optimizing parameters for machine translation
CN103235775A (en) * 2013-04-25 2013-08-07 中国科学院自动化研究所 Statistics machine translation method integrating translation memory and phrase translation model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285536B1 (en) * 2009-07-31 2012-10-09 Google Inc. Optimizing parameters for machine translation
CN102156692A (en) * 2011-02-25 2011-08-17 哈尔滨工业大学 Forest-based system combination method for counting machine translation
CN103235775A (en) * 2013-04-25 2013-08-07 中国科学院自动化研究所 Statistics machine translation method integrating translation memory and phrase translation model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘宇鹏: "《机器翻译中系统融合技术的研究》", 《中国博士学位论文全文数据库(电子期刊)信息科技辑》 *
刘宇鹏等: "《基于超图的翻译模型融合的研究》", 《软件学报》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291553B (en) * 2014-10-24 2023-11-21 谷歌有限责任公司 Neural machine translation system with rare word processing
CN111291553A (en) * 2014-10-24 2020-06-16 谷歌有限责任公司 Neural-machine translation system with rare word processing
CN105550174A (en) * 2015-12-30 2016-05-04 哈尔滨工业大学 Adaptive method of automatic machine translation field on the basis of sample importance
CN107273363A (en) * 2017-05-12 2017-10-20 清华大学 A kind of language text interpretation method and system
CN107273363B (en) * 2017-05-12 2019-11-22 清华大学 A kind of language text interpretation method and system
CN108320026A (en) * 2017-05-16 2018-07-24 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN108320026B (en) * 2017-05-16 2022-02-11 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN108932231A (en) * 2017-05-26 2018-12-04 华为技术有限公司 Machine translation method and device
WO2019052293A1 (en) * 2017-09-12 2019-03-21 腾讯科技(深圳)有限公司 Machine translation method and apparatus, computer device and storage medium
KR20190130636A (en) * 2017-09-12 2019-11-22 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Machine translation methods, devices, computer devices and storage media
KR102360659B1 (en) 2017-09-12 2022-02-08 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Machine translation method, apparatus, computer device and storage medium
CN108304388A (en) * 2017-09-12 2018-07-20 腾讯科技(深圳)有限公司 Machine translation method and device
CN108304388B (en) * 2017-09-12 2020-07-07 腾讯科技(深圳)有限公司 Machine translation method and device
US11275907B2 (en) 2017-09-12 2022-03-15 Tencent Technology (Shenzhen) Company Limited Machine translation method, apparatus, computer device, and storage medium
CN108038111A (en) * 2017-12-11 2018-05-15 中译语通科技股份有限公司 A kind of machine translation pipeline method for building up and system, computer program, computer
CN109977424A (en) * 2017-12-27 2019-07-05 北京搜狗科技发展有限公司 A kind of training method and device of Machine Translation Model
CN109977424B (en) * 2017-12-27 2023-08-08 北京搜狗科技发展有限公司 Training method and device for machine translation model
CN108920472A (en) * 2018-07-04 2018-11-30 哈尔滨工业大学 A kind of emerging system and method for the machine translation system based on deep learning
TWI685759B (en) * 2018-08-31 2020-02-21 愛酷智能科技股份有限公司 Method and system for intelligent learning word editing and multi-language translating
CN109357696B (en) * 2018-09-28 2020-10-23 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-source sensor information fusion closed-loop testing framework
CN109357696A (en) * 2018-09-28 2019-02-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Multiple Source Sensor information merges closed loop test framework
CN110879940B (en) * 2019-11-21 2022-07-12 哈尔滨理工大学 Machine translation method and system based on deep neural network
CN110879940A (en) * 2019-11-21 2020-03-13 哈尔滨理工大学 Machine translation method and system based on deep neural network
CN112085985A (en) * 2020-08-20 2020-12-15 安徽七天教育科技有限公司 Automatic student answer scoring method for English examination translation questions
CN111967247A (en) * 2020-10-23 2020-11-20 北京智源人工智能研究院 Natural language semantic representation method and device based on function declaration and electronic equipment
CN113379065A (en) * 2021-05-17 2021-09-10 百融云创科技股份有限公司 Automatic machine learning method based on multi-target grammar evolution

Similar Documents

Publication Publication Date Title
CN103646019A (en) Method and device for fusing multiple machine translation systems
Dušek et al. Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN109885824B (en) Hierarchical Chinese named entity recognition method, hierarchical Chinese named entity recognition device and readable storage medium
CN101178896B (en) Unit selection voice synthetic method based on acoustics statistical model
CN105843801B (en) The structure system of more translation Parallel Corpus
CN107729322B (en) Word segmentation method and device and sentence vector generation model establishment method and device
CN108804428A (en) Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation
CN109857846B (en) Method and device for matching user question and knowledge point
CN107133223B (en) A kind of machine translation optimization method of the more reference translation information of automatic exploration
Dušek et al. Training a natural language generator from unaligned data
CN107391495B (en) Sentence alignment method of bilingual parallel corpus
JP7139626B2 (en) Phrase generation relationship estimation model learning device, phrase generation device, method, and program
CN105868187B (en) The construction method of more translation Parallel Corpus
CN106021366A (en) API (Application Programing Interface) tag recommendation method based on heterogeneous information
CN111144140B (en) Zhongtai bilingual corpus generation method and device based on zero-order learning
CN110516244A (en) A kind of sentence Research on Automatic Filling based on BERT
CN104572614A (en) Training method and system for language model
CN107273363B (en) A kind of language text interpretation method and system
CN105068997A (en) Parallel corpus construction method and device
CN111062214B (en) Integrated entity linking method and system based on deep learning
CN104731775A (en) Method and device for converting spoken languages to written languages
Jabaian et al. Comparison and combination of lightly supervised approaches for language portability of a spoken language understanding system
CN103885935A (en) Book section abstract generating method based on book reading behaviors
CN104199811B (en) Short sentence analytic modell analytical model method for building up and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140319