CN107562734A - Translation template determination, machine translation method and device - Google Patents

Translation template determination, machine translation method and device Download PDF

Info

Publication number
CN107562734A
CN107562734A CN201610506589.1A CN201610506589A CN107562734A CN 107562734 A CN107562734 A CN 107562734A CN 201610506589 A CN201610506589 A CN 201610506589A CN 107562734 A CN107562734 A CN 107562734A
Authority
CN
China
Prior art keywords
translation
phrase
template
instance
translation template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610506589.1A
Other languages
Chinese (zh)
Inventor
史黎鑫
张海波
卞华明
管陶然
刘禹
赵宇
骆卫华
林锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610506589.1A priority Critical patent/CN107562734A/en
Publication of CN107562734A publication Critical patent/CN107562734A/en
Pending legal-status Critical Current

Links

Abstract

This application discloses a kind of translation template to determine method and device, and a kind of machine translation method and device, to improve the quantity of the translation template obtained by same translation instance, more correct effective translation templates can be obtained, so as to improve the accuracy of machine translation.A kind of translation template that the application provides determines method, including:Translation instance is matched with default phrase set, determines the match phrase in the translation instance;Determine the variable label of the match phrase;According to the position of each phrase in the translation instance, the variable label of the phrase in the translation instance and the match phrase is combined, obtains the translation template of at least one combining form.

Description

Translation template determination, machine translation method and device
Technical field
The application is related to machine translation mothod field, more particularly to a kind of translation template determines method and device, Yi Jiyi Kind machine translation method and device.
Background technology
Machine translation, also known as automatic translation, it is using computer that a kind of natural source language shift is natural for another kind The process of object language, refer generally to the translation of sentence and full text between natural language.Statictic machine translation system, have very strong Generalization ability, by being learnt automatically to extensive panel data, any sentence can be translated, but for translation result Quality can not often ensure.In order to effectively utilize the preferable parallel sentence pair of existing quality, the method that there has been translation memory.Institute Translation memory is stated, also known as translates internal memory, (Translation Memory, TM), is one of computer-aided translation technology, is A kind of language database for being used to store original text and its translation.And traditional translation memory is generally used for computer-aided translation In (Computer aided translation, CAT), means common at present are to carry out ATL and term to translation instance Storehouse is built, and by the integrated application to translation instance storehouse, terminology bank, ATL, utilizes existing bilingual parallel language to greatest extent Expect to obtain the translation result of better quality.Wherein, by being abstracted to translation instance so as to obtain the process of translation template, It is very important module in translation memory system.The translation instance, can be default training sentence, i.e., in short.Institute Translation template is called, is to maintain that sentence general frame is constant, the content in framework is changed according to limitations such as grammer, pragmatics, and then Identification and a kind of translation instance pair of generation sentence, it is to sentence to a certain extent abstract.Wherein, the pragmatic is finger speech Say that the limitation such as utilization, described grammer, pragmatic under specified context refers to one be applied in translation template building process A little language rules, these rules generally describe the related knowledge such as some syntaxes, semanteme, pragmatic.
Usual single language template is to include the sequence that constant and variable are formed.Wherein, specific word phrases etc. are constant, Variable represents that abstract extensive a kind of word phrases can be carried out.For example, for template " I likeeating $ x1. ", " I therein Like eating " and " " are exactly the constant in template, the sentence of the template are matched for each, constant part is all phase With;And " $ x1 " are the variable in template, and the different sentences for matching the template, variable part can be different, such as " I Like eating apple. " and " " apple " and " orange " of I like eating orange. " here is corresponding to be all The variable part of template.As can be seen here, single language template in translation template storehouse is made up of constant and variable two parts.Wherein, often Measure as changeless part in a template, and variable part generally can also include some conditions and limit, these conditions are to turn over Correspond to what the phrase of variable at this must was fulfilled for during translating.Translation template needs to be abstracted enough, makes it have certain Coverage, but can not too be abstracted, to make translation that there is accuracy.Therefore, the abstracting method of translation template directly affects The effect of translation memory system.
The extracting method of translation template mainly includes two thinkings in the prior art:First, according to translation instance itself or phase The information such as structure, semanteme between mutually, independent of other information, by designing respective algorithms, realize automatically extracting for translation template Process;2nd, based on the high quality phrase fragment obtained, i.e., default phrase set is extensive to translation instance progress part, from And that realizes translation template automatically extracts process.Wherein, based on the high quality phrase obtained come the method for extraction template, first Need from data set obtain high quality phrase fragment, generally using nominal phrase etc. have independent meaning phrase fragment as It is main.It is extensive by concentrating translation instance to be compared data on the basis of the phrase fragment of high quality is obtained, so as to obtain phase The translation template answered.The common method class for carrying out the method for template extraction based on high quality phrase and being segmented based on dictionary Seemingly, mainly including Forward Maximum Method, reverse maximum the methods of matching.
Forward Maximum Method algorithm is phrase and the phrase progress in default phrase set one by one since sentence left side Match somebody with somebody, it is if in phrase set, variable part is replaced with from current sentence by current phrase for match phrase, i.e., so-called general Change, until whole sentence traversal terminates.For example, for translation instance " we play in Safari Park ", it is assumed that the maximum of definition Phrase length max=5, i.e. phrase contain up to 5 words.Then the translation instance is turned over using Forward Maximum Method algorithm The process for translating template extraction is as follows:
Step 1: forward direction starts word for word to travel through sentence, for example, " I, we, Safari Park " composition includes three One phrase set of individual phrase.First determine whether to include in phrase set with the phrase of " I " beginning, do not include then to the right A mobile word is simultaneously judged, including then carries out next step operation;
Step 2: defining phrase length len=max, the fragment seg that length is len is taken out to the right since current location =" we are wild ", and match seg in phrase set;
If Step 3: not having the fragment in phrase set, len values subtract 1, and reacquire seg fragments;
Step 4: repeat step two, until finding seg fragments in phrase set, exits circulation;
Step 5: current seg fragments are replaced with into variable label in translation instance, and the length that moves right is len Word, step 1 is re-started, until translation instance traversal terminates.Len therein is the len of current seg fragments currency, It is the length of the seg fragments of current matching, in step 3, if being not matched to seg fragments, len for current len values Value subtracts 1, so when matching seg fragments, len currency is consistent with the length of the seg fragments matched.
The fragment matched replaces with variable, and the fragment being not matched to then is used as constant.For example, according to phrase set " I, we, Safari Park " operates, then sentence " we play in Safari Park " can be designated as into " x1 plays in x2 ", its In " $ x1 " and " $ x2 " are variable part, and remaining is constant part.
Step 6: having variable label to replacement, and the translation instance for replacing end is post-processed, including by phase Adjacent variable label merges, so as to obtain final translation template.
Wherein, described post processing, for example, " I has a red school bag for translation instance." carry out above-mentioned steps one To the processing of step 5, become following form:" I has a $ x1 $ x2 ", then the last handling process in step 6 can be by phase Variable merges at adjacent $ x1 and $ x2 two, then obtains final translation template as " I has a $ x1 ".
Similarly, difference is to sentence since sentence end for the reverse maximum matching algorithm and Forward Maximum Method algorithm The direction that son starts carries out matching traversal with default phrase set, and index is used as using the word at phrase end in phrase set.
Forward Maximum Method algorithm is similar with reverse maximum matching algorithm principle, is all to take current location to obtain most Long phrase replaces with variable label, so only can generate a translation template for a sentence is final.Meanwhile for phrase book In conjunction there is nesting in phrase, can only obtain a kind of maximum translation template of variable-length.As shown in figure 1, for Fig. 1 institutes In the example shown, " the exterior can only be obtained using Forward Maximum Method algorithm and reverse maximum matching algorithm Translation template of offers $ x ", and another translation template " the exterior offers $ x for corresponding to translation instance Added storage " can not then be obtained.
In summary, Forward Maximum Method algorithm of the prior art or reverse maximum matching algorithm, for a translation Example is only capable of obtaining a translation template, and retrievable translation template quantity is few, can not obtain more correct effective translation moulds Plate, it is ineffective so as to cause to translate.
The content of the invention
The embodiment of the present application provides a kind of translation template and determines method and device, and a kind of machine translation method and dress Put, to improve the quantity of the translation template obtained by same translation instance, more correct effective translation moulds can be obtained Plate, so as to improve the accuracy of machine translation.
A kind of translation template that the embodiment of the present application provides determines method, including:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase Variable label is combined, and obtains the translation template of at least one combining form.
Wherein, described translation instance, it is referred to as training sentence, i.e., in short.
Described phrase set, is referred to as set of words, can include word, and/or short sentence of multiple words composition etc..
With it, translation instance is matched with default phrase set, in the translation instance is determined With phrase, and the variable label of the match phrase, and according to the position of each phrase in the translation instance, by the translation The variable label of phrase and the match phrase in example is combined, so as to obtain the translation mould of at least one combining form Plate, the quantity of the translation template obtained by same translation instance is improved, more correct effective translation templates can be obtained, And then the accuracy of machine translation can be improved.
Alternatively, this method also includes:For the translation template of the variable label of adjacent match phrase be present, this is turned over The variable label for translating the adjacent match phrase in template merges.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:For the translation template of the variable label of multiple match phrases be present, this is translated The variable label of each match phrase in template is numbered, such as since the variable label of the first match phrase, adds successively Addend word mark 1,2,3 ..., to distinguish different variable labels, the numeral can be added in corresponding variable label Afterwards.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:According to preset rules, the translation template is filtered.It is so that final Obtained translation template is more representative, can more meet actual demand, and carry out filtering to translation template to reduce Memory space, avoid storing substantial amounts of translation template so that follow-up machine translation is better.
Alternatively, it is described according to preset rules, the translation template is filtered, specifically included:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines 's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do What the length of the translation instance of translation template covering determined.
Alternatively, according to the position of each phrase in the translation instance, by the phrase in the translation instance and described Variable label with phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as The translation template arrived.
It should be noted that simply obtain a variety of translation moulds for same translation instance using two-dimensional matrix as one kind above The preferable implementation of plate, those skilled in the art will also be appreciated that other implementations.
Alternatively, the translation instance is single language translation instance.
Correspondingly, a kind of machine translation method that the embodiment of the present application provides, including:
It is determined that the source statement with translation;
Using default translation template, the source statement is translated into object statement;
Wherein, the translation template is default in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase Variable label is combined, and obtains the translation template of at least one combining form.
Method is determined with above-mentioned translation template accordingly, and a kind of translation template that the embodiment of the present application provides determines dress Put, including:
First module, for translation instance to be matched with default phrase set, determine in the translation instance Match phrase;
Second unit, for determining the variable label of the match phrase;
Third unit, for the position according to each phrase in the translation instance, by the phrase in the translation instance with The variable label of the match phrase is combined, and obtains the translation template of at least one combining form.
Alternatively, the third unit is additionally operable to:For the translation template of the variable label of adjacent match phrase be present, The variable label of adjacent match phrase in the translation template is merged.
Alternatively, the third unit is additionally operable to:, will for the translation template of the variable label of multiple match phrases be present The variable label of each match phrase in the translation template is numbered.
Alternatively, the third unit is additionally operable to:According to preset rules, the translation template is filtered.
Alternatively, the third unit is filtered to the translation template, specifically included according to preset rules:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines 's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do What the length of the translation instance of translation template covering determined.
Alternatively, the third unit according to each phrase in the translation instance position, by the translation instance The variable label of phrase and the match phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as The translation template arrived.
Alternatively, the translation instance is single language translation instance.
With above-mentioned machine translation method accordingly, the embodiment of the present application provide a kind of machine translation apparatus, including:
Determining unit, for determining source statement to be translated;
Translation unit, for utilizing default translation template, the source statement is translated into object statement;
Default unit, for presetting the translation template in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase Variable label is combined, and obtains the translation template of at least one combining form.
Brief description of the drawings
In order to illustrate more clearly of the technical scheme in the embodiment of the present application, make required in being described below to embodiment Accompanying drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the present application, for this For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these accompanying drawings His accompanying drawing.
Fig. 1 is the schematic diagram for carrying out translation template extraction using Forward Maximum Method algorithm in the prior art;
Fig. 2 is the overall procedure schematic diagram that a kind of translation template that the embodiment of the present application provides determines method;
Fig. 3 is the idiographic flow schematic diagram that a kind of translation template that the embodiment of the present application provides determines method;
Fig. 4 is the schematic diagram that translation template is determined by two-dimensional matrix that the embodiment of the present application provides;
Fig. 5 is that the two-dimensional matrix that the embodiment of the present application provides initializes schematic diagram;
Fig. 6 is the schematic diagram that phrase fragment is added in two-dimensional matrix that the embodiment of the present application provides;
Fig. 7 is the schematic diagram that phrase fragment is added in two-dimensional matrix that the embodiment of the present application provides;
Fig. 8 is the two-dimensional matrix signal in two-dimensional matrix after addition phrase fragment that the embodiment of the present application provides Figure;
Fig. 9 is the translation template schematic diagram finally given that the embodiment of the present application provides;
Figure 10 is a kind of structural representation for translation template determining device that the embodiment of the present application provides;
Figure 11 is a kind of structural representation for machine translation apparatus that the embodiment of the present application provides.
Embodiment
The embodiment of the present application provides a kind of translation template and determines method and device, and a kind of machine translation method and dress Put, to improve the quantity of the translation template obtained by same translation instance, more correct effective translation moulds can be obtained Plate, so as to improve the accuracy of machine translation.
The translation template that the embodiment of the present application proposes determines method, is the translation template abstracting method based on Dynamic Programming, Obtain first corresponding to translation instance and be possible to translation template, on this basis, according to the level of abstraction of translation template, coverage Filtered etc. index, so as to effectively expand the quantity of effective translation template.
Referring to Fig. 2, a kind of translation template that the embodiment of the present application provides determines method, including:
S101, translation instance matched with default phrase set, determine the match phrase in the translation instance;
Wherein, described translation instance, it is referred to as training sentence, i.e., in short.
Alternatively, the translation instance is single language translation instance.
Described phrase set, is referred to as set of words, can include word, and/or short sentence of multiple words composition etc..
For example, translation instance is " we play in Safari Park ", default phrase set includes " we, wild animal Garden ", then, the match phrase in the translation instance include " we " and " Safari Park ".
S102, the variable label for determining the match phrase;
Described variable label, such as:$x.
S103, the position according to each phrase in the translation instance, by the phrase in the translation instance and the matching The variable label of phrase is combined, and obtains the translation template of at least one combining form.
For example, in the case of having multiple match phrases, can there is a multiple combinations mode, for example, using wherein any one Variable label with phrase replaces the situation of the phrase of relevant position, recycles the variable label of any two of which match phrase The situation of the phrase of relevant position is replaced, by that analogy, there can be multiple combinations mode, therefore multiple translation moulds can be obtained Plate.
As can be seen here, with it, translation instance is matched with default phrase set, determine that the translation is real Match phrase in example, and the variable label of the match phrase, and according to the position of each phrase in the translation instance, will The variable label of phrase and the match phrase in the translation instance is combined, so as to obtain at least one combining form Translation template, improve the quantity of the translation template obtained by same translation instance, can obtain more correct effective Translation template, and then the accuracy of machine translation can be improved.
Alternatively, this method also includes:For the translation template of the variable label of adjacent match phrase be present, this is turned over The variable label for translating the adjacent match phrase in template merges.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:For the translation template of the variable label of multiple match phrases be present, this is translated The variable label of each match phrase in template is numbered, such as since the variable label of the first match phrase, adds successively Addend word mark 1,2,3 ..., to distinguish different variable labels, the numeral can be added in corresponding variable label Afterwards.For example, the translation template " x1 plays in x2 " obtained using translation instance " we play in Safari Park ", therein 1,2 For the numeral mark of addition.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:According to preset rules, the translation template is filtered.It is so that final Obtained translation template is more representative, can more meet actual demand, and carry out filtering to translation template to reduce Memory space, avoid storing substantial amounts of translation template so that follow-up machine translation is better.
Alternatively, it is described according to preset rules, the translation template is filtered, specifically included:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines 's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do What the length of the translation instance of translation template covering determined.
For example, for translation instance " I has an apple " and translation template, " I has a $ x ", and this translation template is just This translation instance can be covered.The coverage of so so-called translation template, exactly a translation template can cover all Translation instance quantity.
Level of abstraction explanation on translation template:
The level of abstraction (abs):For translation template, length is smaller, and translation template is more abstract;The word number that variable includes is more, Translation template is more abstract.Alternatively, the level of abstraction of translation template is calculated in the embodiment of the present application using equation below:
Wherein, lentemplateRepresent the length (variable can be can be regarded as a word) of translation template, leniRepresenting should The length for i-th of translation instance (i.e. sentence) that translation template is covered, n are the coverage of the translation template.
Alternatively, according to the position of each phrase in the translation instance, by the phrase in the translation instance and described Variable label with phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as The translation template arrived.
Wherein, any translation instance is directed in the embodiment of the present application, a two-dimensional matrix, every unitary of two-dimensional matrix are set Plain position, preserve the combining form of the variable label of phrase and match phrase in translation instance.
Translation template in the two-dimensional matrix in upper right Angle Position, can be in the two-dimensional matrix it is more than diagonal most Translation template in the position in the upper right corner.Translation template in two-dimensional matrix in the position of diagonal above and below is to repeat , so, only take the element position of diagonal above and below in two-dimensional matrix to determine translation template, can take two The translation template in most upper right Angle Position in matrix is tieed up, or takes the translation template in two-dimensional matrix in most lower-left Angle Position, when So, the translation template therein not comprising variable label is excluded.
It should be noted that simply obtain a variety of translation moulds for same translation instance using two-dimensional matrix as one kind above The preferable implementation of plate, those skilled in the art will also be appreciated that other implementations.
Correspondingly, a kind of machine translation method that the embodiment of the present application provides, including:
It is determined that the source statement with translation;
Using default translation template, the source statement is translated into object statement;
Wherein, the translation template is default in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase Variable label is combined, and obtains the translation template of at least one combining form.
The more detailed illustration for the technical scheme that the embodiment of the present application given below provides.
The embodiment of the present application proposes a kind of translation template abstracting method based on Dynamic Programming, participates in Fig. 3, first basis All high quality phrases in translation instance are marked for phrase set, wherein, the high quality phrase, i.e. described matching are short Language, high quality phrase is marked, that is, determines the variable label of match phrase.On this basis, the side of Dynamic Programming is passed through Method, the extraction process of translation template is converted into sentence different fragments (i.e. different phrases, or different words, or be made up of multiple words Short sentence) splicing, so as to obtain all possible translation template, and translation template is post-processed.Afterwards, to institute There are the indexs such as formwork calculation coverage, the level of abstraction and carry out screening and filtering, so as to obtain final ATL.
Algorithm is by all phrase fragments for meeting length requirement in ergodic translation example, according to phrase set to wherein high The position of quality phrase is marked, and based on this, template extraction is carried out to translation instance, template extraction process is using dynamic Planing method.
Assuming that translation instance s=s1…sL, wherein L is the length of translation instance.The length of translation instance refers to translation instance Phrase number or word number, the operation to translation instance carries out on the basis of participle, such as " we are China to translation instance People." word segmentation result is " we is Chinese.", then the length of translation instance is 5.The extraction process of translation template, which defines, to be turned over It is a phrase fragment seg [m, n] to translate a continuous word string section in example, and it is single to n-th that the section starts from m-th of word Word terminates.Retain extensive and not extensive two kinds of forms for high quality phrase labeled in translation instance, such as " we " are Match phrase, then retain " we " (i.e. not extensive form) and " two kinds of forms of $ x " (i.e. extensive form), and extraction translation mould It is combined during plate according to relevant position, therefore diversified forms just occur in the seg comprising the phrase, according to bottom-up Order be the exhaustive possible combined situations of each seg [m, n].
To input translation instance, " exemplified by A B C D E F G ", the translation instance length is 7, in the embodiment of the present application, such as Shown in Fig. 4, genitive phrase fragment seg [m, n] is stated by safeguarding a two-dimensional array, in the two dimension shown in Fig. 4 Each position preserves its corresponding phrase combining form in table.Each small lattice i.e. in two-dimensional matrix represent seg described above [m, n], the dash area of two-dimensional matrix preserves the various combination of the genitive phrase fragment in translation template extraction process, decimated Journey travels through all seg, and acquired translation template set is stored in the position or the most lower left corner in the most upper right corner of two-dimensional matrix Position.Specifically, it is assumed that default phrase set include " A ", " A B ", " F ", " E F ", the embodiment of the present application pass through safeguard one Individual two-dimensional matrix carries out the extraction process of template, specific as follows:
Step 1: matrix is initialized first.Phrase fragment length is 1, to seg [i, i] and the height matched The seg of quality phrase is loaded.Initialization procedure to all high quality phrase piece fragment positions occurred in matrix, it is necessary to carry out Mark, that is, determine variable label " the $ x " of match phrase.Then seg [1,1] (position of Corresponding matching phrase " A "), seg in matrix [1,2] (Corresponding matching phrase " A B " position), and seg [5,6] (Corresponding matching phrase " E F " position), seg [6,6] is (right Answer match phrase " F " position) opening position equal record variable mark " $ x ", it can be understood as added in the seg " $ x " values, Represent that this fragment has the phrase matched, as shown in Figure 5.
Step 2: be incremented by according to phrase fragment length, with " $ x " expressions can be replaced the part of variable, i.e. variable label, The diversified forms of fragment are combined.That is phrase fragment length adds 1, continues to fill up the seg [i, i+1] in matrix, for example, Seg [1,2] is made up of seg [1,1]+seg [2,2].Seg [1,1] storages " A ", " two kinds of shapes of $ x " (Corresponding matching phrase " A ") Formula, a kind of seg [2,2] storage " B " forms, then seg [1,2] includes " $ after the seg [1,2] that homologous segment length is 2 merges X ", " AB ", " three kinds of fragment combination forms of $ x B ", wherein, " A B " are that " A " and the seg [2,2] of seg [1,1] storages are stored " B " combination result, " $ xB " be seg [1,1] storage " $ x " and seg [2,2] storage " B " combination result, as shown in Figure 6.
Similarly, phrase fragment length adds 1 again, continues to fill up the seg [i, i+2] in matrix, such as:Seg [1,3] is by seg [1,1]+seg [2,3], seg [1,2]+seg [3,3] are formed, as shown in Figure 7.
By that analogy, the value in matrix most upper right corner seg [1,7], it is pair referring to Fig. 8 until matrix fill-in finishes Answer translation instance " the phrase-based set " A " of A B C D E F G ", " A B ", " F ", " genitive phrase that E F " are obtained, matching The combining form of the variable label of phrase.Wherein " A B C D E F G " do not include variable label, are not translation templates, need to go Fall, remaining translation template is post-processed, obtained final result is as shown in figure 9, using the embodiment of the present application to translation " A B C D E F G " are handled example, can obtain 9 translation templates altogether.
Algorithm is implemented as follows:
Based on said process, the embodiment of the present application can be translated according to the set of high quality phrase to all translation instances Template extracts, and obtains all possible translation template.Initial data for including 110,000 (w) individual translation instance, according to comprising The phrase set of 1.6w high quality phrase carries out translation template extraction, obtains 130w translation template altogether.Translation template extracts During, in order to improve the recall rate of extraction process, the relatively low translation template of a large amount of quality in the translation template obtained be present, Therefore, the embodiment of the present application increases the filter operation to translation template.
The embodiment of the present application filters from the coverage of translation template, the level of abstraction etc. to translation template.
Coverage (cov):The translation instance number that translation template can cover in whole translation instance storehouse, it is exactly that this is turned over Translate the coverage of template.
The level of abstraction (abs):For specified translation template, length is smaller, and template is more abstract;The word number that variable includes is more, Translation template is more abstract.
According to the introduction of above coverage and the level of abstraction, coverage threshold value and level of abstraction threshold value are pre-set, to being drawn into All translation templates filtered, and formulated following filter condition:
Condition one, translation template coverage >=5;
Condition two, the translation template level of abstraction >=0.5;
After condition three, translation template remove variable label, remaining word number >=3.
Meet that one of above-mentioned condition or the translation template of combination can leave, into translation template storehouse, otherwise filter out.
For example, translation template " Lucy is a good $ X1 ", if only " Lucy is a good girl " one are turned over for covering Example is translated, then its coverage is 1, it is believed that the translation template can filter out without representativeness;
Translation template " $ X1and $ X2 ", the level of abstraction 0.2, then it is assumed that the translation template is abstracted extensive too much, can filter Fall;
Translation template " $ X1and $ X2 ", remove variable " $ x1 " and " after $ x2 ", only surplus " and " one word, remaining word number is 1, Think that the translation template is too short, it is too many abstract, it can filter out.
So far, the translation template extraction process that the embodiment of the present application is proposed terminates.
It should be noted that the technical scheme that the embodiment of the present application provides, can be based on Forward Maximum Method method Template extracts or the template based on reverse maximum matching process extracts.
In summary, template of the embodiment of the present application based on Dynamic Programming extracts, according to the thought of Dynamic Programming, by time The genitive phrase fragment corresponding to translation instance is gone through, its form of ownership is combined, it is more corresponding to translation instance so as to obtain Individual translation template, the recall rate of translation template extraction is effectively raised, avoids the waste of effective translation template, effectively really The abundant degree in final translation template storehouse is protected.Also, the validity of translation template is ensure that by subsequent filter operation.
Method is determined with above-mentioned translation template accordingly, and a kind of translation template that the embodiment of the present application provides determines dress Put, referring to Figure 10, including:
First module 11, for translation instance to be matched with default phrase set, determine in the translation instance Match phrase;
Second unit 12, for determining the variable label of the match phrase;
Third unit 13, for the position according to each phrase in the translation instance, by the phrase in the translation instance It is combined with the variable label of the match phrase, obtains the translation template of at least one combining form.
Alternatively, the third unit is additionally operable to:For the translation template of the variable label of adjacent match phrase be present, The variable label of adjacent match phrase in the translation template is merged.
Alternatively, the third unit is additionally operable to:, will for the translation template of the variable label of multiple match phrases be present The variable label of each match phrase in the translation template is numbered.
Alternatively, the third unit is additionally operable to:According to preset rules, the translation template is filtered.
Alternatively, the third unit is filtered to the translation template, specifically included according to preset rules:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines 's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do What the length of the translation instance of translation template covering determined.
Alternatively, the third unit according to each phrase in the translation instance position, by the translation instance The variable label of phrase and the match phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as The translation template arrived.
With above-mentioned machine translation method accordingly, referring to Figure 11, the embodiment of the present application provide a kind of machine translation dress Put, including:
Determining unit 21, for determining source statement to be translated;
Translation unit 22, for utilizing default translation template, the source statement is translated into object statement;
Default unit 23, for presetting the translation template in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase Variable label is combined, and obtains the translation template of at least one combining form.
Above-mentioned default unit, it can be understood as above-mentioned translation template determining device.
Any of the above-described unit, it can be realized by hardwares such as processors.Processor can be that centre buries device (CPU), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate Array (Field-Programmable Gate Array, FPGA) or CPLD (Complex Programmable Logic Device, CPLD).
In summary, the embodiment of the present application proposes a kind of practicable translation template and extracts scheme automatically, is obtaining On the basis of high quality phrase fragment, template extraction is carried out to translation instance, the efficiency and quality of template extraction can be efficiently modified, So as to improve the quality of translation memory system translation.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the application can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the application can use the computer for wherein including computer usable program code in one or more The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.) Formula.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the application to the application God and scope.So, if these modifications and variations of the application belong to the scope of the application claim and its equivalent technologies Within, then the application is also intended to comprising including these changes and modification.

Claims (16)

1. a kind of translation template determines method, it is characterised in that including:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the variable of the match phrase Mark is combined, and obtains the translation template of at least one combining form.
2. according to the method for claim 1, it is characterised in that this method also includes:For adjacent match phrase be present Variable label translation template, the variable label of the adjacent match phrase in the translation template is merged.
3. according to the method for claim 1, it is characterised in that this method also includes:For multiple match phrases be present The translation template of variable label, the variable label of each match phrase in the translation template is numbered.
4. according to the method for claim 1, it is characterised in that this method also includes:According to preset rules, to the translation Template is filtered.
5. according to the method for claim 4, it is characterised in that it is described according to preset rules, the translation template is carried out Filtering, is specifically included:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines;
The level of abstraction of the translation template, it is according to the coverage of the translation template, the length of the translation template and the translation What the length of the translation instance of template covering determined.
6. according to the method for claim 1, it is characterised in that according to the position of each phrase in the translation instance, by institute The variable label for stating phrase and the match phrase in translation instance is combined, and obtains the translation of at least one combining form Template, specifically include:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, wherein, institute State the length that L is the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as what is obtained Translation template.
7. according to the method described in any claim of claim 1~6, it is characterised in that the translation instance is that the translation of single language is real Example.
A kind of 8. machine translation method, it is characterised in that including:
Determine source statement to be translated;
Using default translation template, the source statement is translated into object statement;
Wherein, the translation template is default in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the variable of the match phrase Mark is combined, and obtains the translation template of at least one combining form.
A kind of 9. translation template determining device, it is characterised in that including:
First module, for translation instance to be matched with default phrase set, determine the matching in the translation instance Phrase;
Second unit, for determining the variable label of the match phrase;
Third unit, for the position according to each phrase in the translation instance, by the phrase in the translation instance with it is described The variable label of match phrase is combined, and obtains the translation template of at least one combining form.
10. device according to claim 9, it is characterised in that the third unit is additionally operable to:For adjacent be present The translation template of variable label with phrase, the variable label of the adjacent match phrase in the translation template is merged.
11. device according to claim 9, it is characterised in that the third unit is additionally operable to:For multiple matchings be present The translation template of the variable label of phrase, the variable label of each match phrase in the translation template is numbered.
12. device according to claim 9, it is characterised in that the third unit is additionally operable to:It is right according to preset rules The translation template is filtered.
13. device according to claim 12, it is characterised in that the third unit is turned over according to preset rules to described Translate template to be filtered, specifically include:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines;
The level of abstraction of the translation template, it is according to the coverage of the translation template, the length of the translation template and the translation What the length of the translation instance of template covering determined.
14. device according to claim 9, it is characterised in that the third unit is according to each short in the translation instance The position of language, the variable label of the phrase in the translation instance and the match phrase is combined, obtains at least one The translation template of combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, wherein, institute State the length that L is the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as what is obtained Translation template.
15. according to the device described in any claim of claim 9~14, it is characterised in that the translation instance is translated for single language Example.
A kind of 16. machine translation apparatus, it is characterised in that including:
Determining unit, for determining source statement to be translated;
Translation unit, for utilizing default translation template, the source statement is translated into object statement;
Default unit, for presetting the translation template in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the variable of the match phrase Mark is combined, and obtains the translation template of at least one combining form.
CN201610506589.1A 2016-06-30 2016-06-30 Translation template determination, machine translation method and device Pending CN107562734A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610506589.1A CN107562734A (en) 2016-06-30 2016-06-30 Translation template determination, machine translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610506589.1A CN107562734A (en) 2016-06-30 2016-06-30 Translation template determination, machine translation method and device

Publications (1)

Publication Number Publication Date
CN107562734A true CN107562734A (en) 2018-01-09

Family

ID=60968894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610506589.1A Pending CN107562734A (en) 2016-06-30 2016-06-30 Translation template determination, machine translation method and device

Country Status (1)

Country Link
CN (1) CN107562734A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408307A (en) * 2021-07-14 2021-09-17 北京理工大学 Neural machine translation method based on translation template

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512395A (en) * 2002-12-27 2004-07-14 联想(北京)有限公司 Establishing method for open type natural language
CN101706777A (en) * 2009-11-10 2010-05-12 中国科学院计算技术研究所 Method and system for extracting resequencing template in machine translation
EP2199925A1 (en) * 2008-12-03 2010-06-23 Xerox Corporation Dynamic translation memory using statistical machine translation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512395A (en) * 2002-12-27 2004-07-14 联想(北京)有限公司 Establishing method for open type natural language
EP2199925A1 (en) * 2008-12-03 2010-06-23 Xerox Corporation Dynamic translation memory using statistical machine translation
CN101706777A (en) * 2009-11-10 2010-05-12 中国科学院计算技术研究所 Method and system for extracting resequencing template in machine translation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408307A (en) * 2021-07-14 2021-09-17 北京理工大学 Neural machine translation method based on translation template
CN113408307B (en) * 2021-07-14 2022-06-14 北京理工大学 Neural machine translation method based on translation template

Similar Documents

Publication Publication Date Title
CN108804530B (en) Subtitling areas of an image
CN110263324B (en) Text processing method, model training method and device
WO2020168844A1 (en) Image processing method, apparatus, equipment, and storage medium
US20220277572A1 (en) Artificial intelligence-based image processing method, apparatus, device, and storage medium
CN105701120B (en) The method and apparatus for determining semantic matching degree
CN113297975A (en) Method and device for identifying table structure, storage medium and electronic equipment
CN106528532A (en) Text error correction method and device and terminal
CN107193807A (en) Language conversion processing method, device and terminal based on artificial intelligence
CN110427610A (en) Text analyzing method, apparatus, computer installation and computer storage medium
CN111612103A (en) Image description generation method, system and medium combined with abstract semantic representation
CN106021227A (en) State transition and neural network-based Chinese chunk parsing method
CN112232346A (en) Semantic segmentation model training method and device and image semantic segmentation method and device
WO2023065619A1 (en) Multi-dimensional fine-grained dynamic sentiment analysis method and system
CN102722518A (en) Information processing apparatus, information processing method, and program
Braz et al. Document classification using a Bi-LSTM to unclog Brazil's supreme court
CN108846138A (en) A kind of the problem of fusion answer information disaggregated model construction method, device and medium
CN108038108A (en) Participle model training method and device and storage medium
CN106649250A (en) Method and device for identifying emotional new words
CN109960815A (en) A kind of creation method and system of nerve machine translation NMT model
CN109801349A (en) A kind of real-time expression generation method of the three-dimensional animation role of sound driver and system
CN114818891A (en) Small sample multi-label text classification model training method and text classification method
CN110489559A (en) A kind of file classification method, device and storage medium
CN106407184B (en) Coding/decoding method, statistical machine translation method and device for statistical machine translation
CN111488732A (en) Deformed keyword detection method, system and related equipment
KR102258906B1 (en) Method and apparatus for spoken language to sign language translation using attention-based artificial neural machine translation approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1249220

Country of ref document: HK

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180109