CN107562734A - Translation template determination, machine translation method and device - Google Patents
Translation template determination, machine translation method and device Download PDFInfo
- Publication number
- CN107562734A CN107562734A CN201610506589.1A CN201610506589A CN107562734A CN 107562734 A CN107562734 A CN 107562734A CN 201610506589 A CN201610506589 A CN 201610506589A CN 107562734 A CN107562734 A CN 107562734A
- Authority
- CN
- China
- Prior art keywords
- translation
- phrase
- template
- instance
- translation template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
This application discloses a kind of translation template to determine method and device, and a kind of machine translation method and device, to improve the quantity of the translation template obtained by same translation instance, more correct effective translation templates can be obtained, so as to improve the accuracy of machine translation.A kind of translation template that the application provides determines method, including:Translation instance is matched with default phrase set, determines the match phrase in the translation instance;Determine the variable label of the match phrase;According to the position of each phrase in the translation instance, the variable label of the phrase in the translation instance and the match phrase is combined, obtains the translation template of at least one combining form.
Description
Technical field
The application is related to machine translation mothod field, more particularly to a kind of translation template determines method and device, Yi Jiyi
Kind machine translation method and device.
Background technology
Machine translation, also known as automatic translation, it is using computer that a kind of natural source language shift is natural for another kind
The process of object language, refer generally to the translation of sentence and full text between natural language.Statictic machine translation system, have very strong
Generalization ability, by being learnt automatically to extensive panel data, any sentence can be translated, but for translation result
Quality can not often ensure.In order to effectively utilize the preferable parallel sentence pair of existing quality, the method that there has been translation memory.Institute
Translation memory is stated, also known as translates internal memory, (Translation Memory, TM), is one of computer-aided translation technology, is
A kind of language database for being used to store original text and its translation.And traditional translation memory is generally used for computer-aided translation
In (Computer aided translation, CAT), means common at present are to carry out ATL and term to translation instance
Storehouse is built, and by the integrated application to translation instance storehouse, terminology bank, ATL, utilizes existing bilingual parallel language to greatest extent
Expect to obtain the translation result of better quality.Wherein, by being abstracted to translation instance so as to obtain the process of translation template,
It is very important module in translation memory system.The translation instance, can be default training sentence, i.e., in short.Institute
Translation template is called, is to maintain that sentence general frame is constant, the content in framework is changed according to limitations such as grammer, pragmatics, and then
Identification and a kind of translation instance pair of generation sentence, it is to sentence to a certain extent abstract.Wherein, the pragmatic is finger speech
Say that the limitation such as utilization, described grammer, pragmatic under specified context refers to one be applied in translation template building process
A little language rules, these rules generally describe the related knowledge such as some syntaxes, semanteme, pragmatic.
Usual single language template is to include the sequence that constant and variable are formed.Wherein, specific word phrases etc. are constant,
Variable represents that abstract extensive a kind of word phrases can be carried out.For example, for template " I likeeating $ x1. ", " I therein
Like eating " and " " are exactly the constant in template, the sentence of the template are matched for each, constant part is all phase
With;And " $ x1 " are the variable in template, and the different sentences for matching the template, variable part can be different, such as " I
Like eating apple. " and " " apple " and " orange " of I like eating orange. " here is corresponding to be all
The variable part of template.As can be seen here, single language template in translation template storehouse is made up of constant and variable two parts.Wherein, often
Measure as changeless part in a template, and variable part generally can also include some conditions and limit, these conditions are to turn over
Correspond to what the phrase of variable at this must was fulfilled for during translating.Translation template needs to be abstracted enough, makes it have certain
Coverage, but can not too be abstracted, to make translation that there is accuracy.Therefore, the abstracting method of translation template directly affects
The effect of translation memory system.
The extracting method of translation template mainly includes two thinkings in the prior art:First, according to translation instance itself or phase
The information such as structure, semanteme between mutually, independent of other information, by designing respective algorithms, realize automatically extracting for translation template
Process;2nd, based on the high quality phrase fragment obtained, i.e., default phrase set is extensive to translation instance progress part, from
And that realizes translation template automatically extracts process.Wherein, based on the high quality phrase obtained come the method for extraction template, first
Need from data set obtain high quality phrase fragment, generally using nominal phrase etc. have independent meaning phrase fragment as
It is main.It is extensive by concentrating translation instance to be compared data on the basis of the phrase fragment of high quality is obtained, so as to obtain phase
The translation template answered.The common method class for carrying out the method for template extraction based on high quality phrase and being segmented based on dictionary
Seemingly, mainly including Forward Maximum Method, reverse maximum the methods of matching.
Forward Maximum Method algorithm is phrase and the phrase progress in default phrase set one by one since sentence left side
Match somebody with somebody, it is if in phrase set, variable part is replaced with from current sentence by current phrase for match phrase, i.e., so-called general
Change, until whole sentence traversal terminates.For example, for translation instance " we play in Safari Park ", it is assumed that the maximum of definition
Phrase length max=5, i.e. phrase contain up to 5 words.Then the translation instance is turned over using Forward Maximum Method algorithm
The process for translating template extraction is as follows:
Step 1: forward direction starts word for word to travel through sentence, for example, " I, we, Safari Park " composition includes three
One phrase set of individual phrase.First determine whether to include in phrase set with the phrase of " I " beginning, do not include then to the right
A mobile word is simultaneously judged, including then carries out next step operation;
Step 2: defining phrase length len=max, the fragment seg that length is len is taken out to the right since current location
=" we are wild ", and match seg in phrase set;
If Step 3: not having the fragment in phrase set, len values subtract 1, and reacquire seg fragments;
Step 4: repeat step two, until finding seg fragments in phrase set, exits circulation;
Step 5: current seg fragments are replaced with into variable label in translation instance, and the length that moves right is len
Word, step 1 is re-started, until translation instance traversal terminates.Len therein is the len of current seg fragments currency,
It is the length of the seg fragments of current matching, in step 3, if being not matched to seg fragments, len for current len values
Value subtracts 1, so when matching seg fragments, len currency is consistent with the length of the seg fragments matched.
The fragment matched replaces with variable, and the fragment being not matched to then is used as constant.For example, according to phrase set
" I, we, Safari Park " operates, then sentence " we play in Safari Park " can be designated as into " x1 plays in x2 ", its
In " $ x1 " and " $ x2 " are variable part, and remaining is constant part.
Step 6: having variable label to replacement, and the translation instance for replacing end is post-processed, including by phase
Adjacent variable label merges, so as to obtain final translation template.
Wherein, described post processing, for example, " I has a red school bag for translation instance." carry out above-mentioned steps one
To the processing of step 5, become following form:" I has a $ x1 $ x2 ", then the last handling process in step 6 can be by phase
Variable merges at adjacent $ x1 and $ x2 two, then obtains final translation template as " I has a $ x1 ".
Similarly, difference is to sentence since sentence end for the reverse maximum matching algorithm and Forward Maximum Method algorithm
The direction that son starts carries out matching traversal with default phrase set, and index is used as using the word at phrase end in phrase set.
Forward Maximum Method algorithm is similar with reverse maximum matching algorithm principle, is all to take current location to obtain most
Long phrase replaces with variable label, so only can generate a translation template for a sentence is final.Meanwhile for phrase book
In conjunction there is nesting in phrase, can only obtain a kind of maximum translation template of variable-length.As shown in figure 1, for Fig. 1 institutes
In the example shown, " the exterior can only be obtained using Forward Maximum Method algorithm and reverse maximum matching algorithm
Translation template of offers $ x ", and another translation template " the exterior offers $ x for corresponding to translation instance
Added storage " can not then be obtained.
In summary, Forward Maximum Method algorithm of the prior art or reverse maximum matching algorithm, for a translation
Example is only capable of obtaining a translation template, and retrievable translation template quantity is few, can not obtain more correct effective translation moulds
Plate, it is ineffective so as to cause to translate.
The content of the invention
The embodiment of the present application provides a kind of translation template and determines method and device, and a kind of machine translation method and dress
Put, to improve the quantity of the translation template obtained by same translation instance, more correct effective translation moulds can be obtained
Plate, so as to improve the accuracy of machine translation.
A kind of translation template that the embodiment of the present application provides determines method, including:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase
Variable label is combined, and obtains the translation template of at least one combining form.
Wherein, described translation instance, it is referred to as training sentence, i.e., in short.
Described phrase set, is referred to as set of words, can include word, and/or short sentence of multiple words composition etc..
With it, translation instance is matched with default phrase set, in the translation instance is determined
With phrase, and the variable label of the match phrase, and according to the position of each phrase in the translation instance, by the translation
The variable label of phrase and the match phrase in example is combined, so as to obtain the translation mould of at least one combining form
Plate, the quantity of the translation template obtained by same translation instance is improved, more correct effective translation templates can be obtained,
And then the accuracy of machine translation can be improved.
Alternatively, this method also includes:For the translation template of the variable label of adjacent match phrase be present, this is turned over
The variable label for translating the adjacent match phrase in template merges.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:For the translation template of the variable label of multiple match phrases be present, this is translated
The variable label of each match phrase in template is numbered, such as since the variable label of the first match phrase, adds successively
Addend word mark 1,2,3 ..., to distinguish different variable labels, the numeral can be added in corresponding variable label
Afterwards.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:According to preset rules, the translation template is filtered.It is so that final
Obtained translation template is more representative, can more meet actual demand, and carry out filtering to translation template to reduce
Memory space, avoid storing substantial amounts of translation template so that follow-up machine translation is better.
Alternatively, it is described according to preset rules, the translation template is filtered, specifically included:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines
's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do
What the length of the translation instance of translation template covering determined.
Alternatively, according to the position of each phrase in the translation instance, by the phrase in the translation instance and described
Variable label with phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its
In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as
The translation template arrived.
It should be noted that simply obtain a variety of translation moulds for same translation instance using two-dimensional matrix as one kind above
The preferable implementation of plate, those skilled in the art will also be appreciated that other implementations.
Alternatively, the translation instance is single language translation instance.
Correspondingly, a kind of machine translation method that the embodiment of the present application provides, including:
It is determined that the source statement with translation;
Using default translation template, the source statement is translated into object statement;
Wherein, the translation template is default in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase
Variable label is combined, and obtains the translation template of at least one combining form.
Method is determined with above-mentioned translation template accordingly, and a kind of translation template that the embodiment of the present application provides determines dress
Put, including:
First module, for translation instance to be matched with default phrase set, determine in the translation instance
Match phrase;
Second unit, for determining the variable label of the match phrase;
Third unit, for the position according to each phrase in the translation instance, by the phrase in the translation instance with
The variable label of the match phrase is combined, and obtains the translation template of at least one combining form.
Alternatively, the third unit is additionally operable to:For the translation template of the variable label of adjacent match phrase be present,
The variable label of adjacent match phrase in the translation template is merged.
Alternatively, the third unit is additionally operable to:, will for the translation template of the variable label of multiple match phrases be present
The variable label of each match phrase in the translation template is numbered.
Alternatively, the third unit is additionally operable to:According to preset rules, the translation template is filtered.
Alternatively, the third unit is filtered to the translation template, specifically included according to preset rules:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines
's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do
What the length of the translation instance of translation template covering determined.
Alternatively, the third unit according to each phrase in the translation instance position, by the translation instance
The variable label of phrase and the match phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its
In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as
The translation template arrived.
Alternatively, the translation instance is single language translation instance.
With above-mentioned machine translation method accordingly, the embodiment of the present application provide a kind of machine translation apparatus, including:
Determining unit, for determining source statement to be translated;
Translation unit, for utilizing default translation template, the source statement is translated into object statement;
Default unit, for presetting the translation template in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase
Variable label is combined, and obtains the translation template of at least one combining form.
Brief description of the drawings
In order to illustrate more clearly of the technical scheme in the embodiment of the present application, make required in being described below to embodiment
Accompanying drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the present application, for this
For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these accompanying drawings
His accompanying drawing.
Fig. 1 is the schematic diagram for carrying out translation template extraction using Forward Maximum Method algorithm in the prior art;
Fig. 2 is the overall procedure schematic diagram that a kind of translation template that the embodiment of the present application provides determines method;
Fig. 3 is the idiographic flow schematic diagram that a kind of translation template that the embodiment of the present application provides determines method;
Fig. 4 is the schematic diagram that translation template is determined by two-dimensional matrix that the embodiment of the present application provides;
Fig. 5 is that the two-dimensional matrix that the embodiment of the present application provides initializes schematic diagram;
Fig. 6 is the schematic diagram that phrase fragment is added in two-dimensional matrix that the embodiment of the present application provides;
Fig. 7 is the schematic diagram that phrase fragment is added in two-dimensional matrix that the embodiment of the present application provides;
Fig. 8 is the two-dimensional matrix signal in two-dimensional matrix after addition phrase fragment that the embodiment of the present application provides
Figure;
Fig. 9 is the translation template schematic diagram finally given that the embodiment of the present application provides;
Figure 10 is a kind of structural representation for translation template determining device that the embodiment of the present application provides;
Figure 11 is a kind of structural representation for machine translation apparatus that the embodiment of the present application provides.
Embodiment
The embodiment of the present application provides a kind of translation template and determines method and device, and a kind of machine translation method and dress
Put, to improve the quantity of the translation template obtained by same translation instance, more correct effective translation moulds can be obtained
Plate, so as to improve the accuracy of machine translation.
The translation template that the embodiment of the present application proposes determines method, is the translation template abstracting method based on Dynamic Programming,
Obtain first corresponding to translation instance and be possible to translation template, on this basis, according to the level of abstraction of translation template, coverage
Filtered etc. index, so as to effectively expand the quantity of effective translation template.
Referring to Fig. 2, a kind of translation template that the embodiment of the present application provides determines method, including:
S101, translation instance matched with default phrase set, determine the match phrase in the translation instance;
Wherein, described translation instance, it is referred to as training sentence, i.e., in short.
Alternatively, the translation instance is single language translation instance.
Described phrase set, is referred to as set of words, can include word, and/or short sentence of multiple words composition etc..
For example, translation instance is " we play in Safari Park ", default phrase set includes " we, wild animal
Garden ", then, the match phrase in the translation instance include " we " and " Safari Park ".
S102, the variable label for determining the match phrase;
Described variable label, such as:$x.
S103, the position according to each phrase in the translation instance, by the phrase in the translation instance and the matching
The variable label of phrase is combined, and obtains the translation template of at least one combining form.
For example, in the case of having multiple match phrases, can there is a multiple combinations mode, for example, using wherein any one
Variable label with phrase replaces the situation of the phrase of relevant position, recycles the variable label of any two of which match phrase
The situation of the phrase of relevant position is replaced, by that analogy, there can be multiple combinations mode, therefore multiple translation moulds can be obtained
Plate.
As can be seen here, with it, translation instance is matched with default phrase set, determine that the translation is real
Match phrase in example, and the variable label of the match phrase, and according to the position of each phrase in the translation instance, will
The variable label of phrase and the match phrase in the translation instance is combined, so as to obtain at least one combining form
Translation template, improve the quantity of the translation template obtained by same translation instance, can obtain more correct effective
Translation template, and then the accuracy of machine translation can be improved.
Alternatively, this method also includes:For the translation template of the variable label of adjacent match phrase be present, this is turned over
The variable label for translating the adjacent match phrase in template merges.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:For the translation template of the variable label of multiple match phrases be present, this is translated
The variable label of each match phrase in template is numbered, such as since the variable label of the first match phrase, adds successively
Addend word mark 1,2,3 ..., to distinguish different variable labels, the numeral can be added in corresponding variable label
Afterwards.For example, the translation template " x1 plays in x2 " obtained using translation instance " we play in Safari Park ", therein 1,2
For the numeral mark of addition.
The step is referred to as the post processing of translation template.
Alternatively, this method also includes:According to preset rules, the translation template is filtered.It is so that final
Obtained translation template is more representative, can more meet actual demand, and carry out filtering to translation template to reduce
Memory space, avoid storing substantial amounts of translation template so that follow-up machine translation is better.
Alternatively, it is described according to preset rules, the translation template is filtered, specifically included:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines
's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do
What the length of the translation instance of translation template covering determined.
For example, for translation instance " I has an apple " and translation template, " I has a $ x ", and this translation template is just
This translation instance can be covered.The coverage of so so-called translation template, exactly a translation template can cover all
Translation instance quantity.
Level of abstraction explanation on translation template:
The level of abstraction (abs):For translation template, length is smaller, and translation template is more abstract;The word number that variable includes is more,
Translation template is more abstract.Alternatively, the level of abstraction of translation template is calculated in the embodiment of the present application using equation below:
Wherein, lentemplateRepresent the length (variable can be can be regarded as a word) of translation template, leniRepresenting should
The length for i-th of translation instance (i.e. sentence) that translation template is covered, n are the coverage of the translation template.
Alternatively, according to the position of each phrase in the translation instance, by the phrase in the translation instance and described
Variable label with phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its
In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as
The translation template arrived.
Wherein, any translation instance is directed in the embodiment of the present application, a two-dimensional matrix, every unitary of two-dimensional matrix are set
Plain position, preserve the combining form of the variable label of phrase and match phrase in translation instance.
Translation template in the two-dimensional matrix in upper right Angle Position, can be in the two-dimensional matrix it is more than diagonal most
Translation template in the position in the upper right corner.Translation template in two-dimensional matrix in the position of diagonal above and below is to repeat
, so, only take the element position of diagonal above and below in two-dimensional matrix to determine translation template, can take two
The translation template in most upper right Angle Position in matrix is tieed up, or takes the translation template in two-dimensional matrix in most lower-left Angle Position, when
So, the translation template therein not comprising variable label is excluded.
It should be noted that simply obtain a variety of translation moulds for same translation instance using two-dimensional matrix as one kind above
The preferable implementation of plate, those skilled in the art will also be appreciated that other implementations.
Correspondingly, a kind of machine translation method that the embodiment of the present application provides, including:
It is determined that the source statement with translation;
Using default translation template, the source statement is translated into object statement;
Wherein, the translation template is default in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase
Variable label is combined, and obtains the translation template of at least one combining form.
The more detailed illustration for the technical scheme that the embodiment of the present application given below provides.
The embodiment of the present application proposes a kind of translation template abstracting method based on Dynamic Programming, participates in Fig. 3, first basis
All high quality phrases in translation instance are marked for phrase set, wherein, the high quality phrase, i.e. described matching are short
Language, high quality phrase is marked, that is, determines the variable label of match phrase.On this basis, the side of Dynamic Programming is passed through
Method, the extraction process of translation template is converted into sentence different fragments (i.e. different phrases, or different words, or be made up of multiple words
Short sentence) splicing, so as to obtain all possible translation template, and translation template is post-processed.Afterwards, to institute
There are the indexs such as formwork calculation coverage, the level of abstraction and carry out screening and filtering, so as to obtain final ATL.
Algorithm is by all phrase fragments for meeting length requirement in ergodic translation example, according to phrase set to wherein high
The position of quality phrase is marked, and based on this, template extraction is carried out to translation instance, template extraction process is using dynamic
Planing method.
Assuming that translation instance s=s1…sL, wherein L is the length of translation instance.The length of translation instance refers to translation instance
Phrase number or word number, the operation to translation instance carries out on the basis of participle, such as " we are China to translation instance
People." word segmentation result is " we is Chinese.", then the length of translation instance is 5.The extraction process of translation template, which defines, to be turned over
It is a phrase fragment seg [m, n] to translate a continuous word string section in example, and it is single to n-th that the section starts from m-th of word
Word terminates.Retain extensive and not extensive two kinds of forms for high quality phrase labeled in translation instance, such as " we " are
Match phrase, then retain " we " (i.e. not extensive form) and " two kinds of forms of $ x " (i.e. extensive form), and extraction translation mould
It is combined during plate according to relevant position, therefore diversified forms just occur in the seg comprising the phrase, according to bottom-up
Order be the exhaustive possible combined situations of each seg [m, n].
To input translation instance, " exemplified by A B C D E F G ", the translation instance length is 7, in the embodiment of the present application, such as
Shown in Fig. 4, genitive phrase fragment seg [m, n] is stated by safeguarding a two-dimensional array, in the two dimension shown in Fig. 4
Each position preserves its corresponding phrase combining form in table.Each small lattice i.e. in two-dimensional matrix represent seg described above
[m, n], the dash area of two-dimensional matrix preserves the various combination of the genitive phrase fragment in translation template extraction process, decimated
Journey travels through all seg, and acquired translation template set is stored in the position or the most lower left corner in the most upper right corner of two-dimensional matrix
Position.Specifically, it is assumed that default phrase set include " A ", " A B ", " F ", " E F ", the embodiment of the present application pass through safeguard one
Individual two-dimensional matrix carries out the extraction process of template, specific as follows:
Step 1: matrix is initialized first.Phrase fragment length is 1, to seg [i, i] and the height matched
The seg of quality phrase is loaded.Initialization procedure to all high quality phrase piece fragment positions occurred in matrix, it is necessary to carry out
Mark, that is, determine variable label " the $ x " of match phrase.Then seg [1,1] (position of Corresponding matching phrase " A "), seg in matrix
[1,2] (Corresponding matching phrase " A B " position), and seg [5,6] (Corresponding matching phrase " E F " position), seg [6,6] is (right
Answer match phrase " F " position) opening position equal record variable mark " $ x ", it can be understood as added in the seg " $ x " values,
Represent that this fragment has the phrase matched, as shown in Figure 5.
Step 2: be incremented by according to phrase fragment length, with " $ x " expressions can be replaced the part of variable, i.e. variable label,
The diversified forms of fragment are combined.That is phrase fragment length adds 1, continues to fill up the seg [i, i+1] in matrix, for example,
Seg [1,2] is made up of seg [1,1]+seg [2,2].Seg [1,1] storages " A ", " two kinds of shapes of $ x " (Corresponding matching phrase " A ")
Formula, a kind of seg [2,2] storage " B " forms, then seg [1,2] includes " $ after the seg [1,2] that homologous segment length is 2 merges
X ", " AB ", " three kinds of fragment combination forms of $ x B ", wherein, " A B " are that " A " and the seg [2,2] of seg [1,1] storages are stored
" B " combination result, " $ xB " be seg [1,1] storage " $ x " and seg [2,2] storage " B " combination result, as shown in Figure 6.
Similarly, phrase fragment length adds 1 again, continues to fill up the seg [i, i+2] in matrix, such as:Seg [1,3] is by seg
[1,1]+seg [2,3], seg [1,2]+seg [3,3] are formed, as shown in Figure 7.
By that analogy, the value in matrix most upper right corner seg [1,7], it is pair referring to Fig. 8 until matrix fill-in finishes
Answer translation instance " the phrase-based set " A " of A B C D E F G ", " A B ", " F ", " genitive phrase that E F " are obtained, matching
The combining form of the variable label of phrase.Wherein " A B C D E F G " do not include variable label, are not translation templates, need to go
Fall, remaining translation template is post-processed, obtained final result is as shown in figure 9, using the embodiment of the present application to translation
" A B C D E F G " are handled example, can obtain 9 translation templates altogether.
Algorithm is implemented as follows:
Based on said process, the embodiment of the present application can be translated according to the set of high quality phrase to all translation instances
Template extracts, and obtains all possible translation template.Initial data for including 110,000 (w) individual translation instance, according to comprising
The phrase set of 1.6w high quality phrase carries out translation template extraction, obtains 130w translation template altogether.Translation template extracts
During, in order to improve the recall rate of extraction process, the relatively low translation template of a large amount of quality in the translation template obtained be present,
Therefore, the embodiment of the present application increases the filter operation to translation template.
The embodiment of the present application filters from the coverage of translation template, the level of abstraction etc. to translation template.
Coverage (cov):The translation instance number that translation template can cover in whole translation instance storehouse, it is exactly that this is turned over
Translate the coverage of template.
The level of abstraction (abs):For specified translation template, length is smaller, and template is more abstract;The word number that variable includes is more,
Translation template is more abstract.
According to the introduction of above coverage and the level of abstraction, coverage threshold value and level of abstraction threshold value are pre-set, to being drawn into
All translation templates filtered, and formulated following filter condition:
Condition one, translation template coverage >=5;
Condition two, the translation template level of abstraction >=0.5;
After condition three, translation template remove variable label, remaining word number >=3.
Meet that one of above-mentioned condition or the translation template of combination can leave, into translation template storehouse, otherwise filter out.
For example, translation template " Lucy is a good $ X1 ", if only " Lucy is a good girl " one are turned over for covering
Example is translated, then its coverage is 1, it is believed that the translation template can filter out without representativeness;
Translation template " $ X1and $ X2 ", the level of abstraction 0.2, then it is assumed that the translation template is abstracted extensive too much, can filter
Fall;
Translation template " $ X1and $ X2 ", remove variable " $ x1 " and " after $ x2 ", only surplus " and " one word, remaining word number is 1,
Think that the translation template is too short, it is too many abstract, it can filter out.
So far, the translation template extraction process that the embodiment of the present application is proposed terminates.
It should be noted that the technical scheme that the embodiment of the present application provides, can be based on Forward Maximum Method method
Template extracts or the template based on reverse maximum matching process extracts.
In summary, template of the embodiment of the present application based on Dynamic Programming extracts, according to the thought of Dynamic Programming, by time
The genitive phrase fragment corresponding to translation instance is gone through, its form of ownership is combined, it is more corresponding to translation instance so as to obtain
Individual translation template, the recall rate of translation template extraction is effectively raised, avoids the waste of effective translation template, effectively really
The abundant degree in final translation template storehouse is protected.Also, the validity of translation template is ensure that by subsequent filter operation.
Method is determined with above-mentioned translation template accordingly, and a kind of translation template that the embodiment of the present application provides determines dress
Put, referring to Figure 10, including:
First module 11, for translation instance to be matched with default phrase set, determine in the translation instance
Match phrase;
Second unit 12, for determining the variable label of the match phrase;
Third unit 13, for the position according to each phrase in the translation instance, by the phrase in the translation instance
It is combined with the variable label of the match phrase, obtains the translation template of at least one combining form.
Alternatively, the third unit is additionally operable to:For the translation template of the variable label of adjacent match phrase be present,
The variable label of adjacent match phrase in the translation template is merged.
Alternatively, the third unit is additionally operable to:, will for the translation template of the variable label of multiple match phrases be present
The variable label of each match phrase in the translation template is numbered.
Alternatively, the third unit is additionally operable to:According to preset rules, the translation template is filtered.
Alternatively, the third unit is filtered to the translation template, specifically included according to preset rules:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines
's;
The level of abstraction of the translation template is according to the coverage of the translation template, the length of the translation template and is somebody's turn to do
What the length of the translation instance of translation template covering determined.
Alternatively, the third unit according to each phrase in the translation instance position, by the translation instance
The variable label of phrase and the match phrase is combined, and is obtained the translation template of at least one combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, its
In, the L is the length of the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as
The translation template arrived.
With above-mentioned machine translation method accordingly, referring to Figure 11, the embodiment of the present application provide a kind of machine translation dress
Put, including:
Determining unit 21, for determining source statement to be translated;
Translation unit 22, for utilizing default translation template, the source statement is translated into object statement;
Default unit 23, for presetting the translation template in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the match phrase
Variable label is combined, and obtains the translation template of at least one combining form.
Above-mentioned default unit, it can be understood as above-mentioned translation template determining device.
Any of the above-described unit, it can be realized by hardwares such as processors.Processor can be that centre buries device
(CPU), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate
Array (Field-Programmable Gate Array, FPGA) or CPLD (Complex
Programmable Logic Device, CPLD).
In summary, the embodiment of the present application proposes a kind of practicable translation template and extracts scheme automatically, is obtaining
On the basis of high quality phrase fragment, template extraction is carried out to translation instance, the efficiency and quality of template extraction can be efficiently modified,
So as to improve the quality of translation memory system translation.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program
Product.Therefore, the application can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the application can use the computer for wherein including computer usable program code in one or more
The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.)
Formula.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the application to the application
God and scope.So, if these modifications and variations of the application belong to the scope of the application claim and its equivalent technologies
Within, then the application is also intended to comprising including these changes and modification.
Claims (16)
1. a kind of translation template determines method, it is characterised in that including:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the variable of the match phrase
Mark is combined, and obtains the translation template of at least one combining form.
2. according to the method for claim 1, it is characterised in that this method also includes:For adjacent match phrase be present
Variable label translation template, the variable label of the adjacent match phrase in the translation template is merged.
3. according to the method for claim 1, it is characterised in that this method also includes:For multiple match phrases be present
The translation template of variable label, the variable label of each match phrase in the translation template is numbered.
4. according to the method for claim 1, it is characterised in that this method also includes:According to preset rules, to the translation
Template is filtered.
5. according to the method for claim 4, it is characterised in that it is described according to preset rules, the translation template is carried out
Filtering, is specifically included:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines;
The level of abstraction of the translation template, it is according to the coverage of the translation template, the length of the translation template and the translation
What the length of the translation instance of template covering determined.
6. according to the method for claim 1, it is characterised in that according to the position of each phrase in the translation instance, by institute
The variable label for stating phrase and the match phrase in translation instance is combined, and obtains the translation of at least one combining form
Template, specifically include:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, wherein, institute
State the length that L is the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as what is obtained
Translation template.
7. according to the method described in any claim of claim 1~6, it is characterised in that the translation instance is that the translation of single language is real
Example.
A kind of 8. machine translation method, it is characterised in that including:
Determine source statement to be translated;
Using default translation template, the source statement is translated into object statement;
Wherein, the translation template is default in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the variable of the match phrase
Mark is combined, and obtains the translation template of at least one combining form.
A kind of 9. translation template determining device, it is characterised in that including:
First module, for translation instance to be matched with default phrase set, determine the matching in the translation instance
Phrase;
Second unit, for determining the variable label of the match phrase;
Third unit, for the position according to each phrase in the translation instance, by the phrase in the translation instance with it is described
The variable label of match phrase is combined, and obtains the translation template of at least one combining form.
10. device according to claim 9, it is characterised in that the third unit is additionally operable to:For adjacent be present
The translation template of variable label with phrase, the variable label of the adjacent match phrase in the translation template is merged.
11. device according to claim 9, it is characterised in that the third unit is additionally operable to:For multiple matchings be present
The translation template of the variable label of phrase, the variable label of each match phrase in the translation template is numbered.
12. device according to claim 9, it is characterised in that the third unit is additionally operable to:It is right according to preset rules
The translation template is filtered.
13. device according to claim 12, it is characterised in that the third unit is turned over according to preset rules to described
Translate template to be filtered, specifically include:
Filtering meets the translation template of one of following condition or combination:
The coverage of translation template is less than default coverage threshold value;
The level of abstraction of translation template is less than default level of abstraction threshold value;
The quantity that translation template removes the word after variable label is less than default amount threshold;
Wherein, the coverage of the translation template, it is that the quantity of the translation instance covered according to the translation template determines;
The level of abstraction of the translation template, it is according to the coverage of the translation template, the length of the translation template and the translation
What the length of the translation instance of template covering determined.
14. device according to claim 9, it is characterised in that the third unit is according to each short in the translation instance
The position of language, the variable label of the phrase in the translation instance and the match phrase is combined, obtains at least one
The translation template of combining form, is specifically included:
Using the phrase in translation instance, and the variable label of the match phrase, L*L two-dimensional matrix is determined, wherein, institute
State the length that L is the translation instance;
The translation template of variable label in translation template in upper right Angle Position in the two-dimensional matrix to be present, as what is obtained
Translation template.
15. according to the device described in any claim of claim 9~14, it is characterised in that the translation instance is translated for single language
Example.
A kind of 16. machine translation apparatus, it is characterised in that including:
Determining unit, for determining source statement to be translated;
Translation unit, for utilizing default translation template, the source statement is translated into object statement;
Default unit, for presetting the translation template in the following way:
Translation instance is matched with default phrase set, determines the match phrase in the translation instance;
Determine the variable label of the match phrase;
According to the position of each phrase in the translation instance, by the phrase in the translation instance and the variable of the match phrase
Mark is combined, and obtains the translation template of at least one combining form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610506589.1A CN107562734A (en) | 2016-06-30 | 2016-06-30 | Translation template determination, machine translation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610506589.1A CN107562734A (en) | 2016-06-30 | 2016-06-30 | Translation template determination, machine translation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107562734A true CN107562734A (en) | 2018-01-09 |
Family
ID=60968894
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610506589.1A Pending CN107562734A (en) | 2016-06-30 | 2016-06-30 | Translation template determination, machine translation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107562734A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408307A (en) * | 2021-07-14 | 2021-09-17 | 北京理工大学 | Neural machine translation method based on translation template |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512395A (en) * | 2002-12-27 | 2004-07-14 | 联想(北京)有限公司 | Establishing method for open type natural language |
CN101706777A (en) * | 2009-11-10 | 2010-05-12 | 中国科学院计算技术研究所 | Method and system for extracting resequencing template in machine translation |
EP2199925A1 (en) * | 2008-12-03 | 2010-06-23 | Xerox Corporation | Dynamic translation memory using statistical machine translation |
-
2016
- 2016-06-30 CN CN201610506589.1A patent/CN107562734A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512395A (en) * | 2002-12-27 | 2004-07-14 | 联想(北京)有限公司 | Establishing method for open type natural language |
EP2199925A1 (en) * | 2008-12-03 | 2010-06-23 | Xerox Corporation | Dynamic translation memory using statistical machine translation |
CN101706777A (en) * | 2009-11-10 | 2010-05-12 | 中国科学院计算技术研究所 | Method and system for extracting resequencing template in machine translation |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408307A (en) * | 2021-07-14 | 2021-09-17 | 北京理工大学 | Neural machine translation method based on translation template |
CN113408307B (en) * | 2021-07-14 | 2022-06-14 | 北京理工大学 | Neural machine translation method based on translation template |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804530B (en) | Subtitling areas of an image | |
CN110263324B (en) | Text processing method, model training method and device | |
WO2020168844A1 (en) | Image processing method, apparatus, equipment, and storage medium | |
US20220277572A1 (en) | Artificial intelligence-based image processing method, apparatus, device, and storage medium | |
CN105701120B (en) | The method and apparatus for determining semantic matching degree | |
CN113297975A (en) | Method and device for identifying table structure, storage medium and electronic equipment | |
CN106528532A (en) | Text error correction method and device and terminal | |
CN107193807A (en) | Language conversion processing method, device and terminal based on artificial intelligence | |
CN110427610A (en) | Text analyzing method, apparatus, computer installation and computer storage medium | |
CN111612103A (en) | Image description generation method, system and medium combined with abstract semantic representation | |
CN106021227A (en) | State transition and neural network-based Chinese chunk parsing method | |
CN112232346A (en) | Semantic segmentation model training method and device and image semantic segmentation method and device | |
WO2023065619A1 (en) | Multi-dimensional fine-grained dynamic sentiment analysis method and system | |
CN102722518A (en) | Information processing apparatus, information processing method, and program | |
Braz et al. | Document classification using a Bi-LSTM to unclog Brazil's supreme court | |
CN108846138A (en) | A kind of the problem of fusion answer information disaggregated model construction method, device and medium | |
CN108038108A (en) | Participle model training method and device and storage medium | |
CN106649250A (en) | Method and device for identifying emotional new words | |
CN109960815A (en) | A kind of creation method and system of nerve machine translation NMT model | |
CN109801349A (en) | A kind of real-time expression generation method of the three-dimensional animation role of sound driver and system | |
CN114818891A (en) | Small sample multi-label text classification model training method and text classification method | |
CN110489559A (en) | A kind of file classification method, device and storage medium | |
CN106407184B (en) | Coding/decoding method, statistical machine translation method and device for statistical machine translation | |
CN111488732A (en) | Deformed keyword detection method, system and related equipment | |
KR102258906B1 (en) | Method and apparatus for spoken language to sign language translation using attention-based artificial neural machine translation approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1249220 Country of ref document: HK |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180109 |