CN101452446A - Target language word deforming method and device - Google Patents

Target language word deforming method and device Download PDF

Info

Publication number
CN101452446A
CN101452446A CNA2007101865456A CN200710186545A CN101452446A CN 101452446 A CN101452446 A CN 101452446A CN A2007101865456 A CNA2007101865456 A CN A2007101865456A CN 200710186545 A CN200710186545 A CN 200710186545A CN 101452446 A CN101452446 A CN 101452446A
Authority
CN
China
Prior art keywords
language
mentioned
word
target language
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007101865456A
Other languages
Chinese (zh)
Inventor
刘占一
王海峰
吴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to CNA2007101865456A priority Critical patent/CN101452446A/en
Priority to JP2008308753A priority patent/JP2009140499A/en
Priority to US12/328,476 priority patent/US20090164206A1/en
Publication of CN101452446A publication Critical patent/CN101452446A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment

Abstract

The invention provides a method and a device for training a target language word deformation model based on a bilingual corpus, a target language word deformation method and a device, as well as a translation method and a system for translating a source language text into a target language. In the method for training the target language word deformation model based on the bilingual corpus, the bilingual corpus comprises a plurality of pairs of source language materials and target language materials which are aligned. The method comprises: pre-processing the source language materials and the target language materials in the bilingual corpus; extracting a template containing target language word deformation information based on the pre-processed source language materials and target language materials; and utilizing the template to train the target language word deformation model.

Description

The method of target language word deforming and device
Technical field
The present invention relates to based on the target language word deforming in the automatic machine translation technology of corpus, particularly, relate to based on the method for bilingualism corpora training objective language word distorted pattern and device, target language word deforming method and device and the interpretation method and the translation system that source language text are translated as target language text.
Background technology
All have the situation of word deforming in a lot of language, for example, in English language, verb has the variation of tense, and noun has the variation of quantity.Like this, according to word deforming, can obtain information such as time, quantity, emotion, and these information can be used for accurately understanding the sentence of this language.
At present, automatic machine translation has two kinds of major technology: rule-based method and based on the method for corpus.Rule-based method is utilized the translation rule training and is set up translation model, utilizes the translation model that trains to translate then; And be to utilize the bilingualism corpora training and set up translation model based on the method for corpus.
In rule-based method, can generate the distortion of target language word by translation rule, yet translation rule is usually by hand-coding, this need spend the more time.And translation rule must use the syntactic analysis information of deep layer, and for the Interpreter, the structure of sentence is very flexible, is difficult to obtain the exact sentence analysis result.
In the method based on corpus, the distortion of target language word derives from bilingualism corpora, has only bilingualism corpora to comprise the distortion of certain target language word, the distortion that just can export this target language word based on the translation model of this bilingualism corpora.Therefore, the size of bilingualism corpora can influence the precision of translation to a great extent.
About above-mentioned rule-based method with based on the method for corpus, in " mechanical translation the principle " (publishing house of Harbin Institute of Technology of writing by Zhao Tiejun etc., May calendar year 2001), D.J.Arnold, Lorna Balkan, Siety Meijer, " Machine Translation:an Introductory Guide " (Blackwells-NCC that R.Lee Humphreys and Louisa Sadler are shown, 1994), " the Machine Translation over Fifty Years " that shown with John Hutchins (is published in Histoire, Epistemologies, Language, Tome XXII, pp.7-31,2001) describe in detail in.
Summary of the invention
The present invention just is being based on above-mentioned technical matters and is proposing, and its purpose is to provide a kind of method and device, target language word deforming method and device and interpretation method and translation system of source language text being translated as target language text based on bilingualism corpora training objective language word distorted pattern.
According to a first aspect of the invention, a kind of method based on bilingualism corpora training objective language word distorted pattern is provided, wherein above-mentioned bilingualism corpora comprises many to having carried out the source language language material and the target language language material of alignment, and described method comprises: set up initial target language word deforming model; Source language language material in the above-mentioned bilingualism corpora of pre-service and target language language material; Based on above-mentioned pretreated source language language material and target language language material, extract the template that comprises target language word deforming information; And utilize above-mentioned template, train above-mentioned target language word deforming model.
According to a second aspect of the invention, a kind of target language word deforming method is provided, wherein, source language text is translated into initial target language translation, and above-mentioned source language text is pretreated for to make the source language word that it comprised be original shape and indicate part of speech, said method comprises: utilize the above-mentioned method based on bilingualism corpora training objective language word distorted pattern, training objective language word distorted pattern; And utilize above-mentioned target language word deforming model, the target language word in the above-mentioned target language translation is out of shape.
According to a third aspect of the present invention, a kind of interpretation method that source language text is translated as the target language translation is provided, comprise: the above-mentioned source language text of pre-service, to obtain the source language word sequence of above-mentioned source language text, the source language word in the wherein above-mentioned source language word sequence is reduced to original shape and indicates part of speech; Utilization is translated as initial target language translation based on the translation model of corpus with above-mentioned pretreated source language text; And utilize above-mentioned target language word deforming method, edit above-mentioned initial target language translation, to obtain final objective language translation.
According to a fourth aspect of the present invention, a kind of device based on bilingualism corpora training objective language word distorted pattern is provided, wherein above-mentioned bilingualism corpora comprises many to having carried out the source language language material and the target language language material of alignment, described device comprises: initial model is set up the unit, is used to set up initial target language word deforming model; The language material pretreatment unit is used for the source language language material and the target language language material of the above-mentioned bilingualism corpora of pre-service; The template extracting unit is used for based on above-mentioned pretreated source language language material and target language language material, extracts the template that comprises target language word deforming information; And training unit, be used to utilize above-mentioned template, train above-mentioned target language word deforming model.
According to a fifth aspect of the present invention, a kind of target language word deforming device is provided, wherein, source language text is translated into the target language translation, and above-mentioned source language text is pretreated for to make the source language word that it comprised be original shape and indicate part of speech, said apparatus comprises: the target language word deforming model, and it is to utilize the above-mentioned device based on bilingualism corpora training objective language word distorted pattern to train; And the word deforming unit, be used to utilize above-mentioned target language word deforming model, the target language word in the above-mentioned target language translation is out of shape.
According to a sixth aspect of the invention, a kind of translation system that source language text is translated as the target language translation is provided, comprise: the text pretreatment unit, be used for the above-mentioned source language text of pre-service, to obtain the source language word sequence of above-mentioned source language text, the source language word in the wherein above-mentioned source language word sequence is reduced to original shape and indicates part of speech; Based on the translation model of corpus, be used for above-mentioned pretreated source language text is translated as initial target language translation; And above-mentioned target language word deforming device, be used to edit above-mentioned initial target language translation, to obtain final objective language translation.
Description of drawings
Fig. 1 is the process flow diagram of the method based on bilingualism corpora training objective language word distorted pattern according to an embodiment of the invention;
Fig. 2 is the process flow diagram of the extraction template step among the embodiment shown in Figure 1;
Fig. 3 is the process flow diagram of target language word deforming method according to an embodiment of the invention;
Fig. 4 is the process flow diagram of the word deforming step among the embodiment shown in Figure 3;
Fig. 5 is the process flow diagram that source language text is translated as the interpretation method of target language translation according to an embodiment of the invention;
Fig. 6 is the schematic block diagram of the device based on bilingualism corpora training objective language word distorted pattern according to an embodiment of the invention;
Fig. 7 is the schematic block diagram of the template extracting unit among the embodiment of Fig. 6;
Fig. 8 is the schematic block diagram of target language word deforming device according to an embodiment of the invention;
Fig. 9 is the schematic block diagram of the word deforming unit among the embodiment of Fig. 8;
Figure 10 is the schematic block diagram that source language text is translated as the translation system of target language translation according to an embodiment of the invention.
Embodiment
Believe that by below in conjunction with the detailed description of accompanying drawing to most preferred embodiment of the present invention, above and other objects of the present invention, feature and advantage can become more obvious.
Fig. 1 is the process flow diagram of the method based on bilingualism corpora training objective language word distorted pattern according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail.Utilize target language word deforming (Target Language WordInflection the is called for short TLWI) model of the method training of present embodiment will be used to the target language word deforming method of describing in conjunction with following embodiment and source language text will be translated as the interpretation method of target language translation.
In the present embodiment, bilingualism corpora comprises many to having carried out the source language language material and the target language language material of alignment, and these language materials can be the forms of phrase, sentence or paragraph.For convenience of explanation, in the embodiment of present embodiment and back, suppose that language material is the form of sentence, promptly bilingualism corpora is bilingual example sentence storehouse, and source language sentence and target language sentence in the so bilingual example sentence storehouse are sentence alignments.
As shown in Figure 1, at first in step 101, set up initial target language word deforming model.In the present embodiment, the TLWI model can be direct probability model, for example adopting the probability model of P (action|condition) form, also can be pattern recognition model, for example based on the pattern recognition model of support vector machine (SVM), based on pattern recognition model of decision tree etc.
Then, in step 105, source language sentence in the bilingual example sentence storehouse and target language sentence are carried out pre-service.Particularly, right for many each to the source language sentence that carried out alignment and target language sentence in the bilingual example sentence storehouse, the source language sentence is carried out pre-service, so that the source language word in the pretreated source language sentence is original shape and indicates part of speech, simultaneously, target language sentence is carried out pre-service, so that the target language word in the pretreated target language sentence is original shape and indicates part of speech.
Be Chinese with source language below, target language is that English is example, specifies this step 105.At first, Chinese sentence is cut into the Chinese word sequence, and marks the part of speech of each Chinese word in this Chinese word sequence.The technology of cutting Chinese sentence is known for those of ordinary skill in the art, omits explanation herein.Then, each English word in the English sentence is reduced into original shape, and marks its part of speech.
Then, in step 110, based on through step 105 pretreated each to having carried out the source language sentence and the target language sentence of alignment, extract the template that comprises target language word deforming information.
Fig. 2 shows the process flow diagram of extraction template step 110.As shown in Figure 2,, source language word in the pretreated source language sentence and the target language word in the pretreated target language sentence are alignd, thereby obtain word alignment information at first in step 1101.In this step, can use existing or future any alignment techniques to carry out word alignment.
Then, in step 1105, search is inconsistent target language word in the pretreated target language sentence of original target language sentence and correspondence,, searches out the target language word that distortion has taken place in target language sentence that is.
In step 1110,, in pretreated source language sentence, obtain the source language word that aligns with the inconsistent target language word that in step 1105, searches out according to the word alignment information that in step 1101, obtains.
Then, in step 1115, according to the inconsistent target language word that in step 1105, searches out, the source language word that aligns with inconsistent target language word that in step 1110, obtains and the contextual information of this source language word that aligns in original source language sentence, generate the template that comprises target language word deforming information.
In the present embodiment, target language word deforming information comprises: the part of speech of source language word; The combination of the contextual information of this source language word is as condition; And with the deformational behavior of the target language word of this source language words aligning, as action.That is to say that the template that is generated is made up of part of speech part, condition part and action part.
Further, the combination of the contextual information of the source language word in the condition part of template can be predefined, for example, comprising: a) previous source language word; B) a previous source language word and a back source language word; C) the front source language word of being separated by; D) the back source language word of being separated by.
For example, Chinese sentence is made up of 7 Chinese words, i.e. " C 1/ P 1C 2/ P 2C 3/ P 3C 4/ P 4C 5/ P 5C 6/ P 6C 7/ P 7", C wherein iRepresent Chinese word, P iRepresent part of speech.Suppose " C 4/ P 4" be the Chinese word that aligns with the English word " W4/P4 " that changes, if adopt the combination of top example as contextual information, then the condition of the template that is generated is: a)-1C 3B)-1C 3+ 1C 5C)-2C 2D)+2C 6
Certainly, persons of ordinary skill in the art may appreciate that the combination of contextual information is not limited to above-mentioned array configuration, can also comprise other array configuration.
Return Fig. 1, after having extracted the template that comprises target language word deforming information,, utilize these templates, training objective language word distorted pattern in step 115.Particularly, according to the model that the target language word deforming model is adopted, adopt corresponding training algorithm.These training algorithms are known for those of ordinary skill in the art, omit explanation herein.
Below in conjunction with a concrete example, the method based on bilingualism corpora training objective language word distorted pattern of present embodiment is described.
Suppose that a pair of Chinese sentence and the English sentence that has carried out alignment in the Chinese-English bilingual example sentence storehouse is:
Chs: that girl had just washed these apples.
Eng:The?girl?just?washed?these?apples.
These two sentences are carried out pre-service, obtain pretreated Chinese sentence and English sentence and be respectively:
Chs: that/girl pron/n just/adv washes/v mistake/u these/pron apple/n./w
Eng:The/art?girl/n?just/adv?wash/v?these/pron?apple/n./w
Table 1 shows pretreated Chinese sentence:
Table 1
Word Part of speech
That Pron (pronoun)
Girl N (noun)
Just Adv (adverbial word)
Wash V (verb)
Cross U (auxiliary word)
These Pron (pronoun)
Apple N (noun)
W (punctuate)
Table 2 shows pretreated English sentence:
Table 2
Word Part of speech
The Art (article)
girl N (noun)
just Adv (adverbial word)
wash V (verb)
these Pron (pronoun)
apple N (noun)
. W (punctuate)
Then,, obtain word alignment information to carrying out word alignment through pretreated Chinese sentence and English sentence, as shown in table 3.
Table 3
The Chinese word English word
That The
Girl girl
Just just
Wash wash
Cross -
These These
Apple Apple
.
Then, the inconsistent English word of English word in pretreated English sentence in search and the original English sentence by relatively, obtains 2 inconsistent English words, that is:
Before the pre-service After the pre-service
washed wash
apples apple
So, the Chinese word that aligns with these two inconsistent English words in Chinese sentence is respectively " washing " and " apple ".
Utilize the Chinese word and the contextual information in original Chinese sentence thereof of two inconsistent English words, alignment, generate the template that comprises the English word deformation information, as shown in table 4.
Table 4
Part of speech Condition Action
P1 V (verb) -1 firm+1 mistake v+ed
P2 N (noun) -1 these n+s
As shown in table 4, template P1 is that the distortion of basis " wash|washed " generates, it is illustrated in the Chinese sentence, for part of speech is the Chinese word of v (verb), if the previous Chinese word of this Chinese word be " just " then a Chinese word be " mistake ", the distortion with the English word of this Chinese words aligning is that suffix adds " ed " so.Template P2 is that the distortion of basis " apple|apples " generates, it is illustrated in the Chinese sentence, for part of speech is the Chinese word of n (verb), if the previous Chinese word of this Chinese word is " these ", the distortion with the English word of this Chinese words aligning is that suffix adds " s " so.
At last, after having extracted all templates, utilize these templates, training TLWI model based on Chinese-English bilingual example sentence storehouse.
By above description as can be seen, present embodiment based on the method for bilingualism corpora training objective language word distorted pattern on the basis of pretreated bilingualism corpora to the training of TLWI model, only use language material shallow-layer analytical information and need not accurate depth analysis information, and the TLWI model that trains is applicable to oral translation system and other translation system based on corpus, can improve translation quality.
Under same inventive concept, Fig. 3 is the process flow diagram of target language word deforming method according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail.For those parts identical, suitably omit its explanation with front embodiment.
The target language word deforming method of present embodiment is the further improvement to the target language translation.In the present embodiment, the target language translation is to utilize translation model based on corpus translation obtains to source language text, and source language text is pretreated for to make the source language word that it comprised be original shape and indicate part of speech.
The above-mentioned translation model based on corpus can be existing or future any translation model based on corpus, for example statistical machine translation (Statistical Machine Translation, be called for short SMT) model etc.
As shown in Figure 3, in step 301, utilize the method based on bilingualism corpora training objective language word distorted pattern of the embodiment description of front, training TLWI model.
Then,, utilize the TLWI model of being trained, the target language word in the target language translation is out of shape in step 310.
Fig. 4 shows the process flow diagram of word deforming step 310.As shown in Figure 4, at first in step 3101,, determine whether to exist corresponding template according to the part of speech and the TLWI model of each the source language word in the source language text.
If there is corresponding template, then in step 3105, whether the contextual information of verifying this source language word satisfies the condition in this template.If satisfy the condition in this template, then in step 3110, to carrying out deformed movement in this template with the target language word of this source language words aligning in the target language translation.If do not satisfy, then to next source language word execution in step 3101.
If determine that in step 3101 this source language word does not have corresponding template, then continue to next source language word execution in step 3101.
By above step, can find needs the target language word that is out of shape and is out of shape in the target language translation.
Further, when the checking result of step 3105 is the condition of this source language word template of satisfying a plurality of correspondences, then in step 3110, to carrying out deformed movement in a plurality of corresponding templates respectively, thereby obtain a plurality of candidate target language translations with the target language word of this source language words aligning.Then, in step 3115, for each of a plurality of candidate target language translations, according to the language model of target language, calculate the fluent degree score of this candidate target language translation, and in step 3120, according to the TLWI model, calculate the score that obtains the employed template of this candidate target language translation.Then, in step 3125, score of this fluent degree score and template is made up and obtain the score of this combination, this combination for example is product, weighted sum etc.Like this, the score of this combination is exactly the score of this candidate target language translation.
At last, in step 3130, select the pairing candidate target language of the top score translation in the score of all candidate target language translations, as final objective language translation.
The step of selecting final objective language translation from a plurality of candidate target language translations described above can be represented by following formula:
e ^ = arg max e { P LM ( e ) f TLWI ( e ) }
Wherein, e represents candidate target language translation, P LMThe language model of (●) expression target language, f TLWI(●) expression TLWI model, argmax{ ● expression gets maximal value, Expression final objective language translation.
By above description as can be seen, the TLWI model that the target language word deforming method utilization of present embodiment is trained, target language word in the target language translation is out of shape, thereby improve the quality of translation, and for a plurality of candidate target language translations, by in conjunction with language model and TLWI model, select best word deforming, thereby obtain best target language translation.
Under same inventive concept, Fig. 5 is the process flow diagram that source language text is translated as the interpretation method of target language translation according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 5, at first in step 501, the source language text of input is carried out pre-service, to obtain the source language word sequence of this source language text, wherein the source language word in the source language word sequence is reduced to original shape and indicates part of speech.For example, if source language text is a Chinese sentence, in step 501, be the Chinese word sequence so, then to each the Chinese word mark part of speech in this Chinese word sequence with the Chinese sentence cutting.
Then,, utilize translation model, pretreated source language text is translated as initial target language translation based on corpus in step 505.As previously mentioned, should can be SMT model etc. based on the translation model of corpus.
Then, in step 510, utilize the described target language word deforming method of front embodiment, the initial target language translation that editor obtains in step 505 is to obtain final objective language translation.
Below in conjunction with an example, the interpretation method of present embodiment is described, wherein, source language is a Chinese, object language is an English, is the SMT model based on the translation model of corpus.The sentence of input is " these boys have just seen TV ", at first this sentence is carried out pre-service, obtain pretreated sentence for " these/boy pron/n just/adv sees/v mistake/u TV/n./w”。Then, utilize the SMT model, obtaining initial English translation is " These/pron boy/n just/adv watch/vTV/n./w ".Then, utilizing the initial English translation of TLWI model editing, that is, " boy " is deformed into " boys ", " watch " is deformed into " watched ", is " These boys just watched TV. " thereby obtain final English translation.
By above description as can be seen, the interpretation method that source language text is translated as the target language translation of present embodiment uses to be translated based on the translation model of corpus, and further utilize the TLWI model, target language word in the target language translation is out of shape, thereby obtains translation more accurately.
Under same inventive concept, Fig. 6 is the schematic block diagram of the device based on bilingualism corpora training objective language word distorted pattern according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail.Utilize the target language word deforming TLWI model of the device training of present embodiment will be used to the target language word deforming device of describing in conjunction with following embodiment and source language text is translated as the translation system of target language translation.
As previously mentioned, bilingualism corpora comprises many to having carried out source language language material and the target language language material of alignment, and wherein language material can be any in phrase, sentence and the paragraph.Usually, bilingualism corpora adopts sentential form, promptly bilingual example sentence storehouse.
As shown in Figure 6, the device 600 based on bilingualism corpora training objective language word distorted pattern of present embodiment comprises: initial model is set up unit 601, and it sets up initial TLWI model; Language material pretreatment unit 602, source language language material in its pre-service bilingualism corpora and target language language material; Template extracting unit 603, it extracts the template that comprises target language word deforming information based on the pretreated source language language material and the target language language material that obtain by language material pretreatment unit 602; And training unit 604, it utilizes the template that is extracted by template extracting unit 603, training TLWI model.
As previously mentioned, the TLWI model can adopt probability model, pattern recognition model etc., and training unit 604 uses corresponding training algorithm to the training of TLWI model.
In language material pretreatment unit 602, by source language language material pretreatment unit the source language language material in the bilingualism corpora is carried out pre-service, so that the source language word in the pretreated source language language material is original shape and indicates part of speech, simultaneously, by target language language material pretreatment unit the target language language material is carried out pre-service, so that the target language word in the pretreated target language language material is original shape and indicates part of speech.
For example, when the source language language material is a Chinese sentence, when the target language language material is English sentence, in source language language material pretreatment unit, be the Chinese word sequence with the Chinese sentence cutting at first by the cutting unit, then by the part-of-speech tagging unit to each the Chinese word lex pos in this Chinese word sequence.In target language language material processing unit, each English word in the English sentence is reduced to original shape, and each English word is shown part of speech.
Fig. 7 shows the schematic block diagram of template extracting unit 603.As shown in Figure 7, template extracting unit 603 comprises: alignment unit 6031, it is right for above-mentioned pretreated many each to the source language language material that carried out alignment and target language language material, word in pretreated source language language material and the pretreated target language language material is alignd, to obtain word alignment information; Search unit 6032, its search inconsistent target language word in original target language language material and pretreated target language language material; Acquiring unit 6033 is used for according to the word alignment information by alignment unit 6031 acquisitions, the source language word that acquisition is alignd with the inconsistent target language word that search unit 6032 is searched for; And template generation unit 6034, be used for generating the template that comprises the target word deformation information according to the source language word of the source language word of inconsistent target language word, alignment and this alignment contextual information at original source language language material.Like this, to source language language material and target language language material, all generate template corresponding in the bilingualism corpora each.All these templates are stored in the template storage unit 6035, are used to train the TLWI model.
As previously mentioned, target language word deforming information comprises: the part of speech of source language word; The combination of the contextual information of this source language word is as condition; And with the deformational behavior of the target language word of this source language words aligning, as action.The combination of the contextual information of source language word can pre-determine, and for example, can comprise: the previous source language word of this source language word; The previous source language word of this source language word and a back source language word; The front of this source language word source language word of being separated by; The back of this source language word source language word of being separated by.Certainly, the combination of contextual information is not limited thereto, and can also adopt other array configuration.
Should be pointed out that device 600 and each ingredient thereof based on bilingualism corpora training objective language word distorted pattern in the present embodiment can constitute with special-purpose circuit or chip, also can carry out corresponding program by computing machine (processor) and realize.And, present embodiment in operation, can realize the method for embodiment illustrated in figures 1 and 2 based on the device 600 of bilingualism corpora training objective language word distorted pattern based on bilingualism corpora training objective language word distorted pattern.
Under same inventive concept, Fig. 8 is the schematic block diagram of target language word deforming device according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail.For those parts identical, suitably omit its explanation with front embodiment.
In the present embodiment, by translation model based on corpus, source language text is translated into the target language translation, and source language text is pretreated for to make the source language word that it comprised be original shape and indicate part of speech, and pretreated source language text is stored in the relevant storage unit.
As shown in Figure 8, the target language word deforming device 800 of present embodiment comprises: TLWI model 801, and it is to utilize embodiment described device 600 based on bilingualism corpora training objective language word distorted pattern in front to train; And word deforming unit 802, it utilizes TLWI model 801, and the target language word in the target language translation is out of shape.
Fig. 9 shows the schematic block diagram of word deforming unit 802.As shown in Figure 9, when target language word is out of shape, in word deforming unit 802, at first, template determining unit 8021 determines whether to exist corresponding template according to the part of speech and the TLWI model 801 of each the source language word in the pretreated source language text.Then, when definite result of template determining unit 8021 was the template of existence correspondence, whether the contextual information of condition authentication unit 8022 these source language words of checking satisfied the condition in this corresponding template.Then, when the checking result of condition authentication unit 8022 is the condition that satisfies in the corresponding template, 8023 pairs of target language words with this source language words aligning of action execution unit are carried out the deformed movement in this corresponding template, thereby obtain final objective language translation.
Further, when the checking result of condition authentication unit 8022 is the condition of this source language word template of satisfying a plurality of correspondences, 8023 pairs of action execution unit and the target language word of this source language words aligning are carried out the deformed movement in a plurality of corresponding templates respectively, obtaining a plurality of candidate target language translations, and these candidate target language translations are stored in the storage unit.Then, for each of a plurality of candidate target language translations, in fluent degree computing unit, language model according to target language, calculate the fluent degree score of this candidate target language translation, and in template score computing unit,, calculate the score that obtains the employed template of this candidate target language translation according to TWLI model 801.Then, obtain the score that the unit obtains the combination of this score of fluently spending score and this template by the combination score, as the score of this candidate target language translation.This combination for example can be product, weighted sum etc.At last, selected cell is selected the pairing candidate target language of the top score translation in the score of a plurality of candidate target language translations, as final objective language translation.
Should be pointed out that target language word deforming device 800 and each ingredient thereof in the present embodiment can constitute with special-purpose circuit or chip, also can carry out corresponding program by computing machine (processor) and realize.And the target language word deforming device 800 of present embodiment can be realized the target language word deforming method of Fig. 3 and embodiment shown in Figure 4 in operation.
Under same inventive concept, Figure 10 is the process flow diagram that source language text is translated as the translation system of target language translation according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 10, the translation system 1000 that source language text is translated as the target language translation of present embodiment comprises: text pretreatment unit 1001, the source language text of its pre-service input, to obtain the source language word sequence of source language text, wherein the source language word in this source language word sequence is reduced to original shape and indicates part of speech; Based on the translation model 1002 of corpus, it will be translated as initial target language translation through text pretreatment unit 1001 pretreated source language text; And the target language word deforming device, it can be the described target language word deforming device 800 of front embodiment, is used for initial target language translation is edited, to obtain final objective language translation.
For example,, then in text pretreatment unit 1001, Chinese sentence is cut into the Chinese word sequence, then to each the Chinese word mark part of speech in this Chinese word sequence if source language text is a Chinese sentence.
As previously mentioned, the translation model based on corpus can be the existing or following any translation model based on corpus, for example SMT model.
Should be pointed out that the translation system 1000 that source language text is translated as the target language translation and each ingredient thereof in the present embodiment can constitute with special-purpose circuit or chip, also can carry out corresponding program by computing machine (processor) and realize.And the translation system 1000 of present embodiment can realize the interpretation method that source language text is translated as the target language translation of embodiment shown in Figure 5 in operation.
Though more than interpretation method and the translation system describing the method and apparatus based on bilingualism corpora training objective language word distorted pattern of the present invention, target language word deforming method and apparatus in detail and source language text is translated as the target language translation by some exemplary embodiments, but above these embodiment are not exhaustive, and those skilled in the art can realize variations and modifications within the spirit and scope of the present invention.Therefore, the present invention is not limited to these embodiment, and scope of the present invention is only defined by the appended claims.

Claims (28)

1. method based on bilingualism corpora training objective language word distorted pattern, wherein above-mentioned bilingualism corpora comprise many to having carried out the source language language material and the target language language material of alignment, and described method comprises:
Set up initial target language word deforming model;
Source language language material in the above-mentioned bilingualism corpora of pre-service and target language language material;
Based on above-mentioned pretreated source language language material and target language language material, extract the template that comprises target language word deforming information; And
Utilize above-mentioned template, train above-mentioned target language word deforming model.
2. the method based on bilingualism corpora training objective language word distorted pattern according to claim 1, wherein, the source language language material in the above-mentioned bilingualism corpora of above-mentioned pre-service and the step of target language language material comprise:
It is right for above-mentioned many each to the source language language material that carried out alignment and target language language material,
The above-mentioned source language language material of pre-service is so that the source language word in the pretreated source language language material is original shape and indicates part of speech; And
The above-mentioned target language language material of pre-service is so that the target language word in the pretreated target language language material is original shape and indicates part of speech.
3. the method based on bilingualism corpora training objective language word distorted pattern according to claim 1 and 2, wherein, the step that above-mentioned extraction comprises the template of target language word deforming information comprises:
It is right for above-mentioned pretreated many each to the source language language material that carried out alignment and target language language material,
Align word in above-mentioned pretreated source language language material and the above-mentioned pretreated target language language material is to obtain word alignment information;
Search is inconsistent target language word in original above-mentioned target language language material and above-mentioned pretreated target language language material;
According to above-mentioned word alignment information, obtain the source language word that aligns with above-mentioned inconsistent target language word; And
According to the source language word of above-mentioned inconsistent target language word, alignment and the contextual information of source language word in original above-mentioned source language language material of alignment, generate above-mentioned template.
4. according to any described method based on bilingualism corpora training objective language word distorted pattern of claim 1 to 3, wherein, above-mentioned target language word deforming information comprises: the part of speech of source language word; Combination as the contextual information of the above-mentioned source language word of condition; And as deformational behavior action and target language word above-mentioned source language words aligning.
5. the method based on bilingualism corpora training objective language word distorted pattern according to claim 4, wherein, the combination of above-mentioned contextual information comprises: previous word; A previous word and a back word; The front word of being separated by; The back word of being separated by.
6. according to any described method based on bilingualism corpora training objective language word distorted pattern of claim 1 to 5, wherein, above-mentioned source language is a Chinese, and above-mentioned target language is an English.
7. the method based on bilingualism corpora training objective language word distorted pattern according to claim 6, wherein, the step of the above-mentioned source language language material of above-mentioned pre-service comprises:
With the cutting of above-mentioned source language language material is the source language word sequence; And
To each the source language word lex pos in the above-mentioned source language word sequence.
8. according to any described method based on bilingualism corpora training objective language word distorted pattern of claim 1 to 7, wherein, above-mentioned language material is at least one in sentence, phrase, the paragraph.
9. according to any described method based on bilingualism corpora training objective language word distorted pattern of claim 1 to 8, wherein, above-mentioned target language word deforming model is a probability model.
10. according to any described method based on bilingualism corpora training objective language word distorted pattern of claim 1 to 8, wherein, above-mentioned target language word deforming model is a pattern recognition model.
11. a target language word deforming method, wherein, source language text is translated into the target language translation, and above-mentioned source language text is pretreated for to make the source language word that it comprised be original shape and indicate part of speech, and said method comprises:
Utilize any described method of claim 1 to 10, training objective language word distorted pattern based on bilingualism corpora training objective language word distorted pattern; And
Utilize above-mentioned target language word deforming model, the target language word in the above-mentioned target language translation is out of shape.
12. target language word deforming method according to claim 11, wherein, the above-mentioned step that target language word in the above-mentioned target language translation is out of shape comprises:
According to the part of speech and the above-mentioned target language word deforming model of each above-mentioned source language word, determine whether to exist corresponding template; And
If have the template of above-mentioned correspondence, then
Whether the contextual information of verifying this source language word satisfies the condition in the template of above-mentioned correspondence;
If satisfy above-mentioned condition, then to carrying out action in the above-mentioned corresponding template with the target language word of this source language words aligning in the above-mentioned target language translation.
13. target language word deforming method according to claim 12, wherein, when the checking result in above-mentioned verification step is the condition of this source language word template of satisfying a plurality of above-mentioned correspondences, to carrying out action in above-mentioned a plurality of corresponding template respectively, to obtain a plurality of candidate target language translations with the target language word of this source language words aligning;
Said method also comprises:
For each of above-mentioned a plurality of candidate target language translations,
According to the language model of target language, calculate the fluent degree score of this candidate target language translation;
According to above-mentioned target language word deforming model, calculate the score that obtains the employed template of this candidate target language translation;
Obtain the score of combination of the score of above-mentioned fluent degree score and above-mentioned template, as the score of this candidate target language translation;
Select the pairing candidate target language of the top score translation in the score of above-mentioned a plurality of candidate target language translations, as final objective language translation.
14. one kind is translated as the interpretation method of target language translation with source language text, comprising:
The above-mentioned source language text of pre-service, to obtain the source language word sequence of above-mentioned source language text, the source language word in the wherein above-mentioned source language word sequence is reduced to original shape and indicates part of speech;
Utilization is translated as initial target language translation based on the translation model of corpus with above-mentioned pretreated source language text; And
Utilize any described target language word deforming method of claim 11 to 13, edit above-mentioned initial target language translation, to obtain final objective language translation.
15. the device based on bilingualism corpora training objective language word distorted pattern, wherein above-mentioned bilingualism corpora comprise many to having carried out the source language language material and the target language language material of alignment, described device comprises:
Initial model is set up the unit, is used to set up initial target language word deforming model;
The language material pretreatment unit is used for the source language language material and the target language language material of the above-mentioned bilingualism corpora of pre-service;
The template extracting unit is used for based on above-mentioned pretreated source language language material and target language language material, extracts the template that comprises target language word deforming information; And
Training unit is used to utilize above-mentioned template, trains above-mentioned target language word deforming model.
16. the device based on bilingualism corpora training objective language word distorted pattern according to claim 15, wherein, above-mentioned language material pretreatment unit comprises:
Source language language material pretreatment unit is used for the above-mentioned source language language material of pre-service, so that the source language word in the pretreated source language language material is original shape and indicates part of speech; And
Target language language material pretreatment unit is used for the above-mentioned target language language material of pre-service, so that the target language word in the pretreated target language language material is original shape and indicates part of speech.
17. according to claim 15 or 16 described devices based on bilingualism corpora training objective language word distorted pattern, wherein, above-mentioned template extracting unit comprises:
Alignment unit, it is right to be used for for above-mentioned pretreated many each to the source language language material that carried out alignment and target language language material, align word in above-mentioned pretreated source language language material and the above-mentioned pretreated target language language material is to obtain word alignment information;
Search unit is used for searching at original above-mentioned target language language material and the inconsistent target language word of above-mentioned pretreated target language language material;
Acquiring unit is used for according to above-mentioned word alignment information, obtains the source language word that aligns with above-mentioned inconsistent target language word; And
The template generation unit is used for generating above-mentioned template according to the source language word of the source language word of above-mentioned inconsistent target language word, alignment and the alignment contextual information at original above-mentioned source language language material.
18. according to any described device based on bilingualism corpora training objective language word distorted pattern of claim 15 to 17, wherein, above-mentioned target language word deforming information comprises: the part of speech of source language word; Combination as the contextual information of the above-mentioned source language word of condition; And as deformational behavior action and target language word above-mentioned source language words aligning.
19. the device based on bilingualism corpora training objective language word distorted pattern according to claim 18, wherein, the combination of above-mentioned contextual information comprises: previous source language word; A previous source language word and a back source language word; The front source language word of being separated by; The back source language word of being separated by.
20. according to any described device based on bilingualism corpora training objective language word distorted pattern of claim 15 to 19, wherein, above-mentioned source language is a Chinese, above-mentioned target language is an English.
21. the device based on bilingualism corpora training objective language word distorted pattern according to claim 20, wherein, above-mentioned source language language material pretreatment unit comprises:
The cutting unit, being used for the cutting of above-mentioned source language language material is the source language word sequence; And
The part-of-speech tagging unit is used for each the source language word lex pos to above-mentioned source language word sequence.
22. according to any described device based on bilingualism corpora training objective language word distorted pattern of claim 15 to 21, wherein, above-mentioned language material is at least one in sentence, phrase, the paragraph.
23. according to any described device based on bilingualism corpora training objective language word distorted pattern of claim 15 to 22, wherein, above-mentioned target language word deforming model is a probability model.
24. according to any described device based on bilingualism corpora training objective language word distorted pattern of claim 15 to 22, wherein, above-mentioned target language word deforming model is a pattern recognition model.
25. a target language word deforming device, wherein, source language text is translated into the target language translation, and above-mentioned source language text is pretreated for to make the source language word that it comprised be original shape and indicate part of speech, and said apparatus comprises:
The target language word deforming model, it is to utilize any described device based on bilingualism corpora training objective language word distorted pattern of claim 15 to 24 to train; And
The word deforming unit is used to utilize above-mentioned target language word deforming model, and the target language word in the above-mentioned target language translation is out of shape.
26. target language word deforming device according to claim 25, wherein, above-mentioned word deforming unit comprises:
The template determining unit is used for part of speech and above-mentioned target language word deforming model according to each above-mentioned source language word, determines whether to exist corresponding template; And
The condition authentication unit, the result who is used in above-mentioned template determining unit is when having the template of above-mentioned correspondence, whether the contextual information of verifying this source language word satisfies the condition in the template of above-mentioned correspondence; And
Action execution unit is when the condition that is used for checking result at above-mentioned condition authentication unit and is the template of above-mentioned correspondence satisfies, to carrying out action in the above-mentioned corresponding template with the target language word of this source language words aligning.
27. target language word deforming device according to claim 26, wherein, when the checking result of above-mentioned condition authentication unit is the condition of this source language word template of satisfying a plurality of above-mentioned correspondences, above-mentioned action execution unit pair and the target language word of this source language words aligning are carried out the action in above-mentioned a plurality of corresponding template respectively, to obtain a plurality of candidate target language translations;
Said apparatus also comprises:
Fluent degree computing unit is used for each for above-mentioned a plurality of candidate target language translations, according to the language model of target language, calculates the fluent degree score of this candidate target language translation;
Template score computing unit is used for according to above-mentioned target language word deforming model, calculates the score that obtains the employed template of this candidate target language translation;
The combination score obtains the unit, is used to obtain the score of combination of the score of above-mentioned fluent degree score and above-mentioned template, as the score of this candidate target language translation;
Selected cell is used for selecting the pairing candidate target language of the top score translation of the score of above-mentioned a plurality of candidate target language translations, as final objective language translation.
28. one kind is translated as the translation system of target language translation with source language text, comprising:
The text pretreatment unit is used for the above-mentioned source language text of pre-service, and to obtain the source language word sequence of above-mentioned source language text, the source language word in the wherein above-mentioned source language word sequence is reduced to original shape and indicates part of speech;
Based on the translation model of corpus, be used for above-mentioned pretreated source language text is translated as initial target language translation; And
Any described target language word deforming device of claim 25 to 27 is used to edit above-mentioned initial target language translation, to obtain final objective language translation.
CNA2007101865456A 2007-12-07 2007-12-07 Target language word deforming method and device Pending CN101452446A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CNA2007101865456A CN101452446A (en) 2007-12-07 2007-12-07 Target language word deforming method and device
JP2008308753A JP2009140499A (en) 2007-12-07 2008-12-03 Method and apparatus for training target language word inflection model based on bilingual corpus, tlwi method and apparatus, and translation method and system for translating source language text into target language
US12/328,476 US20090164206A1 (en) 2007-12-07 2008-12-04 Method and apparatus for training a target language word inflection model based on a bilingual corpus, a tlwi method and apparatus, and a translation method and system for translating a source language text into a target language translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007101865456A CN101452446A (en) 2007-12-07 2007-12-07 Target language word deforming method and device

Publications (1)

Publication Number Publication Date
CN101452446A true CN101452446A (en) 2009-06-10

Family

ID=40734682

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007101865456A Pending CN101452446A (en) 2007-12-07 2007-12-07 Target language word deforming method and device

Country Status (3)

Country Link
US (1) US20090164206A1 (en)
JP (1) JP2009140499A (en)
CN (1) CN101452446A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023969A (en) * 2009-09-10 2011-04-20 株式会社东芝 Methods and devices for acquiring weighted language model probability and constructing weighted language model
CN101788978B (en) * 2009-12-30 2011-12-07 中国科学院自动化研究所 Chinese and foreign spoken language automatic translation method combining Chinese pinyin and character
CN101989260B (en) * 2009-08-01 2012-08-22 中国科学院计算技术研究所 Training method and decoding method of decoding feature weight of statistical machine
CN103678285A (en) * 2012-08-31 2014-03-26 富士通株式会社 Machine translation method and machine translation system
CN103729347A (en) * 2012-10-10 2014-04-16 株式会社东芝 Machine translation apparatus, method and program
CN104412256A (en) * 2012-07-02 2015-03-11 微软公司 Generating localized user interfaces
CN106156007A (en) * 2015-03-24 2016-11-23 吕海港 A kind of English-Chinese statistical machine translation method of word original shape
CN107704456A (en) * 2016-08-09 2018-02-16 松下知识产权经营株式会社 Identify control method and identification control device
CN109448458A (en) * 2018-11-29 2019-03-08 郑昕匀 A kind of Oral English Training device, data processing method and storage medium
CN110008307A (en) * 2019-01-18 2019-07-12 中国科学院信息工程研究所 A kind of rule-based and statistical learning deformation entity recognition method and device
CN110162753A (en) * 2018-11-08 2019-08-23 腾讯科技(深圳)有限公司 For generating the method, apparatus, equipment and computer-readable medium of text template
CN112380877A (en) * 2020-11-10 2021-02-19 天津大学 Construction method of machine translation test set used in discourse-level English translation

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090326916A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Unsupervised chinese word segmentation for statistical machine translation
US8756062B2 (en) * 2010-12-10 2014-06-17 General Motors Llc Male acoustic model adaptation based on language-independent female speech data
US8838433B2 (en) 2011-02-08 2014-09-16 Microsoft Corporation Selection of domain-adapted translation subcorpora
CN102193915B (en) * 2011-06-03 2012-11-28 南京大学 Participle-network-based word alignment fusion method for computer-aided Chinese-to-English translation
CN112602098A (en) 2018-05-08 2021-04-02 谷歌有限责任公司 Alignment to sequence data selector
CN110147556B (en) * 2019-04-22 2022-11-25 云知声(上海)智能科技有限公司 Construction method of multidirectional neural network translation system
CN111539228B (en) * 2020-04-29 2023-08-08 支付宝(杭州)信息技术有限公司 Vector model training method and device and similarity determining method and device
CN112836528B (en) * 2021-02-07 2023-10-03 语联网(武汉)信息技术有限公司 Machine post-translation editing method and system
CN113761944B (en) * 2021-05-20 2024-03-15 腾讯科技(深圳)有限公司 Corpus processing method, device and equipment for translation model and storage medium
CN113255328B (en) * 2021-06-28 2024-02-02 北京京东方技术开发有限公司 Training method and application method of language model

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5477451A (en) * 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
JPH0844719A (en) * 1994-06-01 1996-02-16 Mitsubishi Electric Corp Dictionary access system
JPH08329081A (en) * 1995-05-30 1996-12-13 Toshiba Corp Method and system for machine translation
US6092034A (en) * 1998-07-27 2000-07-18 International Business Machines Corporation Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models
US7356457B2 (en) * 2003-02-28 2008-04-08 Microsoft Corporation Machine translation using learned word associations without referring to a multi-lingual human authored dictionary of content words
JP2004362249A (en) * 2003-06-04 2004-12-24 Advanced Telecommunication Research Institute International Translation knowledge optimization device, computer program, computer and storage medium for translation knowledge optimization
US7200550B2 (en) * 2004-11-04 2007-04-03 Microsoft Corporation Projecting dependencies to generate target language dependency structure
CN101030197A (en) * 2006-02-28 2007-09-05 株式会社东芝 Method and apparatus for bilingual word alignment, method and apparatus for training bilingual word alignment model
US20080154577A1 (en) * 2006-12-26 2008-06-26 Sehda,Inc. Chunk-based statistical machine translation system

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101989260B (en) * 2009-08-01 2012-08-22 中国科学院计算技术研究所 Training method and decoding method of decoding feature weight of statistical machine
CN102023969A (en) * 2009-09-10 2011-04-20 株式会社东芝 Methods and devices for acquiring weighted language model probability and constructing weighted language model
CN101788978B (en) * 2009-12-30 2011-12-07 中国科学院自动化研究所 Chinese and foreign spoken language automatic translation method combining Chinese pinyin and character
CN104412256A (en) * 2012-07-02 2015-03-11 微软公司 Generating localized user interfaces
CN104412256B (en) * 2012-07-02 2017-08-04 微软技术许可有限责任公司 Generate localised users interface
CN103678285A (en) * 2012-08-31 2014-03-26 富士通株式会社 Machine translation method and machine translation system
CN103729347A (en) * 2012-10-10 2014-04-16 株式会社东芝 Machine translation apparatus, method and program
CN106156007A (en) * 2015-03-24 2016-11-23 吕海港 A kind of English-Chinese statistical machine translation method of word original shape
CN107704456A (en) * 2016-08-09 2018-02-16 松下知识产权经营株式会社 Identify control method and identification control device
CN107704456B (en) * 2016-08-09 2023-08-29 松下知识产权经营株式会社 Identification control method and identification control device
CN110162753A (en) * 2018-11-08 2019-08-23 腾讯科技(深圳)有限公司 For generating the method, apparatus, equipment and computer-readable medium of text template
CN110162753B (en) * 2018-11-08 2022-12-13 腾讯科技(深圳)有限公司 Method, apparatus, device and computer readable medium for generating text template
CN109448458A (en) * 2018-11-29 2019-03-08 郑昕匀 A kind of Oral English Training device, data processing method and storage medium
CN110008307A (en) * 2019-01-18 2019-07-12 中国科学院信息工程研究所 A kind of rule-based and statistical learning deformation entity recognition method and device
CN112380877A (en) * 2020-11-10 2021-02-19 天津大学 Construction method of machine translation test set used in discourse-level English translation
CN112380877B (en) * 2020-11-10 2022-07-19 天津大学 Construction method of machine translation test set used in discourse-level English translation

Also Published As

Publication number Publication date
JP2009140499A (en) 2009-06-25
US20090164206A1 (en) 2009-06-25

Similar Documents

Publication Publication Date Title
CN101452446A (en) Target language word deforming method and device
Chakravarthi et al. Improving wordnets for under-resourced languages using machine translation
Schmaltz et al. Adapting sequence models for sentence correction
CAMACHO COLLADOS et al. A framework for the construction of monolingual and Cross-lingual Semantic Similarity Datasets
CN100437557C (en) Machine translation method and apparatus based on language knowledge base
Muller et al. ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents
Kumar et al. Factored Statistical Machine Translation System for English to Tamil Language.
Woodsend et al. Text rewriting improves semantic role labeling
JP2009151777A (en) Method and apparatus for aligning spoken language parallel corpus
Diab The feasibility of bootstrapping an Arabic WordNet leveraging parallel corpora and an English wordnet
Ueffing Using monolingual source-language data to improve MT performance
Dandapat et al. Improved named entity recognition using machine translation-based cross-lingual information
Dunđer Machine translation system for the industry domain and Croatian language
CN103678288A (en) Automatic proper noun translation method
Pushpananda et al. Statistical machine translation from and into morphologically rich and low resourced languages
Mahata et al. JUNLP@ Dravidian-CodeMix-FIRE2020: Sentiment classification of code-mixed tweets using bi-directional RNN and language tags
Stepanov et al. Language style and domain adaptation for cross-language SLU porting
Bakshi et al. A transformer based approach towards identification of discourse unit segments and connectives
Huang et al. Generating Recommendation Evidence Using Translation Model.
Mara English-Wolaytta Machine Translation using Statistical Approach
Abdelsalam et al. Bilingual embeddings and word alignments for translation quality estimation
Ligozat Question classification transfer
Septarina et al. Machine translation of Indonesian: a review
Singh et al. English-Dogri Translation System using MOSES
Makrai et al. Towards abstractive summarization in Hungarian

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20090610