CN101770458A - Mechanical translation method based on example phrases - Google Patents

Mechanical translation method based on example phrases Download PDF

Info

Publication number
CN101770458A
CN101770458A CN200910002334A CN200910002334A CN101770458A CN 101770458 A CN101770458 A CN 101770458A CN 200910002334 A CN200910002334 A CN 200910002334A CN 200910002334 A CN200910002334 A CN 200910002334A CN 101770458 A CN101770458 A CN 101770458A
Authority
CN
China
Prior art keywords
phrase
translation
sentence
cutting
source language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910002334A
Other languages
Chinese (zh)
Inventor
何亮
万磊
王进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN200910002334A priority Critical patent/CN101770458A/en
Publication of CN101770458A publication Critical patent/CN101770458A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a mechanical translation method based on example phrases. The method comprises the following steps of: extracting phrases according to word alignment information obtained from a bilingual alignment text, and acquiring a phrase alignment list; segmenting source language sentences into a plurality of phrases based on a predetermined principle according to the phrase alignment list; and carrying out statistical mechanical translation on the segmented phrases based on phrases. The invention improves the translation speed and the translation quality. In addition, unknown words in a translation result are translated by using a bilingual dictionary as well as combining and utilizing an existing language model of the target language, thereby the translation quality is improved.

Description

Machine translation method based on example phrases
Technical field
The present invention relates to the mechanical translation field, be based on the mechanical translation of corpus specifically, description be a kind of method of utilizing example phrases to translate.
Background technology
Mechanical translation is the automatic translation system that a kind of natural language translation is become another kind of natural language.The problem that mechanical translation will solve is sentence or a fragment of utilizing computing machine that the sentence or the fragment of source language (SL) are translated into corresponding target language (TL) automatically.The type of machine translation system is a lot, comprise based on example mechanical translation (EBMT) system and based on mechanical translation (PBMT) system of phrase.
The basic thought of EBMT system is not by the sentence structure and the semantic analysis of deep layer, only by existing experimental knowledge, translates by the analogy principle.The basic realization principle of this thought: the main knowledge source of system is the translation instance storehouse of bilingual journal, when source language sentence S of input, system finds out the sentence S ' the most similar with S, and the translation T ' of imitation S ', S and the unmatched place of S ' are translated, replace the part of the middle correspondence of T ', the translation T that finally constitutes S exports then.As long as be characterized in existing the very high even example sentence equally of similarity, just can produce high-quality translation.The EBMT method needs a very big case library as support.
The basic thought of PBMT system is with the base unit of phrase as translation.In translation process, system translates each speech isolatedly, but continuous a plurality of speech are translated together.Owing to enlarged the granularity of translation, be easy to handle the local context dependence based on the method for phrase, can translate the collocation of idiom and everyday words well.Usually, in the method based on phrase, phrase can be any continuous character string, does not have phraseological restriction, like this can be easily from the bilingualism corpora of word alignment the bilingual phrase of Automatic Extraction be translated as a source language sentence of appointment.Method based on phrase need be trained system.In the time of training, import a bilingualism corpora, i.e. one group of sentence of translating each other earlier.Know from the result of word alignment which speech is translated each other in the sentence.Next also need to carry out phrase extraction, just extract the continuous speech string that all are translated each other in the corpus, whether have real implication and need not manage this speech string.
Yet the defective of EBMT is: if similarity threshold is too high, it is low to be matched to power; Otherwise if similarity threshold is low excessively, it is relatively poor then to produce translation quality during fuzzy matching.To under the prerequisite that guarantees translation quality, improve the success ratio of coupling, have only and set up large-scale case library, but this need a large amount of time, man power and material.The defective of PBMT is: when sentence is translated, need to consider all possible phrase (so long as continuous speech string just can be construed to be phrase), and the combined situation of these phrases, this has reduced the speed of translating greatly; Simultaneously,, need handle a large amount of ambiguities during translation, cause the poor effect of translating for long sentence or phrase.In addition, pure EBMT method and PBMT method do not have to consider the processing to the unknown word that not have appearance in the corpus, especially a large amount of specialized vocabularies.A disposal route is to expand case library or bilingual alignment corpus, enlarges the coverage of its vocabulary, but the construction of one side case library and bilingual alignment corpus needs a large amount of time, man power and material; On the other hand, when having new term to occur, expanding corpus all needs again system to be trained afterwards.
Summary of the invention
According to an aspect of the present invention, to combine based on the machine translation method of phrase with based on the thought of example, under the prerequisite that existing P BMT system is not made an amendment, introducing is based on the method for example, make full use of existing phrase alignment data, fast, in high quality to the advantage translated of sentence of coupling, thereby reach the synchronous raising of translation speed and translation quality; Simultaneously, use a bilingual dictionary, in conjunction with the language model that utilizes existing target language, unknown word in the translation result is translated, the structure difficulty of bilingual dictionary is significantly less than the right structure difficulty of bilingual sentence, only need expand simultaneously and can translate new term, and existing system need not to train again dictionary.
According to an aspect of the present invention, provide a kind of machine translation method based on example phrases, described method comprises: carry out phrase extraction according to the word alignment information that obtains from the bilingual alignment text, and obtain the phrase alignment table; According to the phrase alignment table, be some phrases with the cutting of source language sentence based on predetermined principle; To carry out statistical machine translation through the phrase after the cutting based on phrase.
According to an aspect of the present invention, described method also can comprise: utilize the language model of bilingual dictionary and target language that unknown word is translated.
According to an aspect of the present invention, the source language sentence is carried out cutting step based on principle be: make that the phrase coverage rate after the cutting is the highest, wherein, coverage rate is meant total number of word that the short-and-medium language of source language sentence the is capped total number of word divided by the source language sentence, covers and is meant that the phrase that is syncopated as is present in the phrase alignment table.
According to an aspect of the present invention, in the step of the source language sentence being carried out cutting, make under the highest prerequisite of phrase coverage rate after the cutting, make the phrase number of source language sentence minimum.
According to an aspect of the present invention, the phrase coverage rate after making cutting is the highest and make under the minimum prerequisite of the phrase number of source language sentence, makes the phrase that is syncopated as the longest.
According to an aspect of the present invention, can according in the graph theory ask two the fixed point between shortest path be some phrases with the cutting of source language sentence.
According to an aspect of the present invention, by coming the step of cutting source language sentence to comprise: be a summit between per two words in the definition source language sentence, a summit respectively is set before first word of sentence and after the last character of sentence according to the shortest path between two fixed points of asking in the graph theory; The weight on the limit on two summits is set to identical value in the connection layout; Utilize A* algorithm or dijkstra's algorithm to find the solution shortest path between two summits of head and the tail.
According to an aspect of the present invention, unknown word being carried out steps of translating can comprise: may the translating of each unknown word from bilingual dictionary in the retrieval source language sentence; In result, may translate the replacement unknown word with each of unknown word to acquisition after carrying out based on the statistical machine translation of phrase through the phrase after the cutting; Utilize the language model of target language to calculate the probable value of the sentence after the replacement; Select the highest replacement of probable value as final translation result.
Description of drawings
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.
Fig. 1 is the process flow diagram based on the machine translation method of example phrases according to the embodiment of the invention;
Fig. 2 is the synoptic diagram according to the structure phrase alignment table of prior art;
Fig. 3 is the example according to the phrase segmentation method of the embodiment of the invention;
Fig. 4 is the example based on the statistical machine translation of phrase according to prior art;
Fig. 5 is the process flow diagram that unknown word is translated according to the embodiment of the invention.
Embodiment
System and method of the present invention is made up of following core: structure phrase alignment table, example phrases cutting, based on the translation of phrase and the translation of unknown word.
Fig. 1 shows the process flow diagram based on the machine translation method of example phrases according to the embodiment of the invention, specifically comprises following steps:
At step S100, structure phrase alignment table.In the process of structure phrase alignment table, utilize GIZA++ from the bilingual alignment text, to obtain word alignment information, carry out phrase extraction according to word alignment information then, obtain the phrase alignment table.Wherein the phrase alignment table is made up of following three parts: source language phrase, target language phrase and probable value.Fig. 2 is an example of structure phrase alignment table, is used for illustrating the input of structure phrase alignment table module, and wherein probable value has a plurality ofly, and they are used to comprehensively weigh the probability of phrase alignment.
At step S200, carry out the example phrases cutting.The input of example phrases cutting is a source language sentence, this sentence can pass through participle in advance, promptly text is carried out the cutting of word, making as English has the space with sign between the speech in the sentence, for instance, sentence " machine translation system and method " is divided into as " machine translation system and method " with the space the separated form of word, a benefit of doing like this is, after the participle, can be that the unit replacement is that unit carries out follow-up phrase segmentation with the word with the speech, thereby obviously improve the efficient of translation.In addition, the source language sentence of the input of phrase segmentation also can not carry out any pre-service as an example, with a continuous word string form input.
In step S200, according to the phrase alignment table, be some source language phrases with the cutting of source language sentence, separate with the space between each phrase, cutting will be followed following principle:
At first, the sentence phrase after the cutting is capped rate the highest (total number of word that phrase is capped in coverage rate=sentence/sentence total number of word), wherein, if the phrase that is syncopated as is present in the phrase alignment table, then claims this phrase to be capped.Secondly, under above-mentioned prerequisite, cutting number to sentence is minimum, that is: after the cutting, the phrase number that is separated by the space in the sentence is minimum, is become " machine translation system and method " after the cutting as sentence " machine translation system and method ", because two spaces are arranged in the sentence, these two spaces promptly are that so, we say that its cutting number is 2 because cutting generates.Once more, under described in the above two prerequisites, the longest situation of phrase that consideration is syncopated as, promptly in multiple slit mode, consider the longest a kind of mode of certain phrase wherein, because phrase is long more, its number of times that occurs in former alignment text is just few more, and the situation complexity of appearance is more little, and is high more with align uniqueness and the accuracy of target language.Under the opposite extreme situations, whole sentence all occurred in the alignment text, and existed in the phrase alignment table, and this sentence is just as any cutting so, and directly translation is come out just passable.
Be the example of a phrase segmentation below.When the input sentence is " machine translation system and method ", suppose to contain in the phrase alignment table following phrase: a. machine, b. translation, c. system, d. method, e. mechanical translation, f. translation system, g. machine translation system, h. system and method, the statistics following (all not enumerating) of then possible cutting and coverage rate thereof, cutting number and length language:
(1) machine translation system and method (coverage rate: 8/9, the cutting number: 2, length language: 6)
(2) machine translation system and method (coverage rate: 9/9, the cutting number: 2, length language: 5)
(3) machine translation system and method (coverage rate: 8/9, the cutting number: 3, length language: 4)
(4) machine translation system and method (coverage rate: 9/9, the cutting number: 1, length language: 5)
According to above-mentioned phrase segmentation principle, we select the slit mode of " machine translation system and method " the most at last, and the phrase coverage rate of this mode is the highest, and put the cutting number of sentence minimum before this.
At step S300, carry out statistical machine translation to what after having carried out step S200, obtained through the input sentence of cutting, mainly forms based on the statictic machine translation system of phrase: the language model of translation model, target language, accent preface model, demoder by four parts based on phrase.Translation model provides the relation of its appropriate translation between source language and the target language phrase, and represent the degree of this its appropriate translation relation with a probable value, probable value is high more, shows the accurate more of translation correspondence, is used to the source language sentence that possible target language translation is provided.The language model of target language has been stored a large amount of probable values, and these probable values have provided the probabilistic relation information of each speech and its front and back speech or phrase, and its effect is to judge a sentence S tThe degree that meets target language grammer, custom is used for translation result is selected, and generally uses a probable value P LM(S t) weigh this degree, P LM(S t) the high more expression sentence of value meets target language more.The effect of transferring the preface model is a sequence of positions of adjusting speech among the target language result who translates out or phrase.The effect of demoder is exactly to coordinate above-mentioned several model, takes all factors into consideration the language model of translation model, target language and transfers the probable value of preface model to calculate, and the source language sentence is translated.The output of step S300 is preliminary Aim of Translation language sentence, wherein may comprise the unknown word that does not have translation to come out, and these speech are still keeping the form of source language.
In addition, the present invention also can comprise step S400, and the translation of carrying out unknown word in step S400 is to obtain translation result.Unknown word translation is made up of two part and parcels: the bilingual dictionary that vocabulary is bigger and the language model of a target language.Wherein bilingual dictionary is used to unknown word that possible translation item is provided, and the language model of target language is used for selecting from a plurality of possible translation items only as translation.The composition of the most basic bilingual dictionary comprises two parts: the word (W of a source language (SL) SL) and the translation (W of the target language (TL) of one group of correspondence TLi).Bilingual dictionary also can add other information as required, as: the information of part of speech, to each the translation W in the translation of one group of corresponding target language TLiAll give a probable value, be used to represent W SLBe translated as W TLiPossibility.The language model of target language step S300 based on the translation of phrase in also be essential constituent, its effect is to judge a sentence S tThe degree that meets target language grammer, custom is used for translation result is selected, and generally uses one 2 probable value P LM(S t) weigh this degree, P LM(S t) the high more expression sentence of value meets target language more, and P LM(S t) translation item that value is the highest is selected as last translation result.
Fig. 3 shows an a kind of example of more excellent phrase segmentation method, and this method is followed above-mentioned segmentation principle: this method is converted into the phrase segmentation problem graph theoretic problem of asking shortest path between two fixed points.At first defining between per two words in the sentence (in language such as English, the word of indication is a speech that is separated by the space) here is a summit, and a summit respectively is set before first word of sentence and after the last character of sentence in addition; The phrase that the word that on behalf of this edge, a limit among the figure cover is formed can retrieve in the phrase alignment table; The weight on all limits is 1 among the figure, and weight all is set to 1 to be that the word of representing this edge to cover will be done as a whole here, handles with the form of phrase, and weight also can be set to other values, if the entitlement heavy phase with; Utilize A* algorithm or dijkstra's algorithm to find the solution shortest path between two summits of head and the tail.In addition, if there is unknown word in the sentence, also be shortest path when being infinity, figure is broken down into several connected subgraphs, and then we only need each subgraph is used A* algorithm or dijkstra's algorithm get final product.Last in the identical result of all shortest paths, there is result's (span i.e. number of words that the limit covered, and span is big more, and the phrase after its corresponding cutting is also just long more) of maximum span in selection.If there are a plurality of subgraphs, then respectively each subgraph is selected.
In Fig. 3, the phrase (corresponding respectively to machine, translation, system, method, mechanical translation, translation system, machine translation system, system and method) that limit a, b, c, d, e, f, g and h cover can retrieve in the phrase alignment table.(a, b, h) and (e, h) selects the path of limit e and h composition from the path then, thereby obtains the input sentence through cutting.
Fig. 4 shows an example based on the statistical machine translation of phrase.Three translation results of " meeting will be held in may ", i.e. " The meeting will be held in May ", " The meeting will holdin May " and " The meeting will in may be held ", have probabilistic language model value 0.9,0.7 and 0.2 respectively, thereby choose the highest translation result of probabilistic language model value " The meeting will be heldin May ".
Fig. 5 shows the basic procedure that unknown word is translated.At step S510, each the untranslated unknown word W in the preliminary translation result that the mechanical translation based on phrase is obtained Unknown, in bilingual dictionary, retrieve the possible translation of this unknown word.Then at step S520, may translate T to each of this unknown word i, carry out following operation: at first use T iReplace W Unknown, the sentence marking after utilizing the language model of target language be to replace then is that sentence after replacing calculates its probable value by the language model of target language promptly, selects the highest replacement of probable value at last, as final translation result.Wherein, generate the language model of target language by target language text, wherein, the target language text storage be the set of a target language sentence, be the starting material that generate the language model of target language.In sentence, exist under the situation of a plurality of unknown words, can only translate, but the present invention is not limited to this at every turn at a unknown word.
To introduce the example of a unknown word translation below.In this example, the translation source language is a Chinese, and target language is English.The source language of supposing input is " to could you tell me the payment terms.", wherein, suppose that " payment " speech is a unknown word.Then the PRELIMINARY RESULTS that has obtained after having carried out based on the translation steps of phrase is " Would you please tell me the terms of payment. ".In bilingual dictionary, " payment " speech has following translation item: defray, disburse, pay and payment.Then corresponding each possible translation item in the bilingual dictionary is replaced unknown word, can obtain following intermediate result:
a.Would?you?please?tell?me?the?terms?of?defray.
b.Would?you?please?tell?me?the?terms?of?disburse.
c.Would?you?please?tell?me?the?terms?of?pay.
d.Would?you?please?tell?me?the?terms?of?payment.
Language model is in English given a mark to these four intermediate results then, because " payment terms " have its saying commonly used " terms of payment ", and " Would you please tell me the terms ofpayment. " more meets the syntax rule and the use habit of English, therefore, English language model can provide a higher score value for this result: to middle a as a result, b, the score value 0.4 that obtains respectively after c and d give a mark, 0.4,0.7 and in 0.9, select score value the highest as net result: Would you pleasetell me the terms of payment.
Carried out in the process of Sino-Korean translation at the interpretation method that utilizes this patent, respectively account in the test set of half at a closed test (test statement is selected in training set) and open test (test statement does not belong to training set), Sino-Korean translation and Korean-Chinese translation speed have improved 80% and 90% respectively with respect to the machine mould (such as the translation model Moses that increases income) based on phrase, and in translation result, the sentence that the fluent degree of statement obviously improves has increased by 30%, as:
Example 1
Korean:
Figure G2009100023341D0000071
Can Chinese: you help me that this traveller's check is cashed?
Can model translation result based on phrase: this cash traveller's check?
According to translation result of the present invention: please help me that traveller's check is cashed?
Example 2
Korean:
Figure G2009100023341D0000072
Chinese: have a bit wine?
Is model translation result based on phrase: wine drunk?
According to model translation result of the present invention: can ask you to drink a glass wine?
The present invention improves existing Machine Translation Model based on phrase on speed and accuracy.Since with sentence from original be that the decode procedure of unit is reduced to the phrase with character or speech be the decode procedure of unit, dwindled the search volume of decoding, improved decoding speed, simultaneously be that unit decodes and reduced the ambiguity between the word in the phrase, improved the accuracy of translation with the phrase.In addition, the present invention has also improved the quality of translation to the translation of unknown word.

Claims (8)

1. machine translation method based on example phrases, described method comprises:
Carry out phrase extraction according to the word alignment information that from the bilingual alignment text, obtains, and obtain the phrase alignment table;
According to the phrase alignment table, be some phrases with the cutting of source language sentence based on predetermined principle;
To carry out statistical machine translation through the phrase after the cutting based on phrase.
2. the method for claim 1 is characterized in that described method also comprises:
Utilize the language model of bilingual dictionary and target language that unknown word is translated.
3. method as claimed in claim 1 or 2, it is characterized in that to the source language sentence carry out cutting step based on principle be: make that the phrase coverage rate after the cutting is the highest, wherein, coverage rate is meant total number of word that the short-and-medium language of source language sentence the is capped total number of word divided by the source language sentence, covers and is meant that the phrase that is syncopated as is present in the phrase alignment table.
4. method as claimed in claim 3 is characterized in that making in the step of the source language sentence being carried out cutting under the highest prerequisite of phrase coverage rate after the cutting, makes the phrase number of source language sentence minimum.
5. method as claimed in claim 4 is characterized in that phrase coverage rate after making cutting is the highest and makes under the minimum prerequisite of the phrase number of source language sentence, makes the phrase that is syncopated as the longest.
6. method as claimed in claim 1 or 2, it is characterized in that according in the graph theory ask two the fixed point between shortest path be some phrases with the cutting of source language sentence.
7. method as claimed in claim 6, it is characterized in that by coming the step of cutting source language sentence to comprise: be a summit between per two words in the definition source language sentence, a summit respectively is set before first word of sentence and after the last character of sentence according to the shortest path between two fixed points of asking in the graph theory; The weight on the limit on two summits is set to identical value in the connection layout; Utilize A* algorithm or dijkstra's algorithm to find the solution shortest path between two summits of head and the tail.
8. method as claimed in claim 2 is characterized in that unknown word is carried out steps of translating to be comprised:
May translating of each unknown word from bilingual dictionary in the retrieval source language sentence;
In result, may translate the replacement unknown word with each of unknown word to acquisition after carrying out based on the statistical machine translation of phrase through the phrase after the cutting;
Utilize the language model of target language to calculate the probable value of the sentence after the replacement;
Select the highest replacement of probable value as final translation result.
CN200910002334A 2009-01-07 2009-01-07 Mechanical translation method based on example phrases Pending CN101770458A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910002334A CN101770458A (en) 2009-01-07 2009-01-07 Mechanical translation method based on example phrases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910002334A CN101770458A (en) 2009-01-07 2009-01-07 Mechanical translation method based on example phrases

Publications (1)

Publication Number Publication Date
CN101770458A true CN101770458A (en) 2010-07-07

Family

ID=42503325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910002334A Pending CN101770458A (en) 2009-01-07 2009-01-07 Mechanical translation method based on example phrases

Country Status (1)

Country Link
CN (1) CN101770458A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150329A (en) * 2013-01-06 2013-06-12 清华大学 Word alignment method and device of bitext
CN103970732A (en) * 2014-05-22 2014-08-06 北京百度网讯科技有限公司 Mining method and device of new word translation
CN104375988A (en) * 2014-11-04 2015-02-25 北京第二外国语学院 Word and expression alignment method and device
CN105701089A (en) * 2015-12-31 2016-06-22 成都数联铭品科技有限公司 Post-editing processing method for correction of wrong words in machine translation
CN105740218A (en) * 2015-12-31 2016-07-06 成都数联铭品科技有限公司 Post-editing processing method for mechanical translation
CN105760542A (en) * 2016-03-15 2016-07-13 腾讯科技(深圳)有限公司 Display control method, terminal and server
CN106126505A (en) * 2016-06-20 2016-11-16 清华大学 Parallel phrase learning method and device
CN106407186A (en) * 2016-10-09 2017-02-15 新译信息科技(深圳)有限公司 Word segmentation model building method and apparatus
CN106663092A (en) * 2014-10-24 2017-05-10 谷歌公司 Neural machine translation systems with rare word processing
CN109840331A (en) * 2019-01-31 2019-06-04 沈阳雅译网络技术有限公司 A kind of neural machine translation method based on user-oriented dictionary
CN110245361A (en) * 2019-06-14 2019-09-17 科大讯飞股份有限公司 Phrase is to extracting method, device, electronic equipment and readable storage medium storing program for executing
CN110334362A (en) * 2019-07-12 2019-10-15 北京百奥知信息科技有限公司 A method of the solution based on medical nerve machine translation generates untranslated word
CN110598222A (en) * 2019-09-12 2019-12-20 北京金山数字娱乐科技有限公司 Language processing method and device, and training method and device of language processing system
CN117540755A (en) * 2023-11-13 2024-02-09 北京云上曲率科技有限公司 Method and system for enhancing data by neural machine translation model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1187651A (en) * 1997-01-07 1998-07-15 株式会社日立制作所 Method and device for managing dictionary
CN1652106A (en) * 2004-02-04 2005-08-10 北京赛迪翻译技术有限公司 Machine translation method and apparatus based on language knowledge base
CN101290616A (en) * 2008-06-11 2008-10-22 中国科学院计算技术研究所 Statistical machine translation method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1187651A (en) * 1997-01-07 1998-07-15 株式会社日立制作所 Method and device for managing dictionary
CN1652106A (en) * 2004-02-04 2005-08-10 北京赛迪翻译技术有限公司 Machine translation method and apparatus based on language knowledge base
CN101290616A (en) * 2008-06-11 2008-10-22 中国科学院计算技术研究所 Statistical machine translation method and system

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150329A (en) * 2013-01-06 2013-06-12 清华大学 Word alignment method and device of bitext
CN103970732A (en) * 2014-05-22 2014-08-06 北京百度网讯科技有限公司 Mining method and device of new word translation
CN103970732B (en) * 2014-05-22 2017-05-10 北京百度网讯科技有限公司 Mining method and device of new word translation
CN106663092B (en) * 2014-10-24 2020-03-06 谷歌有限责任公司 Neural-machine translation system with rare word processing
CN111291553B (en) * 2014-10-24 2023-11-21 谷歌有限责任公司 Neural machine translation system with rare word processing
CN106663092A (en) * 2014-10-24 2017-05-10 谷歌公司 Neural machine translation systems with rare word processing
US10936828B2 (en) 2014-10-24 2021-03-02 Google Llc Neural machine translation systems with rare word processing
CN111291553A (en) * 2014-10-24 2020-06-16 谷歌有限责任公司 Neural-machine translation system with rare word processing
CN104375988A (en) * 2014-11-04 2015-02-25 北京第二外国语学院 Word and expression alignment method and device
CN105701089A (en) * 2015-12-31 2016-06-22 成都数联铭品科技有限公司 Post-editing processing method for correction of wrong words in machine translation
CN105740218A (en) * 2015-12-31 2016-07-06 成都数联铭品科技有限公司 Post-editing processing method for mechanical translation
CN105760542A (en) * 2016-03-15 2016-07-13 腾讯科技(深圳)有限公司 Display control method, terminal and server
CN106126505A (en) * 2016-06-20 2016-11-16 清华大学 Parallel phrase learning method and device
CN106126505B (en) * 2016-06-20 2020-01-31 清华大学 Parallel phrase learning method and device
CN106407186B (en) * 2016-10-09 2019-04-30 新译信息科技(深圳)有限公司 Establish the method and device of participle model
CN106407186A (en) * 2016-10-09 2017-02-15 新译信息科技(深圳)有限公司 Word segmentation model building method and apparatus
CN109840331A (en) * 2019-01-31 2019-06-04 沈阳雅译网络技术有限公司 A kind of neural machine translation method based on user-oriented dictionary
CN109840331B (en) * 2019-01-31 2023-04-28 沈阳雅译网络技术有限公司 Neural machine translation method based on user dictionary
CN110245361A (en) * 2019-06-14 2019-09-17 科大讯飞股份有限公司 Phrase is to extracting method, device, electronic equipment and readable storage medium storing program for executing
CN110245361B (en) * 2019-06-14 2023-04-18 科大讯飞股份有限公司 Phrase pair extraction method and device, electronic equipment and readable storage medium
CN110334362A (en) * 2019-07-12 2019-10-15 北京百奥知信息科技有限公司 A method of the solution based on medical nerve machine translation generates untranslated word
CN110334362B (en) * 2019-07-12 2023-04-07 北京百奥知信息科技有限公司 Method for solving and generating untranslated words based on medical neural machine translation
CN110598222A (en) * 2019-09-12 2019-12-20 北京金山数字娱乐科技有限公司 Language processing method and device, and training method and device of language processing system
CN110598222B (en) * 2019-09-12 2023-05-30 北京金山数字娱乐科技有限公司 Language processing method and device, training method and device of language processing system
CN117540755A (en) * 2023-11-13 2024-02-09 北京云上曲率科技有限公司 Method and system for enhancing data by neural machine translation model

Similar Documents

Publication Publication Date Title
CN101770458A (en) Mechanical translation method based on example phrases
CN105957518B (en) A kind of method of Mongol large vocabulary continuous speech recognition
US8131539B2 (en) Search-based word segmentation method and device for language without word boundary tag
CN107038158B (en) Method and apparatus for creating translation corpus, recording medium, and machine translation system
US8407040B2 (en) Information processing device, method and program
CN106383818A (en) Machine translation method and device
Kobayashi et al. Top-down RST parsing utilizing granularity levels in documents
US8874433B2 (en) Syntax-based augmentation of statistical machine translation phrase tables
Li et al. Language modeling with functional head constraint for code switching speech recognition
Alqudsi et al. A hybrid rules and statistical method for Arabic to English machine translation
CN106649289A (en) Realization method and realization system for simultaneously identifying bilingual terms and word alignment
Abumalloh et al. Arabic part-of-speech tagging
Álvarez et al. Towards customized automatic segmentation of subtitles
Mermer Unsupervised search for the optimal segmentation for statistical machine translation
Gugliotta et al. Tarc: Tunisian arabish corpus first complete release
CN107861953B (en) Automatic name translation system and method
Chan et al. Automatic speech recognition of Cantonese-English code-mixing utterances
Kuo et al. A phonetic similarity model for automatic extraction of transliteration pairs
Yeong et al. Language identification of code switching sentences and multilingual sentences of under-resourced languages by using multi structural word information
CN112765977A (en) Word segmentation method and device based on cross-language data enhancement
CN108255818B (en) Combined machine translation method using segmentation technology
JP5298834B2 (en) Example sentence matching translation apparatus, program, and phrase translation apparatus including the translation apparatus
Lambrecht et al. Machine translation from standard German to alemannic dialects
Janssen et al. The CPLP Corpus: A pluricentric corpus for the common Portuguese spelling dictionary (VOC)
Núñez et al. Phonetic normalization for machine translation of user generated content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20100707