JP2006024114A

JP2006024114A - Mechanical translation device and mechanical translation computer program

Info

Publication number: JP2006024114A
Application number: JP2004203382A
Authority: JP
Inventors: Kenji Imamura; 賢治今村; Hideo Okuma; 英男大熊; Taro Watanabe; 太郎渡辺; Eiichiro Sumida; 英一郎隅田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2004-07-09
Filing date: 2004-07-09
Publication date: 2006-01-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mechanical translation device capable of verifying obtained translated sentences in example-based machine translation and outputting a translated sentence determined to be correct according to a predetermined condition. <P>SOLUTION: The mechanical translation device 130 comprises an example base 106 including a plurality of parallel translation sentences consisting of English sentences and Japanese sentences, an example-based translation section 140 that receives an English input sentence 120, retrieves a parallel translation sentence from the example base 106, and generates a plurality of candidates 132A to 132M of the Japanese translated sentence corresponding to the input sentence 120, and a statistical selection section 150 for selecting, as a translated sentence corresponding to the input sentence 120, one of the plurality of candidates 132A to 132M where a statistical probability score calculated using a language model 102 and a translation model 104 satisfies the predetermined condition. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、対訳コーパスを用いた用例翻訳を行なう機械翻訳装置に関し、特に、用例翻訳の弱点を、統計的なモデルを用いて克服し高品質な翻訳を行なう機械翻訳装置に関する。 The present invention relates to a machine translation apparatus that performs example translation using a bilingual corpus, and more particularly, to a machine translation apparatus that overcomes the weaknesses of example translation using a statistical model and performs high-quality translation.

現在、機械翻訳において、対訳コーパスを利用して翻訳を行なう翻訳方式が高い成果を挙げつつある。このような機械翻訳の代表的なものとして、用例翻訳と呼ばれるものがある（非特許文献１を参照されたい）。用例翻訳は、対訳コーパスを一種のデータベース（以下、これを「用例ベース」と呼ぶ。）として利用して機械翻訳を行なう。 Currently, in machine translation, a translation system that performs translation using a bilingual corpus is producing high results. A typical example of such machine translation is called example translation (see Non-Patent Document 1). For example translation, machine translation is performed using a bilingual corpus as a kind of database (hereinafter referred to as “example base”).

図１１に、用例翻訳を行なう従来の典型的な機械翻訳装置の構成を概略的に示す。図１１を参照して、従来の機械翻訳装置３００は、第１の言語（以下、この言語を「原言語」と呼ぶ。）の入力文３１０を第２の言語（以下、この言語を「目的言語」と呼ぶ。）の出力文３１２に翻訳するものであって、原言語の文とその文に対する目的言語の訳文とを多数含む対訳コーパス３０２からなる用例ベース３０４と、入力文３１０に最も類似した対訳文（以下、これを「用例」と呼ぶ。）を用例ベース３０４から検索するための検索部３２０と、検索部３２０により検索された用例の原言語単語列と入力文３１０との異なり単語列を同定するための差分同定部３２１と、原言語と目的言語との間の対訳辞書３０６と、対訳辞書３０６を参照し、検索部３２０により得られた用例のうち差分同定部３２１により同定された、原言語の文と入力文３１０との相違に基づき、用例のうち目的言語の文を修正して出力文３１２を生成するための修正部３２２とを含む。 FIG. 11 schematically shows a configuration of a typical conventional machine translation apparatus that performs example translation. Referring to FIG. 11, a conventional machine translation apparatus 300 uses an input sentence 310 of a first language (hereinafter referred to as “source language”) as a second language (hereinafter referred to as “object”). The example base 304 consisting of a bilingual corpus 302 including a source language sentence and a target language translation corresponding to the sentence is most similar to the input sentence 310. The search unit 320 for searching the translated text (hereinafter referred to as “example”) from the example base 304, and the difference between the source language word string of the example searched by the search unit 320 and the input sentence 310 The difference identification unit 321 for identifying a column, the bilingual dictionary 306 between the source language and the target language, and the bilingual dictionary 306 are identified by the difference identifying unit 321 among the examples obtained by the search unit 320. Source language Based on the difference of the sentence and the input sentence 310, and a correction unit 322 for generating an output sentence 312 and correct the statement in the target language of the examples.

機械翻訳装置３００に入力文３１０が与えられると、検索部３２０が、入力文３１０に最も類似する用例を用例ベース３０４から検索する。検索された用例は差分同定部３２１に与えられる。差分同定部３２１には、入力文３１０も与えられる。差分同定部３２１は、入力文３１０と検索された用例の原言語側の文との間で異なる個所を特定する。差分同定部３２１は、入力文３１０および用例の相違個所を特定する情報とともに用例を修正部３２２に与える。修正部３２２は、対訳辞書３０６を参照して、用例の目的言語側の文中において特定された個所を、入力文３１０の対応する個所の単語の訳語を用いて修正し、修正した文を出力文３１２として出力する。 When the input sentence 310 is given to the machine translation apparatus 300, the search unit 320 searches the example base 304 for an example that is most similar to the input sentence 310. The searched example is given to the difference identification unit 321. An input sentence 310 is also given to the difference identification unit 321. The difference identification unit 321 specifies a different part between the input sentence 310 and the sentence on the source language side of the searched example. The difference identification unit 321 gives the example to the correction unit 322 together with information specifying the input sentence 310 and the difference between the examples. The correcting unit 322 refers to the bilingual dictionary 306 and corrects the part specified in the sentence on the target language side of the example using the translation of the word at the corresponding part of the input sentence 310 and outputs the corrected sentence to the output sentence 312 is output.

例えば、原言語として英語、目的言語として日本語の場合を考える。この場合、用例ベース３０４としては英語と日本語との対訳を多数含むものが用いられる。入力文３１０として「ｗｈｅｒｅｉｓｔｈｅｃｈｅａｐｅｓｔｈｏｔｅｌ」という英語の文が入力されたものとする。検索部３２０は、用例ベース３０４から、英語側の文が入力文３１０に最も類似する用例を検索する。ここで用例ベース３０４から、「ｗｈｅｒｅｉｓｔｈｅｃｈｅａｐｅｓｔｒｅｓｔａｕｒａｎｔ／一番安いレストランはどこですか」という用例が検索されたものとする。検索部３２０は差分同定部３２１にこの用例と入力文３１０とを与える。 For example, consider the case where the source language is English and the target language is Japanese. In this case, as the example base 304, one containing many parallel translations of English and Japanese is used. Assume that an English sentence “where is the cheap hotel” is input as the input sentence 310. The search unit 320 searches the example base 304 for an example in which the English sentence is most similar to the input sentence 310. Here, it is assumed that an example “where is the cheapest restaurant / where is the cheapest restaurant” is retrieved from the example base 304. The search unit 320 gives this example and the input sentence 310 to the difference identification unit 321.

入力文３１０中には「ｈｏｔｅｌ」という単語があるのに対し、用例の英語側の文では「ｈｏｔｅｌ」の代りに「ｒｅｓｔａｕｒａｎｔ」という単語がある点で入力文と用例とは異なる。差分同定部３２１は、用例の英語側の文と入力文３１０とのこの相違個所を同定する。差分同定部３２１はさらに、用例の日本語側の文のうち、用例の英語側の文において入力文３１０と相違しているとして同定された単語「ｒｅｓｔａｕｒａｎｔ」に対応する単語「レストラン」を特定する。差分同定部３２１は、検索された用例とその用例に対する入力文３１０中の異なり単語「ｈｏｔｅｌ」とを修正部３２２に与える。このとき差分同定部３２１は、用例中の単語「ｒｅｓｔａｕｒａｎｔ」に対応する単語「レストラン」を修正部３２２に対して指示する。修正部３２２は、対訳辞書３０６を参照して入力文３１０中の異なり単語「ｈｏｔｅｌ」の訳語である「ホテル」を得る。修正部３２２は、差分同定部３２１により指示された単語「レストラン」を対訳辞書３０６から得られた「ホテル」に置換する。機械翻訳装置３００は、これら一連の処理により得られる日本語の文「一番安いホテルはどこですか」を、出力文３１２として出力する。 The input sentence 310 includes the word “hotel”, whereas the sentence on the English side of the example differs from the input sentence and the example in that there is a word “restaurant” instead of “hotel”. The difference identification unit 321 identifies this difference between the English sentence of the example and the input sentence 310. The difference identification unit 321 further identifies the word “restaurant” corresponding to the word “restaurant” identified as being different from the input sentence 310 in the sentence on the English side of the example among the sentences on the Japanese side of the example. . The difference identification unit 321 gives the modification example 322 the retrieved example and the different word “hotel” in the input sentence 310 for the example. At this time, the difference identification unit 321 instructs the correction unit 322 of the word “restaurant” corresponding to the word “restaurant” in the example. The correcting unit 322 refers to the bilingual dictionary 306 and obtains “hotel” which is a translated word of the different word “hotel” in the input sentence 310. The correction unit 322 replaces the word “restaurant” instructed by the difference identification unit 321 with “hotel” obtained from the bilingual dictionary 306. The machine translation apparatus 300 outputs the Japanese sentence “Where is the cheapest hotel” obtained as a result of these series of processes as an output sentence 312?

用例翻訳において利用される用例には、句または文を単位とした対訳が用いられることが多い。そのため用例翻訳は、慣用表現等を適切に翻訳することができるという利点を有する。 As an example used in example translation, a parallel translation in units of phrases or sentences is often used. Therefore, example translation has the advantage that conventional expressions and the like can be appropriately translated.

ナガオ，Ｍ．（１９８４）．『類似原則による日本語英語間の機械翻訳機構』、人工および人間の知能、１７３頁−１８０頁、アムステルダム：北オランダ（Ｎａｇａｏ，Ｍ．（１９８４）．“ＡｆｒａｍｅｗｏｒｋｏｆｍｅｃｈａｎｉｃａｌｔｒａｎｓｌａｔｉｏｎｂｅｔｗｅｅｎＪａｐａｎｅｓｅａｎｄＥｎｇｌｉｓｈｂｙａｎａｌｏｇｙｐｒｉｎｃｉｐｌｅ”．ＩｎＡｒｔｉｆｉｃｉａｌａｎｄＨｕｍａｎＩｎｔｅｌｌｉｇｅｎｃｅ，ｐａｇｅｓ１７３−１８０，Ａｍｓｔｅｒｄａｍ：Ｎｏｒｔｈ−Ｈｏｌｌａｎｄ．）Nagao, M.M. (1984). "Machine Translation Mechanism between Japanese and English by Similar Principles", Artificial and Human Intelligence, pages 173-180, Amsterdam: North Holland (Nagao, M. (1984). "A framework of mechanical translation Japan and English by. analog principal ". In Artificial and Human Intelligence, pages 173-180, Amsterdam: North-Holland.) イマムラ，Ｋ．（２００２）．『パタンベースＭＴのための階層的句アライメントにより獲得された翻訳知識の適用』．第９回機械翻訳における理論的および方法論的問題に関する会議（ＴＭＩ−２００２）予稿集，７４−８４頁．（Ｉｍａｍｕｒａ，Ｋ．（２００２）． “Ａｐｐｌｉｃａｔｉｏｎｏｆｔｒａｎｓｌａｔｉｏｎｋｎｏｗｌｅｄｇｗｅａｃｑｕｉｒｅｄｂｙｈｉｅｒａｒｃｈｉｃａｌｐｈｒａｓｅａｌｉｇｎｍｅｎｔｆｏｒｐａｔｔｅｒｎ−ｂａｓｅｄＭＴ”．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ９ｔｈＣｏｎｆｅｒｅｎｃｅｏｎＴｈｅｏｒｅｔｉｃａｌａｎｄＭｅｔｈｏｄｏｌｏｇｉｃａｌＩｓｓｕｅｓｉｎＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ（ＴＭＩ−２００２），ｐａｇｅｓ７４−８４．）Imamra, K.M. (2002). “Application of translation knowledge acquired by hierarchical phrase alignment for pattern-based MT”. 9th Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2002) Proceedings, pp. 74-84. (Imamura, K. (2002). "Application of translation knowledgwe acquired by hierarchical phrase alignment for pattern-based MT" .In Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2002), pages 74-84 .)

用例翻訳では、例えば類似度を単語の異なり数で測った場合、入力文との単語の異なり数が最も少ない用例が用例ベース３０４から検索できることが前提である。しかし、そのような用例が一意に検索できるとは限らず、単語の異なりが同程度の用例が複数個競合することがある。複数個の用例が競合した場合、従来の用例翻訳では、入力文とそれら用例の原言語側との類似度を測り、類似度が最も高い用例を選択する。しかし、最も類似した用例を選択した場合であっても、選択された用例から適切な出力文が生成されるとは限らない。 In the example translation, for example, when the similarity is measured by the number of different words, it is assumed that an example having the smallest number of different words from the input sentence can be searched from the example base 304. However, such an example cannot always be uniquely searched, and a plurality of examples having similar word differences may compete. When a plurality of examples conflict, the conventional example translation measures the similarity between the input sentence and the source language side of those examples, and selects the example having the highest similarity. However, even when the most similar example is selected, an appropriate output sentence is not always generated from the selected example.

例えば、「ｗｈｅｒｅｉｓｔｈｅｎｅａｒｅｓｔｒｅｓｔａｕｒａｎｔ」という文が入力されたものとする。また、この入力文に類似する用例として、「ｗｈｅｒｅｉｓｔｈｅｎｅａｒｅｓｔｓｕｂｗａｙ／最寄りの地下鉄の駅はどこですか」という用例と、「ｗｈｅｒｅｉｓｔｈｅｃｈｅａｐｅｓｔｒｅｓｔａｕｒａｎｔ／一番安いレストランはどこですか」という用例とが競合したとする。仮に入力文に最も類似する用例として前者が選択されたとする。この場合、用例の「ｓｕｂｗａｙ」が入力文の「ｒｅｓｔａｕｒａｎｔ」と異なるため、「地下鉄」が「ｒｅｓｔａｕｒａｎｔ」の訳語「レストラン」に置換される。したがって出力文３１２として「最寄りのレストランの駅はどこですか」が出力される。しかしこの出力文は誤訳である。 For example, it is assumed that a sentence “where is the nearest resturant” is input. As examples similar to this input sentence, there are an example of “where is the nearest subway / where is the nearest subway station” and an example of “where is the cheapest restaurant / where is the cheapest restaurant”. Suppose you compete. Assume that the former is selected as an example most similar to the input sentence. In this case, since “subway” in the example is different from “restaurant” in the input sentence, “subway” is replaced with the translated word “restaurant” of “restaurant”. Therefore, “where is the nearest restaurant station” is output as the output sentence 312. However, this output sentence is a mistranslation.

仮に後者が選択されたとすると、用例の「ｃｈｅａｐｅｓｔ」が入力文の「ｎｅａｒｅｓｔ」と異なる。そのため、用例の「一番安い」を「ｎｅａｒｅｓｔ」の訳語に置換することにより出力文３１２が生成される。ここで、対訳辞書３０６を参照することにより、「ｎｅａｒｅｓｔ」の訳語として「一番近い」を得たならば、出力文３１２は「一番近いレストランはどこですか」となり、正しい訳文となる。しかし、必ずしもそのようになるとは限らない。「ｎｅａｒｅｓｔ」の訳語として「最寄り」を得た場合、出力文３１２は、「最寄りレストランはどこですか」となる。これは、文法的に誤った文である。 If the latter is selected, the example “cheapest” is different from the input sentence “nearest”. Therefore, the output sentence 312 is generated by replacing “cheapest” in the example with the translated word “nearest”. Here, by referring to the bilingual dictionary 306, if “nearest” is obtained as the translation of “nearest”, the output sentence 312 becomes “where is the nearest restaurant” and becomes a correct translation. However, this is not always the case. When “nearest” is obtained as the translation of “nearest”, the output sentence 312 is “where is the nearest restaurant”. This is a grammatically incorrect sentence.

このように、用例翻訳においては、翻訳によりどのような結果が得られるかを考慮せずに用例の修正が実行される。よって、最終的な出力文が入力文の翻訳として正しいかどうかが検証できず、誤訳文または目的言語の文として正しくない文が出力されるおそれがある。 As described above, in the example translation, the example is corrected without considering what result is obtained by the translation. Therefore, it is impossible to verify whether the final output sentence is correct as a translation of the input sentence, and there is a possibility that an incorrect sentence is output as a mistranslated sentence or a target language sentence.

それゆえに、本発明の目的は、用例機械翻訳において、得られた翻訳文を検証して所定条件にしたがって正しいと判定された翻訳文を出力することができる機械翻訳装置を提供することである。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a machine translation apparatus capable of verifying an obtained translation sentence and outputting a translation sentence determined to be correct according to a predetermined condition in an example machine translation.

本発明の別の目的は、用例機械翻訳において、機械翻訳後の翻訳文が目的言語の文として正しい文か否かを判定し、正しい文と判定できる翻訳文を出力することができる機械翻訳装置を提供することである。 Another object of the present invention is to determine whether or not a translated sentence after machine translation is a correct sentence as a sentence in a target language in an example machine translation, and to output a translated sentence that can be determined as a correct sentence Is to provide.

本発明のさらに別の目的は、用例機械翻訳において、機械翻訳後の翻訳文が目的言語の文として正しい文か否かを統計的機械翻訳において使用される統計的モデルを用いて判定し、統計的にみて正しい文と判定できる翻訳文を出力することができる機械翻訳装置を提供することである。 Still another object of the present invention is to determine whether a translated sentence after machine translation is a correct sentence as a sentence in a target language by using a statistical model used in statistical machine translation in an example machine translation, An object of the present invention is to provide a machine translation apparatus that can output a translated sentence that can be determined as a correct sentence.

本発明の第１の局面に係る機械翻訳装置は、第１の言語の文と第２の言語の文との対からなる用例を複数含む所定の用例ベースと、第１の言語の入力文を受けて、用例ベースを参照して、入力文に対する第２の言語の翻訳文の複数の候補を生成するための用例翻訳手段と、用例翻訳手段の生成する複数の候補のうち、所定の確率統計モデルを用いて算出される確率スコアが所定の条件を充足するものを選択して出力するための統計的選択手段とを含む。 A machine translation device according to a first aspect of the present invention includes a predetermined example base including a plurality of examples including pairs of a sentence in a first language and a sentence in a second language, and an input sentence in the first language. Then, referring to the example base, an example translation unit for generating a plurality of candidates for the translated sentence of the second language for the input sentence, and a predetermined probability statistic among the plurality of candidates generated by the example translation unit Statistical selection means for selecting and outputting a probability score calculated using a model that satisfies a predetermined condition.

好ましくは、用例翻訳手段は、入力文を受け、入力文と所定の類似条件を充足する第１の言語の文を有する用例を用例ベース中で検索し、検索された第１の言語の文をそれぞれ含む複数の用例を抽出するための検索手段と、複数の用例の第２の言語の文をそれぞれ修正し、複数の用例の各々から、入力文に対する翻訳文の候補を生成するための修正手段とを含む。 Preferably, the example translation means receives an input sentence, searches an example base having a sentence in a first language satisfying a predetermined similarity condition with the input sentence, and searches the sentence in the first language searched for. Retrieval means for extracting a plurality of examples included therein, and correction means for correcting each sentence in the second language of the plurality of examples and generating translation sentence candidates for the input sentence from each of the plurality of examples Including.

より好ましくは、検索手段は、入力文を受け、入力文との異なり単語数が最も少ない複数の第１の言語の文を用例ベースにおいて検索し、検索された第１の言語の文をそれぞれ含む複数の用例を取得するための手段を含む。 More preferably, the search means receives an input sentence, searches for a plurality of sentences in the first language having the smallest number of words unlike the input sentence on an example basis, and includes each searched sentence in the first language. Means for obtaining a plurality of examples are included.

さらに好ましくは、機械翻訳装置は、第１の言語と第２の言語との間の対訳辞書をさらに含む。修正手段は、入力文と複数の用例の各々の第１の言語の文とを比較し、複数の用例の各々について、入力文との差分を同定するための差分同定手段と、複数の用例の第２の言語の文を、差分同定手段により同定された差分に基づいて対訳辞書を参照してそれぞれ修正し、複数の用例の各々から、入力文に対する翻訳文の候補を生成するための候補生成手段とを含む。 More preferably, the machine translation device further includes a bilingual dictionary between the first language and the second language. The correcting means compares the input sentence with sentences in the first language of each of the plurality of examples, and for each of the plurality of examples, difference identifying means for identifying a difference from the input sentence, and a plurality of examples Candidate generation for correcting a sentence in the second language with reference to the bilingual dictionary based on the difference identified by the difference identifying means, and generating a translation sentence candidate for the input sentence from each of a plurality of examples Means.

対訳辞書は、第１の言語の一つの単語に対して第２の言語の複数個の単語を訳語として含むことがあってもよい。候補生成手段は、複数の用例の各々の第２の言語の文を、差分同定手段により同定された差分に基づいて対訳辞書を参照して得られた１または複数個の第２の言語の単語を用いてそれぞれ修正することにより、入力文に対する１または複数の翻訳文の候補を生成するための手段を含む。 The bilingual dictionary may include a plurality of words in the second language as translations for one word in the first language. The candidate generating means includes one or a plurality of second language words obtained by referring to the bilingual dictionary based on the differences identified by the difference identifying means for the sentences in the second language of the plurality of examples. Means for generating one or a plurality of translation sentence candidates for the input sentence by modifying each of them using.

生成するための手段は、複数の用例の各々の第２の言語の文を、差分同定手段により同定された差分の各々に基づいて対訳辞書を参照して得られた１または複数個の第２の言語の単語を用いてそれぞれ可能な全ての組合せにしたがって修正することにより、入力文に対する１または複数の翻訳文の候補を生成するための手段を含んでもよい。 The means for generating the sentence of the second language of each of the plurality of examples is obtained by referring to the bilingual dictionary based on each of the differences identified by the difference identifying means. Means may be included for generating one or more translation sentence candidates for the input sentence by modifying according to all possible combinations using words in the language.

検索手段は、入力文を受け、入力文との編集距離が最小となる複数の第１の言語の文を用例ベース中で検索し、検索された第１の言語の文をそれぞれ含む複数の用例を取得するための手段を含んでもよい。 The search means receives the input sentence, searches the example base for a plurality of first language sentences that have the smallest editing distance from the input sentence, and includes a plurality of examples each including the searched first language sentence. Means for obtaining may be included.

検索手段は、入力文を受け、単語間の意味的距離を考慮して算出される入力文との編集距離が最小となる複数の第１の言語の文を用例ベース中で検索し、検索された第１の言語の文をそれぞれ含む複数の用例を取得するための手段を含んでもよい。 The search means receives an input sentence, searches the example base for sentences in a plurality of first languages that have the smallest editing distance from the input sentence calculated in consideration of the semantic distance between words, and is searched. A means for acquiring a plurality of examples each including a sentence in the first language may be included.

好ましくは、統計的選択手段は、用例翻訳手段の生成する複数の候補のうち、所定の確率統計モデルを用いて算出される確率スコアが最も高いものを選択して出力するための手段を含む。 Preferably, the statistical selection means includes means for selecting and outputting a candidate having the highest probability score calculated using a predetermined probability statistical model from among a plurality of candidates generated by the example translation means.

より好ましくは、機械翻訳装置はさらに、第２の言語の言語モデルを記憶するための言語モデル記憶手段を含む。出力するための手段は、複数の候補の各々に対し、言語記憶手段に記憶された言語モデルを用いて言語確率を算出するための言語確率算出手段と、言語確率算出手段により算出された言語確率が最も高い候補を選択して出力するための手段とを含む。 More preferably, the machine translation device further includes language model storage means for storing a language model of the second language. The means for outputting includes, for each of the plurality of candidates, a language probability calculation means for calculating a language probability using a language model stored in the language storage means, and a language probability calculated by the language probability calculation means Means for selecting and outputting the highest candidate.

より好ましくは、機械翻訳装置はさらに、第２の言語から第１の言語への翻訳モデルを記憶するための翻訳モデル記憶手段を含む。出力するための手段は、複数の候補の各々に対し、翻訳モデル記憶手段に記憶された翻訳モデルを用いて翻訳確率を算出するための翻訳確率算出手段と、翻訳確率算出手段により算出された翻訳確率が最も高い候補を選択して出力するための手段とを含む。 More preferably, the machine translation device further includes a translation model storage unit for storing a translation model from the second language to the first language. The means for outputting includes, for each of the plurality of candidates, a translation probability calculation means for calculating a translation probability using a translation model stored in the translation model storage means, and a translation calculated by the translation probability calculation means Means for selecting and outputting a candidate having the highest probability.

より好ましくは、機械翻訳装置はさらに、第２の言語の言語モデルを記憶するための言語モデル記憶手段と、第２の言語から第１の言語への翻訳モデルを記憶するための翻訳モデル記憶手段とを含む。出力するための手段は、複数の候補の各々に対し、言語記憶手段に記憶された言語モデルを用いて言語確率を算出するための言語確率算出手段と、複数の候補の各々に対し、翻訳モデル記憶手段に記憶された翻訳モデルを用いて翻訳確率を算出するための翻訳確率算出手段と、言語確率算出手段が算出する言語確率と、翻訳確率算出手段が算出する翻訳確率との関数として所定の確率スコアを算出するためのスコア算出手段と、スコア算出手段により算出された確率スコアが最も高い候補を選択して出力するための手段とを含む。 More preferably, the machine translation apparatus further includes language model storage means for storing a language model of the second language, and translation model storage means for storing a translation model from the second language to the first language. Including. The means for outputting, for each of a plurality of candidates, a language probability calculating means for calculating a language probability using a language model stored in the language storage means, and a translation model for each of the plurality of candidates A translation probability calculation means for calculating a translation probability using a translation model stored in the storage means, a language probability calculated by the language probability calculation means, and a translation probability calculated by the translation probability calculation means as a predetermined function. Score calculating means for calculating a probability score; and means for selecting and outputting a candidate having the highest probability score calculated by the score calculating means.

本発明の第２の局面に係る機械翻訳コンピュータプログラムは、コンピュータにより実行されると、当該コンピュータを本発明の第１の局面に係る機械翻訳装置として動作させる。 The machine translation computer program according to the second aspect of the present invention, when executed by a computer, causes the computer to operate as the machine translation apparatus according to the first aspect of the present invention.

以下に示す本発明の具体的な実施の形態に係る機械翻訳システムは、コンピュータハードウェアと、そのコンピュータハードウェアにより実行されるプログラムと、コンピュータハードウェアに格納されるデータとにより実現される。図１はこの機械翻訳システムを実現するコンピュータシステム３０の外観を示し、図２はコンピュータシステム３０の内部構成を示す。 A machine translation system according to a specific embodiment of the present invention described below is realized by computer hardware, a program executed by the computer hardware, and data stored in the computer hardware. FIG. 1 shows the external appearance of a computer system 30 that implements this machine translation system, and FIG. 2 shows the internal configuration of the computer system 30.

図１を参照して、このコンピュータシステム３０は、ＦＤ（フレキシブルディスク）ドライブ５２およびＣＤ−ＲＯＭ（コンパクトディスク読出専用メモリ）ドライブ５０を有するコンピュータ４０と、キーボード４６と、マウス４８と、モニタ４２とを含む。 Referring to FIG. 1, a computer system 30 includes a computer 40 having an FD (flexible disk) drive 52 and a CD-ROM (compact disk read only memory) drive 50, a keyboard 46, a mouse 48, and a monitor 42. including.

図２を参照して、コンピュータ４０は、ＦＤドライブ５２およびＣＤ−ＲＯＭドライブ５０に加えて、ＣＰＵ（中央処理装置）５６と、ＣＰＵ５６、ＦＤドライブ５２およびＣＤ−ＲＯＭドライブ５０に接続されたバス６６と、ブートアッププログラム等を記憶する読出専用メモリ（ＲＯＭ）５８と、バス６６に接続され、プログラム命令、システムプログラム、および作業データ等を記憶するランダムアクセスメモリ（ＲＡＭ）６０とを含む。コンピュータシステム３０はさらに、プリンタ４４を含む。 2, in addition to the FD drive 52 and the CD-ROM drive 50, the computer 40 includes a CPU (central processing unit) 56 and a bus 66 connected to the CPU 56, the FD drive 52, and the CD-ROM drive 50. And a read only memory (ROM) 58 for storing a boot-up program and the like, and a random access memory (RAM) 60 connected to the bus 66 for storing a program command, a system program, work data and the like. The computer system 30 further includes a printer 44.

ここでは示さないが、コンピュータ４０はさらにローカルエリアネットワーク（ＬＡＮ）への接続を提供するネットワークアダプタボードを含んでもよい。 Although not shown here, the computer 40 may further include a network adapter board that provides a connection to a local area network (LAN).

コンピュータシステム３０に本実施の形態に係る機械翻訳システムの機能を実現させるためのコンピュータプログラムは、ＣＤ−ＲＯＭドライブ５０またはＦＤドライブ５２に挿入されるＣＤ−ＲＯＭ６２またはＦＤ６４に記憶され、さらにハードディスク５４に転送される。または、プログラムは図示しないネットワークを通じてコンピュータ４０に送信されハードディスク５４に記憶されてもよい。プログラムは実行の際にＲＡＭ６０にロードされる。ＣＤ−ＲＯＭ６２から、ＦＤ６４から、またはネットワークを介して、直接にＲＡＭ６０にプログラムをロードしてもよい。 A computer program for causing the computer system 30 to realize the functions of the machine translation system according to the present embodiment is stored in the CD-ROM 62 or FD 64 inserted into the CD-ROM drive 50 or FD drive 52 and further stored in the hard disk 54. Transferred. Alternatively, the program may be transmitted to the computer 40 through a network (not shown) and stored in the hard disk 54. The program is loaded into the RAM 60 at the time of execution. The program may be loaded directly into the RAM 60 from the CD-ROM 62, from the FD 64, or via a network.

以下に説明する機械翻訳プログラムは、コンピュータ４０に本実施の形態に係る機械翻訳装置の機能を実現させる複数の命令を含む。この装置を実現するために必要な基本的機能のいくつかはコンピュータ４０上で動作するオペレーティングシステム（ＯＳ）またはサードパーティのプログラム、もしくはコンピュータ４０にインストールされる各種ツールキットのモジュールにより提供される。したがって、このプログラムはこの実施の形態に係る機械翻訳装置を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、所望の結果が得られるように制御されたやり方で適切な機能または「ツール」を呼出すことにより、機械翻訳装置の機能を実現する命令のみを含んでいればよい。コンピュータシステム３０の動作自体は周知であるので、ここでは繰返さない。 The machine translation program described below includes a plurality of instructions that cause the computer 40 to realize the functions of the machine translation apparatus according to the present embodiment. Some of the basic functions necessary to implement this device are provided by operating system (OS) or third-party programs running on the computer 40 or various toolkit modules installed on the computer 40. Therefore, this program does not necessarily include all functions necessary for realizing the machine translation apparatus according to this embodiment. This program need only contain instructions that implement the functions of the machine translation device by calling the appropriate functions or “tools” in a controlled manner to obtain the desired results. The operation of computer system 30 is well known and will not be repeated here.

−概要−
対訳コーパスを用いる機械翻訳の方式として、用例翻訳の他に、統計翻訳と呼ばれる方式がある。この方式では、ある言語の文を他の言語の文に翻訳するという問題を、条件付確率の最大化問題として定式化する。原言語文をＦとし、目的言語文をＥとすると、統計翻訳は、原言語文Ｆをもとに、次の式を満足する文を探索する。 -Overview-
As a method of machine translation using a bilingual corpus, there is a method called statistical translation in addition to example translation. In this method, the problem of translating a sentence in one language into a sentence in another language is formulated as a problem of maximizing conditional probability. If the source language sentence is F and the target language sentence is E, statistical translation searches the source language sentence F for a sentence that satisfies the following expression.

ここで、Ｐ（Ｅ）とＰ（Ｆ）とは、目的言語Ｅの文の尤度と原言語Ｆの文の尤度とをそれぞれ表わし、その文の流暢さの指標となる。Ｐ（Ｆ｜Ｅ）は、目的言語Ｅの文から原言語Ｆの文が得られる確率を表わし、翻訳の正確さの指標となる。上にあげた式のうちＰ（Ｆ）は定数である。したがってこの問題は、次の式を満足する文Ｅを探索する問題となる。

Here, P (E) and P (F) represent the likelihood of the sentence in the target language E and the likelihood of the sentence in the source language F, respectively, and serve as indices of the fluency of the sentence. P (F | E) represents a probability that a sentence in the source language F can be obtained from a sentence in the target language E, and is an index of translation accuracy. Of the equations listed above, P (F) is a constant. Therefore, this problem is a problem of searching for a sentence E that satisfies the following expression.

統計翻訳においてＰ（Ｅ）は言語Ｅの言語モデル確率と呼ばれる。言語モデル確率Ｐ（Ｅ）は、対訳コーパスをもとに計算される。あるコーパスに含まれる言語Ｅの全ての語に
ついて言語モデル確率を算出したものを言語モデルと呼ぶ。また、Ｐ（Ｆ｜Ｅ）は、翻訳
モデル確率と呼ばれる。翻訳モデル確率Ｐ（Ｆ｜Ｅ）もまた、対訳コーパスをもとに算出される。なおここで使用される翻訳モデルが目的言語から原言語へのものであり、翻訳の向きとは逆であることに注意が必要である。

In statistical translation, P (E) is called the language model probability of language E. The language model probability P (E) is calculated based on the bilingual corpus. What calculated the language model probability about all the words of the language E contained in a certain corpus is called a language model. P (F | E) is called a translation model probability. The translation model probability P (F | E) is also calculated based on the bilingual corpus. It should be noted that the translation model used here is from the target language to the source language and is opposite to the direction of translation.

本実施の形態に係る機械翻訳装置は、言語モデルおよび翻訳モデルを用いた文の探索を用例翻訳に導入することにより、最適な出力文を選択し出力する。 The machine translation apparatus according to the present embodiment selects and outputs an optimum output sentence by introducing a sentence search using the language model and the translation model into the example translation.

−構成−
図３に、本実施の形態に係る機械翻訳システム９０のブロック図を示す。図３を参照して、機械翻訳システム９０は、原言語文（英語）とその文に対する目的言語（日本語）の訳文とからなる対訳文を多数含む対訳コーパス１００と、対訳コーパス１００をもとに、目的言語である日本語の言語モデル１０２を作成するための言語モデル作成装置１１０と、対訳コーパス１００をもとに、目的言語から原言語への翻訳モデル１０４を作成する翻訳モデル作成装置１１２と、対訳コーパス１００からなる用例ベース１０６、言語モデル１０２、および翻訳モデル１０４を用いて原言語の入力文１２０を翻訳し、目的言語の出力文１２２を生成する機械翻訳装置１３０とを含む。 −Configuration−
FIG. 3 shows a block diagram of a machine translation system 90 according to the present embodiment. Referring to FIG. 3, the machine translation system 90 includes a parallel corpus 100 including a large number of parallel translations composed of a source language sentence (English) and a translation of a target language (Japanese) for the sentence, and a parallel corpus 100. In addition, a language model creation device 110 for creating a Japanese language model 102 as a target language and a translation model creation device 112 for creating a translation model 104 from the target language to the source language based on the parallel translation corpus 100. And a machine translation device 130 that translates the input sentence 120 in the source language using the example base 106 including the bilingual corpus 100, the language model 102, and the translation model 104, and generates an output sentence 122 in the target language.

図４に、用例ベース１０６の構成を示す。図４を参照して、用例ベース１０６は、対訳文の番号、その対訳文の原言語（英語）側の単語列（原言語単語列）、原言語文に対する目的言語の訳文の単語列（目的言語単語列）、および対訳文における単語対応を示す情報を含む。図４において、「／」は、単語の区切りを示す。単語対応は、原言語単語列中の単語と、その単語に対応する目的言語単語列中の単語または単語列との組を示す。通常、対訳コーパス１００はこのような情報を含んでおり、対訳コーパス１００を用例ベース１０６として使用できる。 FIG. 4 shows the configuration of the example base 106. Referring to FIG. 4, the example base 106 includes a translation sentence number, a word string (source language word string) on the source language (English) side of the translation sentence, and a word string (objective translation) of the target language with respect to the source language sentence. Language word string) and information indicating word correspondence in the parallel translation. In FIG. 4, “/” indicates a word break. The word correspondence indicates a set of a word in the source language word string and a word or a word string in the target language word string corresponding to the word. Usually, the bilingual corpus 100 includes such information, and the bilingual corpus 100 can be used as the example base 106.

図５に、言語モデル１０２の構成を示す。図５に示す言語モデル１０２は、単語バイグラム（ｂｉｇｒａｍ）モデルである。言語モデル１０２は、対訳コーパス１００の目的言語文中の単語ｅ_i-1およびｅ_iの順序付の組と、直前に単語ｅ_i-1が置かれているという条件下で単語ｅ_iが生起する確率ｐ（ｅ_i｜ｅ_i-1）とからなる多数のエントリを含む。なお、図５に示す言語モデル１０２において、「＜ｓ＞」は、文の開始位置を示す記号であり、「＜／ｓ＞」は、文の終了位置を示す記号である。 FIG. 5 shows the configuration of the language model 102. The language model 102 shown in FIG. 5 is a word bigram model. In the language model 102, the word e _i is generated under the condition that the ordered set of the words e _i-1 and e _i in the target language sentence of the bilingual corpus 100 and the word e _i-1 are placed immediately before. It contains a large number of entries with probability p (e _i | e _i-1 ). In the language model 102 shown in FIG. 5, “<s>” is a symbol indicating the start position of a sentence, and “</ s>” is a symbol indicating the end position of the sentence.

図６に、翻訳モデル１０４の構成を示す。図６に示す翻訳モデル１０４は、Ｌｅｘｉｃｏｎ（語彙）モデルである。翻訳モデル１０４は、対訳コーパス１００中の目的言語（日本語）の単語ｅ_iおよび原言語の単語ｆ_jの組と、対訳コーパス１００中の対訳文において単語ｅ_iが原言語の単語ｆ_jに訳される確率を表わす値ｔ（ｆ_j｜ｅ_i）とを含む。なお、図６に示す翻訳モデル１０４において、「ＮＵＬＬ」は、原言語単語が翻訳されないことを示す特殊単語である。翻訳モデルは、原言語と目的言語との間の単語の訳の正確性を表わすモデルである。 FIG. 6 shows the configuration of the translation model 104. The translation model 104 shown in FIG. 6 is a Lexicon (vocabulary) model. The translation model 104 includes a pair of a target language (Japanese) word e _i and a source language word f _j in the parallel corpus 100, and a word e _i in the parallel translation sentence in the parallel corpus 100 as a source language word f _j . And a value t (f _j | e _i ) representing the probability of being translated. In the translation model 104 shown in FIG. 6, “NULL” is a special word indicating that the source language word is not translated. The translation model is a model that represents the accuracy of translation of words between the source language and the target language.

図３を参照して、機械翻訳装置１３０は、用例ベース１０６に加えて、原言語の単語とその単語に対する１または複数の目的言語の単語または単語列とからなる対訳を多数含む対訳辞書１０８と、入力文１２０が与えられると、用例ベース１０６および対訳辞書１０８を用い、従来と同様の用例翻訳により、日本語の複数の出力文候補１３２Ａ，…，１３２Ｍ（これらをまとめて出力文候補１３２と呼ぶこともある。）を生成して出力するための用例翻訳部１４０と、出力文候補１３２Ａ，…，１３２Ｍの中から、言語モデル１０２および翻訳モデル１０４を用いて統計的に最適と判定される出力文１２２を選択し出力するための統計的選択部１５０とを含む。 Referring to FIG. 3, in addition to the example base 106, the machine translation device 130 includes a bilingual dictionary 108 including a number of bilingual translations composed of a source language word and one or a plurality of target language words or word strings corresponding to the word. When the input sentence 120 is given, the example base 106 and the bilingual dictionary 108 are used, and a plurality of Japanese output sentence candidates 132A,. .., 132M are statistically determined to be statistically optimal using the language model 102 and the translation model 104. And a statistical selection unit 150 for selecting and outputting the output sentence 122.

図７に、対訳辞書１０８の構成を示す。図７を参照して、対訳辞書１０８は、原言語の単語と、その単語に対する訳語である目的言語単語列との組からなるエントリを多数含む。原言語単語に対して複数の訳語が存在する場合、対訳辞書には、それら複数の訳語が列挙される。例えば、原言語単語「ｎｅａｒｅｓｔ」に対して、目的言語単語列には、「最寄り」という訳語と「一番／近い」という訳語とが列挙されている。なお、図７に示す対訳辞書１０８においても用例ベース１０６（図４参照）と同様に、「／」が単語の区切りを示す。 FIG. 7 shows the configuration of the bilingual dictionary 108. Referring to FIG. 7, bilingual dictionary 108 includes a large number of entries each consisting of a set of a source language word and a target language word string which is a translation for that word. When there are a plurality of translations for the source language word, the plurality of translations are listed in the parallel translation dictionary. For example, for the source language word “nearest”, the target language word string lists the translation of “nearest” and the translation of “most / closest”. In the bilingual dictionary 108 shown in FIG. 7, “/” indicates a word break as in the example base 106 (see FIG. 4).

図８に、用例翻訳部１４０（図３参照）の詳細なブロック図を示す。図８を参照して、用例翻訳部１４０は、入力文１２０に類似する原言語文（英語単語列）を有する所定の数（Ｎ個）の用例を用例ベース１０６より検索するための検索部２００と、検索部２００によりＮ個の検索された用例の各々の原言語単語列と入力文１２０との異なり単語を同定するための差分同定部２０２と、各用例の目的言語単語列において、差分同定部２０２により同定された原言語の異なり単語に対応する訳語を、用例ベース１０６に含まれる、用例の単語対応情報により特定し、入力文１２０の異なり単語を用いて対訳辞書１０８から検索した訳語で置換または修正することにより、出力文候補１３２Ａ，…，１３２Ｍを生成するための修正部２０４とを含む。ここで、対訳辞書１０８から複数の訳語が得られる場合もあるので、Ｍ≧Ｎである。 FIG. 8 shows a detailed block diagram of the example translation unit 140 (see FIG. 3). Referring to FIG. 8, the example translation unit 140 searches the example base 106 for a predetermined number (N) of examples having source language sentences (English word strings) similar to the input sentence 120. A difference identification unit 202 for identifying a difference word between the source language word string of each of the N examples searched for by the search unit 200 and the input sentence 120, and a difference identification in the target language word string of each example The translation corresponding to the different word in the source language identified by the unit 202 is identified by the word correspondence information in the example included in the example base 106, and the translation is retrieved from the parallel translation dictionary 108 using the different word in the input sentence 120. And a modification unit 204 for generating output sentence candidates 132A,..., 132M by replacement or modification. Here, since a plurality of translated words may be obtained from the bilingual dictionary 108, M ≧ N.

図９に、統計的選択部１５０（図３参照）の詳細なブロック図を示す。図９を参照して、統計的選択部１５０は、言語モデル１０２に基づき、各出力文候補１３２Ａ，…，１３２Ｍの言語モデル確率を算出する言語モデル確率算出部２２０と、翻訳モデル１０４に基づき、入力文１２０と各出力文候補１３２Ａ，…，１３２Ｍとの間の翻訳モデル確率を算出する翻訳モデル確率算出部２２２とを含む。統計的選択部１５０はさらに、出力文候補１３２Ａ，…，１３２Ｍについての言語モデル確率と翻訳モデル確率とを乗算し、得られた値を、出力文候補１３２Ａ，…，１３２Ｍに対しそれぞれ統計的確率スコアとして付与する乗算部２２４と、出力文候補１３２Ａ，…，１３２Ｍの中で、乗算部２２４が付与した統計的確率スコアの値が最も高いものを選択し、出力文１２２として出力するための選択部２２６とを含む。 FIG. 9 shows a detailed block diagram of the statistical selection unit 150 (see FIG. 3). Referring to FIG. 9, the statistical selection unit 150 is based on the language model probability calculation unit 220 that calculates the language model probability of each of the output sentence candidates 132A,. A translation model probability calculation unit 222 that calculates a translation model probability between the input sentence 120 and each of the output sentence candidates 132A,. The statistical selection unit 150 further multiplies the language model probability and the translation model probability for the output sentence candidates 132A,..., 132M, and uses the obtained values for the statistical probabilities for the output sentence candidates 132A,. A selection unit for selecting the one with the highest statistical probability score given by the multiplication unit 224 from the multiplication unit 224 given as a score and the output sentence candidates 132A,. Part 226.

言語モデル確率算出部２２０は、図５に示す単語バイグラムモデルからなる言語モデル１０２を用い、単語数Ｗ_Eの単語ｅ_i（１≦ｉ≦Ｗ_E）からなる出力文候補Ｅについての言語モデル確率Ｐ（Ｅ）を次の式で算出する。 Language model probability calculation unit 220, the language model probability for the word bigram model language model 102 used consisting, output sentence candidates E consisting of the word e _i of the number of words _{W E (1 ≦ i ≦ W} E) of FIG. 5 P (E) is calculated by the following formula.

ここで、ｐ（ｅ_i｜ｅ_i-1）は、直前に単語ｅ_i-1が置かれているという条件下で単語ｅ_iが生起する確率を表わす。

Here, p (e _i | e _i-1 ) represents the probability that the word e _i will occur under the condition that the word e _i _-1 is placed immediately before.

翻訳モデル確率算出部２２２は、図６に示すＬｅｘｉｃｏｎモデルからなる翻訳モデル１０４を用い、単語数Ｗ_Eの単語ｅ_i（１≦ｉ≦Ｗ_E）からなる出力文候補Ｅと単語数Ｗ_Fの単語ｆ_j（１≦ｊ≦Ｗ_F）からなる入力文Ｆ（１２０）との変換についての翻訳モデル確率Ｐ（Ｆ｜Ｅ）を、次の式で算出する。 Translation model probability calculation unit 222, using a translation model 104 consisting Lexicon model shown in FIG. 6, the number of words W _E word _{e i (1 ≦ i ≦ W} E) of the output sentence candidates E and the number of words W _F consisting of The translation model probability P (F | E) for conversion with the input sentence F (120) consisting of the word f _j (1 ≦ j ≦ W _F ) is calculated by the following equation.

ここで、ｔ（ｆ_j｜ｅ_i）は、目的言語の単語ｅ_iが原言語の単語ｆ_jに翻訳される確率を表わし、ｔ（ｆ_j｜ＮＵＬＬ）は，原言語単語ｆ_jが翻訳されない確率を表わす。翻訳モデルは原言語と目的言語との間の単語訳の正確性を表わすモデルである。したがって上の式により算出された翻訳モデル確率Ｐ（Ｆ｜Ｅ）が高いほど、より正確な翻訳であると考えられる。

Here, t (f _j | e _i ) represents the probability that the target language word e _i will be translated into the source language word f _j , and t (f _j | NULL) represents the source language word f _j translated. Represents the probability of not being A translation model is a model that represents the accuracy of a word translation between a source language and a target language. Therefore, it is considered that the higher the translation model probability P (F | E) calculated by the above equation, the more accurate the translation.

−処理構造−
図１０に、機械翻訳装置１３０において実行される処理の構造をフローチャートで示す。図１０を参照して、入力文１２０（図３参照）が機械翻訳装置１３０に与えられると、ステップ（以下、単に「Ｓ」と表記する。）Ｓ２５２において、入力文１２０との異なり単語数が最も少ないＮ個の用例を用例ベース１０６より検索する。 -Processing structure-
FIG. 10 is a flowchart showing the structure of processing executed in the machine translation apparatus 130. Referring to FIG. 10, when input sentence 120 (see FIG. 3) is given to machine translation device 130, in step S <b> 252 (hereinafter simply referred to as “S”), the number of words is different from input sentence 120. The smallest N examples are searched from the example base 106.

続いて、Ｓ２５４ＡとＳ２５４Ｂとで囲まれたＳ２５６およびＳ２５８の処理を、Ｓ２５２で検索されたＮ個全ての用例について実行する。すなわち、Ｓ２５６では、入力文１２０と、検索された用例の原言語単語列とを比較して、両者の間の差分すなわち異なり単語を同定する。Ｓ２５８では、用例の目的言語単語列のうち、原言語単語列の異なり単語に対応する単語を用例ベース１０６の単語対応情報により特定し、さらに、入力文における異なり単語の訳語を対訳辞書１０８より取得して用例の目的言語単語列内の異なり単語に対応する単語を対訳辞書１０８より取得した訳語で置換する。このようにして修正された目的言語単語列が出力文候補１３２となる。対訳辞書に複数の訳語が列挙されている場合、それらの各々について出力文候補１３２を生成する。仮に異なり単語が複数あれば、それらの各々に対し可能な訳語を全て調べ、それらの間で可能な全ての組合せにしたがって、出力文候補１３２を生成する。全ての用例に対してその出力文候補１３２が生成されると、処理はＳ２６０Ａに移る。 Subsequently, the processing of S256 and S258 surrounded by S254A and S254B is executed for all N examples retrieved in S252. That is, in S256, the input sentence 120 and the retrieved source language word string are compared, and a difference between the two, that is, a different word is identified. In S258, the word corresponding to the different word in the source language word string is identified from the word correspondence information in the example base 106 in the target language word string of the example, and the translated word of the different word in the input sentence is acquired from the parallel translation dictionary Then, the word corresponding to the different word in the target language word string of the example is replaced with the translation acquired from the parallel translation dictionary 108. The target language word string corrected in this way becomes the output sentence candidate 132. When a plurality of translation words are listed in the bilingual dictionary, an output sentence candidate 132 is generated for each of them. If there are a plurality of different words, all possible translations for each of them are examined, and output sentence candidates 132 are generated according to all possible combinations between them. When the output sentence candidate 132 is generated for all the examples, the process proceeds to S260A.

Ｓ２６０ＡとＳ２６０Ｂとで囲まれたＳ２６２、Ｓ２６４、およびＳ２６６の処理は、Ｓ２５４Ａ〜Ｓ２５４Ｂにおいて生成された全ての出力文候補１３２について実行される。Ｓ２６２では、出力文候補１３２の各々について言語モデル確率を算出する。Ｓ２６４では、出力文候補１３２の各々と入力文との間の翻訳モデル確率を算出する。Ｓ２６６では、Ｓ２６２で算出された言語モデル確率とＳ２６４で算出された翻訳モデル確率との積からなる統計的確率スコアを出力文候補１３２の各々について算出する。以上の処理が各出力文候補１３２について行なわれると、処理はＳ２６８に移る。 The processing of S262, S264, and S266 surrounded by S260A and S260B is executed for all output sentence candidates 132 generated in S254A to S254B. In S262, a language model probability is calculated for each of the output sentence candidates 132. In S264, a translation model probability between each of the output sentence candidates 132 and the input sentence is calculated. In S266, a statistical probability score that is the product of the language model probability calculated in S262 and the translation model probability calculated in S264 is calculated for each of the output sentence candidates 132. When the above process is performed for each output sentence candidate 132, the process proceeds to S268.

Ｓ２６８では、出力文候補１３２の中から、Ｓ２６６で算出された統計的確率スコアの値が最大のものを出力文として選択し、出力して処理を終了する。 In S268, the output sentence candidate 132 having the largest statistical probability score value calculated in S266 is selected as an output sentence, and the process ends.

−動作−
機械翻訳システム９０は、以下のように動作する。図３を参照して、対訳コーパス１００には原言語の文と目的言語の訳文とからなる多数の対訳文が含まれており、用例ベース１０６（図４参照）として使用可能な状態で予め準備されているものとする。また対訳辞書１０８（図７参照）も、予め何らかの手段により準備されているものとする。言語モデル作成装置１１０は、対訳コーパス１００をもとに言語モデル１０２（図５参照）を作成しておき、機械翻訳装置１３０に予め与えておく。また、翻訳モデル作成装置１１２は、対訳コーパス１００をもとに翻訳モデル１０４（図６参照）を作成しておき、予め機械翻訳装置１３０に与えておく。 -Operation-
The machine translation system 90 operates as follows. Referring to FIG. 3, the bilingual corpus 100 includes a large number of bilingual sentences including a source language sentence and a target language translation sentence, and is prepared in advance so that it can be used as an example base 106 (see FIG. 4). It is assumed that The bilingual dictionary 108 (see FIG. 7) is also prepared in advance by some means. The language model creation device 110 creates a language model 102 (see FIG. 5) based on the bilingual corpus 100 and provides it to the machine translation device 130 in advance. Also, the translation model creation device 112 creates a translation model 104 (see FIG. 6) based on the bilingual corpus 100 and provides it to the machine translation device 130 in advance.

図８を参照して、入力文１２０は、用例翻訳部１４０の検索部２００に与えられる。検索部２００は、用例ベース１０６（図４参照）中の原言語単語列と入力文１２０とを比較し、入力文１２０と異なる単語の数が最も少ないものから順にＮ個の用例を検索し差分同定部２０２に与える。差分同定部２０２にはまた、入力文１２０も与えられる。 Referring to FIG. 8, input sentence 120 is given to search unit 200 of example translation unit 140. The search unit 200 compares the source language word string in the example base 106 (see FIG. 4) with the input sentence 120, searches N examples in order from the smallest number of words different from the input sentence 120, and performs a difference. This is given to the identification unit 202. The difference identification unit 202 is also given an input sentence 120.

差分同定部２０２は、検索部２００により検索された各用例の原言語単語列と、入力文１２０とを比較し、原言語単語列中で入力文１２０と異なる単語を同定する。さらに用例の単語対応をもとに、入力文１２０と異なる単語として同定された単語に対応する目的言語の単語列を、用例の目的言語単語列中で同定する。差分同定部２０２は、検索された各用例とその用例に対する入力文１２０中の異なり単語とを修正部２０４に与える。このとき差分同定部２０２は、与えた用例の目的言語単語列中で異なる単語または単語列として同定された単語または単語列を修正部２０４に対して指示する。 The difference identification unit 202 compares the source language word string of each example searched by the search unit 200 with the input sentence 120 and identifies a word different from the input sentence 120 in the source language word string. Furthermore, based on the word correspondence in the example, a word string in the target language corresponding to the word identified as a word different from the input sentence 120 is identified in the target language word string in the example. The difference identification unit 202 gives the correction unit 204 the searched examples and the different words in the input sentence 120 for the examples. At this time, the difference identification unit 202 instructs the correction unit 204 to identify a word or word string identified as a different word or word string in the target language word string of the given example.

修正部２０４は、対訳辞書１０８を参照して、入力文１２０中の異なり単語として与えられた単語の訳語を取得する。対訳辞書１０８に複数の訳語が列挙されている場合、列挙された全ての訳語を取得する。修正部２０４はさらに、与えられた用例の、目的言語列側の異なり単語列を、対訳辞書１０８から取得した訳語に置換して出力文候補１３２を生成する。対訳辞書１０８から複数個の訳語が検索された場合には、それら複数個の訳語の各々に対して出力文候補１３２を生成する。異なり単語が複数あるときは、それらに対する訳語の全てを用い、可能な組合せの全てについて出力文候補１３２を生成する。 The correction unit 204 refers to the bilingual dictionary 108 and acquires a translation of a word given as a different word in the input sentence 120. When a plurality of translations are listed in the bilingual dictionary 108, all listed translations are acquired. The correction unit 204 further generates an output sentence candidate 132 by replacing a different word string on the target language string side of the given example with the translated word acquired from the bilingual dictionary 108. When a plurality of translated words are retrieved from the bilingual dictionary 108, an output sentence candidate 132 is generated for each of the plurality of translated words. When there are a plurality of different words, all of the translated words are used, and output sentence candidates 132 are generated for all possible combinations.

こうして、用例翻訳部１４０は、１つの入力文１２０に対しＭ個（Ｍ≧Ｎ）の出力文候補１３２Ａ，…，１３２Ｍを生成する。生成された出力文候補１３２Ａ，…，１３２Ｍは、統計的選択部１５０に与えられる。 Thus, the example translation unit 140 generates M (M ≧ N) output sentence candidates 132A,..., 132M for one input sentence 120. The generated output sentence candidates 132A,..., 132M are given to the statistical selection unit 150.

図９を参照して、出力文候補１３２Ａ，…，１３２Ｍの１つ、例えば出力文候補１３２Ａが言語モデル確率算出部２２０、翻訳モデル確率算出部２２２、および乗算部２２６に与えられる。言語モデル確率算出部２２０は、図５に示す単語バイグラムモデルからなる言語モデル１０２を用いて、与えられた出力文候補１３２Ａの言語モデル確率を次の式にしたがい算出する。 Referring to FIG. 9, one of output sentence candidates 132A,..., 132M, for example, output sentence candidate 132A is given to language model probability calculation unit 220, translation model probability calculation unit 222, and multiplication unit 226. The language model probability calculation unit 220 calculates the language model probability of the given output sentence candidate 132A according to the following expression, using the language model 102 composed of the word bigram model shown in FIG.

なお、該当するエントリが言語モデル１０２内にないバイグラムが出力文候補１３２Ａ中にある場合、そのバイグラムについてｐ（ｅ_i｜ｅ_i-1）＝１．０×１０^-7であるものとして、言語モデル確率の計算を行なう。

If there is a bigram whose output does not exist in the language model 102 in the output sentence candidate 132A, it is assumed that p (e _i | e _i-1 ) = 1.0 × 10 ⁻⁷ for that bigram. Calculate model probabilities.

翻訳モデル確率算出部２２２は、入力文１２０を受け、図６に示すＬｅｘｉｃｏｎモデルからなる翻訳モデル１０４を用いて、入力文１２０と与えられた出力文候補１３２Ａとの翻訳モデル確率を、次の式にしたがい算出する。 The translation model probability calculation unit 222 receives the input sentence 120 and uses the translation model 104 including the Lexicon model shown in FIG. 6 to calculate the translation model probability between the input sentence 120 and the given output sentence candidate 132A as Calculate according to

なお、該当するエントリが翻訳モデル１０４内にない単語の対応関係が出力文候補１３２Ａと入力文１２０との間にある場合、その単語の組（ｅ_i，ｆ_j）について、ｔ（ｆ_j｜ｅ_i）＝１．０×１０^-7であるものとして、翻訳モデル確率の計算を行なう。

Note that if the correspondence between words for which there is no corresponding entry in the translation model 104 is between the output sentence candidate 132A and the input sentence 120, t (f _j |) for the word set (e _i , f _j ). Translation model probabilities are calculated assuming that e _i ) = 1.0 × 10 ⁻⁷ .

乗算部２２４は、言語モデル確率算出部２２０により算出された言語モデル確率Ｐ（Ｅ）と翻訳モデル算出部２２２により算出された翻訳モデル確率Ｐ（Ｆ｜Ｅ）とを乗算し、積Ｐ（Ｅ）Ｐ（Ｆ｜Ｅ）と与えられた出力文候補１３２Ａとを併せて、選択部２２６に与える。 The multiplication unit 224 multiplies the language model probability P (E) calculated by the language model probability calculation unit 220 and the translation model probability P (F | E) calculated by the translation model calculation unit 222 to obtain a product P (E ) P (F | E) and the given output sentence candidate 132A are given together to the selection unit 226.

言語モデル確率算出部２２０、翻訳モデル確率算出部２２２、および乗算部２２４は、上記した処理をさらに出力文候補１３２Ｂ，…，１３２Ｍの各々に対して実行し、出力文候補１３２Ｂ，…，１３２Ｍに、言語モデル確率と翻訳モデル確率との積からなる統計的確率スコアをそれぞれ付与して選択部２２６に与える。 The language model probability calculation unit 220, the translation model probability calculation unit 222, and the multiplication unit 224 further execute the above-described processing for each of the output sentence candidates 132B,. A statistical probability score composed of the product of the language model probability and the translation model probability is assigned to the selection unit 226.

選択部２２６は、統計的確率スコアがそれぞれ付与された出力文候補１３２Ａ，…，１３２Ｍのうち、その値が最大のものを選択する。選択部２２６は、選択した出力文候補を出力文１２２として出力する。 The selection unit 226 selects the output sentence candidate 132A,..., 132M to which the statistical probability score is assigned, that has the maximum value. The selection unit 226 outputs the selected output sentence candidate as the output sentence 122.

−翻訳例−
英語の文を日本語の文に翻訳する場合を例にとり、機械翻訳システム９０による翻訳の具体例を示す。例えば、「ｗｈｅｒｅｉｓｔｈｅｎｅａｒｅｓｔｒｅｓｔａｕｒａｎｔ」という英語の文が入力文１２０として用例翻訳部１４０に与えられたものとする。Ｎ＝２とする。 -Translation example-
Taking a case where an English sentence is translated into a Japanese sentence as an example, a specific example of translation by the machine translation system 90 will be shown. For example, it is assumed that an English sentence “where is the nearest resturant” is given to the example translation unit 140 as the input sentence 120. Let N = 2.

図４を参照して、用例ベース１０６の番号１の用例の英語側の文「ｗｈｅｒｅｉｓｔｈｅｎｅａｒｅｓｔｓｕｂｗａｙ」には、入力文１２０と異なる単語は１つ（ｓｕｂｗａｙ）含まれている。また番号２の用例の英語側の文「ｗｈｅｒｅｉｓｔｈｅｃｈｅａｐｅｓｔｒｅｓｔａｕｒａｎｔ」にも、入力文１２０と異なる単語は１つ（ｃｈｅａｐｅｓｔ）含まれている。番号３の用例「ｔｏｄａｙｉｓｆｒｅｅ」には、入力文１２０と異なる単語が２つ存在する。この場合、図８に示す検索部２００は番号１の用例と番号２の用例とを用例ベース１０６から検索する。 Referring to FIG. 4, the English sentence “where is the nearest subway” of the example number 1 of the example base 106 includes one word different from the input sentence 120. Also, the sentence “where is the cheapest restaurant” on the English side of the example of No. 2 includes one word (cheapest) different from the input sentence 120. In the example “today is free” of number 3, there are two words different from the input sentence 120. In this case, the search unit 200 shown in FIG. 8 searches the example base 106 for the number 1 example and the number 2 example.

番号１の用例の目的言語単語列は、「最寄り／の／地下鉄／の／駅／は／どこ／です／か」である。また番号２の用例の目的言語単語列は、「一番／安い／レストラン／は／どこ／です／か」である。 The target language word string of the number 1 example is “nearest / no / subway / no / station / ha / where / is / ka”. In addition, the target language word string of the example of No. 2 is “most / cheap / restaurant / ha / where / is /?”.

番号１の用例では、原言語単語列中の異なり単語である「ｓｕｂｗａｙ」は、目的言語単語列中の単語「地下鉄」に対応する。図８に示す修正部２０４は、図７に示す対訳辞書１０８より、入力文１２０中の「ｒｅｓｔａｕｒａｎｔ」の訳語「レストラン」を検索する。修正部２０４は、番号１の用例における目的言語単語列中の単語「地下鉄」を対訳辞書１０８から検索した「レストラン」に置換して、出力文候補を生成し出力する。 In the example of number 1, “subway”, which is a different word in the source language word string, corresponds to the word “subway” in the target language word string. 8 retrieves the translated word “restaurant” of “restaurant” in the input sentence 120 from the bilingual dictionary 108 shown in FIG. The correcting unit 204 replaces the word “subway” in the target language word string in the example of number 1 with “restaurant” searched from the parallel translation dictionary 108, and generates and outputs an output sentence candidate.

番号２の用例では、原言語単語列中の異なり単語である「ｃｈｅａｐｅｓｔ」は、目的言語単語列中の単語列「一番／安い」に対応する。また、検索部２００は、図７に示す対訳辞書１０８より、入力文１２０中の「ｎｅａｒｅｓｔ」の訳語として「最寄り」および「一番／近い」を検索する。用例翻訳部１４０は、番号２の用例における目的言語単語列中の単語「一番／安い」を「最寄り」に置換した出力文候補と、目的言語単語列中の単語「一番／安い」を「最寄り」に置換した出力文候補とを生成し出力する。 In the example of number 2, “cheapest”, which is a different word in the source language word string, corresponds to the word string “best / cheap” in the target language word string. Further, the search unit 200 searches for “nearest” and “nearest / closest” as the translation of “nearest” in the input sentence 120 from the bilingual dictionary 108 shown in FIG. The example translation unit 140 replaces the word “most / cheap” in the target language word string in the example number 2 with “nearest” and the word “best / cheap” in the target language word string. Generate and output the output sentence candidate replaced with "Nearest".

このようにして、以下の３つの出力文候補が作成される。すなわち、
（ａ）「最寄り／の／レストラン／の／駅／は／どこ／です／か」
（ｂ）「一番／近い／レストラン／は／どこ／です／か」
（ｃ）「最寄り／レストラン／は／どこ／です／か」
である。以下、これらの出力文候補をそれぞれ「Ｅ_a」、「Ｅ_b」、および「Ｅ_c」とする。 In this way, the following three output sentence candidates are created. That is,
(A) “Nearest / No / Restaurant / No / Station / Ha / Where / Is / K”
(B) “The closest / closest / restaurant / ha / where / is / ka”
(C) “Nearest / Restaurant / Ha / Where / Is it /?”
It is. Hereinafter, these output sentence candidates are referred to as “E _a ”, “E _b ”, and “E _c ”, respectively.

下の表に、図５に示す言語モデル１０２をもとに算出した出力文候補Ｅ_a、Ｅ_b、およびＥ_cについての言語モデル確率Ｐ（Ｅ）、図６に示す翻訳モデル１０４をもとに算出した出力文候補Ｅ_a、Ｅ_b、およびＥ_cの各々と入力文１２０とついての翻訳モデル確率Ｐ（Ｆ｜Ｅ）、およびそれらの積からなる統計的確率スコアＰ（Ｅ）Ｐ（Ｆ｜Ｅ）を示す。 The table below shows the language model probabilities P (E) for the output sentence candidates E _a , E _b , and E _c calculated based on the language model 102 shown in FIG. 5, and the translation model 104 shown in FIG. The translation model probabilities P (F | E) for each of the output sentence candidates E _a , E _b , and E _c calculated in the above and the input sentence 120, and a statistical probability score P (E) P ( F | E).

この表より、統計的確率スコアＰ（Ｅ）Ｐ（Ｆ｜Ｅ）が最大のものは、出力文候補Ｅ_bである。よって、出力文候補Ｅ_b、すなわち「一番近いレストランはどこですか」という日本語の文が、選択部２２６により選択され出力文１２２として出力される。

From this table, the output sentence candidate E _b has the largest statistical probability score P (E) P (F | E). Therefore, the output sentence candidate E _b , that is, the Japanese sentence “Where is the nearest restaurant” is selected by the selection unit 226 and output as the output sentence 122.

以上のように、本実施の形態に係る機械翻訳装置１３０では、従来の用例翻訳とは異なり、入力文と用例の原言語側との類似性のみをもとに一意に用例を決定することはしない。そうではなく、入力文に類似する複数の用例を検索し、それらに基づいて翻訳を行なう。そのため、複数の用例が競合する場合でもその中から一つを選択することによる問題が生ずることはない。また、この機械翻訳装置１３０は、そのようにして選択された複数の用例をもとに複数の出力文候補を生成する。これら複数の出力文候補全体の中に出力文として適切なものが含まれる確率は、単一の出力文候補のみを生成する場合と比較してより高くなる。よって複数の出力文候補の中から適切なものを選択することで、高品質の出力文が出力される可能性を高めることができる。 As described above, in machine translation apparatus 130 according to the present embodiment, unlike conventional example translation, it is not possible to uniquely determine an example based only on the similarity between the input sentence and the source language side of the example. do not do. Instead, a plurality of examples similar to the input sentence are searched, and translation is performed based on these examples. Therefore, even when a plurality of examples compete, there is no problem caused by selecting one of them. In addition, the machine translation apparatus 130 generates a plurality of output sentence candidates based on the plurality of examples selected as described above. The probability that an appropriate output sentence is included in all of the plurality of output sentence candidates is higher than when only a single output sentence candidate is generated. Therefore, by selecting an appropriate one from a plurality of output sentence candidates, it is possible to increase the possibility that a high-quality output sentence will be output.

さらに、機械翻訳装置１３０は、統計翻訳で用いられる言語モデルと翻訳モデルとを利用して、出力文候補の中から出力文を選択する。言語モデルは、文の流暢さを表わすモデルであり、翻訳モデルは、原言語と目的言語との単語訳の正確さを表わすモデルである。これらのモデルを用い、言語モデル確率と翻訳モデル確率とを総合的に勘案することにより、出力文候補の中から、目的言語の文として自然で、かつ入力文の翻訳として正確なものを選択することができる。 Furthermore, the machine translation apparatus 130 selects an output sentence from the output sentence candidates using a language model and a translation model used in statistical translation. The language model is a model representing the fluency of sentences, and the translation model is a model representing the accuracy of word translation between the source language and the target language. By using these models and comprehensively considering the language model probabilities and translation model probabilities, the output sentence candidates are selected as natural as the target language sentence and accurate as the translation of the input sentence. be able to.

なお、上記した実施の形態では、検索部２００（図８参照）は、入力文との異なり単語数を基準として、入力文に類似する用例を検索した。しかし、本発明はそうした実施の形態に限定されるわけではない。入力文に類似する用例を検索するための基準として、編集距離を用いても良い。また、異なり単語の個数または編集距離による類似度の算出において、シソーラスを利用して求められる意味距離を勘案し、異なり単語の個数または編集距離の算出において、意味的に近い単語の場合には単語個数または編集距離が小さくなるようにし、類似用例を検索してもよい。この場合、差分同定部２０２および修正部２０４はそれぞれ、検索の基準に応じた方法で入力文と検索された用例との差分の同定、および検索された用例の修正を行なうことが望ましい。 In the above-described embodiment, the search unit 200 (see FIG. 8) searches for an example similar to the input sentence based on the number of words unlike the input sentence. However, the present invention is not limited to such an embodiment. The edit distance may be used as a reference for searching for an example similar to the input sentence. Also, in calculating the similarity based on the number of different words or the edit distance, the semantic distance obtained using the thesaurus is taken into account, and in calculating the number of different words or the edit distance, if the word is semantically close, the word Similar examples may be searched by reducing the number or editing distance. In this case, it is desirable that the difference identification unit 202 and the correction unit 204 respectively identify the difference between the input sentence and the searched example and correct the searched example by a method according to the search criteria.

また、検索部２００により検索される用例の数Ｎは、予め与えられた固定値であってもよいし、状況に応じて調節されるようにしてもよい。例えば、検索部２００が入力文と完全に一致する原言語文を持つ用例の検索に成功した場合に、これに応答して用例の数を１にして、その段階で検索を打切るようにしてもよい。 The number N of examples searched by the search unit 200 may be a fixed value given in advance or may be adjusted according to the situation. For example, when the search unit 200 succeeds in retrieving an example having a source language sentence that completely matches the input sentence, the number of examples is set to 1 in response to this, and the search is terminated at that stage. Also good.

翻訳の方式は、上記した方式すなわち用例ベースに含まれる単語対応情報を用いて用例中の異なり単語を特定する方式に限らず、構文トランスファ方式（非特許文献２）を用いて入力文の構文木を目的言語の構文木にマッピングし、得られた木構造の葉に非終端記号が残る場合に、対訳辞書を参照して当該葉の訳語候補を求めるような方式を用いてもよい。その他、複数の出力文候補が出力可能な構成であれば、どのような方式の用例翻訳を用いてもよい。 The translation method is not limited to the above-described method, that is, the method of specifying different words in the example using the word correspondence information included in the example base, and the syntax tree of the input sentence using the syntax transfer method (Non-Patent Document 2). May be mapped to the syntax tree of the target language, and when a non-terminal symbol remains in the leaf of the obtained tree structure, a translation word candidate of the leaf may be obtained by referring to the bilingual dictionary. In addition, any type of example translation may be used as long as a plurality of output sentence candidates can be output.

上記した実施の形態では、言語モデル１０２は、単語バイグラムモデルであり、言語モデル確率算出部２２０は、単語バイグラムモデルを用いて言語モデル確率を算出した。しかし、本発明はこのような実施の形態に限定されるわけではなく、単語トライグラムモデル、品詞Ｎグラムモデル等のＮグラムモデル、それらのモデルを組合せたもの、または統計翻訳において用いるその他の言語モデルを用いることも可能である。これらの言語モデルは、本実施の形態におけるように用例翻訳に用いる対訳コーパス１００から作成されたものであってもよいし、別のコーパスから作成されたものであってもよい。 In the above-described embodiment, the language model 102 is a word bigram model, and the language model probability calculation unit 220 calculates the language model probability using the word bigram model. However, the present invention is not limited to such an embodiment. N-gram models such as word trigram models, part-of-speech N-gram models, combinations of these models, or other languages used in statistical translation It is also possible to use a model. These language models may be created from the bilingual corpus 100 used for example translation as in the present embodiment, or may be created from another corpus.

上記した実施の形態では、翻訳モデルは、Ｌｅｘｉｃｏｎモデルであるが、本発明はこのような実施の形態に限定されるわけではない。その他の翻訳モデル、例えば、Ｆｅｒｔｉｌｉｒｙ、ＮＵＬＬ生成モデル、Ｄｉｓｔｏｒｔｉｏｎモデルと組合せた翻訳モデルを用いることも可能である。これらの言語モデルは、本実施の形態のように用例翻訳に用いる対訳コーパス１００をもとに作成されたものであってもよいし、対訳コーパス１００とは別のコーパスをもとに作成されたものであってもよい。 In the above-described embodiment, the translation model is a Lexicon model, but the present invention is not limited to such an embodiment. It is also possible to use other translation models, for example, a translation model combined with a Fertility, NULL generation model, and a distortion model. These language models may be created based on a parallel corpus 100 used for example translation as in the present embodiment, or created based on a corpus different from the parallel corpus 100. It may be a thing.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

本発明の一実施の形態に係る機械翻訳装置の機能を実現するコンピュータシステム３０の一例を示す外観図である。It is an external view which shows an example of the computer system 30 which implement | achieves the function of the machine translation apparatus which concerns on one embodiment of this invention. コンピュータシステム３０の内部構成を示す図である。2 is a diagram showing an internal configuration of a computer system 30. FIG. 本発明の実施の形態に係る機械翻訳装置１３０の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the machine translation apparatus 130 which concerns on embodiment of this invention. 用例ベース１０６の構成を示す図である。3 is a diagram illustrating a configuration of an example base 106. FIG. 言語モデル１０２の構成を示す図である。3 is a diagram illustrating a configuration of a language model 102. FIG. 翻訳モデル１０４の構成を示す図である。3 is a diagram illustrating a configuration of a translation model 104. FIG. 対訳辞書１０８の構成を示す図である。It is a figure which shows the structure of the bilingual dictionary. 用例翻訳部１４０の機能的構成を示すブロック図である。3 is a block diagram showing a functional configuration of an example translation unit 140. FIG. 統計的選択部１５０の機能的構成を示すブロック図である。3 is a block diagram showing a functional configuration of a statistical selection unit 150. FIG. 機械翻訳装置１３０において実行される処理の構造を示すフローチャートである。3 is a flowchart showing a structure of processing executed in the machine translation device 130. 用例翻訳を行なう従来の典型的な機械翻訳装置の構成を示す概略図である。It is the schematic which shows the structure of the conventional typical machine translation apparatus which performs example translation.

Explanation of symbols

３０コンピュータシステム、４０コンピュータ、４２モニタ、４４プリンタ、４６キーボード、４８マウス、５０ＣＤ−ＲＯＭドライブ、５２ＦＤドライブ、５４ハードディスク、５６ＣＰＵ、５８ＲＯＭ、６０ＲＡＭ、６２ＣＤ−ＲＯＭ、６４ＦＤ、６６バス、９０機械翻訳システム、１００対訳コーパス、１０２言語モデル、１０４翻訳モデル、１０６用例ベース、１０８対訳辞書、１１０言語モデル作成装置、１１２翻訳モデル作成装置、１２０入力文、１２２出力文、１３０機械翻訳装置、１３２Ａ，…，１３２Ｍ出力文候補、１４０用例翻訳部、１５０統計的選択部、２００検索部、２０２差分同定部、２０４修正部、２２０言語モデル確率算出部、２２２翻訳モデル確率算出部、２２４乗算部、２２６選択部 30 computer system, 40 computer, 42 monitor, 44 printer, 46 keyboard, 48 mouse, 50 CD-ROM drive, 52 FD drive, 54 hard disk, 56 CPU, 58 ROM, 60 RAM, 62 CD-ROM, 64 FD, 66 Bus, 90 machine translation system, 100 bilingual corpus, 102 language model, 104 translation model, 106 example base, 108 bilingual dictionary, 110 language model creation device, 112 translation model creation device, 120 input sentence, 122 output sentence, 130 machine translation Device, 132A,..., 132M Output sentence candidate, 140 Example translation unit, 150 Statistical selection unit, 200 Search unit, 202 Difference identification unit, 204 Correction unit, 220 Language model probability calculation unit, 222 Translation model probability calculation unit 224 multiplication unit, 226 the selection unit

Claims

A predetermined example base including a plurality of examples of pairs of sentences in a first language and sentences in a second language;
Example translation means for receiving an input sentence in the first language and generating a plurality of translation sentence candidates in the second language for the input sentence with reference to the example base;
Statistical selection means for selecting and outputting a candidate whose probability score calculated using a predetermined probability statistical model satisfies a predetermined condition among the plurality of candidates generated by the example translation means Machine translation device.

The example translation means includes:
Receiving the input sentence, the example having the sentence in the first language satisfying a predetermined similarity condition with the input sentence is searched in the example base, and a plurality of sentences each including the searched first language sentence Search means for extracting examples;
The correction means for correcting each sentence of the second language of the plurality of examples and generating a translation sentence candidate for the input sentence from each of the plurality of examples. Machine translation device.

The search means receives the input sentence, searches the example base for a plurality of sentences in the first language having the smallest number of words unlike the input sentence, and searches the sentence in the first language. The machine translation apparatus according to claim 2, further comprising means for acquiring the plurality of examples including each.

A bilingual dictionary between the first language and the second language;
The correcting means is
A difference identifying means for comparing the input sentence with the sentence in the first language of each of the plurality of examples, and identifying a difference from the input sentence for each of the plurality of examples;
The sentence of the second language of the plurality of examples is corrected by referring to the bilingual dictionary based on the difference identified by the difference identification unit, and the translation of the input sentence from each of the plurality of examples The machine translation apparatus according to claim 3, further comprising candidate generation means for generating sentence candidates.

The bilingual dictionary may include a plurality of words in the second language as translations for one word in the first language,
The candidate generating means is configured to obtain one or a plurality of the sentence in the second language of each of the plurality of examples by referring to the bilingual dictionary based on the difference identified by the difference identifying means. The machine translation apparatus according to claim 4, comprising means for generating one or a plurality of translation sentence candidates for the input sentence by correcting each word using a second language word.

The means for generating is obtained by referring to the bilingual dictionary based on each of the differences identified by the difference identification means for the sentence in the second language of each of the plurality of examples. 6. A means for generating one or more translation sentence candidates for the input sentence by modifying according to each possible combination using a plurality of words in the second language. The machine translation device described in 1.

The search means receives the input sentence, searches the example base for a plurality of sentences in the first language that have the smallest editing distance with the input sentence, and searches for the sentence in the first language. The machine translation device according to claim 2, further comprising means for acquiring a plurality of examples each including

The search means receives the input sentence, and selects a plurality of sentences in the first language that have a minimum editing distance from the input sentence calculated in consideration of a semantic distance between words in the example base. The machine translation device according to claim 2, further comprising means for retrieving and acquiring a plurality of examples each including the retrieved sentence in the first language.

The statistical selection means includes means for selecting and outputting the highest probability score calculated using a predetermined probability statistical model among the plurality of candidates generated by the example translation means. The machine translation device according to any one of claims 1 to 8.

And further comprising language model storage means for storing a language model of the second language,
The means for outputting is:
Language probability calculating means for calculating a language probability using the language model stored in the language storage means for each of the plurality of candidates;
The machine translation apparatus according to claim 9, further comprising: means for selecting and outputting a candidate having the highest language probability calculated by the language probability calculation means.

A translation model storage unit for storing a translation model from the second language to the first language;
The means for outputting is:
For each of the plurality of candidates, a translation probability calculation means for calculating a translation probability using the translation model stored in the translation model storage means;
The machine translation apparatus according to claim 9, further comprising: means for selecting and outputting a candidate having the highest translation probability calculated by the translation probability calculation means.

A language model storage means for storing a language model of the second language;
Translation model storage means for storing a translation model from the second language to the first language;
The means for outputting is:
Language probability calculating means for calculating a language probability using the language model stored in the language storage means for each of the plurality of candidates;
For each of the plurality of candidates, a translation probability calculation means for calculating a translation probability using the translation model stored in the translation model storage means;
Score calculating means for calculating a predetermined probability score as a function of the language probability calculated by the language probability calculating means and the translation probability calculated by the translation probability calculating means;
The machine translation apparatus according to claim 9, further comprising: means for selecting and outputting a candidate having the highest probability score calculated by the score calculation means.

A machine translation computer program that, when executed by a computer, causes the computer to operate as the machine translation device according to any one of claims 1 to 12.