JP3437782B2

JP3437782B2 - Machine translation method and apparatus, and medium storing machine translation program

Info

Publication number: JP3437782B2
Application number: JP06616699A
Authority: JP
Inventors: 直樹麻野間; 浩巳中岩
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-03-12
Filing date: 1999-03-12
Publication date: 2003-08-18
Anticipated expiration: 2019-03-12
Also published as: JP2000259630A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、機械翻訳システム
が翻訳結果文中に原言語のままの未訳出文字列が出現す
る際に、目的言語の単語共起情報を利用して、未訳出文
字列に対する訳語候補を出力する機械翻訳方法及びその
装置並びに機械翻訳プログラムを記憶した媒体に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention uses a word co-occurrence information of a target language when an untranslated character string in the original language appears in a translation result sentence by a machine translation system. The present invention relates to a machine translation method and apparatus for outputting a translation word candidate for, and a medium storing a machine translation program.

【０００２】[0002]

【従来の技術】辞書やルールを用いたルールベースの機
械翻訳システムにおいては、通常、辞書やルールにより
入力文の構造を解析した後、翻訳文を生成する処理が行
われる。実際の機械翻訳システムの性質として、この入
力文を解析した後で、訳語が辞書に登録されていない原
言語単語（未知語）が現れることがある。このように翻
訳辞書に訳語が見つけられない単語は、翻訳結果文中に
原言語単語のまま（未訳出のまま）出力され、訳文品質
を大きく低下させる原因となる。2. Description of the Related Art In a rule-based machine translation system using dictionaries and rules, the structure of an input sentence is usually analyzed by the dictionaries and rules, and then a translated sentence is generated. As a property of an actual machine translation system, after the input sentence is analyzed, a source language word (unknown word) whose translated word is not registered in the dictionary may appear. In this way, a word for which a translated word cannot be found in the translation dictionary is output as it is in the translation result sentence as a source language word (untranslated), which causes a significant reduction in translated text quality.

【０００３】機械翻訳処理中で出現する未知語に対する
対策としては、未知語として機械翻訳システムが判断し
た原言語単語を、人間の判断に基づいてその訳語ととも
に一つ一つ辞書に登録することが行われることが多い。As a measure against an unknown word appearing during the machine translation process, the source language word judged by the machine translation system as an unknown word is registered in the dictionary one by one together with the translated word based on human judgment. Often done.

【０００４】この人手による作業を軽減する方法として
は、例えば、既に辞書に登録されている語の派生語を、
接辞とその訳語の派生パターンから造語し、辞書を充実
させる方法（特開平５−２５７９６９号公報「機械翻訳
方法および装置」）等が挙げられる。このように、ある
辞書構築ルールに従って自動的に辞書を充実させる方法
がある。上記のように、人手もしくは自動的に未知語を
辞書に登録された語に対しては訳出が可能となる。As a method of reducing this manual work, for example, a derivative word of a word already registered in the dictionary is
A method of creating a word from an affix and a derivative pattern of its translated word to enhance the dictionary (Japanese Patent Laid-Open No. 5-257969 “Machine translation method and apparatus”) and the like can be mentioned. In this way, there is a method of automatically expanding the dictionary according to a certain dictionary construction rule. As described above, it is possible to translate unknown words manually or automatically into words registered in the dictionary.

【０００５】[0005]

【発明が解決しようとする課題】上記の人手による辞書
登録作業は、辞書に未登録と判断される度に更新作業に
対するコストや時間が生じ、根本的な未知語の対策には
なっていない。The above-mentioned manual dictionary registration work is not a fundamental countermeasure against unknown words because it costs and time for the updating work every time it is judged as not registered in the dictionary.

【０００６】また、機械翻訳システムの性質として、原
言語入力文の構造を解析する際に失敗することがある。
上記のように辞書やルールを充実する手法をとっても、
この解析の失敗によって、誤った単語分割、あるいは誤
った品詞判別が起こり、それらを含む原言語文字列に対
する目的言語訳語候補が見つからない場合がある。これ
により、翻訳結果文中に未訳出のまま出力されてしま
い、訳文品質を著しく下げてしまうという問題があっ
た。Further, as a property of the machine translation system, there is a case where it fails in analyzing the structure of the source language input sentence.
Even if you take the method of enriching the dictionary and rules as described above,
Due to the failure of this analysis, erroneous word division or erroneous part-of-speech determination may occur, and the target language translation word candidate for the source language character string including them may not be found. As a result, there is a problem that the translated text is output as it is untranslated and the translated text quality is significantly reduced.

【０００７】また、適切な訳語が機械翻訳システムの翻
訳辞書に元々登録されておらず、目的言語訳語候補の選
択肢が不十分となる問題がある。さらに、辞書やルール
を用いて訳語生成する際に、その訳語の適切性を十分に
考慮していないため、目的言語として不自然な訳語が並
び易いという問題がある。目的言語の適切性を考慮した
訳語を選ぶ場合には、対応する原言語単語についての共
起関係を考慮することが望ましい。Further, there is a problem that proper translation words are not originally registered in the translation dictionary of the machine translation system, and the choices of the target language translation word candidates become insufficient. Further, when a translated word is generated using a dictionary or a rule, the appropriateness of the translated word is not sufficiently taken into consideration, so that there is a problem that unnatural translated words are easily arranged as a target language. When selecting a translated word that considers the appropriateness of the target language, it is desirable to consider the co-occurrence relationship for the corresponding source language word.

【０００８】本発明の目的は、上記の点に鑑みて、翻訳
結果文中に未訳出文字列が残ってしまう問題点を解決
し、人手による辞書更新作業を行うことなく、目的言語
の単語対とその共起頻度情報からなるエントリの集合を
保持する共起情報データベースを用いて、目的言語とし
ての適切性を考慮した訳語候補列、もしくは原言語形態
素解析に使うための入力原言語文の単語分割候補を出力
する機械翻訳方法及びその装置並びに機械翻訳プログラ
ムを記憶した媒体を提供することにある。In view of the above points, an object of the present invention is to solve the problem that untranslated character strings remain in a translation result sentence, and to create a word pair of a target language without manually updating the dictionary. Using a co-occurrence information database that holds a set of entries consisting of co-occurrence frequency information, a candidate word sequence considering the appropriateness as a target language, or word segmentation of an input source language sentence for use in source language morphological analysis It is to provide a machine translation method and apparatus for outputting candidates and a medium storing a machine translation program.

【０００９】[0009]

【課題を解決するための手段】図１は、本発明方法の概
要を示す流れ図である。FIG. 1 is a flow chart showing the outline of the method of the present invention.

【００１０】上記の目的を達成するため、本発明の機械
翻訳方法は、原言語単語と目的言語訳語候補の対訳関係
の集合を保持する翻訳辞書を用いて、入力原言語文を目
的言語の文である目的言語機械訳文へ自動的に翻訳を行
い（Ｓ１）、目的言語の訳語が抽出できなかった未訳出
文字列を検出し（Ｓ２）、翻訳辞書を用いて、未訳出文
字列の目的言語訳語候補である未訳出訳語候補を検索し
（Ｓ４）、目的言語の単語対とその共起頻度情報からな
るエントリの集合を保持する目的言語共起情報データベ
ースを用いて、未訳出訳語候補中の目的言語訳語候補
と、前記入力原言語文中の訳出済みの目的言語単語との
組で構成される各目的言語訳語候補対の訳語候補共起強
度を計算し（Ｓ５）、訳語候補共起強度を用いて、未訳
出文字列に対する目的言語訳語候補を選択する（Ｓ
６）。In order to achieve the above object, the machine translation method of the present invention uses an input source language sentence as a sentence of a target language by using a translation dictionary that holds a set of bilingual relations between a source language word and a target language translation candidate. Is automatically translated into a target language machine translated sentence (S1), an untranslated character string for which a translated word of the target language could not be extracted is detected (S2), and the target language of the untranslated character string is detected using the translation dictionary. An untranslated translation word candidate that is a translation word candidate is searched (S4), and a target language co-occurrence information database that holds a set of entries consisting of word pairs of the target language and their co-occurrence frequency information is used. The target word translation candidate co-occurrence strength of each target language translation candidate pair composed of a set of the target language translation candidate and the translated target language word in the input source language sentence is calculated (S5), and the translation candidate co-occurrence strength is calculated. Use the eyes for untranslated strings To select a language translation candidates (S
6).

【００１１】また、本発明の機械翻訳方法は、翻訳辞書
を用いて、未訳出文字列以外の入力原言語文中の単語の
目的言語訳語候補である訳出単語訳語候補を検索し（Ｓ
４）、訳出単語訳語候補と未訳出訳語候補中の目的言語
訳語候補の組み合わせからなる各目的言語訳語候補対の
訳語候補共起強度も計算し（Ｓ５）、訳語候補共起強度
を用いて、入力原言語文に対する目的言語訳語候補列を
選択する（Ｓ６）。Further, the machine translation method of the present invention searches for a translation word translation candidate which is a target language translation candidate of a word in an input source language sentence other than an untranslated character string by using a translation dictionary (S
4), the translation candidate co-occurrence strength of each target language translation candidate pair consisting of a combination of the translated word translation word candidate and the target language translation word candidate in the untranslated translation word candidate is also calculated (S5), and the translation word candidate co-occurrence strength is used. A target language translation candidate string for the input source language sentence is selected (S6).

【００１２】また、本発明の機械翻訳方法は、入力原言
語文の形態素解析を行い、単語分割候補を生成し（Ｓ
３）、翻訳辞書を用いて、単語分割候補中の原言語単語
に対する目的言語訳語候補である分割単語訳語候補を検
索し（Ｓ４）、訳語候補生成ステップで得られる分割単
語訳語候補も含めて目的言語訳語候補として訳語候補共
起強度を計算し（Ｓ５）、訳語候補共起強度を用いて、
形態素解析ステップから得られる各単語分割候補に対す
る単語分割尤度を計算し、単語分割尤度を用いて、最尤
となる単語分割候補を選択して出力する（Ｓ７）。Further, the machine translation method of the present invention performs morphological analysis of an input source language sentence to generate word division candidates (S
3), using the translation dictionary, search for the divided word translated word candidates that are the target language translated word candidates for the source language word in the word divided candidates (S4), and also include the divided word translated word candidates obtained in the translated word candidate generating step. A translation word candidate co-occurrence strength is calculated as a language translation word candidate (S5), and the translation word candidate co-occurrence strength is used.
The word division likelihood for each word division candidate obtained from the morphological analysis step is calculated, and the word division likelihood is selected and output using the word division likelihood (S7).

【００１３】また、本発明の機械翻訳方法は、機械翻訳
ステップで用いる翻訳辞書に加えて、原言語単語と目的
言語訳語候補の対訳関係の集合を保持する別の翻訳辞書
も参照して、原言語単語列中の各原言語単語の目的言語
訳語候補を検索する（Ｓ４）。In addition to the translation dictionary used in the machine translation step, the machine translation method of the present invention also refers to another translation dictionary that holds a set of bilingual relations between source language words and target language translation candidate, A target language translation candidate for each source language word in the language word string is searched (S4).

【００１４】また、本発明の機械翻訳方法は、原言語の
単語対とその共起頻度情報からなるエントリの集合を保
持する原言語共起情報データベースを用いて、原言語単
語対に対する原言語における共起強度をもとに、対応す
る訳語候補共起強度に重み付けを行う（Ｓ５）。Further, the machine translation method of the present invention uses a source language co-occurrence information database that holds a set of entries consisting of source language word pairs and their co-occurrence frequency information in the source language for source language word pairs. Based on the co-occurrence strength, the corresponding translation word candidate co-occurrence strength is weighted (S5).

【００１５】図２は、本発明装置の概要を示す構成図で
ある。FIG. 2 is a block diagram showing the outline of the device of the present invention.

【００１６】さらに上記の目的を達成するため、本発明
の機械翻訳装置は、原言語単語と目的言語訳語候補の対
訳関係の集合を保持する翻訳辞書１０と、翻訳辞書１０
を用いて、入力原言語文を目的言語の文である目的言語
機械訳文へ自動的に翻訳を行う機械翻訳手段１と、機械
翻訳手段１において目的言語の訳語が抽出できなかった
部分文字列である未訳出文字列を検出する未訳出検出手
段２と、翻訳辞書１０を用いて、未訳出文字列の目的言
語訳語候補である未訳出訳語候補を検索する訳語候補生
成手段４と、目的言語の単語対とその共起頻度情報から
なるエントリの集合を保持する目的言語共起情報データ
ベース（ＤＢ）２０と、目的言語共起情報ＤＢ２０を用
いて、未訳出訳語候補中の目的言語訳語候補と、入力原
言語文中の訳出済みの目的言語単語との組で構成される
各目的言語訳語候補対の訳語候補共起強度を計算する共
起強度検出手段５と、訳語候補共起強度を用いて、未訳
出文字列に対する目的言語訳語候補を選択する訳語決定
手段６とを有する。To achieve the above object, the machine translation apparatus of the present invention further includes a translation dictionary 10 that holds a set of bilingual relations between source language words and target language translation word candidates, and a translation dictionary 10.
By using the machine translation means 1 for automatically translating an input source language sentence into a target language machine translated sentence which is a sentence of the target language, and a partial character string from which the target language translation could not be extracted by the machine translation means 1. An untranslated detection unit 2 that detects an untranslated character string, a translation word candidate generation unit 4 that searches an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string using the translation dictionary 10, and a translation target candidate A target language co-occurrence information database (DB) 20 that holds a set of entries composed of word pairs and their co-occurrence frequency information, and a target language translation word candidate among untranslated translation word candidates using the target language co-occurrence information DB 20. Using the co-occurrence strength detection means 5 for calculating the co-occurrence strength of the target word candidate of each target language target word candidate pair composed of a pair of translated target language words in the input source language sentence, and the target word candidate co-occurrence strength, For untranslated strings And a translation determining means 6 for selecting a language translation candidate.

【００１７】また、本発明の機械翻訳装置は、訳語候補
生成手段４は、翻訳辞書１０を用いて、未訳出文字列以
外の入力原言語文中の単語の目的言語訳語候補である訳
出単語訳語候補を検索する手段を含み、共起強度検出手
段５は、訳出単語訳語候補と未訳出訳語候補中の目的言
語訳語候補の組み合わせからなる各目的言語訳語候補対
の訳語候補共起強度も計算する手段を含み、訳語決定手
段６は、訳語候補共起強度を用いて、入力原言語文に対
する目的言語訳語候補列を選択する手段を含む。Further, in the machine translation apparatus of the present invention, the translation word candidate generation means 4 uses the translation dictionary 10 and the translation word translation word candidate which is the target language translation word candidate of the word in the input source language sentence other than the untranslated character string. The co-occurrence strength detecting means 5 also calculates the co-occurrence strength of translation word candidates of each target language translation word candidate pair consisting of a combination of a translation word translation word candidate and a target language translation word candidate in the untranslated translation word candidates. The translation word determining means 6 includes means for selecting a target language translation word candidate string for the input source language sentence by using the translation word candidate co-occurrence strength.

【００１８】また、本発明の機械翻訳装置は、入力原言
語文の形態素解析を行い、単語分割候補を生成する形態
素解析手段３を有し、訳語候補生成手段４は、翻訳辞書
１０を用いて、単語分割候補中の原言語単語に対する目
的言語訳語候補である分割単語訳語候補を検索する手段
を含み、共起強度検出手段５は、訳語候補生成手段で得
られる分割単語訳語候補も含めて目的言語訳語候補とし
て訳語候補共起強度を計算する手段を含み、訳語候補共
起強度を用いて、形態素解析手段３から得られる各単語
分割候補に対する単語分割尤度を計算し、単語分割尤度
を用いて、最尤となる単語分割候補を選択して出力する
単語分割選択手段７を有する。Further, the machine translation apparatus of the present invention has a morpheme analysis means 3 for performing a morpheme analysis of an input source language sentence to generate word division candidates, and a translation word candidate generation means 4 uses a translation dictionary 10. , A target word translation candidate for the source language word in the word segmentation candidates, and a co-occurrence strength detection unit 5 that includes the segmented word translation word candidates obtained by the translation word candidate generation unit. A means for calculating a translation word candidate co-occurrence strength is included as a language translation candidate, and a word division likelihood is calculated for each word division candidate obtained from the morphological analysis means 3 using the translation word candidate co-occurrence strength, and a word division likelihood is calculated. It has a word division selecting means 7 for selecting and outputting the word division candidate having the maximum likelihood.

【００１９】また、本発明の機械翻訳装置は、訳語候補
生成手段４は、機械翻訳手段１で用いる翻訳辞書１０に
加えて、原言語単語と目的言語訳語候補の対訳関係の集
合を保持する別の翻訳辞書１０も参照して、原言語単語
列中の各原言語単語の目的言語訳語候補を検索する手段
を含む。In addition, in the machine translation device of the present invention, the translation word candidate generation means 4 holds, in addition to the translation dictionary 10 used by the machine translation means 1, a set of bilingual relations of source language words and target language translation word candidates. It also includes means for searching the target language translation candidate of each source language word in the source language word string by also referring to the translation dictionary 10.

【００２０】また、本発明の機械翻訳装置は、原言語の
単語対とその共起頻度情報からなるエントリの集合を保
持する原言語共起情報データベース（ＤＢ）３０を有
し、共起強度検出手段５は、原言語共起情報ＤＢ３０を
用いて、原言語単語対に対する原言語における共起強度
をもとに、対応する訳語候補共起強度に重み付けを行う
手段を含む。Further, the machine translation apparatus of the present invention has a source language co-occurrence information database (DB) 30 which holds a set of entries consisting of source language word pairs and their co-occurrence frequency information, and detects co-occurrence strength. The means 5 includes means for using the source language co-occurrence information DB 30 to weight the corresponding translation word candidate co-occurrence intensity based on the co-occurrence intensity in the source language for the source language word pair.

【００２１】なお、本発明の機械翻訳プログラムを記憶
した媒体は、コンピュータに前述した機械翻訳方法の各
ステップを実行させるためのプログラム、もしくはコン
ピュータを前述した機械翻訳装置の各手段として機能さ
せるためのプログラムを記憶している。The medium storing the machine translation program of the present invention is a program for causing a computer to execute each step of the machine translation method described above, or causes a computer to function as each unit of the machine translation apparatus described above. Remember the program.

【００２２】本発明の機械翻訳方法及びその装置並びに
機械翻訳プログラムを記憶した媒体においては、以下の
ステップまたは手段によって、機械翻訳の訳文中に出現
する未訳出文字列に対する訳語候補を出力する。In the machine translation method and apparatus of the present invention, and the medium storing the machine translation program, translation word candidates for untranslated character strings appearing in the translation of machine translation are output by the following steps or means.

【００２３】機械翻訳ステップまたは機械翻訳手段は、
原言語の文である入力原言語文を入力し、原言語単語と
目的言語訳語候補の対訳関係の集合を保持する翻訳辞書
を用いて、該入力原言語文を目的言語の文である目的言
語機械訳文へ自動的に翻訳を行う。The machine translation step or machine translation means is
An input source language sentence that is a source language sentence is input, and the input source language sentence is a target language sentence that is a target language sentence using a translation dictionary that holds a set of bilingual relations between the source language word and the target language translation candidate. Automatically translates into machine translation.

【００２４】未訳出検出ステップまたは未訳出検出手段
は、機械翻訳ステップまたは機械翻訳手段において目的
言語の訳語が抽出できなかった未訳出文字列を検出す
る。The untranslated detection step or untranslated detection means detects an untranslated character string for which the translated word of the target language could not be extracted in the machine translation step or machine translation means.

【００２５】訳語候補生成ステップまたは訳語候補生成
手段は、翻訳辞書を用いて、未訳出文字列の目的言語訳
語候補である未訳出訳語候補を検索する。The translation word candidate generation step or the translation word candidate generation means searches the translation dictionary for an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string.

【００２６】共起強度検出ステップまたは共起強度検出
手段は、目的言語の単語対とその共起頻度情報からなる
エントリの集合を保持する目的言語共起情報ＤＢを用い
て、未訳出訳語候補中の目的言語訳語候補と、入力原言
語文中の訳出済みの目的言語単語との組で構成される各
目的言語訳語候補対の訳語候補共起強度を計算する。The co-occurrence strength detecting step or the co-occurrence strength detecting means uses the target language co-occurrence information DB that holds a set of entries consisting of word pairs of the target language and their co-occurrence frequency information, and selects among untranslated word candidates. The target word translated word co-occurrence strength of each target language translated word candidate pair composed of a set of the target language translated word candidate and the translated target language word in the input source language sentence is calculated.

【００２７】訳語決定ステップまたは訳語決定手段は、
訳語候補共起強度を用いて、未訳出文字列に対する目的
言語訳語候補を選択する。The translated word determining step or translated word determining means is
Using the translation candidate co-occurrence strength, the target language translation candidate for the untranslated character string is selected.

【００２８】これにより、目的言語の適切性を考慮しな
がら、訳文品質低下の原因の未訳出文字列を目的言語の
単語に訳出することができる。This makes it possible to translate the untranslated character string, which is the cause of the quality deterioration of the translated sentence, into a word in the target language while considering the appropriateness of the target language.

【００２９】また、訳語候補生成ステップまたは訳語候
補生成手段は、翻訳辞書を用いて、未訳出文字列以外の
入力原言語文中の単語の目的言語訳語候補である訳出単
語訳語候補を検索し、共起強度検出ステップまたは共起
強度検出手段は、訳出単語訳語候補と未訳出訳語候補中
の目的言語訳語候補の組み合わせからなる各目的言語訳
語候補対の訳語候補共起強度も計算し、訳語決定ステッ
プまたは訳語決定手段は、訳語候補共起強度を用いて、
入力原言語文に対する目的言語訳語候補列を選択する。Further, the translation word candidate generation step or the translation word candidate generation means searches the translation word translation word candidate, which is a target language translation word candidate of the word in the input source language sentence other than the untranslated character string, by using the translation dictionary, and The coercive strength detection step or co-occurrence strength detection means also calculates a co-occurrence strength of a candidate word for each target language translation candidate pair consisting of a combination of a translated word translation candidate and a target language translation candidate in the untranslated translation word candidate, and a translation determination step. Alternatively, the translation word determining means uses the translation word candidate co-occurrence strength,
Select the target language translation candidate sequence for the input source language sentence.

【００３０】これにより、入力原言語文全体としての訳
語候補の適切性を考慮しながら、未訳出文字列以外の原
言語単語に対する訳語候補も選択することができる。This makes it possible to select translation word candidates for source language words other than the untranslated character strings while considering the appropriateness of translation word candidates for the entire input source language sentence.

【００３１】また、形態素解析ステップまたは形態素解
析手段は、入力原言語文の形態素解析を行い、単語分割
候補を生成する。訳語候補生成ステップまたは訳語候補
生成手段は、翻訳辞書を用いて、単語分割候補中の原言
語単語に対する目的言語訳語候補である分割単語訳語候
補を検索する。The morpheme analysis step or morpheme analysis means performs morpheme analysis on the input source language sentence to generate word division candidates. The translation word candidate generation step or the translation word candidate generation means searches for a divided word translation word candidate that is a target language translation word candidate for the source language word in the word division candidates using the translation dictionary.

【００３２】共起強度検出ステップまたは共起強度検出
手段は、訳語候補生成ステップまたは訳語候補生成手段
で得られる分割単語訳語候補も含めて目的言語訳語候補
として訳語候補共起強度を計算する。単語分割選択ステ
ップまたは単語分割選択手段は、訳語候補共起強度を用
いて、形態素解析ステップまたは形態素解析手段から得
られる各単語分割候補に対する単語分割尤度を計算し、
単語分割尤度を用いて、最尤となる単語分割候補を選択
して出力する。The co-occurrence strength detection step or co-occurrence strength detection means calculates the translation word candidate co-occurrence strength as the target language translation word candidate including the divided word translation word candidates obtained by the translation word candidate generation step or the translation word candidate generation means. The word division selection step or word division selection means uses the translation word candidate co-occurrence strength to calculate the word division likelihood for each word division candidate obtained from the morpheme analysis step or the morpheme analysis means,
The word division likelihood is used to select and output the word division candidate having the maximum likelihood.

【００３３】これにより、上記の最尤な単語分割候補を
用いて、入力原言語文の解析のやり直しが可能となり、
さらにこの処理に伴って出力される訳語候補列を機械翻
訳システムの目的言語文生成処理に利用することができ
る。ゆえに、入力原言語文の解析失敗による未訳出の現
象を解決することができる。As a result, it becomes possible to redo the analysis of the input source language sentence by using the above-mentioned most likely word segmentation candidate,
Further, the translated word candidate string output along with this processing can be used for the target language sentence generation processing of the machine translation system. Therefore, an untranslated phenomenon due to a failure in parsing the input source language sentence can be solved.

【００３４】また、訳語候補生成ステップまたは訳語候
補生成手段は、機械翻訳ステップまたは機械翻訳手段で
用いる翻訳辞書に加えて、原言語単語と目的言語訳語候
補の対訳関係の集合を保持する別の翻訳辞書も参照し
て、原言語単語列中の各原言語単語の目的言語訳語候補
を検索する。Further, the translation word candidate generation step or the translation word candidate generation means, in addition to the translation dictionary used in the machine translation step or the machine translation means, another translation holding a set of bilingual relations between the source language word and the target language translation word candidate. Also referring to the dictionary, a target language translation candidate of each source language word in the source language word string is searched.

【００３５】これにより、目的言語訳語候補を増やし、
適当な訳語を選択する可能性を高めることができる。As a result, the target language translated word candidates are increased,
The possibility of selecting an appropriate translation word can be increased.

【００３６】また、共起強度検出ステップまたは共起強
度検出手段は、原言語の単語対とその共起頻度情報から
なるエントリの集合を保持する原言語共起情報ＤＢを用
いて、原言語単語対に対する原言語における共起強度を
もとに、対応する訳語候補共起強度に重み付けを行う。Further, the co-occurrence strength detection step or the co-occurrence strength detection means uses the source language co-occurrence information DB that holds a set of entries consisting of the source language word pairs and their co-occurrence frequency information, and uses the source language words. Based on the co-occurrence strength in the source language for the pair, the corresponding translation candidate co-occurrence strength is weighted.

【００３７】これにより、原言語で共起し易い単語の訳
語候補の共起関係を重視して訳語選択することが可能と
なり、訳語選択精度をより向上させることができる。As a result, it is possible to select a translated word by emphasizing the co-occurrence relationship of translated word candidates of a word that easily co-occurs in the source language, and it is possible to further improve the translated word selection accuracy.

【００３８】従って、上記のステップを実行するか、上
記の手段を用いることにより、機械翻訳の訳文中に出現
する未訳出文字列に対する訳語候補を出力することが可
能となる。Therefore, by executing the above steps or by using the above means, it is possible to output the translated word candidates for the untranslated character strings appearing in the translated text of the machine translation.

【００３９】[0039]

【発明の実施の形態】以下、本発明の実施の形態を図面
とともに説明する。以下に示す実施の形態では、原言語
は日本語、目的言語は英語であるとする。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. In the embodiments described below, the source language is Japanese and the target language is English.

【００４０】図３は、本発明の機械翻訳装置の実施の形
態の一例を示す基本ブロック構成図である。同図におい
て、１０は翻訳辞書、２０は目的言語共起情報データベ
ース（ＤＢ）、３０は原言語共起情報データベース（Ｄ
Ｂ）、１１は機械翻訳部、１２は未訳出検出部、１３は
形態素解析部、１４は訳語候補生成部、１５は共起強度
検出部、１６は訳語決定部、１７は単語分割選択部であ
る。FIG. 3 is a basic block diagram showing an example of the embodiment of the machine translation apparatus of the present invention. In the figure, 10 is a translation dictionary, 20 is a target language co-occurrence information database (DB), and 30 is a source language co-occurrence information database (D).
B), 11 is a machine translation unit, 12 is an untranslated detection unit, 13 is a morpheme analysis unit, 14 is a translation word candidate generation unit, 15 is a co-occurrence strength detection unit, 16 is a translation word determination unit, and 17 is a word division selection unit. is there.

【００４１】機械翻訳部１１は、入力原言語文を、翻訳
辞書１０を用いて、目的言語機械訳文へ自動的に翻訳す
る。未訳出検出部１２は、この目的言語機械訳文から目
的言語の訳語を抽出できなかった未訳出文字列を検出す
る。The machine translation unit 11 automatically translates the input source language sentence into a target language machine translated sentence using the translation dictionary 10. The untranslated detection unit 12 detects an untranslated character string for which a translation of the target language could not be extracted from this target language machine translation.

【００４２】形態素解析部１３は、入力原言語文を形態
素解析して、単語分割候補を生成する。The morphological analysis unit 13 morphologically analyzes the input source language sentence to generate word division candidates.

【００４３】訳語候補生成部１４は、翻訳辞書１０を用
いて、前記の未訳出文字列、訳出されている原言語単語
または単語分割候補中の単語に対する目的言語訳語候補
を検索する。The translation word candidate generation unit 14 searches the translation dictionary 10 for the target language translation word candidate for the untranslated character string, the translated source language word, or the word in the word division candidates.

【００４４】共起強度検出部１５は、目的言語共起情報
ＤＢ２０、原言語共起情報ＤＢ３０を用いて、目的言語
訳語候補の組み合わせからなる各目的言語訳語候補対の
訳語候補共起強度を計算する。The co-occurrence strength detection unit 15 uses the target language co-occurrence information DB 20 and the source language co-occurrence information DB 30 to calculate the target word candidate word co-occurrence strength of each target language target word candidate pair consisting of combinations of target language target word candidates. To do.

【００４５】訳語決定部１６は、この訳語候補共起強度
を用いて、未訳出文字列に対する目的言語訳語候補を選
択する。単語分割選択部１７は、各単語分割候補に対す
る単語分割尤度を計算し、最尤となる単語分割候補を選
択する。The translated word determination unit 16 uses this translated word candidate co-occurrence strength to select a target language translated word candidate for the untranslated character string. The word division selection unit 17 calculates the word division likelihood for each word division candidate and selects the word division candidate that is the maximum likelihood.

【００４６】なお、この機械翻訳装置は、ＣＰＵ、メモ
リ、入出力装置、外部記憶装置等からなるコンピュータ
と、該コンピュータに読み取られた際、このコンピュー
タを前記各手段として機能させるための機械翻訳プログ
ラムを記憶した媒体とによって実現することもできる。The machine translation device is a computer including a CPU, a memory, an input / output device, an external storage device, and the like, and a machine translation program for causing the computer to function as each of the means when read by the computer. It can also be realized by a medium storing the.

【００４７】次に、図３の基本ブロック構成の機械翻訳
の手順について説明する。ここでは、機械翻訳システム
からの目的言語機械訳文の一部として図４に示す情報を
入力例として説明する。Next, the procedure of machine translation of the basic block configuration of FIG. 3 will be described. Here, the information shown in FIG. 4 will be described as an input example as a part of the target language machine translated text from the machine translation system.

【００４８】まず、機械翻訳部１１は、翻訳の対象とす
る原言語文”〜が今春から商用化を予定している〜”を
入力し、翻訳辞書１０を用いて原言語文を目的言語機械
訳文に翻訳し、図４に示すように、元の原言語単語が対
応した訳語の集合で構成される翻訳文構造情報を生成す
る。First, the machine translation unit 11 inputs a source language sentence to be translated "... is scheduled to be commercialized from this spring", and uses the translation dictionary 10 to translate the source language sentence into a target language machine. It is translated into a translated sentence, and as shown in FIG. 4, translated sentence structure information composed of a set of translated words corresponding to the original source language word is generated.

【００４９】次に、未訳出検出部１２で、この翻訳文構
造情報から、品詞条件が合わなかった等の原因で目的言
語の訳語の抽出に失敗してしまった未訳出文字列″今
春″を抽出する。未訳出検出部１２は、翻訳文構造情報
を出力する前の早い翻訳処理段階で、訳語を抽出できな
い原言語文字列を検出・抽出することも可能である。Next, the untranslated detection unit 12 extracts the untranslated character string "Imaharu" from which the translation of the target language has failed to be extracted from the translated sentence structure information due to a part-of-speech condition not being met or the like. Extract. The untranslated detection unit 12 can also detect / extract a source language character string from which a translated word cannot be extracted at an early translation processing stage before outputting the translated sentence structure information.

【００５０】訳語候補生成部１４は、この未訳出文字
列″今春″に対する目的言語訳語候補を、翻訳辞書１０
を用いて検索する。例えば、未訳出文字列″今春″に対
して、図５に示すような訳語候補が得られる。この時、
機械翻訳システムが持つ翻訳辞書１０に加えて、別の翻
訳辞書１０を参照することで、目的言語訳語候補を増や
すことができる。The translation word candidate generation unit 14 extracts the target language translation word candidate for the untranslated character string "this spring" from the translation dictionary 10
Search using. For example, translation word candidates as shown in FIG. 5 are obtained for the untranslated character string "this spring". At this time,
By referring to another translation dictionary 10 in addition to the translation dictionary 10 included in the machine translation system, it is possible to increase the target language translation word candidates.

【００５１】次に、共起強度検出部１５は、訳語候補生
成部１４で得られた目的言語訳語候補と、翻訳文構造情
報（図４）中の訳出済みの目的言語単語との組で構成さ
れる各目的言語訳語候補対の集合を列挙する。本実施の
形態では目的言語の前置詞は考慮せずに目的言語訳語候
補対を設定する。目的言語訳語候補対の集合の例を図６
に示す。Next, the co-occurrence intensity detection unit 15 is composed of a set of the target language translation candidate obtained by the translation candidate generation unit 14 and the translated target language word in the translated sentence structure information (FIG. 4). Enumerate a set of candidate translation pairs for each target language. In the present embodiment, the target language translation candidate pair is set without considering the preposition of the target language. FIG. 6 shows an example of a set of target language translation candidate pairs.
Shown in.

【００５２】ここで、図７に共起強度検出部１５で利用
する目的言語共起情報ＤＢ２０の内容例を示す。目的言
語共起情報ＤＢ２０のエントリの内容は、例えば図７か
ら、単語″ｓｃｈｅｄｕｌｅ″と単語″ｓｐｒｉｎｇ″
が、共起情報を収集する際に定めた範囲内（例えば、一
文内）で、同時に共起する頻度が１０であることを示し
ている。Here, FIG. 7 shows an example of the contents of the target language co-occurrence information DB 20 used by the co-occurrence strength detection unit 15. The contents of the entries in the target language co-occurrence information DB 20 are, for example, as shown in FIG. 7, the words "schedule" and the words "spring".
Indicates that the co-occurrence frequency is 10 within the range (for example, one sentence) defined when the co-occurrence information is collected.

【００５３】次に、この目的言語共起情報ＤＢ２０を用
いて、各目的言語訳語候補対に対する訳語候補共起強度
を計算する。図８に、各目的言語訳語候補対に対して計
算された訳語候補共起強度の集合の例を示す。Next, using this target language co-occurrence information DB 20, the translation word candidate co-occurrence strength for each target language translation word candidate pair is calculated. FIG. 8 shows an example of a set of translation word candidate co-occurrence intensities calculated for each target language translation word candidate pair.

【００５４】さらに、訳語決定部１６は各目的言語訳語
候補対に対する訳語候補共起強度を用いて、未訳出文字
列″今春″に対する訳語候補を選択する。Further, the translated word determination unit 16 selects a translated word candidate for the untranslated character string "this spring" using the translated word candidate co-occurrence strength for each target language translated word candidate pair.

【００５５】訳語候補を決定する基準値の算出方法の一
例としては、入力原言語文中の原言語単語に対する訳語
候補をそれぞれ決定した時、その入力原言語文全体の共
起強度は、目的言語訳語候補対の組み合わせの各訳語候
補共起強度の積と近似できる。As an example of the method of calculating the reference value for determining the translation candidate, when the translation candidate for each source language word in the input source language sentence is determined, the co-occurrence strength of the entire input source language sentence is the target language translation word. It can be approximated to the product of the co-occurrence strength of each candidate word of the combination of candidate pairs.

【００５６】即ち、ここでは（Ａ）（″ｓｃｈｅｄｕｌ
ｅ″と″ｔｈｉｓｓｐｒｉｎｇ″の共起強度）×（″
ｃｏｍｍｅｒｃｉａｌｉｚａｔｉｏｎ″と″ｔｈｉｓ
ｓｐｒｉｎｇ″の共起強度）、（Ｂ）（″ｓｃｈｅｄｕ
ｌｅ″と″ｐｒｅｓｅｎｔｓｐｒｉｎｇ″の共起強度）
×（″ｃｏｍｍｅｒｃｉａｌｉｚａｔｉｏｎ″と″ｐｒ
ｅｓｅｎｔｓｐｒｉｎｇ″の共起強度）、の２つの組
み合わせで入力原言語文全体の共起強度を計算できる。That is, here, (A) ("schedul
e "and" this spring "co-occurrence strength) x ("
commercialization "and" this
co-occurrence strength of "spring"), (B) ("schedu"
co-occurrence strength of le "and" present spring ")
× ("commercialization" and "pr
The co-occurrence strength of the entire input source language sentence can be calculated by a combination of the two.

【００５７】最後に、入力原言語文全体の共起強度が最
も高いものをとる目的言語訳語候補列の組（Ａ）が選択
され、出力される。図９に目的言語訳語候補の出力結果
を示す。Finally, the set (A) of the target language translated word candidate strings having the highest co-occurrence strength of the entire input source language sentence is selected and output. FIG. 9 shows the output result of the target language translated word candidates.

【００５８】また、前記訳語候補生成部１４では、未訳
出文字列以外の単語についても翻訳辞書１０を用いて目
的言語訳語候補を生成できる。未訳出文字列以外の単語
に対して、図１０に示すような訳語候補が得られる。本
実施の形態では、日本語の助詞、助動詞の訳出処理は省
略する。The translated word candidate generation unit 14 can also generate target language translated word candidates for the words other than the untranslated character strings by using the translation dictionary 10. For words other than the untranslated character strings, translated word candidates as shown in FIG. 10 are obtained. In this embodiment, the process of translating Japanese particles and auxiliary verbs is omitted.

【００５９】その後、共起強度検出部１５において、そ
の訳語候補生成部１４で得られた全ての目的言語訳語候
補の組み合わせからなる各目的言語訳語候補対の集合を
列挙する。ここで得られる目的言語訳語候補対の例を図
１１に示す。After that, the co-occurrence strength detection unit 15 enumerates a set of each target language translation candidate pair consisting of combinations of all the target language translation candidate obtained by the translation candidate generation unit 14. FIG. 11 shows an example of target language translation candidate pairs obtained here.

【００６０】続いて、目的言語共起情報ＤＢ２０を用い
て、各目的言語訳語候補対に対する訳語候補共起強度を
計算する。Subsequently, the target language co-occurrence information DB 20 is used to calculate the translation word candidate co-occurrence strength for each target language translation word candidate pair.

【００６１】次に、訳語決定部１６において、前記訳語
候補共起強度を用いて、上記と同様の方法で入力原言語
文全体の共起強度の最大値を求め、入力原言語文中の各
単語に対して最適な目的言語訳語候補を選択することが
できる。Next, in the translated word determination unit 16, the maximum value of the co-occurrence strength of the entire input source language sentence is calculated using the translation word candidate co-occurrence intensity in the same manner as described above, and each word in the input source language sentence is found. It is possible to select an optimal target language translation candidate for.

【００６２】また、原言語における共起頻度情報からな
るエントリの集合を保持する原言語共起情報ＤＢ３０か
ら得られる原言語共起情報を用いた優先訳語選択方法の
一例を以下に説明する。原言語共起情報ＤＢ３０の例を
図１２に示す。Further, an example of a priority translation word selection method using the source language co-occurrence information obtained from the source language co-occurrence information DB 30 which holds a set of entries composed of the co-occurrence frequency information in the source language will be described below. An example of the source language co-occurrence information DB 30 is shown in FIG.

【００６３】原言語単語対の原言語における共起強度
を、原言語単語列中の単語組み合わせの共起強度の和に
対する該原言語単語対の共起頻度の割合とする。共起強
度検出部１５において、全ての訳語候補共起強度に、そ
れと対応する原言語単語対の原言語における共起強度を
掛ける。このようにして重み付けされた訳語候補共起強
度を用いて訳語決定部６で最終的な訳語を選択する。The co-occurrence strength of the source language word pair in the source language is defined as the ratio of the co-occurrence frequency of the source language word pair to the sum of the co-occurrence strengths of the word combinations in the source language word string. In the co-occurrence strength detection unit 15, all the candidate word co-occurrence strengths are multiplied by the co-occurrence strengths of the corresponding source language word pairs in the source language. The translation word determination unit 6 selects the final translation word using the translation word candidate co-occurrence strength weighted in this way.

【００６４】図８に示した訳語候補共起強度に、以上の
手順によって変更を加えた結果を図１３に示す。FIG. 13 shows the result of changing the translation candidate co-occurrence strength shown in FIG. 8 by the above procedure.

【００６５】また、未訳出文字列″今春″が翻訳辞書１
０によって検索できなかった場合の拡張方法として次の
実施の形態を示す。In addition, the untranslated character string "this spring" is the translation dictionary 1
The following embodiment will be shown as an extension method in the case where the search cannot be performed with 0.

【００６６】形態素解析部１３は入力原言語文の形態素
解析を行い、単語分割候補の集合を作成する。図４中の
入力原言語文の単語分割候補の集合の一例を図１４に示
す。The morphological analysis unit 13 performs a morphological analysis of the input source language sentence and creates a set of word division candidates. FIG. 14 shows an example of a set of word division candidates of the input source language sentence in FIG.

【００６７】訳語候補生成部１４は、形態素解析部１３
から得られる各単語分割候補に含まれる各原言語単語に
対する目的言語訳語候補を翻訳辞書１０を用いて検索す
る。例えば、図１４の各単語に対しては、図１５の訳語
候補が得られる。The translation word candidate generation unit 14 includes a morpheme analysis unit 13.
The translation dictionary 10 is used to search for a target language translated word candidate for each source language word included in each word division candidate obtained from the above. For example, for each word in FIG. 14, the translation word candidates in FIG. 15 are obtained.

【００６８】次に、共起強度検出部１５は、各単語分割
候補について、上記の訳語候補生成部１４で得られた目
的言語訳語候補と、翻訳文構造情報（図４）中の訳出済
みの目的言語単語との組で構成される各目的言語訳語候
補対の集合、または上記の訳語候補生成部１４で得られ
た全ての目的言語訳語候補の組み合わせからなる各目的
言語訳語候補対の集合を作成する。続いて、目的言語共
起情報ＤＢ２０を用いて、全ての目的言語訳語候補対に
ついての訳語候補共起強度を求める。ここで得られる目
的言語訳語候補対及び対応する訳語候補共起強度の集合
の例を図１６に示す。Next, the co-occurrence strength detection unit 15 determines, for each word division candidate, the target language translated word candidate obtained by the translated word candidate generation unit 14 and the translated word in the translated sentence structure information (FIG. 4). A set of each target language translation candidate pair composed of a set of target language words, or a set of each target language translation candidate pair consisting of a combination of all the target language translation candidate obtained by the translation candidate generation unit 14 create. Then, the target language co-occurrence information DB 20 is used to obtain the target word candidate co-occurrence strengths for all target language target word candidate pairs. FIG. 16 shows an example of a set of target language translation word candidate pairs and corresponding translation word candidate co-occurrence strengths obtained here.

【００６９】単語分割選択部１７では、共起強度検出部
１５で計算した訳語候補共起強度を用いて、個々の単語
分割候補について上記と同様の方法で入力原言語文全体
の共起強度（ここでは、単語分割尤度とみなせる。）を
それぞれ計算する。The word division selection unit 17 uses the translation word candidate co-occurrence strength calculated by the co-occurrence strength detection unit 15 for each word division candidate in the same manner as described above for the co-occurrence strength of the entire input source language sentence ( Here, it can be regarded as word division likelihood.) Is calculated.

【００７０】最後に、入力原言語文全体の共起強度が最
大となる単語分割候補を選択し、同時に入力原言語文の
各単語に対して最適な目的言語訳語候補を選択すること
が可能となる。ここでは、図１６の訳語候補共起強度を
用いて図１４の各単語分割候補の（ア），（イ）それぞ
れの文全体の共起強度を求め、最大となる方、例えば
（イ）の単語分割候補が選択される。これに伴って、共
起強度最大となった（イ）の各単語に対する目的言語訳
語候補列が出力される。Finally, it is possible to select a word segmentation candidate that maximizes the co-occurrence strength of the entire input source language sentence, and at the same time select an optimal target language translation candidate for each word of the input source language sentence. Become. Here, using the translation word candidate co-occurrence strength of FIG. 14, the co-occurrence strength of the entire sentence of each of the word division candidates (A) and (A) of FIG. A word division candidate is selected. Along with this, the target language translated word candidate string for each word of (a) having the maximum co-occurrence strength is output.

【００７１】なお、本発明は、上記の実施の形態に限定
されることなく、特許請求の範囲内で変更、応用が可能
である。The present invention is not limited to the above-mentioned embodiments, but can be modified and applied within the scope of the claims.

【００７２】[0072]

【発明の効果】上述のように、本発明によれば、入力原
言語文を、翻訳辞書を用いて目的言語機械訳文へ自動的
に翻訳し、この目的言語機械訳文から目的言語の訳語が
抽出できなかった未訳出文字列を検出し、翻訳辞書を用
いて未訳出文字列または訳出されている原言語単語の目
的言語訳語候補を検索する。そして、目的言語の単語対
とその共起頻度情報からなるエントリの集合を保持する
共起情報ＤＢを用いて、目的言語訳語候補の組み合わせ
からなる各目的言語訳語候補対の訳語候補共起強度を計
算し、この訳語候補共起強度を用いて未訳出文字列に対
する目的言語訳語候補を選択する。これにより、目的言
語での適切性を考慮しながら、訳文品質低下の原因とな
る未訳出文字列を目的言語の単語に訳出することができ
る。As described above, according to the present invention, an input source language sentence is automatically translated into a target language machine translation using a translation dictionary, and a target language translation is extracted from the target language machine translation. The untranslated character string that could not be detected is detected, and the translation dictionary is used to search for the target language translation candidate of the untranslated character string or the translated source language word. Then, using the co-occurrence information DB that holds a set of entries consisting of word pairs of the target language and their co-occurrence frequency information, the translation candidate co-occurrence strength of each target language translation candidate pair composed of combinations of target language translation candidates is determined. The target word translation word candidate for the untranslated character string is selected by using the translation word candidate co-occurrence strength. This makes it possible to translate an untranslated character string, which causes a reduction in the quality of the translated text, into a word in the target language while taking into account the suitability for the target language.

【００７３】また、入力原言語文を形態素解析して単語
分割候補を生成し、翻訳辞書を用いて単語分割候補中の
原言語単語に対する分割単語訳語候補を検索する。次
に、この分割単語訳語候補も含めて目的言語訳語候補と
して訳語候補共起強度を計算し、各単語分割候補に対す
る単語分割尤度を計算し、最尤となる単語分割候補を選
択することで、入力原言語文の解析のやり直しや訳語候
補列の生成が可能となり、入力原言語文の解析失敗によ
る未訳出の問題を解決することができる。Further, the input source language sentence is subjected to morphological analysis to generate word division candidates, and the translation dictionary is used to retrieve the division word translation candidate for the source language word in the word division candidates. Next, the translation candidate co-occurrence strength is calculated as the target language translation candidate including this split word translation candidate, the word split likelihood for each word split candidate is calculated, and the word split candidate that is the maximum likelihood is selected. , It becomes possible to redo the analysis of the input source language sentence and generate the translation candidate string, and it is possible to solve the untranslated problem due to the failure of the analysis of the input source language sentence.

【００７４】さらに、複数の翻訳辞書を利用することに
よる目的言語訳語候補の増加及び原言語の単語共起情報
の利用によって適切な訳語を選択する精度を高めること
ができる。Further, by using a plurality of translation dictionaries, it is possible to increase the number of target language translation word candidates and to increase the accuracy of selecting an appropriate translation word by using the word co-occurrence information of the source language.

【００７５】以上のようにして、機械翻訳の訳文中に出
現する未訳出文字列に対する訳語候補を出力することが
可能となる。As described above, it is possible to output translation word candidates for an untranslated character string appearing in a translated text of machine translation.

[Brief description of drawings]

【図１】本発明方法の概要を示す流れ図FIG. 1 is a flow chart showing an outline of the method of the present invention.

【図２】本発明装置の概要を示す構成図FIG. 2 is a configuration diagram showing an outline of the device of the present invention.

【図３】本発明の機械翻訳装置の実施の形態の一例を示
す基本ブロック構成図FIG. 3 is a basic block configuration diagram showing an example of an embodiment of a machine translation device of the present invention.

【図４】翻訳文構造情報の例を示す図FIG. 4 is a diagram showing an example of translated sentence structure information.

【図５】未訳出文字列に対する目的言語訳語候補を示す
図FIG. 5 is a diagram showing target language translated word candidates for untranslated character strings.

【図６】目的言語訳語候補対の集合を示す図FIG. 6 is a diagram showing a set of target language translation candidate pairs.

【図７】目的言語共起情報データベースの内容例を示す
図FIG. 7 is a diagram showing an example of contents of a target language co-occurrence information database.

【図８】訳語候補共起強度の集合の例を示す図FIG. 8 is a diagram showing an example of a set of translation word candidate co-occurrence strengths.

【図９】入力原言語文と目的言語訳語候補の対応を示す
図FIG. 9 is a diagram showing correspondence between input source language sentences and target language translated word candidates.

【図１０】各原言語単語に対する目的言語訳語候補を示
す図FIG. 10 is a diagram showing target language translated word candidates for each source language word.

【図１１】未訳出文字列以外の原言語単語の訳語候補を
含めた目的言語訳語候補対の集合を示す図FIG. 11 is a diagram showing a set of target language translation word candidate pairs including translation word candidates of source language words other than untranslated character strings.

【図１２】原言語共起情報データベースの内容例を示す
図FIG. 12 is a diagram showing an example of contents of a source language co-occurrence information database.

【図１３】原言語の共起強度を加味した訳語候補共起強
度の集合の例を示す図FIG. 13 is a diagram showing an example of a set of translation word candidate co-occurrence strengths in which the co-occurrence strengths of the source language are added.

【図１４】単語分割候補の集合の一例を示す図FIG. 14 is a diagram showing an example of a set of word division candidates.

【図１５】単語分割候補中の各単語に対する目的言語訳
語候補の集合の一例を示す図FIG. 15 is a diagram showing an example of a set of target language translated word candidates for each word in the word division candidates.

【図１６】単語分割候補を用いた時の訳語候補共起強度
の集合の例を示す図FIG. 16 is a diagram showing an example of a set of translation word candidate co-occurrence strengths when word division candidates are used.

[Explanation of symbols]

１１：機械翻訳部、１２：未訳出検出部、１３：形態素
解析部、１４：訳語候補生成部、１５：共起強度検出
部、１６：訳語決定部、１７：単語分割選択部、１０：
翻訳辞書、２０：目的言語共起情報データベース、３
０：原言語共起情報データベース。11: Machine translation unit, 12: Untranslated detection unit, 13: Morphological analysis unit, 14: Translated word candidate generation unit, 15: Co-occurrence strength detection unit, 16: Translated word determination unit, 17: Word division selection unit, 10:
Translation dictionary, 20: Target language co-occurrence information database, 3
0: Source language co-occurrence information database.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/28 ─────────────────────────────────────────────────── ─── Continuation of the front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/28

Claims

(57) [Claims]

1. A parallel translation relationship between a source language word and a target language translation candidate.
A translation dictionary that holds a set of operators, word pairs in the target language, and
Purpose of holding a set of entries consisting of co-occurrence frequency information
Language co-occurrence information database, machine translation means, untranslated
Detection means, morpheme analysis means, translation word candidate generation means,
Co-occurrence intensity detection means, translation word determination means, word division selection
In a machine translation method for automatically translating a source language sentence into a target language sentence by using a machine translation device that constitutes a stage , a machine translation means uses the translation dictionary to convert an input source language sentence into a target language sentence. A machine translation step for automatically translating into a target language machine translated sentence that is a sentence of, and an untranslated detection for detecting an untranslated character string for which a translated word of the target language could not be extracted in the machine translation step by the untranslated detection means. Morphological analysis of the input source language sentence by steps and morphological analysis means
Morphological analysis step for generating word division candidates
And a translation word candidate generation unit , using the translation dictionary, searches for an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string, and uses the translation dictionary
It is a target language translation candidate for the source language word in the division candidate.
A candidate word generating step of retrieving the divided word translation candidates that, by co-occurrence intensity detecting means, using said target language co-occurrence information database, the at the candidate word generating means together with the target language translation candidates in non-translation candidate word The division obtained
A co-occurrence strength detecting step of calculating a co-occurrence strength of a target word translation candidate pair composed of a target language target word candidate including a word target word candidate and a translated target language word in the input source language sentence. And the translated word determining means using the translated word candidate co-occurrence strength,
Using the translation word candidate co-occurrence strength by a translation word determination step of selecting a target language translation word candidate for the untranslated character string, and a word division selection unit.
Each word segmentation obtained from the morphological analysis step
Compute the word division likelihood for the complement and use the word division likelihood
And output the selected word division candidate with the maximum likelihood.
A machine translation method comprising: a division selection step .

2. In the translation word candidate generation step, translation
A step of searching a translation word translation word candidate, which is a target language translation word candidate of a word in the input source language sentence other than the untranslated character string, by the word candidate generating means , using the translation dictionary; In co-occurrence intensity detection means
Accordingly, the translation candidate word cooccurrence intensity of each target language translation candidate pairs which consist of a combination of the target language translation candidates in the word candidate words and the non-translation candidate word also provided the step of calculating, in said translation determining step, translation determined By means
2. The machine translation method according to claim 1, further comprising the step of selecting a target language translation candidate string for the input source language sentence using the translation candidate co-occurrence strength.

3. In the translation word candidate generation step, translation
By the word candidate generating means, in addition to the translation dictionary used in the machine translation step, another translation dictionary holding a set of bilingual relations of source language words and target language translation word candidates is also referred to,
The step of searching for a target language translation candidate for each source language word in the source language word string is provided.
Or the machine translation method described in 2.

4. A includes a source language co-occurrence information database that holds a set of entries consisting of word pairs and their co-occurrence frequency information of the source language, in the co-occurrence intensity detecting step, the co-occurrence
The intensity detecting means uses the source language co-occurrence information database to weight the corresponding translation word candidate co-occurrence intensity based on the co-occurrence intensity in the source language for the source language word pair. The machine translation method according to any one of claims 1 to 3.

5. A machine translation device for automatically translating a source language sentence into a target language sentence, using a translation dictionary that holds a set of bilingual relations between a source language word and a target language translation candidate, and using the translation dictionary. A machine translation unit for automatically translating an input source language sentence into a target language machine translated sentence that is a sentence of a target language; and a non-translated character string for which a translation of the target language could not be extracted by the machine translation unit. Performs morphological analysis on the translation source detection means and the input source language sentence to determine word division candidates.
A morphological analysis means to generate, using the translation dictionary, as well as search for non-translation candidate word that is the target language translation candidates of the non translation strings, the
Using the translation dictionary, the source language words in the word segmentation candidates
Search for segmented word translation candidates that are target language translation candidates
A target language co-occurrence information database that holds a set of entries including word pairs of the target language and their co-occurrence frequency information, and the untranslated source word candidates using the target language co-occurrence information database. The target language translation candidate and the translation candidate
Target language including the word division candidates obtained by the synthesizing means
And a candidate word, and co-occurrence intensity detecting means for calculating a candidate word cooccurrence intensity of each target language translation candidate pairs composed of a set of the translation already target language words in the input source language sentence, the candidate word cooccurrence strength using the a translation determining means for selecting a target language translation candidates for non-translation strings, using said candidate word cooccurrence intensity, or the morphological analysis means
Calculate the word division likelihood for each word division candidate obtained from
Then, using the word division likelihood, the word division candidate having the maximum likelihood is obtained.
And a word division selection unit for selecting and outputting the machine translation device.

6. The translated word candidate generation means includes means for searching for a translated word translated word candidate that is a target language translated word candidate of a word in an input source language sentence other than the untranslated character string, using the translation dictionary. The co-occurrence intensity detection means includes means for calculating translation word candidate co-occurrence strength of each target language translation word candidate pair consisting of a combination of the translation word translation word candidate and the target language translation word candidate in the untranslated translation word candidate, and the translation word The determining means uses the translation word candidate co-occurrence strength,
6. The machine translation apparatus according to claim 5, further comprising means for selecting a target language translation candidate string for the input source language sentence.

7. The translation word candidate generation means refers to, in addition to the translation dictionary used by the machine translation means, another translation dictionary that holds a set of bilingual relations between source language words and target language translation word candidates, 7. The machine translation device according to claim 5, further comprising means for searching a target language translation candidate of each source language word in the source language word string.

8. A source language co-occurrence information database that holds a set of entries consisting of source language word pairs and their co-occurrence frequency information, wherein the co-occurrence strength detection means stores the source language co-occurrence information database. 8. The machine translation device according to claim 5, further comprising means for weighting the corresponding translation word candidate co-occurrence strength based on the co-occurrence strength in the source language for the source language word pair. .

9. A bilingual relationship between a source language word and a target language translation candidate.
A translation dictionary that holds a set of operators, word pairs in the target language, and
Purpose of holding a set of entries consisting of co-occurrence frequency information
Language co-occurrence information database, machine translation means, untranslated
Detection means, morpheme analysis means, translation word candidate generation means,
Co-occurrence intensity detection means, translation word determination means, word division selection
Using the computer that constitutes the stage , a machine translation method that automatically translates the source language sentence into the target language sentence is executed.
In a computer-readable medium storing a program, when the program is read by a computer, a machine translation means causes the computer to read the input source language sentence into a target language. A machine translation step for automatically translating into a target language machine translated sentence that is a sentence of, and an untranslated detection for detecting an untranslated character string for which a translated word of the target language could not be extracted in the machine translation step by the untranslated detection means. Morphological analysis of the input source language sentence by steps and morphological analysis means
Morphological analysis step for generating word division candidates
And a translation word candidate generation unit , using the translation dictionary, searches for an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string, and uses the translation dictionary
It is a target language translation candidate for the source language word in the division candidate.
A candidate word generating step of retrieving the divided word translation candidates that, by co-occurrence intensity detecting means, using said target language co-occurrence information database, the at the candidate word generating means together with the target language translation candidates in non-translation candidate word The division obtained
A co-occurrence strength detection step of calculating a target word translation candidate co-occurrence strength of each target language translation candidate pair composed of a target language translation candidate including a word translation candidate and a translated target language word in the input source language sentence. And the translated word determining means using the translated word candidate co-occurrence strength,
Using the translation word candidate co-occurrence strength by a translation word determination step of selecting a target language translation word candidate for the untranslated character string, and a word division selection unit.
Each word segmentation obtained from the morphological analysis step
Compute the word division likelihood for the complement and use the word division likelihood
And output the selected word division candidate with the maximum likelihood.
A computer-readable medium that stores a machine translation program for executing the division selection step .

10. The machine translation program according to claim 9.
In a computer-readable medium stored, when the program is read by a computer,
In the translation word candidate generation step, the computer includes translation word candidate generation means.
By using the translation dictionary, the to execute the steps of searching the translation word candidate word is a target language translation candidates of words of the input source language sentence than non-translation strings, in the co-occurrence intensity detecting step, co Strength detection means
According to this, the step of calculating the translation candidate co-occurrence strength of each target language translation candidate pair composed of a combination of the translation word translation candidate and the target language translation candidate in the untranslated translation candidate is executed.
In the translation word determining step, the translation word determining means,
Using the translation candidate co-occurrence intensity, a step of selecting a target language translation candidate sequence for the input source language sentence is executed.
Con it stores a machine translation program characterized that
Computer-readable medium.

11. A machine translation professional according to claim 9 or 10.
The computer-readable medium that stores the
And when the program is read by a computer,
In the translation word candidate generation step, the computer includes translation word candidate generation means.
Thus, in addition to the translation dictionary used in the machine translation step, another translation dictionary that holds a set of bilingual relations between source language words and target language translation word candidates is also referred to, and each source language in the source language word string is referenced. A computer-readable medium storing a machine translation program, characterized in that a step of searching for a target language translation candidate of a word is executed .

12. A machine according to any one of claims 9 to 11.
Computer readable memory that stores the translation program
In the medium, when the program is read by a computer,
The computer includes a source language co-occurrence information database that holds a set of entries composed of source language word pairs and their co-occurrence frequency information, and in the co-occurrence strength detection step, co-occurrence strength detection means
By using the source language co-occurrence information database, based on the co-occurrence intensity in the original language of the original language word pairs,
Steps for weighting the corresponding translation candidate co-occurrence intensity
A computer readable program storing a machine translation program according to any one of claims 9 to 11, which is executed.
Removable media.

13. A computer readable program storing a machine translation program for automatically translating a source language sentence into a target language sentence.
In a retrievable medium, when the program is read by a computer, the computer uses a translation dictionary that holds a set of bilingual relations of source language words and target language translation candidate and an input source using the translation dictionary. Machine translation means for automatically translating a language sentence into a target language machine translation which is a sentence of the target language, and an untranslated detection means for detecting an untranslated character string for which the translated word of the target language could not be extracted by the machine translation means. Morphological analysis of the input source language sentence and
Using the generated morphological analysis means and the translation dictionary, search for untranslated translation word candidates that are target language translation word candidates of the untranslated character string , and
Using the translation dictionary, the source language words in the word segmentation candidates
Search for segmented word translation candidates that are target language translation candidates
A target language co-occurrence information database that holds a set of entries consisting of target language word pairs and their co-occurrence frequency information; and the untranslated source word candidates using the target language co-occurrence information database. The target language translation candidate and the translation candidate
Target language including the word division candidates obtained by the synthesizing means
Co-occurrence strength detection means for calculating a target word candidate word co-occurrence strength of each target language target word candidate pair, which is composed of a set of a target word candidate and a translated target language word in the input source language sentence, and the target word candidate co-occurrence strength using the a translation determining means for selecting a target language translation candidates for non-translation strings, using said candidate word cooccurrence intensity, or the morphological analysis means
Calculate the word division likelihood for each word division candidate obtained from
Then, using the word division likelihood, the word division candidate having the maximum likelihood is obtained.
A computer-readable program that stores a machine translation program to function as a word division selection unit that selects and outputs
Removable media.

14. A machine translation program according to claim 13.
In a computer-readable medium storing the above, when the program is read by a computer,
The computer, the candidate word generating means, said using a translation dictionary, the non translation character of the input source language sentence other than a column means including hand to search for a translation word translation candidate is a target language translation candidates of the word
The co-occurrence strength detecting unit also calculates a translation candidate co-occurrence strength of each target language translation candidate pair consisting of a combination of the translation word translation candidate and the target language translation candidate among the untranslated translation word candidates. the to function as including means, said translation determining means, using said candidate word cooccurrence strength,
A computer-readable medium that stores a machine translation program for functioning as a unit including a unit that selects a target language translation candidate sequence for the input source language sentence.

15. The machine translation program according to claim 13 or 14.
A computer-readable medium that stores the program.
Oite, when said program is read by the computer, this
In addition to the translation dictionary used by the machine translation means, the translation word candidate generation means also refers to another translation dictionary that holds a set of bilingual relations of source language words and target language translation word candidates, in addition to the translation dictionary. A computer-readable medium storing a machine translation program for functioning as a means including a means for searching a target language translation candidate of each source language word in a language word string.

16. A machine according to any one of claims 13 to 15.
Computer readable memory that stores the translation program
In the medium, when the program is read by a computer,
The computer functions as a source language co-occurrence information database that holds a set of entries consisting of source language word pairs and their co-occurrence frequency information.
Then , the co-occurrence strength detection means uses the source language co-occurrence information database to weight the corresponding translation candidate co-occurrence strength based on the co-occurrence strength in the source language for the source language word pair. A computer-readable medium storing a machine translation program for causing it to function as a means including.