JP3437782B2 - Machine translation method and apparatus, and medium storing machine translation program - Google Patents

Machine translation method and apparatus, and medium storing machine translation program

Info

Publication number
JP3437782B2
JP3437782B2 JP06616699A JP6616699A JP3437782B2 JP 3437782 B2 JP3437782 B2 JP 3437782B2 JP 06616699 A JP06616699 A JP 06616699A JP 6616699 A JP6616699 A JP 6616699A JP 3437782 B2 JP3437782 B2 JP 3437782B2
Authority
JP
Japan
Prior art keywords
translation
word
candidate
target language
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP06616699A
Other languages
Japanese (ja)
Other versions
JP2000259630A (en
Inventor
直樹 麻野間
浩巳 中岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP06616699A priority Critical patent/JP3437782B2/en
Publication of JP2000259630A publication Critical patent/JP2000259630A/en
Application granted granted Critical
Publication of JP3437782B2 publication Critical patent/JP3437782B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】本発明は、機械翻訳システム
が翻訳結果文中に原言語のままの未訳出文字列が出現す
る際に、目的言語の単語共起情報を利用して、未訳出文
字列に対する訳語候補を出力する機械翻訳方法及びその
装置並びに機械翻訳プログラムを記憶した媒体に関する
ものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention uses a word co-occurrence information of a target language when an untranslated character string in the original language appears in a translation result sentence by a machine translation system. The present invention relates to a machine translation method and apparatus for outputting a translation word candidate for, and a medium storing a machine translation program.

【0002】[0002]

【従来の技術】辞書やルールを用いたルールベースの機
械翻訳システムにおいては、通常、辞書やルールにより
入力文の構造を解析した後、翻訳文を生成する処理が行
われる。実際の機械翻訳システムの性質として、この入
力文を解析した後で、訳語が辞書に登録されていない原
言語単語(未知語)が現れることがある。このように翻
訳辞書に訳語が見つけられない単語は、翻訳結果文中に
原言語単語のまま(未訳出のまま)出力され、訳文品質
を大きく低下させる原因となる。
2. Description of the Related Art In a rule-based machine translation system using dictionaries and rules, the structure of an input sentence is usually analyzed by the dictionaries and rules, and then a translated sentence is generated. As a property of an actual machine translation system, after the input sentence is analyzed, a source language word (unknown word) whose translated word is not registered in the dictionary may appear. In this way, a word for which a translated word cannot be found in the translation dictionary is output as it is in the translation result sentence as a source language word (untranslated), which causes a significant reduction in translated text quality.

【0003】機械翻訳処理中で出現する未知語に対する
対策としては、未知語として機械翻訳システムが判断し
た原言語単語を、人間の判断に基づいてその訳語ととも
に一つ一つ辞書に登録することが行われることが多い。
As a measure against an unknown word appearing during the machine translation process, the source language word judged by the machine translation system as an unknown word is registered in the dictionary one by one together with the translated word based on human judgment. Often done.

【0004】この人手による作業を軽減する方法として
は、例えば、既に辞書に登録されている語の派生語を、
接辞とその訳語の派生パターンから造語し、辞書を充実
させる方法(特開平5−257969号公報「機械翻訳
方法および装置」)等が挙げられる。このように、ある
辞書構築ルールに従って自動的に辞書を充実させる方法
がある。上記のように、人手もしくは自動的に未知語を
辞書に登録された語に対しては訳出が可能となる。
As a method of reducing this manual work, for example, a derivative word of a word already registered in the dictionary is
A method of creating a word from an affix and a derivative pattern of its translated word to enhance the dictionary (Japanese Patent Laid-Open No. 5-257969 “Machine translation method and apparatus”) and the like can be mentioned. In this way, there is a method of automatically expanding the dictionary according to a certain dictionary construction rule. As described above, it is possible to translate unknown words manually or automatically into words registered in the dictionary.

【0005】[0005]

【発明が解決しようとする課題】上記の人手による辞書
登録作業は、辞書に未登録と判断される度に更新作業に
対するコストや時間が生じ、根本的な未知語の対策には
なっていない。
The above-mentioned manual dictionary registration work is not a fundamental countermeasure against unknown words because it costs and time for the updating work every time it is judged as not registered in the dictionary.

【0006】また、機械翻訳システムの性質として、原
言語入力文の構造を解析する際に失敗することがある。
上記のように辞書やルールを充実する手法をとっても、
この解析の失敗によって、誤った単語分割、あるいは誤
った品詞判別が起こり、それらを含む原言語文字列に対
する目的言語訳語候補が見つからない場合がある。これ
により、翻訳結果文中に未訳出のまま出力されてしま
い、訳文品質を著しく下げてしまうという問題があっ
た。
Further, as a property of the machine translation system, there is a case where it fails in analyzing the structure of the source language input sentence.
Even if you take the method of enriching the dictionary and rules as described above,
Due to the failure of this analysis, erroneous word division or erroneous part-of-speech determination may occur, and the target language translation word candidate for the source language character string including them may not be found. As a result, there is a problem that the translated text is output as it is untranslated and the translated text quality is significantly reduced.

【0007】また、適切な訳語が機械翻訳システムの翻
訳辞書に元々登録されておらず、目的言語訳語候補の選
択肢が不十分となる問題がある。さらに、辞書やルール
を用いて訳語生成する際に、その訳語の適切性を十分に
考慮していないため、目的言語として不自然な訳語が並
び易いという問題がある。目的言語の適切性を考慮した
訳語を選ぶ場合には、対応する原言語単語についての共
起関係を考慮することが望ましい。
Further, there is a problem that proper translation words are not originally registered in the translation dictionary of the machine translation system, and the choices of the target language translation word candidates become insufficient. Further, when a translated word is generated using a dictionary or a rule, the appropriateness of the translated word is not sufficiently taken into consideration, so that there is a problem that unnatural translated words are easily arranged as a target language. When selecting a translated word that considers the appropriateness of the target language, it is desirable to consider the co-occurrence relationship for the corresponding source language word.

【0008】本発明の目的は、上記の点に鑑みて、翻訳
結果文中に未訳出文字列が残ってしまう問題点を解決
し、人手による辞書更新作業を行うことなく、目的言語
の単語対とその共起頻度情報からなるエントリの集合を
保持する共起情報データベースを用いて、目的言語とし
ての適切性を考慮した訳語候補列、もしくは原言語形態
素解析に使うための入力原言語文の単語分割候補を出力
する機械翻訳方法及びその装置並びに機械翻訳プログラ
ムを記憶した媒体を提供することにある。
In view of the above points, an object of the present invention is to solve the problem that untranslated character strings remain in a translation result sentence, and to create a word pair of a target language without manually updating the dictionary. Using a co-occurrence information database that holds a set of entries consisting of co-occurrence frequency information, a candidate word sequence considering the appropriateness as a target language, or word segmentation of an input source language sentence for use in source language morphological analysis It is to provide a machine translation method and apparatus for outputting candidates and a medium storing a machine translation program.

【0009】[0009]

【課題を解決するための手段】図1は、本発明方法の概
要を示す流れ図である。
FIG. 1 is a flow chart showing the outline of the method of the present invention.

【0010】上記の目的を達成するため、発明の機械
翻訳方法は、原言語単語と目的言語訳語候補の対訳関係
の集合を保持する翻訳辞書を用いて、入力原言語文を目
的言語の文である目的言語機械訳文へ自動的に翻訳を行
い(S1)、目的言語の訳語が抽出できなかった未訳出
文字列を検出し(S2)、翻訳辞書を用いて、未訳出文
字列の目的言語訳語候補である未訳出訳語候補を検索し
(S4)、目的言語の単語対とその共起頻度情報からな
るエントリの集合を保持する目的言語共起情報データベ
ースを用いて、未訳出訳語候補中の目的言語訳語候補
と、前記入力原言語文中の訳出済みの目的言語単語との
組で構成される各目的言語訳語候補対の訳語候補共起強
度を計算し(S5)、訳語候補共起強度を用いて、未訳
出文字列に対する目的言語訳語候補を選択する(S
6)。
In order to achieve the above object, the machine translation method of the present invention uses an input source language sentence as a sentence of a target language by using a translation dictionary that holds a set of bilingual relations between a source language word and a target language translation candidate. Is automatically translated into a target language machine translated sentence (S1), an untranslated character string for which a translated word of the target language could not be extracted is detected (S2), and the target language of the untranslated character string is detected using the translation dictionary. An untranslated translation word candidate that is a translation word candidate is searched (S4), and a target language co-occurrence information database that holds a set of entries consisting of word pairs of the target language and their co-occurrence frequency information is used. The target word translation candidate co-occurrence strength of each target language translation candidate pair composed of a set of the target language translation candidate and the translated target language word in the input source language sentence is calculated (S5), and the translation candidate co-occurrence strength is calculated. Use the eyes for untranslated strings To select a language translation candidates (S
6).

【0011】また、発明の機械翻訳方法は、翻訳辞書
を用いて、未訳出文字列以外の入力原言語文中の単語の
目的言語訳語候補である訳出単語訳語候補を検索し(S
4)、訳出単語訳語候補と未訳出訳語候補中の目的言語
訳語候補の組み合わせからなる各目的言語訳語候補対の
訳語候補共起強度も計算し(S5)、訳語候補共起強度
を用いて、入力原言語文に対する目的言語訳語候補列を
選択する(S6)。
Further, the machine translation method of the present invention searches for a translation word translation candidate which is a target language translation candidate of a word in an input source language sentence other than an untranslated character string by using a translation dictionary (S
4), the translation candidate co-occurrence strength of each target language translation candidate pair consisting of a combination of the translated word translation word candidate and the target language translation word candidate in the untranslated translation word candidate is also calculated (S5), and the translation word candidate co-occurrence strength is used. A target language translation candidate string for the input source language sentence is selected (S6).

【0012】また、発明の機械翻訳方法は、入力原言
語文の形態素解析を行い、単語分割候補を生成し(S
3)、翻訳辞書を用いて、単語分割候補中の原言語単語
に対する目的言語訳語候補である分割単語訳語候補を検
索し(S4)、訳語候補生成ステップで得られる分割単
語訳語候補も含めて目的言語訳語候補として訳語候補共
起強度を計算し(S5)、訳語候補共起強度を用いて、
形態素解析ステップから得られる各単語分割候補に対す
る単語分割尤度を計算し、単語分割尤度を用いて、最尤
となる単語分割候補を選択して出力する(S7)。
Further, the machine translation method of the present invention performs morphological analysis of an input source language sentence to generate word division candidates (S
3), using the translation dictionary, search for the divided word translated word candidates that are the target language translated word candidates for the source language word in the word divided candidates (S4), and also include the divided word translated word candidates obtained in the translated word candidate generating step. A translation word candidate co-occurrence strength is calculated as a language translation word candidate (S5), and the translation word candidate co-occurrence strength is used.
The word division likelihood for each word division candidate obtained from the morphological analysis step is calculated, and the word division likelihood is selected and output using the word division likelihood (S7).

【0013】また、発明の機械翻訳方法は、機械翻訳
ステップで用いる翻訳辞書に加えて、原言語単語と目的
言語訳語候補の対訳関係の集合を保持する別の翻訳辞書
も参照して、原言語単語列中の各原言語単語の目的言語
訳語候補を検索する(S4)。
In addition to the translation dictionary used in the machine translation step, the machine translation method of the present invention also refers to another translation dictionary that holds a set of bilingual relations between source language words and target language translation candidate, A target language translation candidate for each source language word in the language word string is searched (S4).

【0014】また、発明の機械翻訳方法は、原言語の
単語対とその共起頻度情報からなるエントリの集合を保
持する原言語共起情報データベースを用いて、原言語単
語対に対する原言語における共起強度をもとに、対応す
る訳語候補共起強度に重み付けを行う(S5)。
Further, the machine translation method of the present invention uses a source language co-occurrence information database that holds a set of entries consisting of source language word pairs and their co-occurrence frequency information in the source language for source language word pairs. Based on the co-occurrence strength, the corresponding translation word candidate co-occurrence strength is weighted (S5).

【0015】図2は、本発明装置の概要を示す構成図で
ある。
FIG. 2 is a block diagram showing the outline of the device of the present invention.

【0016】さらに上記の目的を達成するため、発明
の機械翻訳装置は、原言語単語と目的言語訳語候補の対
訳関係の集合を保持する翻訳辞書10と、翻訳辞書10
を用いて、入力原言語文を目的言語の文である目的言語
機械訳文へ自動的に翻訳を行う機械翻訳手段1と、機械
翻訳手段1において目的言語の訳語が抽出できなかった
部分文字列である未訳出文字列を検出する未訳出検出手
段2と、翻訳辞書10を用いて、未訳出文字列の目的言
語訳語候補である未訳出訳語候補を検索する訳語候補生
成手段4と、目的言語の単語対とその共起頻度情報から
なるエントリの集合を保持する目的言語共起情報データ
ベース(DB)20と、目的言語共起情報DB20を用
いて、未訳出訳語候補中の目的言語訳語候補と、入力原
言語文中の訳出済みの目的言語単語との組で構成される
各目的言語訳語候補対の訳語候補共起強度を計算する共
起強度検出手段5と、訳語候補共起強度を用いて、未訳
出文字列に対する目的言語訳語候補を選択する訳語決定
手段6とを有する。
To achieve the above object, the machine translation apparatus of the present invention further includes a translation dictionary 10 that holds a set of bilingual relations between source language words and target language translation word candidates, and a translation dictionary 10.
By using the machine translation means 1 for automatically translating an input source language sentence into a target language machine translated sentence which is a sentence of the target language, and a partial character string from which the target language translation could not be extracted by the machine translation means 1. An untranslated detection unit 2 that detects an untranslated character string, a translation word candidate generation unit 4 that searches an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string using the translation dictionary 10, and a translation target candidate A target language co-occurrence information database (DB) 20 that holds a set of entries composed of word pairs and their co-occurrence frequency information, and a target language translation word candidate among untranslated translation word candidates using the target language co-occurrence information DB 20. Using the co-occurrence strength detection means 5 for calculating the co-occurrence strength of the target word candidate of each target language target word candidate pair composed of a pair of translated target language words in the input source language sentence, and the target word candidate co-occurrence strength, For untranslated strings And a translation determining means 6 for selecting a language translation candidate.

【0017】また、発明の機械翻訳装置は、訳語候補
生成手段4は、翻訳辞書10を用いて、未訳出文字列以
外の入力原言語文中の単語の目的言語訳語候補である訳
出単語訳語候補を検索する手段を含み、共起強度検出手
段5は、訳出単語訳語候補と未訳出訳語候補中の目的言
語訳語候補の組み合わせからなる各目的言語訳語候補対
の訳語候補共起強度も計算する手段を含み、訳語決定手
段6は、訳語候補共起強度を用いて、入力原言語文に対
する目的言語訳語候補列を選択する手段を含む。
Further, in the machine translation apparatus of the present invention, the translation word candidate generation means 4 uses the translation dictionary 10 and the translation word translation word candidate which is the target language translation word candidate of the word in the input source language sentence other than the untranslated character string. The co-occurrence strength detecting means 5 also calculates the co-occurrence strength of translation word candidates of each target language translation word candidate pair consisting of a combination of a translation word translation word candidate and a target language translation word candidate in the untranslated translation word candidates. The translation word determining means 6 includes means for selecting a target language translation word candidate string for the input source language sentence by using the translation word candidate co-occurrence strength.

【0018】また、発明の機械翻訳装置は、入力原言
語文の形態素解析を行い、単語分割候補を生成する形態
素解析手段3を有し、訳語候補生成手段4は、翻訳辞書
10を用いて、単語分割候補中の原言語単語に対する目
的言語訳語候補である分割単語訳語候補を検索する手段
を含み、共起強度検出手段5は、訳語候補生成手段で得
られる分割単語訳語候補も含めて目的言語訳語候補とし
て訳語候補共起強度を計算する手段を含み、訳語候補共
起強度を用いて、形態素解析手段3から得られる各単語
分割候補に対する単語分割尤度を計算し、単語分割尤度
を用いて、最尤となる単語分割候補を選択して出力する
単語分割選択手段7を有する。
Further, the machine translation apparatus of the present invention has a morpheme analysis means 3 for performing a morpheme analysis of an input source language sentence to generate word division candidates, and a translation word candidate generation means 4 uses a translation dictionary 10. , A target word translation candidate for the source language word in the word segmentation candidates, and a co-occurrence strength detection unit 5 that includes the segmented word translation word candidates obtained by the translation word candidate generation unit. A means for calculating a translation word candidate co-occurrence strength is included as a language translation candidate, and a word division likelihood is calculated for each word division candidate obtained from the morphological analysis means 3 using the translation word candidate co-occurrence strength, and a word division likelihood is calculated. It has a word division selecting means 7 for selecting and outputting the word division candidate having the maximum likelihood.

【0019】また、発明の機械翻訳装置は、訳語候補
生成手段4は、機械翻訳手段1で用いる翻訳辞書10に
加えて、原言語単語と目的言語訳語候補の対訳関係の集
合を保持する別の翻訳辞書10も参照して、原言語単語
列中の各原言語単語の目的言語訳語候補を検索する手段
を含む。
In addition, in the machine translation device of the present invention, the translation word candidate generation means 4 holds, in addition to the translation dictionary 10 used by the machine translation means 1, a set of bilingual relations of source language words and target language translation word candidates. It also includes means for searching the target language translation candidate of each source language word in the source language word string by also referring to the translation dictionary 10.

【0020】また、発明の機械翻訳装置は、原言語の
単語対とその共起頻度情報からなるエントリの集合を保
持する原言語共起情報データベース(DB)30を有
し、共起強度検出手段5は、原言語共起情報DB30を
用いて、原言語単語対に対する原言語における共起強度
をもとに、対応する訳語候補共起強度に重み付けを行う
手段を含む。
Further, the machine translation apparatus of the present invention has a source language co-occurrence information database (DB) 30 which holds a set of entries consisting of source language word pairs and their co-occurrence frequency information, and detects co-occurrence strength. The means 5 includes means for using the source language co-occurrence information DB 30 to weight the corresponding translation word candidate co-occurrence intensity based on the co-occurrence intensity in the source language for the source language word pair.

【0021】なお、発明の機械翻訳プログラムを記憶
した媒体は、コンピュータに前述した機械翻訳方法の各
ステップを実行させるためのプログラム、もしくはコン
ピュータを前述した機械翻訳装置の各手段として機能さ
せるためのプログラムを記憶している。
The medium storing the machine translation program of the present invention is a program for causing a computer to execute each step of the machine translation method described above, or causes a computer to function as each unit of the machine translation apparatus described above. Remember the program.

【0022】本発明の機械翻訳方法及びその装置並びに
機械翻訳プログラムを記憶した媒体においては、以下の
ステップまたは手段によって、機械翻訳の訳文中に出現
する未訳出文字列に対する訳語候補を出力する。
In the machine translation method and apparatus of the present invention, and the medium storing the machine translation program, translation word candidates for untranslated character strings appearing in the translation of machine translation are output by the following steps or means.

【0023】機械翻訳ステップまたは機械翻訳手段は、
原言語の文である入力原言語文を入力し、原言語単語と
目的言語訳語候補の対訳関係の集合を保持する翻訳辞書
を用いて、該入力原言語文を目的言語の文である目的言
語機械訳文へ自動的に翻訳を行う。
The machine translation step or machine translation means is
An input source language sentence that is a source language sentence is input, and the input source language sentence is a target language sentence that is a target language sentence using a translation dictionary that holds a set of bilingual relations between the source language word and the target language translation candidate. Automatically translates into machine translation.

【0024】未訳出検出ステップまたは未訳出検出手段
は、機械翻訳ステップまたは機械翻訳手段において目的
言語の訳語が抽出できなかった未訳出文字列を検出す
る。
The untranslated detection step or untranslated detection means detects an untranslated character string for which the translated word of the target language could not be extracted in the machine translation step or machine translation means.

【0025】訳語候補生成ステップまたは訳語候補生成
手段は、翻訳辞書を用いて、未訳出文字列の目的言語訳
語候補である未訳出訳語候補を検索する。
The translation word candidate generation step or the translation word candidate generation means searches the translation dictionary for an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string.

【0026】共起強度検出ステップまたは共起強度検出
手段は、目的言語の単語対とその共起頻度情報からなる
エントリの集合を保持する目的言語共起情報DBを用い
て、未訳出訳語候補中の目的言語訳語候補と、入力原言
語文中の訳出済みの目的言語単語との組で構成される各
目的言語訳語候補対の訳語候補共起強度を計算する。
The co-occurrence strength detecting step or the co-occurrence strength detecting means uses the target language co-occurrence information DB that holds a set of entries consisting of word pairs of the target language and their co-occurrence frequency information, and selects among untranslated word candidates. The target word translated word co-occurrence strength of each target language translated word candidate pair composed of a set of the target language translated word candidate and the translated target language word in the input source language sentence is calculated.

【0027】訳語決定ステップまたは訳語決定手段は、
訳語候補共起強度を用いて、未訳出文字列に対する目的
言語訳語候補を選択する。
The translated word determining step or translated word determining means is
Using the translation candidate co-occurrence strength, the target language translation candidate for the untranslated character string is selected.

【0028】これにより、目的言語の適切性を考慮しな
がら、訳文品質低下の原因の未訳出文字列を目的言語の
単語に訳出することができる。
This makes it possible to translate the untranslated character string, which is the cause of the quality deterioration of the translated sentence, into a word in the target language while considering the appropriateness of the target language.

【0029】また、訳語候補生成ステップまたは訳語候
補生成手段は、翻訳辞書を用いて、未訳出文字列以外の
入力原言語文中の単語の目的言語訳語候補である訳出単
語訳語候補を検索し、共起強度検出ステップまたは共起
強度検出手段は、訳出単語訳語候補と未訳出訳語候補中
の目的言語訳語候補の組み合わせからなる各目的言語訳
語候補対の訳語候補共起強度も計算し、訳語決定ステッ
プまたは訳語決定手段は、訳語候補共起強度を用いて、
入力原言語文に対する目的言語訳語候補列を選択する。
Further, the translation word candidate generation step or the translation word candidate generation means searches the translation word translation word candidate, which is a target language translation word candidate of the word in the input source language sentence other than the untranslated character string, by using the translation dictionary, and The coercive strength detection step or co-occurrence strength detection means also calculates a co-occurrence strength of a candidate word for each target language translation candidate pair consisting of a combination of a translated word translation candidate and a target language translation candidate in the untranslated translation word candidate, and a translation determination step. Alternatively, the translation word determining means uses the translation word candidate co-occurrence strength,
Select the target language translation candidate sequence for the input source language sentence.

【0030】これにより、入力原言語文全体としての訳
語候補の適切性を考慮しながら、未訳出文字列以外の原
言語単語に対する訳語候補も選択することができる。
This makes it possible to select translation word candidates for source language words other than the untranslated character strings while considering the appropriateness of translation word candidates for the entire input source language sentence.

【0031】また、形態素解析ステップまたは形態素解
析手段は、入力原言語文の形態素解析を行い、単語分割
候補を生成する。訳語候補生成ステップまたは訳語候補
生成手段は、翻訳辞書を用いて、単語分割候補中の原言
語単語に対する目的言語訳語候補である分割単語訳語候
補を検索する。
The morpheme analysis step or morpheme analysis means performs morpheme analysis on the input source language sentence to generate word division candidates. The translation word candidate generation step or the translation word candidate generation means searches for a divided word translation word candidate that is a target language translation word candidate for the source language word in the word division candidates using the translation dictionary.

【0032】共起強度検出ステップまたは共起強度検出
手段は、訳語候補生成ステップまたは訳語候補生成手段
で得られる分割単語訳語候補も含めて目的言語訳語候補
として訳語候補共起強度を計算する。単語分割選択ステ
ップまたは単語分割選択手段は、訳語候補共起強度を用
いて、形態素解析ステップまたは形態素解析手段から得
られる各単語分割候補に対する単語分割尤度を計算し、
単語分割尤度を用いて、最尤となる単語分割候補を選択
して出力する。
The co-occurrence strength detection step or co-occurrence strength detection means calculates the translation word candidate co-occurrence strength as the target language translation word candidate including the divided word translation word candidates obtained by the translation word candidate generation step or the translation word candidate generation means. The word division selection step or word division selection means uses the translation word candidate co-occurrence strength to calculate the word division likelihood for each word division candidate obtained from the morpheme analysis step or the morpheme analysis means,
The word division likelihood is used to select and output the word division candidate having the maximum likelihood.

【0033】これにより、上記の最尤な単語分割候補を
用いて、入力原言語文の解析のやり直しが可能となり、
さらにこの処理に伴って出力される訳語候補列を機械翻
訳システムの目的言語文生成処理に利用することができ
る。ゆえに、入力原言語文の解析失敗による未訳出の現
象を解決することができる。
As a result, it becomes possible to redo the analysis of the input source language sentence by using the above-mentioned most likely word segmentation candidate,
Further, the translated word candidate string output along with this processing can be used for the target language sentence generation processing of the machine translation system. Therefore, an untranslated phenomenon due to a failure in parsing the input source language sentence can be solved.

【0034】また、訳語候補生成ステップまたは訳語候
補生成手段は、機械翻訳ステップまたは機械翻訳手段で
用いる翻訳辞書に加えて、原言語単語と目的言語訳語候
補の対訳関係の集合を保持する別の翻訳辞書も参照し
て、原言語単語列中の各原言語単語の目的言語訳語候補
を検索する。
Further, the translation word candidate generation step or the translation word candidate generation means, in addition to the translation dictionary used in the machine translation step or the machine translation means, another translation holding a set of bilingual relations between the source language word and the target language translation word candidate. Also referring to the dictionary, a target language translation candidate of each source language word in the source language word string is searched.

【0035】これにより、目的言語訳語候補を増やし、
適当な訳語を選択する可能性を高めることができる。
As a result, the target language translated word candidates are increased,
The possibility of selecting an appropriate translation word can be increased.

【0036】また、共起強度検出ステップまたは共起強
度検出手段は、原言語の単語対とその共起頻度情報から
なるエントリの集合を保持する原言語共起情報DBを用
いて、原言語単語対に対する原言語における共起強度を
もとに、対応する訳語候補共起強度に重み付けを行う。
Further, the co-occurrence strength detection step or the co-occurrence strength detection means uses the source language co-occurrence information DB that holds a set of entries consisting of the source language word pairs and their co-occurrence frequency information, and uses the source language words. Based on the co-occurrence strength in the source language for the pair, the corresponding translation candidate co-occurrence strength is weighted.

【0037】これにより、原言語で共起し易い単語の訳
語候補の共起関係を重視して訳語選択することが可能と
なり、訳語選択精度をより向上させることができる。
As a result, it is possible to select a translated word by emphasizing the co-occurrence relationship of translated word candidates of a word that easily co-occurs in the source language, and it is possible to further improve the translated word selection accuracy.

【0038】従って、上記のステップを実行するか、上
記の手段を用いることにより、機械翻訳の訳文中に出現
する未訳出文字列に対する訳語候補を出力することが可
能となる。
Therefore, by executing the above steps or by using the above means, it is possible to output the translated word candidates for the untranslated character strings appearing in the translated text of the machine translation.

【0039】[0039]

【発明の実施の形態】以下、本発明の実施の形態を図面
とともに説明する。以下に示す実施の形態では、原言語
は日本語、目的言語は英語であるとする。
BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. In the embodiments described below, the source language is Japanese and the target language is English.

【0040】図3は、本発明の機械翻訳装置の実施の形
態の一例を示す基本ブロック構成図である。同図におい
て、10は翻訳辞書、20は目的言語共起情報データベ
ース(DB)、30は原言語共起情報データベース(D
B)、11は機械翻訳部、12は未訳出検出部、13は
形態素解析部、14は訳語候補生成部、15は共起強度
検出部、16は訳語決定部、17は単語分割選択部であ
る。
FIG. 3 is a basic block diagram showing an example of the embodiment of the machine translation apparatus of the present invention. In the figure, 10 is a translation dictionary, 20 is a target language co-occurrence information database (DB), and 30 is a source language co-occurrence information database (D).
B), 11 is a machine translation unit, 12 is an untranslated detection unit, 13 is a morpheme analysis unit, 14 is a translation word candidate generation unit, 15 is a co-occurrence strength detection unit, 16 is a translation word determination unit, and 17 is a word division selection unit. is there.

【0041】機械翻訳部11は、入力原言語文を、翻訳
辞書10を用いて、目的言語機械訳文へ自動的に翻訳す
る。未訳出検出部12は、この目的言語機械訳文から目
的言語の訳語を抽出できなかった未訳出文字列を検出す
る。
The machine translation unit 11 automatically translates the input source language sentence into a target language machine translated sentence using the translation dictionary 10. The untranslated detection unit 12 detects an untranslated character string for which a translation of the target language could not be extracted from this target language machine translation.

【0042】形態素解析部13は、入力原言語文を形態
素解析して、単語分割候補を生成する。
The morphological analysis unit 13 morphologically analyzes the input source language sentence to generate word division candidates.

【0043】訳語候補生成部14は、翻訳辞書10を用
いて、前記の未訳出文字列、訳出されている原言語単語
または単語分割候補中の単語に対する目的言語訳語候補
を検索する。
The translation word candidate generation unit 14 searches the translation dictionary 10 for the target language translation word candidate for the untranslated character string, the translated source language word, or the word in the word division candidates.

【0044】共起強度検出部15は、目的言語共起情報
DB20、原言語共起情報DB30を用いて、目的言語
訳語候補の組み合わせからなる各目的言語訳語候補対の
訳語候補共起強度を計算する。
The co-occurrence strength detection unit 15 uses the target language co-occurrence information DB 20 and the source language co-occurrence information DB 30 to calculate the target word candidate word co-occurrence strength of each target language target word candidate pair consisting of combinations of target language target word candidates. To do.

【0045】訳語決定部16は、この訳語候補共起強度
を用いて、未訳出文字列に対する目的言語訳語候補を選
択する。単語分割選択部17は、各単語分割候補に対す
る単語分割尤度を計算し、最尤となる単語分割候補を選
択する。
The translated word determination unit 16 uses this translated word candidate co-occurrence strength to select a target language translated word candidate for the untranslated character string. The word division selection unit 17 calculates the word division likelihood for each word division candidate and selects the word division candidate that is the maximum likelihood.

【0046】なお、この機械翻訳装置は、CPU、メモ
リ、入出力装置、外部記憶装置等からなるコンピュータ
と、該コンピュータに読み取られた際、このコンピュー
タを前記各手段として機能させるための機械翻訳プログ
ラムを記憶した媒体とによって実現することもできる。
The machine translation device is a computer including a CPU, a memory, an input / output device, an external storage device, and the like, and a machine translation program for causing the computer to function as each of the means when read by the computer. It can also be realized by a medium storing the.

【0047】次に、図3の基本ブロック構成の機械翻訳
の手順について説明する。ここでは、機械翻訳システム
からの目的言語機械訳文の一部として図4に示す情報を
入力例として説明する。
Next, the procedure of machine translation of the basic block configuration of FIG. 3 will be described. Here, the information shown in FIG. 4 will be described as an input example as a part of the target language machine translated text from the machine translation system.

【0048】まず、機械翻訳部11は、翻訳の対象とす
る原言語文”〜が今春から商用化を予定している〜”を
入力し、翻訳辞書10を用いて原言語文を目的言語機械
訳文に翻訳し、図4に示すように、元の原言語単語が対
応した訳語の集合で構成される翻訳文構造情報を生成す
る。
First, the machine translation unit 11 inputs a source language sentence to be translated "... is scheduled to be commercialized from this spring", and uses the translation dictionary 10 to translate the source language sentence into a target language machine. It is translated into a translated sentence, and as shown in FIG. 4, translated sentence structure information composed of a set of translated words corresponding to the original source language word is generated.

【0049】次に、未訳出検出部12で、この翻訳文構
造情報から、品詞条件が合わなかった等の原因で目的言
語の訳語の抽出に失敗してしまった未訳出文字列″今
春″を抽出する。未訳出検出部12は、翻訳文構造情報
を出力する前の早い翻訳処理段階で、訳語を抽出できな
い原言語文字列を検出・抽出することも可能である。
Next, the untranslated detection unit 12 extracts the untranslated character string "Imaharu" from which the translation of the target language has failed to be extracted from the translated sentence structure information due to a part-of-speech condition not being met or the like. Extract. The untranslated detection unit 12 can also detect / extract a source language character string from which a translated word cannot be extracted at an early translation processing stage before outputting the translated sentence structure information.

【0050】訳語候補生成部14は、この未訳出文字
列″今春″に対する目的言語訳語候補を、翻訳辞書10
を用いて検索する。例えば、未訳出文字列″今春″に対
して、図5に示すような訳語候補が得られる。この時、
機械翻訳システムが持つ翻訳辞書10に加えて、別の翻
訳辞書10を参照することで、目的言語訳語候補を増や
すことができる。
The translation word candidate generation unit 14 extracts the target language translation word candidate for the untranslated character string "this spring" from the translation dictionary 10
Search using. For example, translation word candidates as shown in FIG. 5 are obtained for the untranslated character string "this spring". At this time,
By referring to another translation dictionary 10 in addition to the translation dictionary 10 included in the machine translation system, it is possible to increase the target language translation word candidates.

【0051】次に、共起強度検出部15は、訳語候補生
成部14で得られた目的言語訳語候補と、翻訳文構造情
報(図4)中の訳出済みの目的言語単語との組で構成さ
れる各目的言語訳語候補対の集合を列挙する。本実施の
形態では目的言語の前置詞は考慮せずに目的言語訳語候
補対を設定する。目的言語訳語候補対の集合の例を図6
に示す。
Next, the co-occurrence intensity detection unit 15 is composed of a set of the target language translation candidate obtained by the translation candidate generation unit 14 and the translated target language word in the translated sentence structure information (FIG. 4). Enumerate a set of candidate translation pairs for each target language. In the present embodiment, the target language translation candidate pair is set without considering the preposition of the target language. FIG. 6 shows an example of a set of target language translation candidate pairs.
Shown in.

【0052】ここで、図7に共起強度検出部15で利用
する目的言語共起情報DB20の内容例を示す。目的言
語共起情報DB20のエントリの内容は、例えば図7か
ら、単語″schedule″と単語″spring″
が、共起情報を収集する際に定めた範囲内(例えば、一
文内)で、同時に共起する頻度が10であることを示し
ている。
Here, FIG. 7 shows an example of the contents of the target language co-occurrence information DB 20 used by the co-occurrence strength detection unit 15. The contents of the entries in the target language co-occurrence information DB 20 are, for example, as shown in FIG. 7, the words "schedule" and the words "spring".
Indicates that the co-occurrence frequency is 10 within the range (for example, one sentence) defined when the co-occurrence information is collected.

【0053】次に、この目的言語共起情報DB20を用
いて、各目的言語訳語候補対に対する訳語候補共起強度
を計算する。図8に、各目的言語訳語候補対に対して計
算された訳語候補共起強度の集合の例を示す。
Next, using this target language co-occurrence information DB 20, the translation word candidate co-occurrence strength for each target language translation word candidate pair is calculated. FIG. 8 shows an example of a set of translation word candidate co-occurrence intensities calculated for each target language translation word candidate pair.

【0054】さらに、訳語決定部16は各目的言語訳語
候補対に対する訳語候補共起強度を用いて、未訳出文字
列″今春″に対する訳語候補を選択する。
Further, the translated word determination unit 16 selects a translated word candidate for the untranslated character string "this spring" using the translated word candidate co-occurrence strength for each target language translated word candidate pair.

【0055】訳語候補を決定する基準値の算出方法の一
例としては、入力原言語文中の原言語単語に対する訳語
候補をそれぞれ決定した時、その入力原言語文全体の共
起強度は、目的言語訳語候補対の組み合わせの各訳語候
補共起強度の積と近似できる。
As an example of the method of calculating the reference value for determining the translation candidate, when the translation candidate for each source language word in the input source language sentence is determined, the co-occurrence strength of the entire input source language sentence is the target language translation word. It can be approximated to the product of the co-occurrence strength of each candidate word of the combination of candidate pairs.

【0056】即ち、ここでは(A)(″schedul
e″と″this spring″の共起強度)×(″
commercialization″と″this
spring″の共起強度)、(B)(″schedu
le″と″presentspring″の共起強度)
×(″commercialization″と″pr
esent spring″の共起強度)、の2つの組
み合わせで入力原言語文全体の共起強度を計算できる。
That is, here, (A) ("schedul
e "and" this spring "co-occurrence strength) x ("
commercialization "and" this
co-occurrence strength of "spring"), (B) ("schedu"
co-occurrence strength of le "and" present spring ")
× ("commercialization" and "pr
The co-occurrence strength of the entire input source language sentence can be calculated by a combination of the two.

【0057】最後に、入力原言語文全体の共起強度が最
も高いものをとる目的言語訳語候補列の組(A)が選択
され、出力される。図9に目的言語訳語候補の出力結果
を示す。
Finally, the set (A) of the target language translated word candidate strings having the highest co-occurrence strength of the entire input source language sentence is selected and output. FIG. 9 shows the output result of the target language translated word candidates.

【0058】また、前記訳語候補生成部14では、未訳
出文字列以外の単語についても翻訳辞書10を用いて目
的言語訳語候補を生成できる。未訳出文字列以外の単語
に対して、図10に示すような訳語候補が得られる。本
実施の形態では、日本語の助詞、助動詞の訳出処理は省
略する。
The translated word candidate generation unit 14 can also generate target language translated word candidates for the words other than the untranslated character strings by using the translation dictionary 10. For words other than the untranslated character strings, translated word candidates as shown in FIG. 10 are obtained. In this embodiment, the process of translating Japanese particles and auxiliary verbs is omitted.

【0059】その後、共起強度検出部15において、そ
の訳語候補生成部14で得られた全ての目的言語訳語候
補の組み合わせからなる各目的言語訳語候補対の集合を
列挙する。ここで得られる目的言語訳語候補対の例を図
11に示す。
After that, the co-occurrence strength detection unit 15 enumerates a set of each target language translation candidate pair consisting of combinations of all the target language translation candidate obtained by the translation candidate generation unit 14. FIG. 11 shows an example of target language translation candidate pairs obtained here.

【0060】続いて、目的言語共起情報DB20を用い
て、各目的言語訳語候補対に対する訳語候補共起強度を
計算する。
Subsequently, the target language co-occurrence information DB 20 is used to calculate the translation word candidate co-occurrence strength for each target language translation word candidate pair.

【0061】次に、訳語決定部16において、前記訳語
候補共起強度を用いて、上記と同様の方法で入力原言語
文全体の共起強度の最大値を求め、入力原言語文中の各
単語に対して最適な目的言語訳語候補を選択することが
できる。
Next, in the translated word determination unit 16, the maximum value of the co-occurrence strength of the entire input source language sentence is calculated using the translation word candidate co-occurrence intensity in the same manner as described above, and each word in the input source language sentence is found. It is possible to select an optimal target language translation candidate for.

【0062】また、原言語における共起頻度情報からな
るエントリの集合を保持する原言語共起情報DB30か
ら得られる原言語共起情報を用いた優先訳語選択方法の
一例を以下に説明する。原言語共起情報DB30の例を
図12に示す。
Further, an example of a priority translation word selection method using the source language co-occurrence information obtained from the source language co-occurrence information DB 30 which holds a set of entries composed of the co-occurrence frequency information in the source language will be described below. An example of the source language co-occurrence information DB 30 is shown in FIG.

【0063】原言語単語対の原言語における共起強度
を、原言語単語列中の単語組み合わせの共起強度の和に
対する該原言語単語対の共起頻度の割合とする。共起強
度検出部15において、全ての訳語候補共起強度に、そ
れと対応する原言語単語対の原言語における共起強度を
掛ける。このようにして重み付けされた訳語候補共起強
度を用いて訳語決定部6で最終的な訳語を選択する。
The co-occurrence strength of the source language word pair in the source language is defined as the ratio of the co-occurrence frequency of the source language word pair to the sum of the co-occurrence strengths of the word combinations in the source language word string. In the co-occurrence strength detection unit 15, all the candidate word co-occurrence strengths are multiplied by the co-occurrence strengths of the corresponding source language word pairs in the source language. The translation word determination unit 6 selects the final translation word using the translation word candidate co-occurrence strength weighted in this way.

【0064】図8に示した訳語候補共起強度に、以上の
手順によって変更を加えた結果を図13に示す。
FIG. 13 shows the result of changing the translation candidate co-occurrence strength shown in FIG. 8 by the above procedure.

【0065】また、未訳出文字列″今春″が翻訳辞書1
0によって検索できなかった場合の拡張方法として次の
実施の形態を示す。
In addition, the untranslated character string "this spring" is the translation dictionary 1
The following embodiment will be shown as an extension method in the case where the search cannot be performed with 0.

【0066】形態素解析部13は入力原言語文の形態素
解析を行い、単語分割候補の集合を作成する。図4中の
入力原言語文の単語分割候補の集合の一例を図14に示
す。
The morphological analysis unit 13 performs a morphological analysis of the input source language sentence and creates a set of word division candidates. FIG. 14 shows an example of a set of word division candidates of the input source language sentence in FIG.

【0067】訳語候補生成部14は、形態素解析部13
から得られる各単語分割候補に含まれる各原言語単語に
対する目的言語訳語候補を翻訳辞書10を用いて検索す
る。例えば、図14の各単語に対しては、図15の訳語
候補が得られる。
The translation word candidate generation unit 14 includes a morpheme analysis unit 13.
The translation dictionary 10 is used to search for a target language translated word candidate for each source language word included in each word division candidate obtained from the above. For example, for each word in FIG. 14, the translation word candidates in FIG. 15 are obtained.

【0068】次に、共起強度検出部15は、各単語分割
候補について、上記の訳語候補生成部14で得られた目
的言語訳語候補と、翻訳文構造情報(図4)中の訳出済
みの目的言語単語との組で構成される各目的言語訳語候
補対の集合、または上記の訳語候補生成部14で得られ
た全ての目的言語訳語候補の組み合わせからなる各目的
言語訳語候補対の集合を作成する。続いて、目的言語共
起情報DB20を用いて、全ての目的言語訳語候補対に
ついての訳語候補共起強度を求める。ここで得られる目
的言語訳語候補対及び対応する訳語候補共起強度の集合
の例を図16に示す。
Next, the co-occurrence strength detection unit 15 determines, for each word division candidate, the target language translated word candidate obtained by the translated word candidate generation unit 14 and the translated word in the translated sentence structure information (FIG. 4). A set of each target language translation candidate pair composed of a set of target language words, or a set of each target language translation candidate pair consisting of a combination of all the target language translation candidate obtained by the translation candidate generation unit 14 create. Then, the target language co-occurrence information DB 20 is used to obtain the target word candidate co-occurrence strengths for all target language target word candidate pairs. FIG. 16 shows an example of a set of target language translation word candidate pairs and corresponding translation word candidate co-occurrence strengths obtained here.

【0069】単語分割選択部17では、共起強度検出部
15で計算した訳語候補共起強度を用いて、個々の単語
分割候補について上記と同様の方法で入力原言語文全体
の共起強度(ここでは、単語分割尤度とみなせる。)を
それぞれ計算する。
The word division selection unit 17 uses the translation word candidate co-occurrence strength calculated by the co-occurrence strength detection unit 15 for each word division candidate in the same manner as described above for the co-occurrence strength of the entire input source language sentence ( Here, it can be regarded as word division likelihood.) Is calculated.

【0070】最後に、入力原言語文全体の共起強度が最
大となる単語分割候補を選択し、同時に入力原言語文の
各単語に対して最適な目的言語訳語候補を選択すること
が可能となる。ここでは、図16の訳語候補共起強度を
用いて図14の各単語分割候補の(ア),(イ)それぞ
れの文全体の共起強度を求め、最大となる方、例えば
(イ)の単語分割候補が選択される。これに伴って、共
起強度最大となった(イ)の各単語に対する目的言語訳
語候補列が出力される。
Finally, it is possible to select a word segmentation candidate that maximizes the co-occurrence strength of the entire input source language sentence, and at the same time select an optimal target language translation candidate for each word of the input source language sentence. Become. Here, using the translation word candidate co-occurrence strength of FIG. 14, the co-occurrence strength of the entire sentence of each of the word division candidates (A) and (A) of FIG. A word division candidate is selected. Along with this, the target language translated word candidate string for each word of (a) having the maximum co-occurrence strength is output.

【0071】なお、本発明は、上記の実施の形態に限定
されることなく、特許請求の範囲内で変更、応用が可能
である。
The present invention is not limited to the above-mentioned embodiments, but can be modified and applied within the scope of the claims.

【0072】[0072]

【発明の効果】上述のように、本発明によれば、入力原
言語文を、翻訳辞書を用いて目的言語機械訳文へ自動的
に翻訳し、この目的言語機械訳文から目的言語の訳語が
抽出できなかった未訳出文字列を検出し、翻訳辞書を用
いて未訳出文字列または訳出されている原言語単語の目
的言語訳語候補を検索する。そして、目的言語の単語対
とその共起頻度情報からなるエントリの集合を保持する
共起情報DBを用いて、目的言語訳語候補の組み合わせ
からなる各目的言語訳語候補対の訳語候補共起強度を計
算し、この訳語候補共起強度を用いて未訳出文字列に対
する目的言語訳語候補を選択する。これにより、目的言
語での適切性を考慮しながら、訳文品質低下の原因とな
る未訳出文字列を目的言語の単語に訳出することができ
る。
As described above, according to the present invention, an input source language sentence is automatically translated into a target language machine translation using a translation dictionary, and a target language translation is extracted from the target language machine translation. The untranslated character string that could not be detected is detected, and the translation dictionary is used to search for the target language translation candidate of the untranslated character string or the translated source language word. Then, using the co-occurrence information DB that holds a set of entries consisting of word pairs of the target language and their co-occurrence frequency information, the translation candidate co-occurrence strength of each target language translation candidate pair composed of combinations of target language translation candidates is determined. The target word translation word candidate for the untranslated character string is selected by using the translation word candidate co-occurrence strength. This makes it possible to translate an untranslated character string, which causes a reduction in the quality of the translated text, into a word in the target language while taking into account the suitability for the target language.

【0073】また、入力原言語文を形態素解析して単語
分割候補を生成し、翻訳辞書を用いて単語分割候補中の
原言語単語に対する分割単語訳語候補を検索する。次
に、この分割単語訳語候補も含めて目的言語訳語候補と
して訳語候補共起強度を計算し、各単語分割候補に対す
る単語分割尤度を計算し、最尤となる単語分割候補を選
択することで、入力原言語文の解析のやり直しや訳語候
補列の生成が可能となり、入力原言語文の解析失敗によ
る未訳出の問題を解決することができる。
Further, the input source language sentence is subjected to morphological analysis to generate word division candidates, and the translation dictionary is used to retrieve the division word translation candidate for the source language word in the word division candidates. Next, the translation candidate co-occurrence strength is calculated as the target language translation candidate including this split word translation candidate, the word split likelihood for each word split candidate is calculated, and the word split candidate that is the maximum likelihood is selected. , It becomes possible to redo the analysis of the input source language sentence and generate the translation candidate string, and it is possible to solve the untranslated problem due to the failure of the analysis of the input source language sentence.

【0074】さらに、複数の翻訳辞書を利用することに
よる目的言語訳語候補の増加及び原言語の単語共起情報
の利用によって適切な訳語を選択する精度を高めること
ができる。
Further, by using a plurality of translation dictionaries, it is possible to increase the number of target language translation word candidates and to increase the accuracy of selecting an appropriate translation word by using the word co-occurrence information of the source language.

【0075】以上のようにして、機械翻訳の訳文中に出
現する未訳出文字列に対する訳語候補を出力することが
可能となる。
As described above, it is possible to output translation word candidates for an untranslated character string appearing in a translated text of machine translation.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明方法の概要を示す流れ図FIG. 1 is a flow chart showing an outline of the method of the present invention.

【図2】本発明装置の概要を示す構成図FIG. 2 is a configuration diagram showing an outline of the device of the present invention.

【図3】本発明の機械翻訳装置の実施の形態の一例を示
す基本ブロック構成図
FIG. 3 is a basic block configuration diagram showing an example of an embodiment of a machine translation device of the present invention.

【図4】翻訳文構造情報の例を示す図FIG. 4 is a diagram showing an example of translated sentence structure information.

【図5】未訳出文字列に対する目的言語訳語候補を示す
FIG. 5 is a diagram showing target language translated word candidates for untranslated character strings.

【図6】目的言語訳語候補対の集合を示す図FIG. 6 is a diagram showing a set of target language translation candidate pairs.

【図7】目的言語共起情報データベースの内容例を示す
FIG. 7 is a diagram showing an example of contents of a target language co-occurrence information database.

【図8】訳語候補共起強度の集合の例を示す図FIG. 8 is a diagram showing an example of a set of translation word candidate co-occurrence strengths.

【図9】入力原言語文と目的言語訳語候補の対応を示す
FIG. 9 is a diagram showing correspondence between input source language sentences and target language translated word candidates.

【図10】各原言語単語に対する目的言語訳語候補を示
す図
FIG. 10 is a diagram showing target language translated word candidates for each source language word.

【図11】未訳出文字列以外の原言語単語の訳語候補を
含めた目的言語訳語候補対の集合を示す図
FIG. 11 is a diagram showing a set of target language translation word candidate pairs including translation word candidates of source language words other than untranslated character strings.

【図12】原言語共起情報データベースの内容例を示す
FIG. 12 is a diagram showing an example of contents of a source language co-occurrence information database.

【図13】原言語の共起強度を加味した訳語候補共起強
度の集合の例を示す図
FIG. 13 is a diagram showing an example of a set of translation word candidate co-occurrence strengths in which the co-occurrence strengths of the source language are added.

【図14】単語分割候補の集合の一例を示す図FIG. 14 is a diagram showing an example of a set of word division candidates.

【図15】単語分割候補中の各単語に対する目的言語訳
語候補の集合の一例を示す図
FIG. 15 is a diagram showing an example of a set of target language translated word candidates for each word in the word division candidates.

【図16】単語分割候補を用いた時の訳語候補共起強度
の集合の例を示す図
FIG. 16 is a diagram showing an example of a set of translation word candidate co-occurrence strengths when word division candidates are used.

【符号の説明】[Explanation of symbols]

11:機械翻訳部、12:未訳出検出部、13:形態素
解析部、14:訳語候補生成部、15:共起強度検出
部、16:訳語決定部、17:単語分割選択部、10:
翻訳辞書、20:目的言語共起情報データベース、3
0:原言語共起情報データベース。
11: Machine translation unit, 12: Untranslated detection unit, 13: Morphological analysis unit, 14: Translated word candidate generation unit, 15: Co-occurrence strength detection unit, 16: Translated word determination unit, 17: Word division selection unit, 10:
Translation dictionary, 20: Target language co-occurrence information database, 3
0: Source language co-occurrence information database.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.7,DB名) G06F 17/21 - 17/28 ─────────────────────────────────────────────────── ─── Continuation of the front page (58) Fields surveyed (Int.Cl. 7 , DB name) G06F 17/21-17/28

Claims (16)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】 原言語単語と目的言語訳語候補の対訳関
係の集合を保持する翻訳辞書と、目的言語の単語対とそ
の共起頻度情報からなるエントリの集合を保持する目的
言語共起情報データベースと、機械翻訳手段と、未訳出
検出手段と、形態素解析手段と、訳語候補生成手段と、
共起強度検出手段と、訳語決定手段と、単語分割選択手
段とを構成する機械翻訳装置を用いて、原言語文から目
的言語文へ自動的に翻訳を行う機械翻訳方法において、機械翻訳手段により、前記 翻訳辞書を用いて、入力原言
語文を目的言語の文である目的言語機械訳文へ自動的に
翻訳を行う機械翻訳ステップと、未訳出検出手段により、 前記機械翻訳ステップにおいて
目的言語の訳語を抽出できなかった未訳出文字列を検出
する未訳出検出ステップと、形態素解析手段により、前記入力原言語文の形態素解析
を行い、単語分割候補を生成する形態素解析ステップ
と、 訳語候補生成手段により、 前記翻訳辞書を用いて、前記
未訳出文字列の目的言語訳語候補である未訳出訳語候補
を検索するとともに、前記翻訳辞書を用いて、前記単語
分割候補中の原言語単語に対する目的言語訳語候補であ
る分割単語訳語候補を検索する訳語候補生成ステップ
と、共起強度検出手段により、前記 目的言語共起情報データ
ベースを用いて、前記未訳出訳語候補中の目的言語訳語
候補とともに前記訳語候補生成手段で得られる前記分割
単語訳語候補も含む目的言語訳語候補と、前記入力原言
語文中の訳出済みの目的言語単語との組で構成される各
目的言語訳語候補対の訳語候補共起強度を計算する共起
強度検出ステップと、訳語決定手段により、 前記訳語候補共起強度を用いて、
前記未訳出文字列に対する目的言語訳語候補を選択する
訳語決定ステップと 単語分割選択手段により、前記訳語候補共起強度を用い
て、前記形態素解析ステップから得られる各単語分割候
補に対する単語分割尤度を計算し、該単語分割尤度を用
いて、最尤となる単語分割候補を選択して出力する単語
分割選択ステップと を含むことを特徴とする機械翻訳方
法。
1. A parallel translation relationship between a source language word and a target language translation candidate.
A translation dictionary that holds a set of operators, word pairs in the target language, and
Purpose of holding a set of entries consisting of co-occurrence frequency information
Language co-occurrence information database, machine translation means, untranslated
Detection means, morpheme analysis means, translation word candidate generation means,
Co-occurrence intensity detection means, translation word determination means, word division selection
In a machine translation method for automatically translating a source language sentence into a target language sentence by using a machine translation device that constitutes a stage , a machine translation means uses the translation dictionary to convert an input source language sentence into a target language sentence. A machine translation step for automatically translating into a target language machine translated sentence that is a sentence of, and an untranslated detection for detecting an untranslated character string for which a translated word of the target language could not be extracted in the machine translation step by the untranslated detection means. Morphological analysis of the input source language sentence by steps and morphological analysis means
Morphological analysis step for generating word division candidates
And a translation word candidate generation unit , using the translation dictionary, searches for an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string, and uses the translation dictionary
It is a target language translation candidate for the source language word in the division candidate.
A candidate word generating step of retrieving the divided word translation candidates that, by co-occurrence intensity detecting means, using said target language co-occurrence information database, the at the candidate word generating means together with the target language translation candidates in non-translation candidate word The division obtained
A co-occurrence strength detecting step of calculating a co-occurrence strength of a target word translation candidate pair composed of a target language target word candidate including a word target word candidate and a translated target language word in the input source language sentence. And the translated word determining means using the translated word candidate co-occurrence strength,
Using the translation word candidate co-occurrence strength by a translation word determination step of selecting a target language translation word candidate for the untranslated character string, and a word division selection unit.
Each word segmentation obtained from the morphological analysis step
Compute the word division likelihood for the complement and use the word division likelihood
And output the selected word division candidate with the maximum likelihood.
A machine translation method comprising: a division selection step .
【請求項2】 前記訳語候補生成ステップにおいて、
語候補生成手段により、前記翻訳辞書を用いて、前記未
訳出文字列以外の入力原言語文中の単語の目的言語訳語
候補である訳出単語訳語候補を検索するステップを設
け、 前記共起強度検出ステップにおいて、共起強度検出手段
により、前記訳出単語訳語候補と前記未訳出訳語候補中
の目的言語訳語候補の組み合わせからなる各目的言語訳
語候補対の訳語候補共起強度も計算するステップを設
け、 前記訳語決定ステップにおいて、訳語決定手段により、
前記訳語候補共起強度を用いて、前記入力原言語文に対
する目的言語訳語候補列を選択するステップを設けたこ
とを特徴とする請求項1記載の機械翻訳方法。
2. In the translation word candidate generation step, translation
A step of searching a translation word translation word candidate, which is a target language translation word candidate of a word in the input source language sentence other than the untranslated character string, by the word candidate generating means , using the translation dictionary; In co-occurrence intensity detection means
Accordingly, the translation candidate word cooccurrence intensity of each target language translation candidate pairs which consist of a combination of the target language translation candidates in the word candidate words and the non-translation candidate word also provided the step of calculating, in said translation determining step, translation determined By means
2. The machine translation method according to claim 1, further comprising the step of selecting a target language translation candidate string for the input source language sentence using the translation candidate co-occurrence strength.
【請求項3】 前記訳語候補生成ステップにおいて、
語候補生成手段により、前記機械翻訳ステップで用いる
前記翻訳辞書に加えて、原言語単語と目的言語訳語候補
の対訳関係の集合を保持する別の翻訳辞書も参照して、
前記原言語単語列中の各原言語単語の目的言語訳語候補
を検索するステップを設けたことを特徴とする請求項1
又は2記載の機械翻訳方法。
3. In the translation word candidate generation step, translation
By the word candidate generating means, in addition to the translation dictionary used in the machine translation step, another translation dictionary holding a set of bilingual relations of source language words and target language translation word candidates is also referred to,
The step of searching for a target language translation candidate for each source language word in the source language word string is provided.
Or the machine translation method described in 2.
【請求項4】 原言語の単語対とその共起頻度情報から
なるエントリの集合を保持する原言語共起情報データベ
ースを含み、前記共起強度検出ステップにおいて、共起
強度検出手段により、前記原言語共起情報データベース
を用いて、原言語単語対に対する原言語における共起強
度をもとに、対応する訳語候補共起強度に重み付けを行
うステップを設けたことを特徴とする請求項1乃至3い
ずれか記載の機械翻訳方法。
4. A includes a source language co-occurrence information database that holds a set of entries consisting of word pairs and their co-occurrence frequency information of the source language, in the co-occurrence intensity detecting step, the co-occurrence
The intensity detecting means uses the source language co-occurrence information database to weight the corresponding translation word candidate co-occurrence intensity based on the co-occurrence intensity in the source language for the source language word pair. The machine translation method according to any one of claims 1 to 3.
【請求項5】 原言語文から目的言語文へ自動的に翻訳
を行う機械翻訳装置において、 原言語単語と目的言語訳語候補の対訳関係の集合を保持
する翻訳辞書と、 該翻訳辞書を用いて、入力原言語文を目的言語の文であ
る目的言語機械訳文へ自動的に翻訳を行う機械翻訳手段
と、 前記機械翻訳手段において目的言語の訳語を抽出できな
かった未訳出文字列を検出する未訳出検出手段と、前記入力原言語文の形態素解析を行い、単語分割候補を
生成する形態素解析手 段と、 前記翻訳辞書を用いて、前記未訳出文字列の目的言語訳
語候補である未訳出訳語候補を検索するとともに、前記
翻訳辞書を用いて、前記単語分割候補中の原言語単語に
対する目的言語訳語候補である分割単語訳語候補を検索
する訳語候補生成手段と、 目的言語の単語対とその共起頻度情報からなるエントリ
の集合を保持する目的言語共起情報データベースと、 該目的言語共起情報データベースを用いて、前記未訳出
訳語候補中の目的言語訳語候補とともに前記訳語候補生
成手段で得られる前記分割単語訳語候補も含む目的言語
訳語候補と、前記入力原言語文中の訳出済みの目的言語
単語との組で構成される各目的言語訳語候補対の訳語候
補共起強度を計算する共起強度検出手段と、 前記訳語候補共起強度を用いて、前記未訳出文字列に対
する目的言語訳語候補を選択する訳語決定手段と 前記訳語候補共起強度を用いて、前記形態素解析手段か
ら得られる各単語分割候補に対する単語分割尤度を計算
し、該単語分割尤度を用いて、最尤となる単語分割候補
を選択して出力する単語分割選択手段と を有することを
特徴とする機械翻訳装置。
5. A machine translation device for automatically translating a source language sentence into a target language sentence, using a translation dictionary that holds a set of bilingual relations between a source language word and a target language translation candidate, and using the translation dictionary. A machine translation unit for automatically translating an input source language sentence into a target language machine translated sentence that is a sentence of a target language; and a non-translated character string for which a translation of the target language could not be extracted by the machine translation unit. Performs morphological analysis on the translation source detection means and the input source language sentence to determine word division candidates.
A morphological analysis means to generate, using the translation dictionary, as well as search for non-translation candidate word that is the target language translation candidates of the non translation strings, the
Using the translation dictionary, the source language words in the word segmentation candidates
Search for segmented word translation candidates that are target language translation candidates
A target language co-occurrence information database that holds a set of entries including word pairs of the target language and their co-occurrence frequency information, and the untranslated source word candidates using the target language co-occurrence information database. The target language translation candidate and the translation candidate
Target language including the word division candidates obtained by the synthesizing means
And a candidate word, and co-occurrence intensity detecting means for calculating a candidate word cooccurrence intensity of each target language translation candidate pairs composed of a set of the translation already target language words in the input source language sentence, the candidate word cooccurrence strength using the a translation determining means for selecting a target language translation candidates for non-translation strings, using said candidate word cooccurrence intensity, or the morphological analysis means
Calculate the word division likelihood for each word division candidate obtained from
Then, using the word division likelihood, the word division candidate having the maximum likelihood is obtained.
And a word division selection unit for selecting and outputting the machine translation device.
【請求項6】 前記訳語候補生成手段は、前記翻訳辞書
を用いて、前記未訳出文字列以外の入力原言語文中の単
語の目的言語訳語候補である訳出単語訳語候補を検索す
る手段を含み、 前記共起強度検出手段は、前記訳出単語訳語候補と前記
未訳出訳語候補中の目的言語訳語候補の組み合わせから
なる各目的言語訳語候補対の訳語候補共起強度も計算す
る手段を含み、 前記訳語決定手段は、前記訳語候補共起強度を用いて、
前記入力原言語文に対する目的言語訳語候補列を選択す
る手段を含むことを特徴とする請求項5記載の機械翻訳
装置。
6. The translated word candidate generation means includes means for searching for a translated word translated word candidate that is a target language translated word candidate of a word in an input source language sentence other than the untranslated character string, using the translation dictionary. The co-occurrence intensity detection means includes means for calculating translation word candidate co-occurrence strength of each target language translation word candidate pair consisting of a combination of the translation word translation word candidate and the target language translation word candidate in the untranslated translation word candidate, and the translation word The determining means uses the translation word candidate co-occurrence strength,
6. The machine translation apparatus according to claim 5, further comprising means for selecting a target language translation candidate string for the input source language sentence.
【請求項7】 前記訳語候補生成手段は、前記機械翻訳
手段で用いる前記翻訳辞書に加えて、原言語単語と目的
言語訳語候補の対訳関係の集合を保持する別の翻訳辞書
も参照して、前記原言語単語列中の各原言語単語の目的
言語訳語候補を検索する手段を含むことを特徴とする請
求項5又は6記載の機械翻訳装置。
7. The translation word candidate generation means refers to, in addition to the translation dictionary used by the machine translation means, another translation dictionary that holds a set of bilingual relations between source language words and target language translation word candidates, 7. The machine translation device according to claim 5, further comprising means for searching a target language translation candidate of each source language word in the source language word string.
【請求項8】 原言語の単語対とその共起頻度情報から
なるエントリの集合を保持する原言語共起情報データベ
ースを有し、前記共起強度検出手段は、前記原言語共起
情報データベースを用いて、原言語単語対に対する原言
語における共起強度をもとに、対応する訳語候補共起強
度に重み付けを行う手段を含むことを特徴とする請求項
5乃至7いずれか記載の機械翻訳装置。
8. A source language co-occurrence information database that holds a set of entries consisting of source language word pairs and their co-occurrence frequency information, wherein the co-occurrence strength detection means stores the source language co-occurrence information database. 8. The machine translation device according to claim 5, further comprising means for weighting the corresponding translation word candidate co-occurrence strength based on the co-occurrence strength in the source language for the source language word pair. .
【請求項9】 原言語単語と目的言語訳語候補の対訳関
係の集合を保持する翻訳辞書と、目的言語の単語対とそ
の共起頻度情報からなるエントリの集合を保持する目的
言語共起情報データベースと、機械翻訳手段と、未訳出
検出手段と、形態素解析手段と、訳語候補生成手段と、
共起強度検出手段と、訳語決定手段と、単語分割選択手
段とを構成するコンピュータを用いて、原言語文から目
的言語文へ自動的に翻訳を行う機械翻訳方法を実行させ
プログラムを記憶したコンピュータ読み取り可能な
体において、 前記プログラムはコンピュータに読み取られた際、この
コンピュータに、機械翻訳手段により、前記 翻訳辞書を用いて、入力原言
語文を目的言語の文である目的言語機械訳文へ自動的に
翻訳を行う機械翻訳ステップと、未訳出検出手段により、 前記機械翻訳ステップにおいて
目的言語の訳語を抽出できなかった未訳出文字列を検出
する未訳出検出ステップと、形態素解析手段により、前記入力原言語文の形態素解析
を行い、単語分割候補を生成する形態素解析ステップ
と、 訳語候補生成手段により、 前記翻訳辞書を用いて、前記
未訳出文字列の目的言語訳語候補である未訳出訳語候補
を検索するとともに、前記翻訳辞書を用いて、前記単語
分割候補中の原言語単語に対する目的言語訳語候補であ
る分割単語訳語候補を検索する訳語候補生成ステップ
と、共起強度検出手段により、前記 目的言語共起情報データ
ベースを用いて、前記未訳出訳語候補中の目的言語訳語
候補とともに前記訳語候補生成手段で得られる前記分割
単語訳語候補も含む目的言語訳語候補と、前記入力原言
語文中の訳出済みの目的言語単語との組で構成される各
目的言語訳語候補対の訳語候補共起強度を計算する共起
強度検出ステップと、訳語決定手段により、 前記訳語候補共起強度を用いて、
前記未訳出文字列に対する目的言語訳語候補を選択する
訳語決定ステップと 単語分割選択手段により、前記訳語候補共起強度を用い
て、前記形態素解析ステップから得られる各単語分割候
補に対する単語分割尤度を計算し、該単語分割尤度を用
いて、最尤となる単語分割候補を選択して出力する単語
分割選択ステップと を実行させるための機械翻訳プログ
ラムを記憶したコンピュータ読み取り可能な媒体。
9. A bilingual relationship between a source language word and a target language translation candidate.
A translation dictionary that holds a set of operators, word pairs in the target language, and
Purpose of holding a set of entries consisting of co-occurrence frequency information
Language co-occurrence information database, machine translation means, untranslated
Detection means, morpheme analysis means, translation word candidate generation means,
Co-occurrence intensity detection means, translation word determination means, word division selection
Using the computer that constitutes the stage , a machine translation method that automatically translates the source language sentence into the target language sentence is executed.
In a computer-readable medium storing a program, when the program is read by a computer, a machine translation means causes the computer to read the input source language sentence into a target language. A machine translation step for automatically translating into a target language machine translated sentence that is a sentence of, and an untranslated detection for detecting an untranslated character string for which a translated word of the target language could not be extracted in the machine translation step by the untranslated detection means. Morphological analysis of the input source language sentence by steps and morphological analysis means
Morphological analysis step for generating word division candidates
And a translation word candidate generation unit , using the translation dictionary, searches for an untranslated translation word candidate that is a target language translation word candidate of the untranslated character string, and uses the translation dictionary
It is a target language translation candidate for the source language word in the division candidate.
A candidate word generating step of retrieving the divided word translation candidates that, by co-occurrence intensity detecting means, using said target language co-occurrence information database, the at the candidate word generating means together with the target language translation candidates in non-translation candidate word The division obtained
A co-occurrence strength detection step of calculating a target word translation candidate co-occurrence strength of each target language translation candidate pair composed of a target language translation candidate including a word translation candidate and a translated target language word in the input source language sentence. And the translated word determining means using the translated word candidate co-occurrence strength,
Using the translation word candidate co-occurrence strength by a translation word determination step of selecting a target language translation word candidate for the untranslated character string, and a word division selection unit.
Each word segmentation obtained from the morphological analysis step
Compute the word division likelihood for the complement and use the word division likelihood
And output the selected word division candidate with the maximum likelihood.
A computer-readable medium that stores a machine translation program for executing the division selection step .
【請求項10】 請求項9記載の機械翻訳プログラムを
記憶したコンピュータ読み取り可能な媒体において、 前記プログラムはコンピュータに読み取られた際、この
コンピュータに、 前記訳語候補生成ステップにおいて、訳語候補生成手段
により、前記翻訳辞書を用いて、前記未訳出文字列以外
の入力原言語文中の単語の目的言語訳語候補である訳出
単語訳語候補を検索するステップを実行させ、 前記共起強度検出ステップにおいて、共起強度検出手段
により、前記訳出単語訳語候補と前記未訳出訳語候補中
の目的言語訳語候補の組み合わせからなる各目的言語訳
語候補対の訳語候補共起強度も計算するステップを実行
させ、 前記訳語決定ステップにおいて、訳語決定手段により、
前記訳語候補共起強度を用いて、前記入力原言語文に対
する目的言語訳語候補列を選択するステップを実行させ
ことを特徴とする機械翻訳プログラムを記憶したコン
ピュータ読み取り可能な媒体。
10. The machine translation program according to claim 9.
In a computer-readable medium stored, when the program is read by a computer,
In the translation word candidate generation step, the computer includes translation word candidate generation means.
By using the translation dictionary, the to execute the steps of searching the translation word candidate word is a target language translation candidates of words of the input source language sentence than non-translation strings, in the co-occurrence intensity detecting step, co Strength detection means
According to this, the step of calculating the translation candidate co-occurrence strength of each target language translation candidate pair composed of a combination of the translation word translation candidate and the target language translation candidate in the untranslated translation candidate is executed.
In the translation word determining step, the translation word determining means,
Using the translation candidate co-occurrence intensity, a step of selecting a target language translation candidate sequence for the input source language sentence is executed.
Con it stores a machine translation program characterized that
Computer-readable medium.
【請求項11】 請求項9又は10記載の機械翻訳プロ
グラムを記憶したコンピュータ読み取り可能な媒体にお
いて、 前記プログラムはコンピュータに読み取られた際、この
コンピュータに、 前記訳語候補生成ステップにおいて、訳語候補生成手段
により、前記機械翻訳ステップで用いる前記翻訳辞書に
加えて、原言語単語と目的言語訳語候補の対訳関係の集
合を保持する別の翻訳辞書も参照して、前記原言語単語
列中の各原言語単語の目的言語訳語候補を検索するステ
ップを実行させることを特徴とする機械翻訳プログラム
を記憶したコンピュータ読み取り可能な媒体。
11. A machine translation professional according to claim 9 or 10.
The computer-readable medium that stores the
And when the program is read by a computer,
In the translation word candidate generation step, the computer includes translation word candidate generation means.
Thus, in addition to the translation dictionary used in the machine translation step, another translation dictionary that holds a set of bilingual relations between source language words and target language translation word candidates is also referred to, and each source language in the source language word string is referenced. A computer-readable medium storing a machine translation program, characterized in that a step of searching for a target language translation candidate of a word is executed .
【請求項12】 請求項9乃至11いずれか記載の機械
翻訳プログラムを記憶したコンピュータ読み取り可能な
媒体において、 前記プログラムはコンピュータに読み取られた際、この
コンピュータに、 原言語の単語対とその共起頻度情報からなるエントリの
集合を保持する原言語共起情報データベースを含み、 前記共起強度検出ステップにおいて、共起強度検出手段
により、前記原言語共起情報データベースを用いて、原
言語単語対に対する原言語における共起強度をもとに、
対応する訳語候補共起強度に重み付けを行うステップを
実行させることを特徴とする請求項9乃至11いずれか
記載の機械翻訳プログラムを記憶したコンピュータ読み
取り可能な媒体。
12. A machine according to any one of claims 9 to 11.
Computer readable memory that stores the translation program
In the medium, when the program is read by a computer,
The computer includes a source language co-occurrence information database that holds a set of entries composed of source language word pairs and their co-occurrence frequency information, and in the co-occurrence strength detection step, co-occurrence strength detection means
By using the source language co-occurrence information database, based on the co-occurrence intensity in the original language of the original language word pairs,
Steps for weighting the corresponding translation candidate co-occurrence intensity
A computer readable program storing a machine translation program according to any one of claims 9 to 11, which is executed.
Removable media.
【請求項13】 原言語文から目的言語文へ自動的に翻
訳を行う機械翻訳プログラムを記憶したコンピュータ読
み取り可能な媒体において、 前記プログラムはコンピュータに読み取られた際、この
コンピュータを、 原言語単語と目的言語訳語候補の対訳関係の集合を保持
する翻訳辞書と、 該翻訳辞書を用いて、入力原言語文を目的言語の文であ
る目的言語機械訳文へ自動的に翻訳を行う機械翻訳手段
と、 前記機械翻訳手段において目的言語の訳語を抽出できな
かった未訳出文字列を検出する未訳出検出手段と、前記入力原言語文の形態素解析を行い、単語分割候補を
生成する形態素解析手段と、 前記翻訳辞書を用いて、前記未訳出文字列の目的言語訳
語候補である未訳出訳語候補を検索するとともに、前記
翻訳辞書を用いて、前記単語分割候補中の原言語単語に
対する目的言語訳語候補である分割単語訳語候補を検索
する訳語候補生成手段と、 目的言語の単語対とその共起頻度情報からなるエントリ
の集合を保持する目的言語共起情報データベースと、 該目的言語共起情報データベースを用いて、前記未訳出
訳語候補中の目的言語訳語候補とともに前記訳語候補生
成手段で得られる前記分割単語訳語候補も含む目的言語
訳語候補と、前記入力原言語文中の訳出済みの目的言語
単語との組で構成される各目的言語訳語候補対の訳語候
補共起強度を計算する共起強度検出手段と、 前記訳語候補共起強度を用いて、前記未訳出文字列に対
する目的言語訳語候補を選択する訳語決定手段と 前記訳語候補共起強度を用いて、前記形態素解析手段か
ら得られる各単語分割候補に対する単語分割尤度を計算
し、該単語分割尤度を用いて、最尤となる単語分割候補
を選択して出力する単語分割選択手段と して機能させる
ための機械翻訳プログラムを記憶したコンピュータ読み
取り可能な媒体。
13. A computer readable program storing a machine translation program for automatically translating a source language sentence into a target language sentence.
In a retrievable medium, when the program is read by a computer, the computer uses a translation dictionary that holds a set of bilingual relations of source language words and target language translation candidate and an input source using the translation dictionary. Machine translation means for automatically translating a language sentence into a target language machine translation which is a sentence of the target language, and an untranslated detection means for detecting an untranslated character string for which the translated word of the target language could not be extracted by the machine translation means. Morphological analysis of the input source language sentence and
Using the generated morphological analysis means and the translation dictionary, search for untranslated translation word candidates that are target language translation word candidates of the untranslated character string , and
Using the translation dictionary, the source language words in the word segmentation candidates
Search for segmented word translation candidates that are target language translation candidates
A target language co-occurrence information database that holds a set of entries consisting of target language word pairs and their co-occurrence frequency information; and the untranslated source word candidates using the target language co-occurrence information database. The target language translation candidate and the translation candidate
Target language including the word division candidates obtained by the synthesizing means
Co-occurrence strength detection means for calculating a target word candidate word co-occurrence strength of each target language target word candidate pair, which is composed of a set of a target word candidate and a translated target language word in the input source language sentence, and the target word candidate co-occurrence strength using the a translation determining means for selecting a target language translation candidates for non-translation strings, using said candidate word cooccurrence intensity, or the morphological analysis means
Calculate the word division likelihood for each word division candidate obtained from
Then, using the word division likelihood, the word division candidate having the maximum likelihood is obtained.
A computer-readable program that stores a machine translation program to function as a word division selection unit that selects and outputs
Removable media.
【請求項14】 請求項13記載の機械翻訳プログラム
を記憶したコンピュータ読み取り可能な媒体において、 前記プログラムはコンピュータに読み取られた際、この
コンピュータを、 前記訳語候補生成手段は、前記翻訳辞書を用いて、前記
未訳出文字列以外の入力原言語文中の単語の目的言語訳
語候補である訳出単語訳語候補を検索する手段を含む手
段として機能させ、 前記共起強度検出手段は、前記訳出単語訳語候補と前記
未訳出訳語候補中の目的言語訳語候補の組み合わせから
なる各目的言語訳語候補対の訳語候補共起強度も計算す
る手段を含む手段として機能させ、 前記訳語決定手段は、前記訳語候補共起強度を用いて、
前記入力原言語文に対する目的言語訳語候補列を選択す
る手段を含む手段として機能させる ための機械翻訳プロ
グラムを記憶したコンピュータ読み取り可能な媒体。
14. A machine translation program according to claim 13.
In a computer-readable medium storing the above, when the program is read by a computer,
The computer, the candidate word generating means, said using a translation dictionary, the non translation character of the input source language sentence other than a column means including hand to search for a translation word translation candidate is a target language translation candidates of the word
The co-occurrence strength detecting unit also calculates a translation candidate co-occurrence strength of each target language translation candidate pair consisting of a combination of the translation word translation candidate and the target language translation candidate among the untranslated translation word candidates. the to function as including means, said translation determining means, using said candidate word cooccurrence strength,
A computer-readable medium that stores a machine translation program for functioning as a unit including a unit that selects a target language translation candidate sequence for the input source language sentence.
【請求項15】 請求項13又は14記載の機械翻訳プ
ログラムを記憶したコンピュータ読み取り可能な媒体に
おいて、 前記プログラムはコンピュータに読み取られた際、この
コンピュータを、 前記訳語候補生成手段は、前記機械翻訳手段で用いる前
記翻訳辞書に加えて、原言語単語と目的言語訳語候補の
対訳関係の集合を保持する別の翻訳辞書も参照して、前
記原言語単語列中の各原言語単語の目的言語訳語候補を
検索する手段を含む手段として機能させる ための機械翻
訳プログラムを記憶したコンピュータ読み取り可能な
体。
15. The machine translation program according to claim 13 or 14.
A computer-readable medium that stores the program.
Oite, when said program is read by the computer, this
In addition to the translation dictionary used by the machine translation means, the translation word candidate generation means also refers to another translation dictionary that holds a set of bilingual relations of source language words and target language translation word candidates, in addition to the translation dictionary. A computer-readable medium storing a machine translation program for functioning as a means including a means for searching a target language translation candidate of each source language word in a language word string.
【請求項16】請求項13乃至15いずれか記載の機械
翻訳プログラムを記憶したコンピュータ読み取り可能な
媒体において、 前記プログラムはコンピュータに読み取られた際、この
コンピュータを、 原言語の単語対とその共起頻度情報からなるエントリの
集合を保持する原言語共起情報データベースとして機能
させ、 前記共起強度検出手段は、前記原言語共起情報データベ
ースを用いて、原言語単語対に対する原言語における共
起強度をもとに、対応する訳語候補共起強度に重み付け
を行う手段を含む手段として機能させる ための機械翻訳
プログラムを記憶したコンピュータ読み取り可能な
体。
16. A machine according to any one of claims 13 to 15.
Computer readable memory that stores the translation program
In the medium, when the program is read by a computer,
The computer functions as a source language co-occurrence information database that holds a set of entries consisting of source language word pairs and their co-occurrence frequency information.
Then , the co-occurrence strength detection means uses the source language co-occurrence information database to weight the corresponding translation candidate co-occurrence strength based on the co-occurrence strength in the source language for the source language word pair. A computer-readable medium storing a machine translation program for causing it to function as a means including.
JP06616699A 1999-03-12 1999-03-12 Machine translation method and apparatus, and medium storing machine translation program Expired - Fee Related JP3437782B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP06616699A JP3437782B2 (en) 1999-03-12 1999-03-12 Machine translation method and apparatus, and medium storing machine translation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP06616699A JP3437782B2 (en) 1999-03-12 1999-03-12 Machine translation method and apparatus, and medium storing machine translation program

Publications (2)

Publication Number Publication Date
JP2000259630A JP2000259630A (en) 2000-09-22
JP3437782B2 true JP3437782B2 (en) 2003-08-18

Family

ID=13308006

Family Applications (1)

Application Number Title Priority Date Filing Date
JP06616699A Expired - Fee Related JP3437782B2 (en) 1999-03-12 1999-03-12 Machine translation method and apparatus, and medium storing machine translation program

Country Status (1)

Country Link
JP (1) JP3437782B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5115239B2 (en) * 2008-03-03 2013-01-09 富士ゼロックス株式会社 Character processing device
JP5194920B2 (en) * 2008-03-24 2013-05-08 富士ゼロックス株式会社 Example sentence set-based translation device, method and program, and phrase translation device including the translation device
CN112085090A (en) * 2020-09-07 2020-12-15 百度在线网络技术(北京)有限公司 Translation method and device and electronic equipment

Also Published As

Publication number Publication date
JP2000259630A (en) 2000-09-22

Similar Documents

Publication Publication Date Title
JPH0242572A (en) Preparation/maintenance method for co-occurrence relation dictionary
JPH083815B2 (en) Natural language co-occurrence relation dictionary maintenance method
JPH05314166A (en) Electronic dictionary and dictionary retrieval device
KR100792203B1 (en) Apparatus and Method of Construction for Single Noun Korean-English Technical Word Dictionary Using Compound Noun's Target Word Notation in Patent Documents
JP3437782B2 (en) Machine translation method and apparatus, and medium storing machine translation program
JP2883153B2 (en) Keyword extraction device
JPH1139313A (en) Automatic document classification system, document classification oriented knowledge base creating method and record medium recording its program
JP2000285122A (en) Device and method for generating thesaurus and storage medium recording thesaurus generation program
JP2838984B2 (en) General-purpose reference device
JP2812511B2 (en) Keyword extraction device
JP4812811B2 (en) Machine translation apparatus and machine translation program
JP2006190226A (en) Declinable word automatic paraphrasing apparatus, declinable word paraphrasing method and declinable word paraphrasing processing program
JP3388393B2 (en) Translation device for tense, aspect or modality using database
JP4001605B2 (en) Translation pattern creation device
JP4417967B2 (en) Example database and example search system
JP4087829B2 (en) Valency dictionary expansion device, method, and program
JP2840258B2 (en) Method of creating bilingual dictionary and co-occurrence dictionary for machine translation system
JP3244286B2 (en) Translation processing device
JPH0561902A (en) Mechanical translation system
JP3197110B2 (en) Natural language analyzer and machine translator
JP3907106B2 (en) Translation rule creation device and program
JPH05225232A (en) Automatic text pre-editor
JPH11282839A (en) Machine translation system and computer readable recording medium recording machine translation processing program
JP5032453B2 (en) Machine translation apparatus and machine translation program
JPH0320866A (en) Text base retrieval system

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090606

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090606

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100606

Year of fee payment: 7

LAPS Cancellation because of no payment of annual fees