JP2002328920A

JP2002328920A - Method for associating words in parallel translation

Info

Publication number: JP2002328920A
Application number: JP2001136463A
Authority: JP
Inventors: Eiichiro Sumida; 英一郎隅田
Original assignee: ATR ONSEI GENGO TSUSHIN KENKYU; ATR Spoken Language Translation Research Laboratories
Current assignee: ATR ONSEI GENGO TSUSHIN KENKYU; ATR Spoken Language Translation Research Laboratories
Priority date: 2001-05-07
Filing date: 2001-05-07
Publication date: 2002-11-15

Abstract

PROBLEM TO BE SOLVED: To provide a method for associating words in parallel translation with high rate of reappearance and high precision. SOLUTION: With respect to a method for associating words in parallel translation consisting of a sentence in the original language and a corresponding sentence in the translation language, the present invention consists of the first step which computes a first word correspondence rate for each word to word correspondence between the sentence in the original language and the sentence in the translation language utilizing a bilingual dictionary for the two languages and the second step which finds a combination of word correspondence having maximum sum of the first word correspondence rate in the parallel translation.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、対訳文の単語対
応付け方法に関する。The present invention relates to a method for associating a bilingual sentence with a word.

【０００２】[0002]

【従来の技術】現在、機械翻訳システムの高精度化を目
指した開発が行なわれており、対象となる分野の対訳文
から翻訳辞書を自動的に作成する研究がさかんに行なわ
れている。2. Description of the Related Art At present, developments are being made to improve the accuracy of a machine translation system, and research is being actively conducted on automatically creating a translation dictionary from a bilingual sentence in a target field.

【０００３】ここでは、機械翻訳システム構築の自動化
の出発点である対訳文の単語対応付けに焦点を当てる。
対訳文の単語の対応付けは、原言語と目的言語との対訳
文において、各単語の訳を決定することである。この作
業において、以前に提案されていた方法（文献１〜７参
照）は、主に統計的情報に頼るものであったが、残念な
がら、頻度の低い単語には充分な証拠が得られないた
め、対応付けられる単語の割合（再現率）が低かった。[0003] Here, we focus on the word correspondence of the bilingual sentence, which is the starting point of the automation of the construction of the machine translation system.
The correspondence between words in the bilingual sentence is to determine the translation of each word in the bilingual sentence between the source language and the target language. In this work, the previously proposed methods (see references 1 to 7) have relied mainly on statistical information, but unfortunately there is not enough evidence for infrequent words. , The proportion (recall) of the associated words was low.

【０００４】文献１： Gale, W.A. and Church K.W.(19
91) A program for aligning sentences in bilingual
corpora. Proc.of 29th ACL,pp. 177-184. 文献２： Brown, P.F., Della Pietra, S.A., Della Pi
etra, V.J. and Merer, R.L.(1993) The mathematics o
f statistical machine Translation: Parameter Estim
ation, Computational Linguistics, 19, 2, pp. 263-3
111. 文献３： Dagan, I., Church, K.W., and Gale, W.A.(1
994) Robust bilingual word alignment for Machine-A
ided Translation, Proc.of 4th ANLP,pp.34-40.Reference 1: Gale, WA and Church KW (19
91) A program for aligning sentences in bilingual
corpora. Proc. of 29th ACL, pp. 177-184. Reference 2: Brown, PF, Della Pietra, SA, Della Pi
etra, VJ and Merer, RL (1993) The mathematics o
f statistical machine Translation: Parameter Estim
ation, Computational Linguistics, 19, 2, pp. 263-3
111. Reference 3: Dagan, I., Church, KW, and Gale, WA (1
994) Robust bilingual word alignment for Machine-A
ided Translation, Proc.of 4th ANLP, pp.34-40.

【０００５】文献４：Wu, D.(1994) Aligning a parall
el English-Chinese corpus statistically with lexic
al criteria, Proc. of 32th ACL, pp.80-87. 文献５：Smadja, F., MaKeown, K., and Hatzsivassilo
glou (1996) Translating collocations for bilingual
lexicons: A statistical approach, Computational L
inguistics, 21(4), pp.1-38. 文献６：Tanaka, T. and Matsuo, Y. (1999) Extractin
g Translation Equivalents from Non-parallel Corpor
a, Proc.of 8th TMI, pp.88-97. 文献７：Sumita, E.(2000) Word alignment using matr
ix, Proc. of 6th PRICAI,pp.821.Reference 4: Wu, D. (1994) Aligning a parall
el English-Chinese corpus statistically with lexic
al criteria, Proc. of 32th ACL, pp.80-87. Reference 5: Smadja, F., MaKeown, K., and Hatzsivassilo
glou (1996) Translating collocations for bilingual
lexicons: A statistical approach, Computational L
inguistics, 21 (4), pp.1-38. Reference 6: Tanaka, T. and Matsuo, Y. (1999) Extractin
g Translation Equivalents from Non-parallel Corpor
a, Proc. of 8th TMI, pp.88-97. Reference 7: Sumita, E. (2000) Word alignment using matr
ix, Proc. of 6th PRICAI, pp.821.

【０００６】[0006]

【発明が解決しようとする課題】この発明は、再現率が
高くかつ精度の高い、対訳文の単語対応付け方法を提供
することを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to provide a method for associating a bilingual sentence with a high recall and high accuracy.

【０００７】[0007]

【課題を解決するための手段】請求項１の発明は、原言
語文とそれに対応する目的言語文とからなる対訳文の単
語対応付け方法において、原言語文と目的言語文との間
における１単語対１単語の各単語対応毎に、両言語間の
２言語辞書を利用して、第１の単語対応度を算出する第
１ステップ、ならびに対訳文において、第１の単語対応
度の和が最大となる単語対応の組み合わせを求める第２
ステップを備えていることを特徴とする。According to a first aspect of the present invention, there is provided a method for associating a bilingual sentence with a source language sentence and a corresponding target language sentence with a word between the source language sentence and the target language sentence. In a first step of calculating a first word correspondence degree using a bilingual dictionary between both languages for each word correspondence of a word pair and one word, and in a bilingual sentence, the sum of the first word correspondence degrees is Second to find the maximum word correspondence combination
It is characterized by having steps.

【０００８】請求項２の発明は、請求項１の発明におい
て、任意の原言語の単語と任意の目的言語の単語との間
における第１の単語対応度は、上記目的言語の単語が、
２言語辞書において、上記原言語の単語の訳のうちのい
ずれかと一致するか否かに基づいて決定されることを特
徴とする。According to a second aspect of the present invention, in the first aspect of the present invention, the first word correspondence between a word in an arbitrary source language and a word in an arbitrary target language is as follows:
The bilingual dictionary is characterized in that it is determined based on whether or not it matches any of the translations of the words in the source language.

【０００９】請求項３の発明は、請求項１乃至２の発明
において、対訳文全体のうち第２ステップによって未だ
対応付けが行なわれていない部分に対して、両言語間の
１単語対１単語の各単語対応毎に、両言語それぞれに対
する類語辞書を利用して、第２の単語対応度を算出する
第３ステップ、ならびに対訳文全体のうち第２ステップ
によって未だ対応付けが行なわれていない部分おいて、
第２の単語対応度の和が最大となる単語対応の組み合わ
せを求める第４ステップを備えていることを特徴とす
る。According to a third aspect of the present invention, in the first and second aspects of the present invention, a part of the entire bilingual sentence that has not yet been correlated in the second step is one word to one word between the two languages. A third step of calculating a second degree of word correspondence by using a thesaurus for each of the two languages for each word correspondence, and a portion of the entire bilingual sentence that has not yet been associated by the second step And
The method is characterized by including a fourth step of finding a combination of word correspondences that maximizes the second word correspondence degree.

【００１０】請求項４の発明は、請求項３の発明におい
て、任意の原言語の単語と任意の目的言語の単語との間
における第２の単語対応度は、類語辞書により得られる
上記原言語の単語が属する意味クラスの集合と上記目的
言語の単語が属する意味クラスの集合とに基づいて算出
されることを特徴とする。According to a fourth aspect of the present invention, in the third aspect of the invention, the second word correspondence between a word in an arbitrary source language and a word in an arbitrary target language is obtained from the source language obtained by a thesaurus. Are calculated based on a set of semantic classes to which the word belongs and a set of semantic classes to which the word of the target language belongs.

【００１１】請求項５の発明は、請求項４の発明におい
て、任意の原言語の単語と任意の目的言語の単語との間
における第２の単語対応度は、類語辞書により得られる
上記原言語の単語が属する意味クラスの集合と上記目的
言語の単語が属する意味クラスの集合の和集合の大きさ
に対する、共通集合の大きさの比で表されることを特徴
とする。According to a fifth aspect of the present invention, in the invention of the fourth aspect, the second word correspondence between a word in an arbitrary source language and a word in an arbitrary target language is obtained by using the source language obtained by a thesaurus. And the ratio of the size of the common set to the size of the union of the set of semantic classes to which the word belongs and the set of semantic classes to which the words of the target language belong.

【００１２】請求項６の発明は、請求項３乃至５の発明
において、対訳文全体のうち第２ステップおよび第４ス
テップによって未だ対応付けが行なわれていない部分に
対して、両言語間の１単語対１単語の各単語対応毎に、
両言語間の品詞の対応を表す品詞テーブルを利用して、
第３の単語対応度を算出する第５ステップ、ならびに対
訳文全体のうち第２ステップおよび第４ステップによっ
て未だ対応付けが行なわれていない部分おいて、第３の
単語対応度の和が最大となる単語対応の組み合わせを求
める第６ステップを備えていることを特徴とする。According to a sixth aspect of the present invention, in the third to fifth aspects of the present invention, a portion of the entire bilingual sentence that has not been matched by the second step and the fourth step is set to one language between the two languages. For each word-to-word correspondence,
Using a part-of-speech table that represents the correspondence between parts of speech in both languages,
The fifth step of calculating the third word correspondence degree, and the sum of the third word correspondence degrees is the maximum in the part of the whole bilingual sentence that has not yet been associated by the second step and the fourth step. And a sixth step of obtaining a combination of word correspondences.

【００１３】請求項７の発明は、請求項６の発明におい
て、任意の原言語の単語と任意の目的言語の単語との間
における第３の単語対応度は、上記目的言語の単語の品
詞が、品詞テーブルにおいて、上記原言語の単語の品詞
に対応する品詞のうちのいずれかと一致するか否かに基
づいて決定されることを特徴とする。According to a seventh aspect of the present invention, in the invention of the sixth aspect, the third word correspondence between a word in an arbitrary source language and a word in an arbitrary target language is such that the part of speech of the word in the target language is In the part-of-speech table, the determination is made based on whether or not the part-of-speech corresponding to the part of speech of the source language word matches.

【００１４】請求項８の発明は、請求項６乃至７の発明
において、第２ステップ、第４ステップおよび第６ステ
ップにおいて部分的に対応された対訳文において、２言
語間の合成語辞書を用いて、１単語対合成語または合成
語対合成語の対応付けを行なう第７ステップを備えてい
ることを特徴とする。According to an eighth aspect of the present invention, in the sixth to seventh aspects, a bilingual sentence partially corresponding in the second, fourth and sixth steps uses a compound word dictionary between two languages. A seventh step of associating one word-to-synthetic word or a compound word-to-synthetic word.

【００１５】請求項９の発明は、請求項８の発明におい
て、第２ステップ、第４ステップ、第６ステップおよび
第７ステップにおいて部分的に対応された対訳文におい
て、２言語間の合成語の品詞の対応を表す合成語品詞テ
ーブルを用いて、１単語対合成語または合成語対合成語
の対応付けを行なう第８ステップを備えていることを特
徴とする。According to a ninth aspect of the present invention, the bilingual sentence partially corresponded in the second step, the fourth step, the sixth step, and the seventh step in the translation of the composite word between the two languages. The present invention is characterized in that an eighth step of associating one word-to-synthesis word or a synthetic word-to-synthesis word using a synthetic word part-of-speech table representing the correspondence between parts of speech is provided.

【００１６】請求項１０の発明は、請求項９の発明にお
いて、対訳文を構成する原言語文と目的言語文の語順が
ほぼ同じであると仮定して、対訳文全体のうち第２ステ
ップ、第４ステップ、第６ステップ、第７ステップおよ
び第８ステップによって未だ対応付けが行なわれていな
い部分において、位置的に対応するものどうしを対応付
ける第９ステップを備えていることを特徴とする。According to a tenth aspect of the present invention, in the ninth aspect, it is assumed that the word order of the source language sentence and the target language sentence constituting the bilingual sentence is substantially the same, and In a portion that has not yet been associated in the fourth, sixth, seventh, and eighth steps, a ninth step is provided for associating positional correspondences with each other.

【００１７】請求項１１の発明は、請求項１乃至２の発
明において、第２ステップにおいて部分的に対応された
対訳文において、２言語間の合成語辞書を用いて、１単
語対合成語または合成語対合成語の対応付けを行なう第
１０ステップを備えていることを特徴とする。According to an eleventh aspect of the present invention, in the first and second aspects of the present invention, in the bilingual sentence partially corresponded in the second step, a one word pair compound word or A tenth step of associating a compound word with a compound word is provided.

【００１８】請求項１２の発明は、請求項１１の発明に
おいて、対訳文を構成する原言語文と目的言語文の語順
がほぼ同じであると仮定して、対訳文全体のうち第２ス
テップおよび第１０ステップによって未だ対応付けが行
なわれていない部分において、位置的に対応するものど
うしを対応付ける第１１ステップを備えていることを特
徴とする。According to a twelfth aspect of the present invention, in the eleventh aspect, assuming that the word order of the source language sentence and the target language sentence constituting the bilingual sentence is substantially the same, the second step and the An eleventh step of associating a positionally corresponding part with a part which has not yet been associated in the tenth step is characterized in that:

【００１９】請求項１３の発明は、請求項１１の発明に
おいて、第２ステップおよび第１０ステップにおいて部
分的に対応された対訳文において、２言語間の合成語の
品詞の対応を表す合成語品詞テーブルを用いて、１単語
対合成語または合成語対合成語の対応付けを行なう第１
２ステップを備えていることを特徴とする。According to a thirteenth aspect of the present invention, in the invention of the eleventh aspect, in the bilingual sentence partially corresponded in the second step and the tenth step, a compound word part of speech indicating a correspondence of a part of speech of a compound word between two languages. A first method for associating one word to a compound word or a compound word to a compound word using a table
It is characterized by having two steps.

【００２０】請求項１４の発明は、請求項１３の発明に
おいて、対訳文を構成する原言語文と目的言語文の語順
がほぼ同じであると仮定して、対訳文全体のうち第２ス
テップ、第１０ステップおよび第１２ステップによって
未だ対応付けが行なわれていない部分において、位置的
に対応するものどうしを対応付ける第１３ステップを備
えていることを特徴とする。According to a fourteenth aspect of the present invention, assuming that the word order of the source language sentence and the target language sentence constituting the bilingual sentence is substantially the same, the second step of In a portion which has not yet been associated by the tenth step and the twelfth step, a thirteenth step of associating positional correspondences is provided.

【００２１】[0021]

【発明の実施の形態】〔１〕本発明による対訳文の単語
対応付け方法の基本的な考え方の説明DESCRIPTION OF THE PREFERRED EMBODIMENTS [1] Explanation of the basic concept of a method of associating a bilingual sentence with a word according to the present invention

【００２２】最近、この低再現率という問題を克服する
ために、文献８では、２言語の類語辞書を利用すること
が提案されている。また、文献９では、辞書と幾つかの
言語学的観点の２言語間に存在する類似性を用いること
が提案されている。Recently, in order to overcome the problem of low recall, Document 8 proposes to use a bilingual thesaurus. Document 9 proposes to use similarities that exist between a dictionary and two languages from several linguistic viewpoints.

【００２３】文献８： Ker, J., S. and Chang, S. J.
(1997) A Class-Based Approach ToWord Alignment, Co
mputational linguistics, 23, 2 pp. 313-343. 文献９： Huang, J. and Choi, K. (2000) Chinese-Kor
ean Word-Alignment Based on Linguistic Comparison,
Proc.of 38th ACL,pp. 392-399.Reference 8: Ker, J., S. and Chang, SJ
(1997) A Class-Based Approach ToWord Alignment, Co
Mputational linguistics, 23, 2 pp. 313-343. Reference 9: Huang, J. and Choi, K. (2000) Chinese-Kor
ean Word-Alignment Based on Linguistic Comparison,
Proc. Of 38th ACL, pp. 392-399.

【００２４】我々は、この線に沿って、単語、意味クラ
スおよび品詞（POS; part of speech)の語彙知識を利用
する。我々は、まず、原言語と目的言語との間の２言語
辞書、両言語それぞれに対する類語辞書、両言語間の品
詞の対応を表す品詞テーブルを用いて、原言語と目的言
語との対訳文において１語対１語毎の単語対応度を表す
スコアを計算する。そして、ダイナミック・プログラミ
ングを用いて、対訳文において、スコアの和が最大とな
る単語対応の組み合わせを探索する。We utilize vocabulary knowledge of words, semantic classes and part of speech (POS) along this line. First, we use a bilingual dictionary between the source language and the target language, a synonym dictionary for each language, and a part-of-speech table indicating the correspondence between the parts of speech between the two languages. A score representing the degree of word correspondence for each word is calculated. Then, using dynamic programming, a search is made for a word-correspondence combination that maximizes the sum of scores in the bilingual sentence.

【００２５】次に、２言語間の合成語辞書および両言語
間の合成語の品詞の対応を表す合成語品詞テーブルに基
づいて、合成語（熟語、成句）を見つけることによっ
て、その単語対応を修正する。なお、２言語辞書、類語
辞書、合成語辞書、品詞テーブル、合成語品詞テーブル
としては、電子化（データベース化）されたものが用い
られる。Next, based on a compound word dictionary between two languages and a compound word part-of-speech table showing the correspondence between the parts of speech of the compound words between the two languages, a compound word (idiom, phrase) is found to find the word correspondence. Fix it. As the bilingual dictionary, thesaurus, the compound word dictionary, the part-of-speech table, and the compound part-of-speech table, computerized (database) parts are used.

【００２６】我々の方法は、互いに似ていると考えられ
る２言語の対訳文、たとえば、日本語と韓国語との対訳
文に対して好適であるが、韓国語、日本語、ウルグアイ
語、トルコ語、モンゴル語等のアルタイ語族の対訳文に
限らず、他の言語族において密接に関連した言語の対訳
文、たとえばチェコ語、スロバキア語、ポーランド語等
のスラブ語族の対訳文、イタリア語、スペイン語等のイ
タリック語族の対訳文に対しても、扱うことが可能であ
る。Our method is suitable for bilingual sentences that are considered to be similar to each other, for example, bilingual sentences between Japanese and Korean, but are suitable for Korean, Japanese, Uruguay, Turkish Bilingual translations of languages closely related to other languages, not limited to Altai languages such as Arabic, Mongolian, etc. It can also handle bilingual sentences in Italic languages such as words.

【００２７】〔２〕発明の実施の形態の説明〔２−１〕全体的な処理手順の説明[2] Description of Embodiment of the Invention [2-1] Description of Overall Processing Procedure

【００２８】図１は、対訳文の単語対応付け方法の全体
的な処理手順を示している。FIG. 1 shows an overall processing procedure of a method of associating a bilingual sentence with a word.

【００２９】ここでは、原言語が韓国語であり、目的言
語が日本語である場合を例にとって、対訳文の単語対応
付け方法について説明する。Here, a method for associating a bilingual sentence with words will be described, taking as an example a case where the source language is Korean and the target language is Japanese.

【００３０】対訳文のうちの原言語の文は、単語に分割
され、各単語に品詞付けが行なわれる（ステップ１）。
同様に、対訳文のうちの目的原言語の文も、単語に分割
され、各単語に品詞付けが行なわれる（ステップ２）。The source language sentence in the bilingual sentence is divided into words, and each word is given a part of speech (step 1).
Similarly, the sentence of the target source language in the bilingual sentence is also divided into words, and each word is given a part of speech (step 2).

【００３１】次に、原言語と目的言語との間の２言語辞
書、両言語それぞれに対する類語辞書および両言語間の
品詞の対応を表す品詞テーブルを用いて、１単語対１単
語の単語対応付け処理が行なわれる（ステップ３）。Next, using a bilingual dictionary between the source language and the target language, a synonym dictionary for each of the two languages, and a part-of-speech table representing the correspondence between the parts of speech between the two languages, one-to-one word correspondence. Processing is performed (step 3).

【００３２】この後、両言語間の合成語辞書および両言
語間の合成語の品詞の対応を表す合成語品詞テーブルを
用いて、合成語の対応付け処理が行なわれる（ステップ
４）。最後に、未対応単語に対する対応付け処理が行な
われる（ステップ５）。Thereafter, the process of associating the synthesized words is performed using the synthesized word dictionary between the two languages and the synthesized word part-of-speech table indicating the correspondence between the parts of speech of the synthesized words between the two languages (step 4). Finally, an associating process is performed on the uncorresponding word (step 5).

【００３３】〔２−２〕１単語対１単語の単語対応付け
処理の説明１単語対１単語の単語対応付け処理に用いられるスコア
について説明する。スコアには、WORD＿SCORE （第１の
単語対応度）、SEM ＿SCORE （第２の単語対応度）およ
びPOS ＿SCORE （第３の単語対応度）の３種類がある。[2-2] Description of Word-to-One-Word Word Associating Process The score used in the one-word to one-word word associating process will be described. There are three types of scores: WORD_SCORE (first word correspondence), SEM_SCORE (second word correspondence), and POS_SCORE (third word correspondence).

【００３４】(i) WORD＿SCORE 原言語と目的言語との間の２言語辞書は、原言語の単語
とその可能な訳のリストとのペアの集合であると見做す
ことができる。２言語辞書において、目的言語の単語
（ターゲットワード）ｊが、原言語の単語（ソースワー
ド）ｋの訳のうちの１つと等しい場合には、当該ソース
ワードｋと当該ターゲットワードｊとの間のWORD＿SCOR
E (k,j) の値として１を割り当てる。そうでない場合に
はWORD＿SCORE の値として０を割り当てる。(I) WORD_SCORE A bilingual dictionary between a source language and a target language can be regarded as a set of pairs of a source language word and a list of possible translations thereof. In the bilingual dictionary, if the word (target word) j of the target language is equal to one of the translations of the word (source word) k of the source language, the word between the source word k and the target word j is determined. WORD_SCOR
Assign 1 as the value of E (k, j). Otherwise, assign 0 as the value of WORD_SCORE.

【００３５】(ii) SEM＿SCORE ここでは、原言語の類語辞書と目的言語の類語辞書とが
共に同じ階層を持ち、各単語はその意味に従った意味ク
ラスにそれぞれ属すると仮定する。言い換えれば、ある
単語が多義であれば、その単語は複数の意味クラスに属
する。単語ｘが属する意味クラスの集合を、SEM(x)で表
す。(Ii) SEM_SCORE Here, it is assumed that the synonym dictionary of the source language and the synonym dictionary of the target language both have the same hierarchy, and that each word belongs to a semantic class according to its meaning. In other words, if a word is ambiguous, it belongs to more than one semantic class. A set of semantic classes to which the word x belongs is represented by SEM (x).

【００３６】SEM＿SCORE の算出には、たとえば、情報
検索において文書間の類似性を算出するときによく用い
られるTanimoto係数( 文献１０参照）が利用できる。Ta
nimoto係数は、Jaccard 係数とも呼ばれる。Tanimoto係
数は、要素（意味クラス）の総数に対する共通の要素
（意味クラス）の数の比率であり、０から１までの間の
値をとる。For the calculation of SEM_SCORE, for example, a Tanimoto coefficient (see Reference 10) often used when calculating similarity between documents in information retrieval can be used. Ta
The nimoto coefficient is also called the Jaccard coefficient. The Tanimoto coefficient is a ratio of the number of common elements (semantic classes) to the total number of elements (semantic classes), and takes a value between 0 and 1.

【００３７】文献１０：Kohonen,T. (1989) Self-organ
ization and Associative Memory,Springer-Verlag.Reference 10: Kohonen, T. (1989) Self-organ
ization and Associative Memory, Springer-Verlag.

【００３８】ソースワードｋとターゲットワードｊとの
間の SEM＿SCORE (k,j) は、次式（１）で示すように、
ソースワードｋの意味クラスの集合SEM(k)とターゲット
ワードｊの意味クラスの集合SEM(j)との和集合の大きさ
に対する、両意味クラスの集合SEM(k)，SEM(j)の共通集
合の大きさの比で表される。SEM_SCORE (k, j) between a source word k and a target word j is expressed by the following equation (1).
A set of semantic classes SEM (k) and SEM (j) common to the size of the union of the semantic class set SEM (k) of the source word k and the semantic class set SEM (j) of the target word j Expressed as a ratio of the size of the set.

【００３９】[0039]

【数１】 (Equation 1)

【００４０】なお、 SEM＿SCORE (k,j) を、以下の式
（２）〜（５）のうちの何れの式に基づいて、算出する
ようにしてもよい。It should be noted that SEM_SCORE (k, j) may be calculated based on any of the following equations (2) to (5).

【００４１】[0041]

【数２】 (Equation 2)

【００４２】(iii) POS ＿SCORE まず、両言語間の品詞(POS:Part-Of-Speech)の対応を表
す品詞テーブルについて説明する。この品詞テーブル
は、ソースワードの品詞と、その可能な訳の品詞との対
応を表すテーブルである。(Iii) POS_SCORE First, a part-of-speech table indicating the correspondence of part-of-speech (POS: Part-Of-Speech) between the two languages will be described. This part-of-speech table is a table showing the correspondence between the part-of-speech of the source word and the part-of-speech of the possible translation.

【００４３】表１は、実験によって作成した韓国語と日
本語に対する品詞テーブルの一部を示している。Table 1 shows a part of a part of speech table for Korean and Japanese created by experiments.

【００４４】[0044]

【表１】 [Table 1]

【００４５】例えば、表１の第１行目に示すように、韓
国語の普通名詞(Common Noun) は、日本語では普通名詞
(Common Noun) または副詞(Adverb)に訳される。なお、
単語は、内容語（名詞、動詞、形容詞、副詞等の単語）
と、機能語（助詞、助動詞等の単語）とに大別される
が、品詞テーブルには、機能語が除かれている。この理
由は、機能語の対応は、その単語自体によるところが大
きいので、品詞レベルで機能語を扱うのは不適切である
からである。For example, as shown in the first row of Table 1, a Korean common noun (Common Noun) is a common noun in Japanese.
(Common Noun) or Adverb. In addition,
Words are content words (nouns, verbs, adjectives, adverbs, etc.)
And functional words (words such as particles and auxiliary verbs), but the functional words are excluded from the part-of-speech table. The reason for this is that the correspondence of the function words largely depends on the words themselves, so that it is inappropriate to handle the function words at the part of speech level.

【００４６】品詞テーブルにおいて、ターゲットワード
ｊの品詞が、ソースワードｋの品詞に対応する品詞のう
ちの１つと一致する場合には、当該ソースワードｋと当
該ターゲットワードｊとの間のPOS ＿SCORE (k,j) の値
として１を割り当てる。そうでない場合にはPOS ＿SCOR
E の値として０を割り当てる。In the part of speech table, when the part of speech of the target word j matches one of the parts of speech corresponding to the part of speech of the source word k, POS_SCORE ( k, j) is assigned 1. Otherwise, POS_SCOR
Assign 0 as the value of E.

【００４７】図２は、１単語対１単語の単語対応付け処
理の手順を示している。FIG. 2 shows the procedure of a word-to-word word correspondence process.

【００４８】１単語対１単語の単語対応付け処理では、
まず、原言語と目的言語との間の２言語辞書を利用した
単語対応付け処理、つまり、WORD＿SCORE を用いた単語
対応付け処理が行なわれる（ステップ１１）。In the one-to-one word association process,
First, a word association process using a bilingual dictionary between the source language and the target language, that is, a word association process using WORD_SCORE is performed (step 11).

【００４９】次に、ステップ１１の処理によって対応付
けが行なわれていない単語に対して、両言語それぞれに
対する類語辞書を利用した単語対応付け処理、つまり、
SEM＿SCORE を用いた単語対応付け処理が行なわれる
（ステップ１２）。Next, for words that have not been associated by the processing in step 11, word association processing using a thesaurus for each of the two languages, that is,
A word association process using SEM_SCORE is performed (step 12).

【００５０】最後に、ステップ１１およびステップ１２
の処理によって対応付けが行なわれていない単語に対し
て、両言語間の品詞の対応を表す品詞テーブルを利用し
た単語対応付け処理、つまり、POS ＿SCORE を用いた単
語対応付け処理が行なわれる（ステップ１３）。Finally, step 11 and step 12
Is performed by using the part-of-speech table indicating the correspondence between parts of speech between the two languages, that is, the word association processing using POS_SCORE is performed on the word that has not been associated by the processing of step (step S1). 13).

【００５１】ステップ１１、１２および１３において利
用されるスコアの種類は異なっているが、単語を対応付
けるためのアルゴリズムとしては同じものが用いられる
ので、その共通のアルゴリズムについて説明する。Although the types of scores used in steps 11, 12 and 13 are different, the same algorithm is used for associating words, so the common algorithm will be described.

【００５２】我々は、日本語と韓国語のように密接に関
係した言語においては、基本的な語順はほぼ同じである
と仮定する。形態素解析された１組の対訳文が与えられ
ているとする。対訳文は、ｍ個の単語からなる原言語の
文（ソースセンテンス）Ｋ＝＜ｋ₁，ｋ₂，ｋ₃，…ｋ
_m＞と、ｎ個の単語からなる目的言語の文（ターゲット
センテンス）Ｊ＝＜ｊ₁，ｊ₂，ｊ₃，…ｊ_n＞とから
構成されているものとする。We assume that in languages closely related, such as Japanese and Korean, the basic word order is almost the same. It is assumed that a set of bilingual sentences subjected to morphological analysis is given. The bilingual sentence is a source language sentence (source sentence) consisting of m words K = <k ₁ , k ₂ , k ₃ ,.
_m > and a target language sentence (target sentence) J = <j ₁ , j ₂ , j ₃ ,... j _n > composed of _n words.

【００５３】まず、ｍ＊ｎのマトリックスｃを作成す
る。その要素ｃ[ ｘ，ｙ] は、次式（６）で示すよう
に、部分列＜ｋ₁，ｋ₂，ｋ₃，…ｋ_x＞＜ｊ₁，
ｊ₂，ｊ₃，…ｊ_y＞の部分スコアである。First, an m * n matrix c is created. The element c [x, y] has a subsequence <k ₁ , k ₂ , k ₃ ,... K _x ><j ₁ , as shown in the following equation (6).
j ₂ , j ₃ ,... j _y >.

【００５４】[0054]

【数３】 (Equation 3)

【００５５】図３は、Ｋ＝＜Ａ，Ｂ，Ｙ，Ｅ＞、Ｊ＝＜
ａ，ｂ，ｃ，ｄ，ｅ＞である場合の、４×５のマトリッ
クスの各要素ｃ[ ｘ，ｙ] の値を示している。ここ
で、”ａ”は”Ａ”と、”ｂ”は”Ｂ”と、”ｅ”は”
Ｅ”と対応している。FIG. 3 shows that K = <A, B, Y, E> and J = <
It shows the value of each element c [x, y] of the 4 × 5 matrix when a, b, c, d, e>. Here, “a” is “A”, “b” is “B”, and “e” is “
E ".

【００５６】次に、図４に示すように、総スコアが最大
となる経路を探索する。そして、図５に示すように、探
索した経路に基づいて、単語を対応付ける。Next, as shown in FIG. 4, a route having the maximum total score is searched. Then, as shown in FIG. 5, words are associated based on the searched route.

【００５７】図６は、上記の単語を対応付けるためのア
ルゴリズムを示している。FIG. 6 shows an algorithm for associating the above words.

【００５８】まず、変数ｍにソースセンテンスＫの長さ
（ソースセンテンスＫを構成する単語の数）を設定する
とともに、変数ｎにターゲットセンテンスＪの長さ（タ
ーゲットセンテンスＪを構成する単語の数）を設定する
（ステップ１０１）。First, the length of the source sentence K (the number of words forming the source sentence K) is set in a variable m, and the length of the target sentence J (the number of words forming the target sentence J) is set in a variable n. Is set (step 101).

【００５９】変数ｘを１からｍまでの値として、要素ｃ
[ ｘ，０] の値を０とする。つまり、まず、ｘ＝１とし
（ステップ１０２）、ｃ[ ｘ，０] ＝０とする（ステッ
プ１０３）。ｘがｍ以上であるか否かを判定し（ステッ
プ１０４）、ｘがｍより小さければ、ｘの値を１だけイ
ンクリメント（ｘ＝ｘ＋１）した後（ステップ１０
５）、ステップ１０３に移行する。このようにして、ス
テップ１０３、１０４、１０５の処理を繰り返し行な
う。When the variable x is a value from 1 to m, the element c
Let the value of [x, 0] be 0. That is, first, x = 1 (step 102) and c [x, 0] = 0 (step 103). It is determined whether or not x is greater than or equal to m (step 104). If x is smaller than m, the value of x is incremented by 1 (x = x + 1) (step 10).
5), go to step 103; Thus, the processing of steps 103, 104 and 105 is repeatedly performed.

【００６０】そして、ステップ１０４において、ｘがｍ
以上であると判定されると、変数ｙを０からｎまでの値
として、要素ｃ[ ０，ｙ] の値を０とする。つまり、ま
ず、ｙ＝０とし（ステップ１０６）、ｃ[ ０，ｙ] ＝０
とする（ステップ１０７）。ｙがｎ以上であるか否かを
判定し（ステップ１０８）、ｙがｎより小さければ、ｙ
の値を１だけインクリメント（ｙ＝ｙ＋１）した後（ス
テップ１０９）、ステップ１０７に移行する。このよう
にして、ステップ１０７、１０８、１０９の処理を繰り
返し行なう。Then, in step 104, x is m
If it is determined that this is the case, the variable y is set to a value from 0 to n, and the value of the element c [0, y] is set to 0. That is, first, y = 0 (step 106), and c [0, y] = 0.
(Step 107). It is determined whether or not y is greater than or equal to n (step 108). If y is smaller than n, y is determined.
Is incremented by one (y = y + 1) (step 109), and the process proceeds to step 107. Thus, the processing of steps 107, 108, and 109 is repeatedly performed.

【００６１】そして、ステップ１０８において、ｙがｎ
以上であると判定されると、ｘ＝１，２，…，ｍ、ｙ＝
１，２，…，ｎの範囲で、[ ｘ，ｙ] の各組み合わせに
対して、部分スコアｃ[ ｘ，ｙ] と、経路を示す矢印の
種類ｂ[ ｘ，ｙ] とを求めるための処理を行なう。Then, in step 108, y becomes n
If it is determined that the above is satisfied, x = 1, 2,..., M, y =
For each combination of [x, y] in the range of 1, 2,..., N, a partial score c [x, y] and a type of arrow b [x, y] indicating a route are obtained. Perform processing.

【００６２】つまり、まず、ｘ＝１とするとともに（ス
テップ１１０）、ｙ＝１とする（ステップ１１１）。そ
して、SCORE(K[x],J[y])＞０という第１条件を満たして
いるか否かを判定する（ステップ１１２）。第１条件を
満たしていれば、要素ｃ[ ｘ，ｙ] の値を｛ｃ[ ｘ−
１，ｙ−１] ＋SCORE(K[x],J[y])｝に設定するととも
に、ｂ[ ｘ，ｙ] の値を斜め矢印（diagonal arrow) を
表す”１１”に設定する（ステップ１１３）。そして、
ステップ１１７に進む。That is, first, x = 1 (step 110), and y = 1 (step 111). Then, it is determined whether the first condition of SCORE (K [x], J [y])> 0 is satisfied (step 112). If the first condition is satisfied, the value of the element c [x, y] is changed to ｛c [x−
1, y−1] + SCORE (K [x], J [y])}, and the value of b [x, y] is set to “11” representing a diagonal arrow (step 113). ). And
Proceed to step 117.

【００６３】上記ステップ１１２において、第１条件を
満たしていない場合には、ｃ[ ｘ−１，ｙ] ≧ｃ[ ｘ，
ｙ−１] という第２条件を満たしているか否かを判定す
る（ステップ１１４）。第２条件を満たしていれば、要
素ｃ[ ｘ，ｙ] の値を｛ｃ[ｘ−１，ｙ] ｝に設定する
とともに、ｂ[ ｘ，ｙ] の値を上向き矢印（upper arro
w)を表す”０１”に設定する（ステップ１１５）。そし
て、ステップ１１７に進む。If it is determined in step 112 that the first condition is not satisfied, c [x−1, y] ≧ c [x,
y-1] is determined (step 114). If the second condition is satisfied, the value of the element c [x, y] is set to {c [x-1, y]}, and the value of b [x, y] is set to the upward arrow (upper arro
w) is set to "01" (step 115). Then, the process proceeds to step 117.

【００６４】上記ステップ１１４において、第２条件も
満たしていない場合には、要素ｃ[ｘ，ｙ] の値を｛ｃ
[ ｘ，ｙ−１] ｝に設定するとともに、ｂ[ ｘ，ｙ] を
左向き矢印（left arrow) を表す”１０”に設定する
（ステップ１１６）。そして、ステップ１１７に進む。In step 114, if the second condition is not satisfied, the value of the element c [x, y] is changed to {c
[x, y-1]}, and b [x, y] is set to "10" representing a left arrow (step 116). Then, the process proceeds to step 117.

【００６５】ステップ１１７では、ｙがｎ以上であるか
否かを判定し、ｙがｎより小さければ、ｙの値を１だけ
インクリメント（ｙ＝ｙ＋１）した後（ステップ１１
８）、ステップ１１２に移行する。このようにして、ス
テップ１１２〜１１８の処理を繰り返し行なう。At step 117, it is determined whether or not y is not smaller than n. If y is smaller than n, the value of y is incremented by 1 (y = y + 1) (step 11).
8) Go to step 112. Thus, the processing of steps 112 to 118 is repeatedly performed.

【００６６】そして、ステップ１１７において、ｙがｎ
以上であると判定されると、ｘがｍ以上であるか否かを
判定し（ステップ１１９）、ｘがｍより小さければ、ｘ
の値を１だけインクリメント（ｘ＝ｘ＋１）した後（ス
テップ１２０）、ステップ１１１に移行する。そして、
ステップ１１１〜１２０の処理を繰り返し行なう。Then, in step 117, y becomes n
If it is determined that it is not less than x, it is determined whether or not x is not less than m (step 119).
Is incremented by 1 (x = x + 1) (step 120), and the process proceeds to step 111. And
Steps 111 to 120 are repeatedly performed.

【００６７】このようにして、ステップ１１９におい
て、ｘがｍ以上であると判定されると、今回の処理を終
了する。そして、総スコア最大の経路をｂ[ ｍ，ｎ] か
ら行列ｂをたどって出力する。ｂ[ ｘ，ｙ] が”１１”
（斜め矢印）である単語K[x]と単語J[y]とが対応する。In this way, if it is determined in step 119 that x is not less than m, the current processing is terminated. Then, the route having the maximum total score is output by following the matrix b from b [m, n]. b [x, y] is "11"
The word K [x], which is a (diagonal arrow), corresponds to the word J [y].

【００６８】図２のステップ１１のWORD＿SCORE を用い
た単語対応付け処理では、上記アルゴリズムを用いて、
単語対応付け処理が行なわれる。この際、SCORE(K[x],J
[y])の値としてWORD＿SCORE が用いられる。In the word matching process using WORD_SCORE in step 11 of FIG.
A word association process is performed. At this time, SCORE (K [x], J
WORD_SCORE is used as the value of [y]).

【００６９】図２のステップ１２の SEM＿SCORE を用い
た単語対応付け処理では、ステップ１１の処理によって
対応付けが行なわれた単語が除去されたＫとＪとに対し
て、上記アルゴリズムが適用される。この際、SCORE(K
[x],J[y])の値として SEM＿SCORE が用いられる。In the word associating process using SEM_SCORE in step 12 in FIG. 2, the above algorithm is applied to K and J from which the words associated by step 11 have been removed. At this time, SCORE (K
SEM_SCORE is used as the value of [x], J [y]).

【００７０】図２のステップ１３のPOS ＿SCORE を用い
た単語対応付け処理では、ステップ１１及びステップ１
２の処理によって対応付けが行なわれた単語が除去され
たＫとＪとに対して、上記アルゴリズムが適用される。
この際、SCORE(K[x],J[y])の値としてPOS ＿SCORE が用
いられる。In the word associating process using POS_SCORE in step 13 in FIG.
The above algorithm is applied to K and J from which the word associated by the process 2 has been removed.
At this time, POS_SCORE is used as the value of SCORE (K [x], J [y]).

【００７１】〔２−３〕合成語の対応付け処理の説明[2-3] Description of processing for associating compound words

【００７２】対訳文における単語の対応は、１単語毎の
対応から数単語毎の対応まで様々である。そこで、１語
対１語の単語対応付け処理の後に、合成語の対応付け処
理を行なう。The correspondence of words in a bilingual sentence varies from correspondence for each word to correspondence for several words. Therefore, after the word-to-word one-to-one word associating process, the process of associating the compound words is performed.

【００７３】図７は、合成語の対応付け処理手順を示し
ている。合成語の対応付け処理では、まず、２言語間の
合成語辞書を利用した対応付け処理が行なわれる（ステ
ップ２１）。FIG. 7 shows a procedure for associating compound words. In the synthetic word associating process, first, an associating process using a synthetic word dictionary between two languages is performed (step 21).

【００７４】次に、両言語間の合成語の品詞の対応を表
す合成語品詞テーブルを利用した対応付け処理が行なわ
れる（ステップ２２）。Next, an associating process is performed using a compound word part-of-speech table representing the correspondence between the parts of speech of the compound words between the two languages (step 22).

【００７５】表２は、ステップ２１の処理に用いられる
２言語の合成語辞書の一部を示している。Table 2 shows a part of the bilingual compound word dictionary used in the processing of step 21.

【００７６】[0076]

【表２】 [Table 2]

【００７７】表３は、ステップ２２の処理に用いられる
合成語品詞テーブルの一部を示している。Table 3 shows a part of the compound word part-of-speech table used in the processing of step 22.

【００７８】[0078]

【表３】 [Table 3]

【００７９】次に、ステップ２１の２言語間の合成語辞
書を利用した対応付け処理のアルゴリズムについて説明
する。Next, the algorithm of the associating process using the composite word dictionary between the two languages in step 21 will be described.

【００８０】部分的に対応された対訳文において、対応
付けされていない単語を、合成語対応付け処理における
処理対象単語と定義する。In the partially corresponded bilingual sentence, a word that is not associated is defined as a word to be processed in the compound word association process.

【００８１】まず、ソースセンテンス（韓国語文）から
処理対象単語を検索して、１つの処理対象単語Ｐを抽出
する。この検索は、ソースセンテンスの左端から行な
う。次に、２言語間の合成語辞書の韓国語欄において、
抽出した処理対象単語Ｐで始まる単語または単語列（以
下、Ｑという）が存在するか否かを調べる。First, a processing target word is retrieved from a source sentence (Korean sentence) to extract one processing target word P. This search is performed from the left end of the source sentence. Next, in the Korean column of the compound word dictionary between two languages,
It is checked whether a word or a word string (hereinafter, referred to as Q) starting with the extracted processing target word P exists.

【００８２】２言語間の合成語辞書の韓国語欄において
Ｑが存在する場合には、合成語辞書内のそれに対応する
日本語欄から、Ｑに対応する単語列（以下、Ｒという）
を抽出する。そして、部分的に対応された対訳文のソー
スセンテンスに処理対象単語Ｐで始まるＱが存在し、か
つターゲットセンテンスにＱに対応するＲが存在するか
否かを調べる。When Q exists in the Korean column of the composite word dictionary between two languages, a word string corresponding to Q (hereinafter, referred to as R) is read from the corresponding Japanese column in the composite word dictionary.
Is extracted. Then, it is checked whether or not Q starting with the processing target word P exists in the source sentence of the partially-translated bilingual sentence, and whether or not R corresponding to Q exists in the target sentence.

【００８３】ソースセンテンスにＱが存在し、かつター
ゲットセンテンスにＲが存在する場合には、ＱとＲとを
対応付ける。この場合、ＱとＲとのうち、少なくとも一
方は合成語である。このような処理を、ソースセンテン
スの各処理対象単語に対して行なう。一部が重複してい
る複数組の合成語の対応付けが存在する場合には、最も
長い合成語を採用する。When Q exists in the source sentence and R exists in the target sentence, Q and R are associated with each other. In this case, at least one of Q and R is a compound word. Such processing is performed on each processing target word of the source sentence. If there is a correspondence between a plurality of sets of compound words partially overlapping, the longest compound word is adopted.

【００８４】ソースセンテンスの左端からの処理対象単
語Ｐの検索に基づいた対応付け処理（合成語辞書を利用
した対応付け処理）を行なった後に、ソースセンテンス
の右端からの処理対象単語Ｐの検索に基づいた対応付け
処理を行なう。After performing a matching process based on a search for the processing target word P from the left end of the source sentence (a matching process using a compound word dictionary), a search for the processing target word P from the right end of the source sentence is performed. An associating process is performed based on the information.

【００８５】また、上記の例では、ソースセンテンス
（韓国語文）から処理対象単語Ｐを検索して、１つの処
理対象単語Ｐを抽出し、抽出した処理対象単語Ｐに基づ
いて、所定の処理を行なっているが、ターゲットセンテ
ンス（日本語文）から処理対象単語Ｐを検索して、１つ
の処理対象単語Ｐを抽出し、抽出した処理対象単語Ｐに
基づいて、所定の処理を行なうようにしてもよい。In the above example, the processing target word P is searched from the source sentence (Korean sentence), one processing target word P is extracted, and a predetermined process is performed based on the extracted processing target word P. The processing target word P is searched from the target sentence (Japanese sentence), one processing target word P is extracted, and a predetermined process is performed based on the extracted processing target word P. Good.

【００８６】次に、ステップ２２の両言語間の合成語の
品詞の対応を表す合成語品詞テーブルを利用した対応付
け処理のアルゴリズムについて説明する。Next, an explanation will be given of the algorithm of the association process using the compound word part-of-speech table representing the correspondence between the parts of speech of the compound words in both languages in step 22.

【００８７】上記ステップ２１までの処理によって部分
的に対応された対訳文において、対応付けされていない
単語を、合成語対応付け処理における処理対象単語と定
義する。In the bilingual sentence partially corresponded by the processing up to the step 21, a word that is not matched is defined as a processing target word in the compound word matching processing.

【００８８】まず、ソースセンテンス（韓国語文）から
処理対象単語を検索して、１つの処理対象単語Ｐを抽出
する。この検索は、ソースセンテンスの左端から行な
う。次に、合成語品詞テーブルの韓国語欄において、抽
出した処理対象単語Ｐの品詞で始まる品詞列Ｓが存在す
るか否かを調べる。First, a processing target word is retrieved from a source sentence (Korean sentence) to extract one processing target word P. This search is performed from the left end of the source sentence. Next, it is checked whether or not a part-of-speech sequence S starting with the part-of-speech of the extracted processing target word P exists in the Korean column of the compound word part-of-speech table.

【００８９】合成語品詞テーブルの韓国語欄において、
品詞列Ｓが存在する場合には、合成語品詞テーブル内の
それに対応する日本語欄から、品詞列Ｓに対応する品詞
列Ｔを抽出する。そして、部分的に対応された対訳文の
ソースセンテンスに処理対象単語Ｐで始まる品詞列Ｓが
存在し、かつターゲットセンテンスに品詞列Ｔが存在す
るか否かを調べる。In the Korean column of the compound word part-of-speech table,
If the part-of-speech sequence S exists, the part-of-speech sequence T corresponding to the part-of-speech sequence S is extracted from the corresponding Japanese column in the compound word part-of-speech table. Then, it is checked whether or not the part-of-speech sequence S starting with the processing target word P exists in the source sentence of the partially-translated bilingual sentence, and whether the part-of-speech sequence T exists in the target sentence.

【００９０】ソースセンテンスに品詞列Ｓが存在し、か
つターゲットセンテンスに品詞列Ｔが存在する場合に
は、ＳとＴとを対応付ける。このような処理を、ソース
センテンスの各処理対象単語に対して行なう。一部が重
複している複数組の合成語の対応付けが存在する場合に
は、最も長い合成語を採用する。If the part-of-speech sequence S exists in the source sentence and the part-of-speech sequence T exists in the target sentence, S and T are associated with each other. Such processing is performed on each processing target word of the source sentence. If there is a correspondence between a plurality of sets of compound words partially overlapping, the longest compound word is adopted.

【００９１】ソースセンテンスの左端からの処理対象単
語Ｐの検索に基づいた対応付け処理（合成語品詞テーブ
ルを利用した対応付け処理）を行なった後に、ソースセ
ンテンスの右端からの処理対象単語Ｐの検索に基づいた
対応付け処理を行なう。After performing an associating process based on the search of the processing target word P from the left end of the source sentence (associating process using the compound word part-of-speech table), a search of the processing target word P from the right end of the source sentence is performed. Is performed based on.

【００９２】また、上記の例では、ソースセンテンス
（韓国語文）から処理対象単語Ｐを検索して、１つの処
理対象単語Ｐを抽出し、抽出した処理対象単語Ｐに基づ
いて、所定の処理を行なっているが、ターゲットセンテ
ンス（日本語文）から処理対象単語Ｐを検索して、１つ
の処理対象単語Ｐを抽出し、抽出した処理対象単語Ｐに
基づいて、所定の処理を行なうようにしてもよい。In the above example, the processing target word P is searched from the source sentence (Korean sentence), one processing target word P is extracted, and a predetermined process is performed based on the extracted processing target word P. The processing target word P is searched from the target sentence (Japanese sentence), one processing target word P is extracted, and a predetermined process is performed based on the extracted processing target word P. Good.

【００９３】〔２−４〕未対応単語に対する対応付け処
理（図１のステップ５）の説明[2-4] Description of associating process for uncorresponding words (step 5 in FIG. 1)

【００９４】未対応単語に対する対応付け処理（図１の
ステップ５）においては、対訳文のソースセンテンスと
ターゲットセンテンスとの語順がほぼ同じであると仮定
して、対訳文における未だ対応付けられてない部分にお
いて、位置的に対応するものどうしを対応付ける。In the associating process for uncorresponding words (step 5 in FIG. 1), it is assumed that the word order of the source sentence and the target sentence of the bilingual sentence is substantially the same, and the word is not yet associated in the bilingual sentence. In the portions, those that correspond in position are associated with each other.

【００９５】〔３〕実験本発明の実行可能性を証明するために、韓国語と日本語
との間の予備実験を行った。[3] Experiment In order to prove the feasibility of the present invention, a preliminary experiment was conducted between Korean and Japanese.

【００９６】〔３−１〕実験条件〔３−１−１〕２言語の語彙リソース表４は、我々がこの実験で使用した５つのリソースの統
計を示す。[3-1] Experimental Conditions [3-1-1] Vocabulary Resources in Two Languages Table 4 shows the statistics of the five resources we used in this experiment.

【００９７】[0097]

【表４】 [Table 4]

【００９８】我々は、我々の以前の翻訳プロジェクト
（文献１１参照）のリソース、つまり、(i) ２言語辞
書、(iv)２言語の合成語辞書、および (ii) 類語辞書を
利用した。(i) と(iv)とは品詞レベルに一般化され、手
動でチェックされて、(iii) と(v) が得られた。We utilized the resources of our previous translation project (see reference 11): (i) a bilingual dictionary, (iv) a bilingual dictionary, and (ii) a thesaurus. (i) and (iv) were generalized to the part-of-speech level and checked manually to obtain (iii) and (v).

【００９９】文献１１：Sumita, E., Yamada, S., Yama
moto, K., Paul, M., Kashioka, H., Ishikawa, K., an
d Shirai, S.(1999) Solutions to Problems Inherent
in Spoken-language Translation: The ATR-MATRIX App
roach, Proc. of 7th MT Summit, pp.229-235.Reference 11: Sumita, E., Yamada, S., Yama
moto, K., Paul, M., Kashioka, H., Ishikawa, K., an
d Shirai, S. (1999) Solutions to Problems Inherent
in Spoken-language Translation: The ATR-MATRIX App
roach, Proc. of 7th MT Summit, pp.229-235.

【０１００】〔３−１−２〕対訳文我々は、外国旅行者のための韓国語と日本語のフレーズ
ブック（典型的な韓国語のセンテンスとその日本語訳の
ペアを集めたもの）を、対訳文として利用した。対訳文
の統計は表５に要約されている。センテンスの長さは比
較的短い、韓国語で平均して約８．１語の長さである。[3-1-2] Translated sentences We have a Korean and Japanese phrasebook for foreign travelers (a collection of typical Korean sentences and their Japanese translations). Was used as a bilingual sentence. Bilingual statistics are summarized in Table 5. The length of the sentence is relatively short, about 8.1 words in Korean on average.

【０１０１】[0101]

【表５】 [Table 5]

【０１０２】〔３−２〕精度我々は、上記対訳文から無作為にセンテンスペアを１０
０組選択して、上記実施の形態に示した方法で、対応付
けを行なった。２か国語を話せる話者が、その結果を対
応毎に評価した。評価結果を表６に示す。１００組のセ
ンテンスペアに対し６９５の対応付けが得られた。ま
た、そのうち、正しい対応付けは５９４であった。[3-2] Accuracy We randomly selected 10 sentence spares from the above translated text.
Zero sets were selected and associated by the method described in the above embodiment. A bilingual speaker evaluated the results for each response. Table 6 shows the evaluation results. 695 correspondences were obtained for 100 sentence spares. The correct association was 594.

【０１０３】[0103]

【表６】 [Table 6]

【０１０４】我々は、次の２つの未対応の場合を区別を
しなかった。２つの場合とは、（１）不正確：対応付け
を行なうべきだったが、システムがこれを達成出来なか
った、（２）正確：センテンスペアにおいて対応するも
のが無かった。We did not distinguish between the following two unhandled cases: The two cases are: (1) Inaccurate: the association should have been made, but the system could not achieve this, (2) Exact: there was no corresponding in sentence spare.

【０１０５】全ての未対応を不正確な場合とみなすと、
再現率を低く計算することになる。この場合の再現率は
次式（７）で定義される。If all unsupported cases are regarded as incorrect,
The recall will be calculated lower. The recall in this case is defined by the following equation (7).

【０１０６】[0106]

【数４】 (Equation 4)

【０１０７】我々の実験では、表７に示すように、高い
再現率 quasi＿ recall （９３．７％）と高い精度prec
ision （９１．２％）が達成された。In our experiments, as shown in Table 7, high recall quasi_recall (93.7%) and high precision prec
ision (91.2%) was achieved.

【０１０８】[0108]

【表７】 [Table 7]

【０１０９】表８は、結果に対する各ステップ（図１の
ステップ３（図２のステップ１１、１２、１３）、図１
のステップ４（図７のステップ２１、２２）、図１のス
テップ５）の寄与度を示している。Table 8 shows each step (step 3 in FIG. 1 (steps 11, 12, and 13 in FIG. 2),
4 (steps 21 and 22 in FIG. 7) and step 5) in FIG.

【０１１０】[0110]

【表８】 [Table 8]

【０１１１】WORD＿SCORE を用いた単語対応付け処理
（図２のステップ１１）は、最も有効で正確なステップ
である。残りのステップは、精度は劣化するが、再現率
に著しく寄与している。The word matching process using WORD_SCORE (step 11 in FIG. 2) is the most effective and accurate step. The remaining steps, although degraded in accuracy, contribute significantly to recall.

【０１１２】[0112]

【発明の効果】この発明によれば、この発明は、再現率
が高くかつ精度の高い、対訳文の単語対応付け方法が得
られる。According to the present invention, a method of associating a bilingual sentence with a high recall and high accuracy can be obtained.

[Brief description of the drawings]

【図１】対訳文の単語対応付け方法の全体的な処理手順
を示すフローチャートである。FIG. 1 is a flowchart showing an overall processing procedure of a method for associating a bilingual sentence with a word.

【図２】１単語対１単語の単語対応付け処理の手順を示
すフローチャートである。FIG. 2 is a flowchart illustrating a procedure of a word-to-word word associating process;

【図３】Ｋ＝＜Ａ，Ｂ，Ｙ，Ｅ＞、Ｊ＝＜ａ，ｂ，ｃ，
ｄ，ｅ＞である場合の、４×５のマトリックスの部分ス
コアｃ[ ｘ，ｙ] の値の例を示す模式図である。FIG. 3 shows K = <A, B, Y, E>, J = <a, b, c,
It is a schematic diagram which shows the example of the value of the partial score c [x, y] of the 4x5 matrix in case d, e>.

【図４】図３において、総スコアが最大となる経路を示
す模式図である。FIG. 4 is a schematic diagram showing a route having a maximum total score in FIG. 3;

【図５】図４に基づいて、対応付けられた単語を示す模
式図である。FIG. 5 is a schematic diagram showing words associated with each other based on FIG. 4;

【図６】１単語対１単語の単語対応付け処理に用いられ
るアルゴリズムを示すフローチャートである。FIG. 6 is a flowchart showing an algorithm used for a word-to-word word correspondence process.

【図７】合成語の対応付け処理手順を示すフローチャー
トである。FIG. 7 is a flowchart illustrating a synthetic word association processing procedure;

Claims

[Claims]

1. A method for associating a bilingual sentence comprising a source language sentence and a target language sentence corresponding thereto with a word-to-word correspondence between the source language sentence and the target language sentence, A first step of calculating a first word correspondence degree using a bilingual dictionary between the two languages, and a second step of obtaining a word correspondence combination that maximizes the sum of the first word correspondence degrees in a bilingual sentence A method for associating a bilingual sentence with words, comprising:

2. The first degree of word correspondence between an arbitrary source language word and an arbitrary target language word is as follows: the target language word is a translation of the source language word in a bilingual dictionary. 2. The method for associating a bilingual sentence with a word according to claim 1, wherein the word is determined based on whether or not the word matches one of the words.

3. For a part of the entire bilingual sentence that has not yet been associated in the second step, a synonym dictionary for each of the two languages is created for each one-to-one word correspondence between the two languages. Utilizing the third step of calculating the second word correspondence, and the sum of the second word correspondence is maximum in a portion of the entire bilingual sentence that has not yet been matched by the second step. 4. The method according to claim 1, further comprising: a fourth step of obtaining a combination of word correspondences.

4. A second word correspondence between a word in an arbitrary source language and a word in an arbitrary target language is determined by a set of semantic classes to which the word in the source language belongs obtained from a thesaurus and the target language 4. The method according to claim 3, wherein the calculation is performed based on a set of semantic classes to which the word belongs.

5. A second word correspondence between a word in an arbitrary source language and a word in an arbitrary target language is determined by a set of semantic classes to which the word in the source language belongs obtained from a thesaurus and the target language 5. The method according to claim 4, wherein the ratio is represented by a ratio of the size of the common set to the size of the union of the set of the semantic classes to which the word belongs.

6. A part of the entire bilingual sentence that has not yet been associated in the second and fourth steps, for each one-to-one word correspondence between the two languages. The fifth step of calculating the third degree of word correspondence using the part-of-speech table representing the part-of-speech correspondence, and the parts of the entire bilingual sentence that have not yet been matched by the second and fourth steps. 6. A bilingual sentence word according to any one of claims 3, 4 and 5, further comprising: a sixth step of obtaining a combination of word correspondences that maximizes the third sum of word correspondence degrees. Mapping method.

7. The third word correspondence between a word in an arbitrary source language and a word in an arbitrary target language is as follows: the part of speech of the word in the target language is the part of speech of the word in the source language in the part of speech table. 7. It is determined based on whether or not it matches any of the parts of speech corresponding to.
The word correspondence method of the bilingual sentence described in.

8. The second step, the fourth step and the sixth step.
In the bilingual sentence partially supported in the step,
8. The method according to claim 6, further comprising a seventh step of associating one word to a composite word or a composite word to a composite word using a composite word dictionary between two languages. How to match words in the bilingual sentence.

9. A bilingual sentence partially corresponding in the second, fourth, sixth, and seventh steps, using a compound word part-of-speech table representing the correspondence of part of speech of a compound word between two languages. 9. The method for associating a bilingual sentence with a word according to claim 8, further comprising an eighth step of associating one word-to-synthesis word or a synthetic word-to-synthesis word.

10. Assuming that the word order of the source language sentence and the target language sentence forming the bilingual sentence is substantially the same, the second step, the fourth step, the sixth step, the seventh step, and The method according to claim 9, further comprising a ninth step of associating a positionally corresponding part in a part that has not yet been associated in the eighth step.

11. A bilingual sentence partially corresponding in the second step, using a compound word dictionary between two languages,
3. The method for associating a bilingual sentence with a word according to claim 1, further comprising a tenth step of associating one word to a compound word or a compound word to a compound word.

12. Assuming that the word order of the source language sentence and the target language sentence constituting the bilingual sentence is almost the same, no association has been made yet in the second and tenth steps of the entire bilingual sentence. 12. The method for associating words in a bilingual sentence according to claim 11, further comprising an eleventh step of associating parts corresponding in position with each other.

13. A bilingual sentence partially corresponded in the second step and the tenth step, using a compound word part-of-speech table indicating the correspondence of part of speech of a compound word between two languages, one word pair compound word or compound word The method according to claim 11, further comprising a twelfth step of associating a word pair with a composite word.

14. Assuming that the source language sentence constituting the bilingual sentence and the target language sentence have substantially the same word order, the second, tenth, and twelfth steps of the entire bilingual sentence are still in correspondence. 14. The method for associating a bilingual sentence with a word according to claim 13, further comprising a thirteenth step of associating a positionally corresponding part in a part that has not been performed.