JP4708682B2

JP4708682B2 - Bilingual word pair learning method, apparatus, and recording medium on which parallel word pair learning program is recorded

Info

Publication number: JP4708682B2
Application number: JP2003099007A
Authority: JP
Inventors: 節夫山田; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-04-02
Filing date: 2003-04-02
Publication date: 2011-06-22
Anticipated expiration: 2023-04-02
Also published as: JP2004310170A

Description

【０００１】
【発明の属する技術分野】
本発明は、対訳関係にある対訳単語対を抽出する学習方法に係わり、特に対訳関係にある自然言語文からの自動的な対訳単語対を抽出する学習方法に関する。
【０００２】
【従来の技術】
統計情報を利用して、対訳関係にある対訳文対から対訳単語対を自動学習する方法（特許文献１参照）が知られているが、例えば日英間のように言語構造が大きく違う単語対の場合、対訳単語対の抽出精度に問題があった。また、このような言語構造が大きく違う場合、片側の言語の構文情報を利用して単語対の抽出精度を向上させる学習方法（非特許文献１参照）が知られている。
【０００３】
【特許文献１】
特開平５−１８９４８１号公報
ピーターフィトシューブラウン「翻訳用コンピュータ操作方法、字句モデル生成方法、モデル生成方法、翻訳用コンピュータシステム、字句モデル生成コンピュータシステム及びモデル生成コンピュータシステム」
【非特許文献１】
Kenji Yamada and Kevin Knight,「A Syntax-based Statistical Translation Model」,39th Annual Meeting of the Association for Computation Linguistics(ACL-01),pp.523-530,2001
【０００４】
【発明が解決しようとする課題】
上記の従来の特許文献１や非特許文献１における対訳単語対の学習方法では、対訳文の２言語のうち対応付ける基になる言語が変わると得られる単語対が変わり、安定した単語対を抽出できない問題がある。また、非特許文献１の学習方法において、構文情報を利用しない方法より抽出精度は向上しているものの、まだ十分な精度とはなっていない。
本発明は、上記の点に鑑みなされたもので、入力される対訳文の構文情報を片言語ずつ利用して得られた対訳単語対から、同じ対訳単語対を抽出することで安定した対訳単語対を抽出し、また、抽出された共通の対訳単語対を入力された対訳文に加え、対訳単語対の学習を繰り返すことで対訳単語対の抽出精度を向上させる、対訳単語対の学習方法、装置、及び、対訳単語対の学習プログラムを記録した記録媒体を提供することを目的とする。
【０００５】
【課題を解決するための手段】
図１および図４は、本発明を説明するための対訳単語対の学習方法の概要フローチャートである。
本発明は、第１の自然言語文とその対訳である第２の自然言語文を入力して対訳関係にある単語対を学習する装置において、第１の自然言語文とその対訳である第２の自然言語文を入力し（ステップ１）、第１の自然言語文の要素である単語または単語列に第２の自然言語文の要素である単語または単語列を対応付けて対訳単語対の集合を獲得し（ステップ２）、第２の自然言語文の要素である単語または単語列に第１の自然言語文の要素である単語または単語列を対応付けて対訳単語対の集合を獲得し（ステップ３）、これら２種類の対訳単語対の集合のうち同じ対訳対を抽出し、当該抽出された共通単語対が、既に共通対訳単語対抽出手段により記憶装置９に記憶されている共通対訳単語対の全てと一致するか否かを判断し、全てとは一致しないと判断すれば当該一致しない共通単語対を共通対訳単語対として前記記憶装置９へ記憶し、全てと一致すると判断された場合は処理を終了し、
言語別単語抽出手段が、前記記憶装置９に記憶されている共通対訳単語対を第１の自然言語文の要素である単語または単語列と第２の自然言語文の要素である単語または単語列とに分けて前記対訳コーパスに記憶して前記ステップ１に戻る（ステップ４）。
【０００６】
また、本発明は、前記対訳単語対獲得ステップの単語または単語列を対応付ける処理において、構文解析ステップを設けることによって得られる構文木と単語または単語列を対応付けるステップを含む。
【０００７】
また、本発明の他の参考例は、第１の自然言語文の要素である単語または単語列、及び、前記同じ対訳対を抽出するステップによって得られる共通単語対の第１の自然言語文の要素である単語または単語列に対して、第２の自然言語文の要素である単語または単語列、及び、前記同じ対訳対を抽出するステップによって得られる共通単語対の第２の自然言語文の要素である単語または単語列を対応付けるステップと、第２の自然言語文の要素である単語または単語列、及び、前記同じ対訳対を抽出するステップによって得られる共通単語対の第２の自然言語文の要素である単語または単語列に対して、第１の自然言語文の要素である単語または単語列、及び、前記同じ対訳対を抽出するステップによって得られる共通単語対の第１の自然言語文の要素である単語または単語列を対応付けるステップを含む。
また、本発明は、前記同じ対訳対を抽出するステップにおいて得られる共通単語対を用いて、該共通単語対が変化しなくなるまで対訳単語対の学習方法を繰り返すステップを含む。
【０００８】
図２および図３は、本発明の対訳単語対の学習装置の概要構成図である。
対訳単語対の学習装置は、第１の自然言語文とその対訳である第２の自然言語文を読み込む対訳文読み込み手段１と、第１の自然言語文の要素である単語または単語列に第２の自然言語文の要素である単語または単語列を対応付け、第１の自然言語を基に対応付けた対訳単語対を記憶装置７１に格納する、及び、第２の自然言語文の要素である単語または単語列に第１の自然言語文の要素である単語または単語列を対応付け、第２の自然言語を基に対応付けた対訳単語対を記憶装置７２に格納する対訳単語対獲得手段６と、記憶装置７１、７２の中で同じ対訳単語対である共通対訳単語対を抽出し、当該抽出された共通単語対が、既に共通対訳単語対抽出手段により記憶装置９に記憶されている共通対訳単語対の全てと一致するか否かを判断し、全てとは一致しないと判断すれば当該一致しない共通単語対を共通対訳単語対として前記記憶装置９へ記憶し、全てと一致すると判断された場合は処理を終了する共通対訳単語対抽出手段と、
前記共通対訳単語対抽出手段で前記記憶装置９へ記憶されている共通対訳単語対を第１の自然言語文の要素である単語または単語列と第２の自然言語文の要素である単語または単語列とに分けて前記対訳コーパスに記憶して第１の自然言語を基にした対訳単語対獲得手段と第２の自然言語を基にした対訳単語対獲得手段ならびに共通対訳単語対抽出手段８を動作させる言語別単語抽出手段を含む。
【０００９】
また、本発明は対訳単語対獲得手段６において単語切り結果や構文解析結果を利用するために、対訳文読み込み手段１で読み込まれた文を形態素解析する形態素解析手段２と、続いて構文解析する構文解析手段３を含む。
また、本発明の他の実施例は、対訳単語対獲得手段６において、共通対訳単語対抽出手段８によって記憶装置９に格納された共通対訳単語対を利用するために、共通対訳単語対を言語別に分ける言語別単語抽出手段Ａを含む。
また、本発明の他の実施例は、共通対訳単語対抽出手段８によって記憶装置９に格納された共通対訳単語対が変化しなくなるまで対訳単語対の学習を繰り返す手段を含む。
【００１０】
（作用）
上記のように、本発明の対訳単語対の学習方法においては、対訳文読み込み手段１に第１の自然言語文とその対訳である第２の自然言語文を入力し、対訳単語対獲得手段６で第１の自然言語文の要素である単語または単語列に第２の自然言語文の要素である単語または単語列を対応付け、第１の自然言語を基に対応付けた対訳単語対を記憶装置71に格納し、また、対訳単語対獲得手段６で第２の自然言語文の要素である単語または単語列に第１の自然言語文の要素である単語または単語列を対応付け、第２の自然言語を基に対応付けた対訳単語対を記憶装置72に格納し、共通対訳単語対抽出手段８で、記憶装置71、72の中で同じ対訳単語対である共通対訳単語対を抽出し、記憶装置９に格納する。これにより、第１の自然言語文とその対訳である第２の自然言語文から自動的に安定した対訳単語対が抽出できる。
【００１１】
また、形態素解析手段２で対訳文読み込み手段１によって入力された文の形態素解析を行い、形態素解析結果を記憶装置41、42に格納し、さらに、構文解析手段３で構文解析を行い、構文解析結果を記憶装置51、52に格納し、対訳単語対獲得手段６において、形態素解析結果、及び、構文解析結果を利用して、第１の自然言語を基に対応付けた対訳単語対を記憶装置71に格納し、及び、第２の自然言語を基に対応付けた対訳単語対を記憶装置72に格納し、共通対訳単語対抽出手段８で、記憶装置71、72の中で同じ単語対である共通対訳単語対を抽出し、記憶装置９に格納する。これにより、共通対訳単語対をより正確に抽出することができる。
【００１２】
また、言語別単語抽出手段Ａで、共通対訳単語対抽出手段８によって記憶装置９に格納された共通対訳単語対を言語別に分け、対訳文読み込み手段１によって入力された対訳文に加えて、分けられた第１の自然言語文の要素である単語または単語列、及び、第２の自然言語文の要素である単語または単語列を対訳単語対獲得手段６において利用する。これによって、共通対訳単語対抽出手段８では共通対訳単語対をより正確に、より多く抽出することができる。
【００１３】
また、対訳単語対の学習を繰り返す手段で、共通対訳単語対抽出手段８によって記憶装置９に格納された共通対訳単語対が変化しなくなるまで、対訳単語対の学習を行う。これによって、さらに正確に、より多くの共通対訳単語対を抽出することができる。
したがって、上記方法を全て実行する、及び上記手段を用いることにより、対訳単語対が自動的に得られ、対訳単語対の学習が可能となる。
上記の記載は、対訳単語対の学習方法について述べているが、対訳単語対の学習装置及び対訳単語対の学習プログラムについても同様である。
【００１４】
【発明の実施の形態】
以下に、本発明の一実施例について図面により説明する。
図３は、本発明の一実施例である対訳単語対の学習装置基本ブロック構成図である。同図に示す対訳単語対の学習装置は、対訳文読み込み部１、形態素解析部２、構文解析部３、対訳単語対獲得部６、共通対訳単語対抽出部８、言語別単語抽出部Ａ、記憶装置41、42、51、52、71、72、９より構成される。
【００１５】
対訳文読み込み部１に対訳コーパスに格納された第１の自然言語文とその対訳である第２の自然言語文を入力する（読み込む）。
形態素解析部２は、入力された第１の自然言語文を形態素解析した結果を記憶装置41に格納し、入力された第２の自然言語文を形態素解析した結果を記憶装置42に格納する。
構文解析部３は、記憶装置41に格納されている形態素解析結果を利用して、第１の自然言語文の構文解析を行い、その結果を記憶装置51に格納し、記憶装置42に格納されている形態素解析結果を利用して、第２の自然言語文の構文解析を行い、その結果を記憶装置52に格納する。
【００１６】
対訳単語対獲得部６は、第１の自然言語を基にした対訳単語対獲得部61と第２の自然言語を基にした対訳単語対獲得部62より構成される。
第１の自然言語を基にした対訳単語対獲得部61は、例えば、第１の自然言語文を形態素解析した結果である単語列に第２の自然言語文の構文解析結果を対応付けて対訳単語対を抽出し、記憶装置71に格納する。
第２の自然言語を基にした対訳単語対獲得部62は、例えば、第２の自然言語文を形態素解析した結果である単語列に第１の自然言語文の構文解析結果を対応付けて対訳単語対を抽出し、記憶装置72に格納する。
共通対訳単語対抽出部８は、記憶装置71、72に格納されている対訳単語対の集合のうち同じ対訳単語対を抽出し、記憶装置９に格納する。
言語別単語抽出部Ａは、記憶装置９に格納されている共通対訳単語対を第１の自然言語、第２の自然言語の単語に分け、対訳文読み込み部１で読み込まれている対訳コーパスの対訳文に追加する。
【００１７】
図４は、本発明の一実施例である学習の繰り返しを行うフローチャートである。
以下、このフローチャートに基づいて、第１の自然言語が英語、第２の自然言語が日本語であるとした場合の一実施例について説明する。
ステップ101では、対訳文読み込み部１に第１の自然言語文とその対訳である第２の自然言語文を入力する。例えば、入力対訳文が図５に示すように、英文が「The house is somewhere about here」、「Look about」、その対訳である日文が「その家はどこかこのあたりにある」、「あたりを見まわす」を含んでいたとする。
【００１８】
ステップ102では、ステップ101で読み込んだ対訳文を形態素解析部２によってそれぞれ形態素解析し、その結果を記憶装置41、42に格納する。例えば、日文は図６に示すように、「その／家／は／どこ／か／この／あたり／に／ある」、「あたり／を／見／まわす」と単語切りがなされたとする。また、形態素解析部２では、構文解析のために各単語に品詞を付与する。例えば、英文では図７に示すように、「Look」には動詞、「about」には副詞、及び、図９に示す品詞が付与されたとする。また、例えば、日文では、図８に示すように、「あたり」には名詞、「を」には助詞、「見」には動詞、「まわす」には動詞、及び図１０に示す品詞が付与されたとする。
ステップ103では、記憶装置41、42に格納された形態素解析結果を基に、構文解析部３によってそれぞれ構文解析し、その結果を記憶装置51、52に格納する。例えば、英文は図７、９に示す結果が、また、日文は図８、１０に示す結果が得られたとする。
【００１９】
ステップ104、及び、ステップ105では、記憶装置41、42、51、52に格納されている、形態素解析結果、及び、構文解析結果を利用して、第１の自然言語を基にした対訳単語対獲得部61によって第１の自然言語を基にした対訳単語対が記憶装置71に格納され、また、第２の自然言語を基にした対訳単語対獲得部62によって第２の自然言語を基にした対訳単語対が記憶装置72に格納される。
例えば、上記の例では、図11に示すように形態素解析を行った英文を基に日文の構文解析結果を対応付けた場合では、「その」と「the」、「家」と「house」などが対応付けられ、また、形態素解析を行った日文を基に英文の構文解析結果を対応付けた場合では、「the」と「その」、「house」と「家」などが対応付けられたとする。なお、空欄は対応するものがないことを表し、例えば、図11の英文を基に日文の構文解析結果を対応付けた場合の「か」は、対応する英語単語または単語列がなかったことを意味する。また、この例では、構文解析結果の中間ノード単位を超えない範囲で、単語列に構文解析結果をできるだけ合わせるように構文解析結果の語順を入れ替えて対応付けている。ここで、構文解析結果の中間ノードとは、動詞、名詞、動詞句、名詞句といった構文解析結果上の文法的なカテゴリーを示す。中間ノード単位とは、構文解析結果において中間ノード（つまり文法的なカテゴリー）よりも下に属する単語列を指し、例えば、図８に示す構文解析結果では、名詞句単位は、名詞句よりも下に属する単語列なので、「あたり」「を」を指し、動詞句単位は、「見」「まわす」を指す。中間ノード単位を超えない範囲とは、中間ノード単位である単語列内の範囲に限ることを意味する。例えば、図８に示す構文解析結果では、名詞句単位の範囲にある「あたり」と「を」を入れ替えたり、名詞句単位全体の「あたり／を」と動詞句単位全体の「見／まわす」を入れ替えることはできるが、助詞「を」は名詞句の範囲の単語で、動詞「見」は動詞句の範囲の単語なので、名詞句単位と動詞句単位の範囲を超えて、個別に助詞「を」と動詞「見」を入れ替えることはできない。したがって、英文「look about」に日文「あたり／を／見／まわす」の構文解析結果を対応付ける場合は、図11（上図）に示すように、名詞句「あたり／を」と動詞句「見／まわす」の語順を入れ替えることによって、「見」と「look」、「あたり」と「about」が対応付けられている。これら対訳単語対は、記憶装置71、72に格納される。
【００２０】
ステップ106では、共通対訳単語対抽出部８によって、記憶装置71に格納されている対訳単語対と記憶装置72に格納されている対訳単語対のうち同じ対訳単語対を抽出し、既に記憶装置９に保存されている共通対訳単語対と全て同じかどうかを判断し、全て同じなら(yes)、同じでないものがあれば(no)となり、yesが選択されると対訳単語対の学習は終了し、noが選択されると、次のステップ107に進む。上記の例では、図５に示す入力文から共通対訳単語対抽出部８によって抽出される共通対訳単語対は、図12に示す通り、例えば、「その」と「the」、「家」と「house」などが抽出される。記憶装置９にはまだ何も保存されていないのでnoが選択され、次のステップ107へ進む。
【００２１】
ステップ107では、ステップ106で記憶装置９に保存されている共通対訳単語対と一致しなかった対訳単語対を記憶装置９へ格納する。上記の例の場合、図12に示すステップ106で抽出された対訳単語対は全て記憶装置９へ格納する。
ステップ108では、記憶装置９に格納されている共通対訳単語対をそれぞれの言語別に分け、それぞれ入力対訳に加える。すなわち、共通対訳単語対を対訳コーパスに保存する。上記の例の場合、言語別に分けられ、入力された英文に、「the」、「house」、「is」、「about」が、また、その対訳として入力された日文に「その」、「家」、「は」、「あたり」が、加えられる。
【００２２】
上記例では、再度ステップ101に進むので、以下では上記の例についてさらにステップ毎に説明する。
ステップ101では、対訳文読み込み部１に元の対訳文とステップ108で加えられた対訳単語対と両方が入力される。
ステップ102では、形態素解析部２でステップ101によって入力された対訳単語対の品詞を付与し、結果を記憶装置41、42に格納する。
ステップ103では、構文解析部３でステップ101によって入力された対訳単語対について記憶装置41、42に格納された形態素解析結果を利用し、構文解析をし、結果を記憶装置51、52に格納する。
【００２３】
ステップ104、及び、ステップ105では、記憶装置41、42、51、52に格納されている、入力文及び共通単語対の単語または単語列の形態素結果、構文解析結果を利用して、第１の自然言語を基にした対訳単語対獲得部61によって第１の自然言語を基にした対訳単語対が記憶装置71に格納され、また、第２の自然言語を基にした対訳単語対獲得部62によって第２の自然言語を基にした対訳単語対が記憶装置72に格納される。上記の例の場合、共通対訳単語対である「あたり」と「about」が入力文に加わったため（図１２参照）、構文解析結果を単語列に対応する時に、「あたり」と「about」が対応付くことが考慮される。例えば、英文「The house is somewhere about here」に日文「その／家／は／どこ／か／この／あたり／に／ある」の構文解析結果を合わせる場合、図10で示した構文解析結果から動詞句「ある」と副詞句「この／あたり／に」が入れ替わり、さらに、副詞句の中では、連体詞「この」と助詞「に」が入れ替わり、図13に示すような語順となる。この結果、例えば、英文との対応は図15に示す通りとなり、「この」と「here」が対応付く。一方、同じ例文に対して、日文に英文の構文解析結果を合わせる場合、図９で示した構文解析結果から副詞句の中の副詞「about」と副詞「here」が入れ替わり、図14に示すような語順となる。この結果、例えば、日文との対応は図15に示す通りとなり、「here」と「この」が対応付く。これら図15に示す対訳単語対は、記憶装置71、72に格納される。
【００２４】
ステップ106では、共通対訳単語対抽出部８によって、記憶装置71に格納されている対訳単語対と記憶装置72に格納されている対訳単語対のうち同じ対訳単語対を抽出すると、上記例では、新たに「この」と「here」の共通対訳単語対が抽出される。これは、現在記憶装置９に保存されている図12の共通対訳単語対と一致しないものがあるので、このステップの判定は、noとなり、ステップ107に進む。
【００２５】
ステップ107では、記憶装置９に保存されている図12と一致しない「この」と「here」を追加し、図16に示す共通対訳単語対が記憶装置９に格納される。
ステップ108では、記憶装置９に格納されている共通対訳単語対をそれぞれの言語別に分け、それぞれ入力対訳に加える。
この後、さらにステップ101へと処理は進むが、共通対訳単語対が変化しなくなると、ステップ106の判定がyesとなり、対訳単語対の学習は終了する。
なお、上記の例では、図３に示す構成図に基づいて説明したが、この例に限定されることなく特許請求の範囲内で種々の変更・応用が可能である。
【００２６】
本発明の対訳単語対の学習装置は、ＣＰＵやメモリ等を有するコンピュータと利用者端末とCD-ROM、磁気ディスク装置、半導体メモリ等の機械読み取り可能な記録媒体とから構成することができる。
記録媒体に記録された対訳単語対の学習プログラム、あるいは通信回線を介して伝送された対訳単語対の学習プログラムはコンピュータに読み取られ、コンピュータの動作を制御し、コンピュータ上に前述した各構成要素と各処理を実現する。
【００２７】
【発明の効果】
以上説明したように、本発明によれば、第１の自然言語文とその対訳である第２の自然言語文から、第１の自然言語文を第２の自然言語文に対応付けた対訳単語対と、第２の自然言語文を第１の自然言語文に対応付けた対訳単語対と比較して共通する対訳単語対を抽出することにより、自動的に安定した第１の自然言語と第２の自然言語の対訳単語対が抽出できる。
また、対応付けるステップで、形態素解析手段や構文解析手段を利用した結果を用いることにより、より正確に第１の自然言語と第２の自然言語の対訳単語対が抽出できる。
また、一度抽出された対訳単語対を、入力された対訳文に追加することにより、より多く第１の自然言語と第２の自然言語の対訳単語対が抽出できる。
また、抽出される対訳単語対が変化しなくなるまで、上記対訳単語対の学習を繰り返すことにより、さらに多く、より正確に第１の自然言語と第２の自然言語の対訳単語対が抽出できる。
このようにして抽出された対訳単語対は、例えば、電子化対訳辞書の構築または拡充に利用できたり、機械翻訳システムの対訳辞書として利用することが可能である。
【図面の簡単な説明】
【図１】本発明の原理を説明するための対訳単語対の学習方法の概要フローチャート。
【図２】本発明の対訳単語対の学習装置の概要構成図。
【図３】本発明の一実施例である対訳単語対の学習装置の基本ブロック構成図。
【図４】本発明の一実施例である学習の繰り返しを行うフローチャート。
【図５】本発明の一実施例である入力対訳文の例を示す図。
【図６】本発明の一実施例である日文の形態素解析結果（単語切り）の例を示す図。
【図７】本発明の一実施例である英文の構文解析結果の例（その１）を示す図。
【図８】本発明の一実施例である日文の構文解析結果の例（その１）を示す図。
【図９】本発明の一実施例である英文の構文解析結果の例（その２）を示す図。
【図１０】本発明の一実施例である日文の構文解析結果の例（その２）を示す図。
【図１１】本発明の一実施例である単語対応結果の例（その１）を示す図。
【図１２】本発明の一実施例である共通の単語対応の例（その１）を示す図。
【図１３】本発明の一実施例である日文の構文解析結果の語順を入れ替えた例を示す図。
【図１４】本発明の一実施例である英文の構文解析結果の語順を入れ替えた例を示す図。
【図１５】本発明の一実施例である単語対応結果の例（その２）を示す図。
【図１６】本発明の一実施例である共通の単語対応の例（その２）を示す図。
【符号の説明】
１・・・対訳文読み込み部、２・・・形態素解析部、３・・・構文解析部、６・・・対訳単語対獲得部、８・・・共通対訳単語対抽出部、９・・・共通対訳単語対、71・・・第１の自然言語を基に対応付けた対訳単語対記憶装置、72・・・第２の自然言語を基に対応付けた対訳単語対記憶装置、Ａ・・・言語別単語抽出部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a learning method for extracting bilingual word pairs having a bilingual relationship, and more particularly to a learning method for automatically extracting bilingual word pairs from a natural language sentence having a bilingual relationship.
[0002]
[Prior art]
There is known a method of automatically learning bilingual word pairs from bilingual sentence pairs in a bilingual relationship using statistical information (see Patent Document 1). For example, word pairs with greatly different language structures such as between Japanese and English. In the case of, there was a problem in the accuracy of extracting parallel word pairs. In addition, when such a language structure is greatly different, a learning method (see Non-Patent Document 1) is known that uses the syntax information of one language to improve word pair extraction accuracy.
[0003]
[Patent Document 1]
Japanese Patent Application Laid-Open No. Hei 5-189482 "Computer Operation Method for Translation, Lexical Model Generation Method, Model Generation Method, Translation Computer System, Lexical Model Generation Computer System and Model Generation Computer System"
[Non-Patent Document 1]
Kenji Yamada and Kevin Knight, `` A Syntax-based Statistical Translation Model '', 39th Annual Meeting of the Association for Computation Linguistics (ACL-01), pp. 523-530, 2001
[0004]
[Problems to be solved by the invention]
In the conventional method of learning a pair of translated words in Patent Document 1 and Non-Patent Document 1 described above, when a base language to be associated changes between two languages of a translated sentence, the obtained word pair changes, and a stable word pair cannot be extracted. There's a problem. Further, in the learning method of Non-Patent Document 1, although the extraction accuracy is improved as compared with the method that does not use syntax information, the accuracy is not yet sufficient.
The present invention has been made in view of the above points, and stable bilingual words can be obtained by extracting the same bilingual word pairs from bilingual word pairs obtained by using syntactic information of the bilingual sentences inputted for each language. A bilingual word pair learning method for extracting pairs, and adding the extracted common bilingual word pairs to the input bilingual sentence and repeating the bilingual word pair learning to improve the accuracy of bilingual word pair extraction, It is an object of the present invention to provide an apparatus and a recording medium on which a parallel word pair learning program is recorded.
[0005]
[Means for Solving the Problems]
FIG. 1 and FIG. 4 are schematic flowcharts of a bilingual word pair learning method for explaining the present invention.
The present invention is an apparatus for learning a word pair in a bilingual relationship by inputting a first natural language sentence and a second natural language sentence that is a translation of the first natural language sentence. A natural language sentence is input (step 1), and the word or word string that is the element of the first natural language sentence is associated with the word or word string that is the element of the second natural language sentence, to form a set of bilingual word pairs (Step 2), a word or word string that is an element of the first natural language sentence is associated with a word or word string that is the element of the second natural language sentence, and a set of parallel word pairs is obtained ( Step 3), extracting the same translation pair from the set of these two types of translation word pairs, and the extracted common word pair is already stored in the storage device 9 by the common translation word pair extraction means Judge whether or not all of the pairs match. If it is determined not to store the common word pair not the match to the storage device 9 as a common translation word pair, if it is determined that match all the process ends,
The word-by-language word extracting means converts the common bilingual word pairs stored in the storage device 9 into words or word strings that are elements of the first natural language sentence and words or word strings that are elements of the second natural language sentence. And store in the bilingual corpus and return to step 1 (step 4).
[0006]
Further, the present invention includes a step of associating a word or word string with a syntax tree obtained by providing a syntax analysis step in the process of associating a word or word string in the bilingual word pair acquisition step.
[0007]
In another reference example of the present invention, a word or word string that is an element of a first natural language sentence and a first natural language sentence of a common word pair obtained by the step of extracting the same parallel translation pair. For a word or word string that is an element, a word or word string that is an element of a second natural language sentence, and a second natural language sentence of a common word pair obtained by extracting the same parallel translation pair A second natural language sentence of a common word pair obtained by associating a word or word string as an element, a word or word string as an element of a second natural language sentence, and the step of extracting the same parallel translation pair A first natural language of a common word pair obtained by extracting a word or word string that is an element of a first natural language sentence and the same parallel translation pair with respect to a word or word string that is an element of Comprising the step of associating a word or word string is a component.
In addition, the present invention includes a step of repeating the method of learning a parallel word pair until the common word pair is changed by using the common word pair obtained in the step of extracting the same parallel translation pair.
[0008]
2 and 3 are schematic configuration diagrams of the bilingual word pair learning apparatus of the present invention.
The bilingual word pair learning device includes a bilingual sentence reading unit 1 that reads a first natural language sentence and a second natural language sentence that is a parallel translation thereof, and a word or a word string that is an element of the first natural language sentence. A word or a word string that is an element of the natural language sentence of 2 and a parallel translation word pair that is associated based on the first natural language are stored in the storage device 71, and an element of the second natural language sentence Bilingual word pair acquisition means for associating a word or word string, which is an element of the first natural language sentence, with a certain word or word string, and storing in the storage device 72 bilingual word pairs associated with each other based on the second natural language 6 and the parallel translation word pair which is the same translation word pair in the storage devices 71 and 72 are extracted, and the extracted common word pair is already stored in the storage device 9 by the common translation word pair extraction means. Judges whether or not all of the common bilingual word pairs match If it is determined that they do not match all, the non-matching common word pair is stored in the storage device 9 as a common bilingual word pair, and if it is determined that all match, a common bilingual word pair extraction unit that ends the processing; ,
The common bilingual word pair stored in the storage device 9 by the common bilingual word pair extraction means is a word or word string that is an element of a first natural language sentence and a word or word that is an element of a second natural language sentence. A bilingual word pair acquisition means based on the first natural language, a bilingual word pair acquisition means based on the second natural language, and a common bilingual word pair extraction means 8 which are stored in the bilingual corpus divided into columns. Language-specific word extraction means to be operated is included.
[0009]
The present invention also uses a morphological analysis unit 2 for morphological analysis of a sentence read by the bilingual sentence reading unit 1 and subsequently parses it in order to use a word cut result and a syntax analysis result in the bilingual word pair acquisition unit 6. The parsing means 3 is included.
According to another embodiment of the present invention, in order to use the common translation word pair stored in the storage device 9 by the common translation word pair extraction means 8 in the translation word pair acquisition means 6, the common translation word pair is converted into a language. It includes language-specific word extraction means A.
In addition, another embodiment of the present invention includes means for repeating learning of parallel word pairs until the common parallel word pairs stored in the storage device 9 by the common parallel word pair extracting means 8 do not change.
[0010]
(Function)
As described above, in the parallel word pair learning method of the present invention, the first natural language sentence and the second natural language sentence corresponding to the first natural language sentence are input to the parallel sentence reading means 1, and the parallel word pair acquisition means 6 is input. The word or word string that is the element of the second natural language sentence is associated with the word or word string that is the element of the first natural language sentence, and the bilingual word pair that is associated based on the first natural language is stored In the device 71, the parallel word pair acquisition means 6 associates the word or word string that is the element of the second natural language sentence with the word or word string that is the element of the second natural language sentence, Bilingual word pairs associated with each other in the natural language are stored in the storage device 72, and the common bilingual word pair extraction means 8 extracts common bilingual word pairs that are the same bilingual word pairs in the storage devices 71 and 72. And stored in the storage device 9. Thereby, a stable parallel word pair can be automatically extracted from the first natural language sentence and the second natural language sentence which is a parallel translation thereof.
[0011]
Also, the morpheme analysis unit 2 performs morpheme analysis of the sentence input by the parallel translation reading unit 1, stores the morpheme analysis results in the storage devices 41 and 42, and further performs the syntax analysis by the syntax analysis unit 3 to perform the syntax analysis. The results are stored in the storage devices 51 and 52, and the translated word pair acquisition means 6 uses the morphological analysis results and the syntax analysis results to store the translated word pairs associated with each other based on the first natural language. 71, and the parallel translation word pair associated with the second natural language is stored in the storage device 72. The common translation word pair extraction means 8 uses the same word pair in the storage devices 71 and 72. A common bilingual word pair is extracted and stored in the storage device 9. Thereby, a common parallel translation word pair can be extracted more correctly.
[0012]
In addition, the word-by-language word extraction unit A divides the common bilingual word pair stored in the storage device 9 by the common bilingual word pair extraction unit 8 by language, and in addition to the bilingual sentence input by the bilingual sentence reading unit 1, The parallel word pair acquisition means 6 uses the word or word string that is the element of the first natural language sentence and the word or word string that is the element of the second natural language sentence. Thereby, the common parallel translation word pair extraction means 8 can extract more common parallel translation word pairs more accurately.
[0013]
Further, the means for repeating the learning of the parallel translation word pairs is performed until the common translation word pairs stored in the storage device 9 by the common translation word pair extraction means 8 no longer change. As a result, more common parallel word pairs can be extracted more accurately.
Therefore, by executing all the above methods and using the above means, a parallel word pair is automatically obtained, and a parallel word pair can be learned.
The above description describes the learning method of the parallel word pair, but the same applies to the parallel word pair learning device and the parallel word pair learning program.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described below with reference to the drawings.
FIG. 3 is a basic block configuration diagram of a bilingual word pair learning device according to an embodiment of the present invention. The bilingual word pair learning device shown in FIG. 1 includes a bilingual sentence reading unit 1, a morpheme analyzing unit 2, a syntax analyzing unit 3, a bilingual word pair acquiring unit 6, a common bilingual word pair extracting unit 8, a language-specific word extracting unit A, The storage devices 41, 42, 51, 52, 71, 72 and 9 are configured.
[0015]
The bilingual sentence reading unit 1 inputs (reads) the first natural language sentence stored in the bilingual corpus and the second natural language sentence that is the parallel translation.
The morpheme analysis unit 2 stores the result of morphological analysis of the input first natural language sentence in the storage device 41 and stores the result of morphological analysis of the input second natural language sentence in the storage device 42.
The syntax analysis unit 3 uses the morphological analysis result stored in the storage device 41 to analyze the syntax of the first natural language sentence, stores the result in the storage device 51, and stores the result in the storage device 42. The second natural language sentence is analyzed by using the morpheme analysis result, and the result is stored in the storage device 52.
[0016]
The parallel word pair acquisition unit 6 includes a parallel word pair acquisition unit 61 based on the first natural language and a parallel word pair acquisition unit 62 based on the second natural language.
The bilingual word pair acquisition unit 61 based on the first natural language, for example, translates the first natural language sentence by associating the result of syntactic analysis of the second natural language sentence with the word string that is the result of morphological analysis. The word pair is extracted and stored in the storage device 71.
The bilingual word pair acquisition unit 62 based on the second natural language, for example, translates the second natural language sentence by associating the syntactic analysis result of the first natural language sentence with the word string that is the result of the morphological analysis. The word pair is extracted and stored in the storage device 72.
The common bilingual word pair extraction unit 8 extracts the same bilingual word pair from the set of bilingual word pairs stored in the storage devices 71 and 72 and stores it in the storage device 9.
The word-specific word extraction unit A divides the common bilingual word pairs stored in the storage device 9 into words of the first natural language and the second natural language, and the bilingual corpus read by the bilingual sentence reading unit 1 Add to the translation.
[0017]
FIG. 4 is a flowchart for repeating learning according to an embodiment of the present invention.
Hereinafter, based on this flowchart, an embodiment in the case where the first natural language is English and the second natural language is Japanese will be described.
In step 101, the first natural language sentence and the second natural language sentence corresponding to the first natural language sentence are input to the parallel sentence reading unit 1. For example, as shown in Fig. 5, the translated text is "The house is somewhere about here", "Look about", and the translated Japanese text is "The house is somewhere around", " Suppose you include "Look around".
[0018]
In step 102, the parallel translation read in step 101 is subjected to morphological analysis by the morphological analysis unit 2, and the results are stored in the storage devices 41 and 42. For example, as shown in FIG. 6, it is assumed that the Japanese sentence is cut into words “that / house / has / where / somewhere / this / around / in / be” and “around / see / turn / turn”. In addition, the morpheme analysis unit 2 assigns parts of speech to each word for syntax analysis. For example, in English, as shown in FIG. 7, it is assumed that “Look” is given a verb, “about” is an adverb, and the part of speech shown in FIG. Also, for example, in Japanese, as shown in FIG. 8, “around” is given a noun, “wa” is a particle, “see” is a verb, “mawasu” is a verb, and the part of speech shown in FIG. Suppose that
In step 103, the syntax analysis unit 3 performs syntax analysis based on the morphological analysis results stored in the storage devices 41 and 42, and stores the results in the storage devices 51 and 52, respectively. For example, it is assumed that the results shown in FIGS. 7 and 9 are obtained for English and the results shown in FIGS. 8 and 10 are obtained for Japanese.
[0019]
In step 104 and step 105, using the morpheme analysis results and the syntax analysis results stored in the storage devices 41, 42, 51, 52, the translated word pairs based on the first natural language are used. The acquisition unit 61 stores the parallel word pairs based on the first natural language in the storage device 71, and the parallel word pair acquisition unit 62 based on the second natural language uses the second natural language as a basis. The translated word pairs are stored in the storage device 72.
For example, in the above example, as shown in Fig. 11, when syntactic analysis results of Japanese sentences are associated based on English sentences that have undergone morphological analysis, "the" and "the", "house" and "house", etc. Are associated with each other, and when the English syntax analysis result is associated based on the Japanese sentence that has been subjected to morphological analysis, it is assumed that "the" and "that", "house" and "house", etc. are associated. . Note that a blank indicates that there is no corresponding item. For example, “ka” in the case of associating Japanese sentence syntactic analysis results based on the English sentence in FIG. 11 indicates that there is no corresponding English word or word string. means. Further, in this example, the word order of the syntax analysis result is exchanged so as to match the word string as much as possible within a range not exceeding the intermediate node unit of the syntax analysis result. Here, the intermediate node of the parsing result indicates a grammatical category on the parsing result such as a verb, a noun, a verb phrase, and a noun phrase. The term “intermediate node unit” refers to a word string that belongs below the intermediate node (that is, the grammatical category) in the parsing result. For example, in the parsing result shown in FIG. 8, the noun phrase unit is lower than the noun phrase. Is a word string that belongs to, and refers to “around” and “to”, and the verb phrase unit refers to “see” and “turn”. The range that does not exceed the intermediate node unit means that it is limited to the range in the word string that is the intermediate node unit. For example, in the result of the parsing shown in FIG. 8, “around” and “to” in the range of the noun phrase unit are interchanged, or “around / to” of the entire noun phrase unit and “see / turn” of the entire verb phrase unit. However, because the particle `` O '' is a word in the range of a noun phrase, and the verb `` mi '' is a word in the range of a verb phrase, the particle `` You can't interchange the verb "see". Therefore, when associating the English sentence “look about” with the syntax analysis result of the Japanese sentence “Per / Oh / Look / Turn”, as shown in FIG. 11 (above), the noun phrase “Oh / O” and the verb phrase “ By changing the word order of “/ Mawasu”, “look” and “look” and “around” and “about” are associated with each other. These parallel translation word pairs are stored in the storage devices 71 and 72.
[0020]
In step 106, the common bilingual word pair extraction unit 8 extracts the same bilingual word pair from the bilingual word pairs stored in the storage device 71 and the bilingual word pairs stored in the storage device 72, and has already been stored in the storage device 9. If all are the same (yes), if they are not the same (no), it will be (no) .If yes is selected, the bilingual word pair learning will end. , No is selected, the process proceeds to the next step 107. In the above example, common bilingual word pairs extracted by the common bilingual word pair extraction unit 8 from the input sentence shown in FIG. 5 are, for example, “that”, “the”, “house”, “ house "etc. are extracted. Since nothing is stored in the storage device 9, no is selected and the process proceeds to the next step 107.
[0021]
In step 107, the parallel translation word pair that does not match the common translation word pair stored in the storage device 9 in step 106 is stored in the storage device 9. In the case of the above example, all the translated word pairs extracted in step 106 shown in FIG.
In step 108, the common bilingual word pairs stored in the storage device 9 are divided for each language and added to the input bilingual translations. That is, the common bilingual word pair is stored in the bilingual corpus. In the case of the above example, “the”, “house”, “is”, “about” are entered in the English sentences that are divided by language, and “that”, “house” ",""Ha" and "around" are added.
[0022]
In the above example, the process proceeds to step 101 again. Therefore, the above example will be further described step by step.
In step 101, both the original translated sentence and the translated word pair added in step 108 are input to the translated sentence reading unit 1.
In step 102, the morphological analysis unit 2 assigns the part of speech of the translated word pair input in step 101, and the result is stored in the storage devices 41 and 42.
In step 103, the morphological analysis result stored in the storage devices 41 and 42 is used for the bilingual word pair input in step 101 in the syntax analysis unit 3, and the result is stored in the storage devices 51 and 52. .
[0023]
In Step 104 and Step 105, the first sentence is stored in the storage devices 41, 42, 51, 52 using the morpheme result and the syntax analysis result of the word or word string of the input sentence and the common word pair. A bilingual word pair acquisition unit 61 based on the first natural language is stored in the storage device 71 by the bilingual word pair acquisition unit 61 based on the natural language, and a bilingual word pair acquisition unit 62 based on the second natural language. As a result, the translated word pairs based on the second natural language are stored in the storage device 72. In the case of the above example, since the common bilingual word pairs “around” and “about” are added to the input sentence (see FIG. 12), when the parsing result corresponds to the word string, “around” and “about” It is considered to be able to respond. For example, if the Japanese sentence “that / house / has / where / somewhere / this / around / n / a” is combined with the English sentence “The house is somewhere about here”, the verb from the parsing result shown in FIG. The phrase “al” and the adverb phrase “this / around / ni” are interchanged. Further, in the adverb phrase, the conjunction “this” and the particle “ni” are interchanged, and the word order is as shown in FIG. As a result, for example, correspondence with English sentences is as shown in FIG. 15, and “this” is associated with “here”. On the other hand, for the same example sentence, when the syntactic analysis result of English is combined with the Japanese sentence, the adverb “about” and the adverb “here” in the adverb phrase are switched from the parsing result shown in FIG. Word order. As a result, for example, correspondence with Japanese sentences is as shown in FIG. 15, and “here” and “this” are associated with each other. These bilingual word pairs shown in FIG. 15 are stored in the storage devices 71 and 72.
[0024]
In step 106, when the common translation word pair extraction unit 8 extracts the same translation word pairs from the translation word pairs stored in the storage device 71 and the translation word pairs stored in the storage device 72, in the above example, A common bilingual word pair of “this” and “here” is newly extracted. This does not match the common bilingual word pair of FIG. 12 currently stored in the storage device 9, so the determination at this step is no and the process proceeds to step 107.
[0025]
In step 107, “this” and “here” that do not match those in FIG. 12 stored in the storage device 9 are added, and the common bilingual word pairs shown in FIG. 16 are stored in the storage device 9.
In step 108, the common bilingual word pairs stored in the storage device 9 are divided for each language and added to the input bilingual translations.
Thereafter, the process further proceeds to step 101. However, when the common bilingual word pair does not change, the determination in step 106 becomes yes, and the bilingual word pair learning ends.
Although the above example has been described based on the configuration diagram shown in FIG. 3, the present invention is not limited to this example, and various modifications and applications can be made within the scope of the claims.
[0026]
The bilingual word pair learning apparatus of the present invention can be composed of a computer having a CPU, a memory, and the like, a user terminal, and a machine-readable recording medium such as a CD-ROM, a magnetic disk device, and a semiconductor memory.
A bilingual word pair learning program recorded on a recording medium or a bilingual word pair learning program transmitted via a communication line is read by a computer to control the operation of the computer. Implement each process.
[0027]
【The invention's effect】
As described above, according to the present invention, a bilingual word in which a first natural language sentence is associated with a second natural language sentence from the first natural language sentence and the second natural language sentence that is a parallel translation thereof. By comparing the pair and the second natural language sentence with the bilingual word pair associated with the first natural language sentence and extracting the common bilingual word pair, the first stable natural language and the first Two natural language bilingual word pairs can be extracted.
Also, by using the result of using the morphological analysis means and the syntax analysis means in the associating step, it is possible to extract the first natural language and second natural language parallel word pairs more accurately.
Further, by adding the bilingual word pairs extracted once to the input bilingual sentence, more bilingual word pairs of the first natural language and the second natural language can be extracted.
Further, by repeating the learning of the above-described bilingual word pairs until the extracted bilingual word pairs no longer change, the bilingual word pairs of the first natural language and the second natural language can be extracted more accurately.
The bilingual word pairs extracted in this way can be used for, for example, construction or expansion of an electronic bilingual dictionary, or can be used as a bilingual dictionary of a machine translation system.
[Brief description of the drawings]
FIG. 1 is a schematic flowchart of a bilingual word pair learning method for explaining the principle of the present invention;
FIG. 2 is a schematic configuration diagram of a bilingual word pair learning apparatus according to the present invention.
FIG. 3 is a basic block configuration diagram of a bilingual word pair learning apparatus according to an embodiment of the present invention.
FIG. 4 is a flowchart for repeating learning according to an embodiment of the present invention.
FIG. 5 is a diagram showing an example of an input parallel translation sentence according to an embodiment of the present invention.
FIG. 6 is a diagram showing an example of a Japanese sentence morphological analysis result (word cut) according to an embodiment of the present invention.
FIG. 7 is a diagram showing an example (part 1) of the English syntax analysis result according to the embodiment of the present invention;
FIG. 8 is a diagram showing an example (part 1) of a Japanese sentence syntactic analysis result according to an embodiment of the present invention;
FIG. 9 is a diagram showing an example (part 2) of the English syntax analysis result according to the embodiment of the present invention;
FIG. 10 is a diagram illustrating an example (part 2) of a Japanese sentence syntactic analysis result according to an embodiment of the present invention;
FIG. 11 is a diagram showing an example (part 1) of the word correspondence result according to the embodiment of the present invention.
FIG. 12 is a diagram showing an example (part 1) of common word correspondence according to an embodiment of the present invention.
FIG. 13 is a diagram showing an example of exchanging the word order of the Japanese sentence syntactic analysis results according to an embodiment of the present invention.
FIG. 14 is a diagram showing an example in which the word order of the English syntax analysis result is switched according to an embodiment of the present invention.
FIG. 15 is a diagram showing an example (part 2) of the word correspondence result according to the embodiment of the present invention.
FIG. 16 is a diagram showing an example (part 2) of common word correspondence that is an embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Bilingual sentence reading part, 2 ... Morphological analysis part, 3 ... Syntax analysis part, 6 ... Parallel word pair acquisition part, 8 ... Common parallel translation word pair extraction part, 9 ... Common bilingual word pair, 71 ... bilingual word pair storage device associated with the first natural language, 72 ... bilingual word pair storage device associated with the second natural language, A ...・ Language extraction unit

Claims

In a bilingual word pair learning method for extracting a word pair having a bilingual relationship with respect to a first natural language sentence stored in a bilingual corpus separately for each natural language sentence and a second natural language sentence that is a bilingual sentence,
First of translation word pair acquisition means based on natural language, the translation of the first natural language sentence stored in corpus element at a word or word string a second stored in the bilingual corpus Generating and storing a bilingual word pair that is a pair of a word or a word string in a bilingual relationship obtained in association with a word or word string that is an element of a natural language sentence;
Bilingual word pairs acquisition means based on a second natural language, said second natural language text which is stored in the bilingual corpus element in which the word or word string a first stored in the bilingual corpus A step 2 of generating and storing a bilingual word pair that is a pair of a word or a word string in a bilingual relationship obtained in association with a word or word string that is an element of a natural language sentence;
The common bilingual word pair extraction means compares the bilingual word pairs stored in the procedure 1 and the procedure 2, extracts common word pairs that are the same bilingual word pairs, and the extracted common word pairs are already common. The bilingual word pair extraction means determines whether or not all the common bilingual word pairs stored in the storage device (9) match, and if it does not match all the common bilingual word pairs, Store in the storage device (9) as a word pair, and if it is determined that all match, step 3 ends the process;
The language-specific word extracting means converts the common bilingual word pairs stored in the storage device (9) into words or word strings that are elements of the first natural language sentence and words that are elements of the second natural language sentence or A method of learning bilingual word pairs, comprising: a step 4 of dividing into word strings and storing in the bilingual corpus and returning to the step 1.

In the learning method of the parallel translation word pair of Claim 1,
In the steps 1 and 2, the bilingual word pair acquisition unit based on the first natural language and the bilingual word pair acquisition unit based on the second natural language parse the inputted natural language sentence. A bilingual word pair learning method characterized by associating a syntax tree obtained by analysis with a word or a word string.

In a bilingual word pair learning device that extracts a word pair having a bilingual relationship with respect to a first natural language sentence stored in a bilingual corpus separately for each natural language sentence and a second natural language sentence that is a bilingual sentence,
Obtained in association with the first word or word string is an element of the second natural language sentence the word or word string is an element of the natural language text is stored in the bilingual corpus stored in the bilingual corpus A bilingual word pair acquisition means based on a first natural language for generating and storing a bilingual word pair that is a pair of a word or a word string in a translated relation;
Obtained in association with the first word or word string is an element of the natural language text which is stored the element word or word string which is the second natural language text which is stored in the bilingual corpus to the bilingual corpus A bilingual word pair acquisition means based on a second natural language for generating and storing a bilingual word pair that is a pair of words or word strings having a bilingual relationship,
Comparing the bilingual word pair stored by the bilingual word pair acquisition means based on the first natural language and the bilingual word pair stored by the bilingual word pair acquisition means based on the second natural language; same or bilingual word extracting a is common word pair pairs, the extracted common word pairs, already consistent with all common translation word pairs stored in the storage device (9) by a common translation word pair extracting means If not, the common word pair that does not match is stored in the storage device (9) as a common bilingual word pair, and the process ends if it is determined that all match. Common bilingual word pair extraction means;
The common bilingual word pair stored in the storage device (9) by the common bilingual word pair extraction means is a word or word string that is an element of the first natural language sentence and a word that is an element of the second natural language sentence. Alternatively, it is stored in the bilingual corpus divided into word strings, and bilingual word pair acquisition means based on the first natural language and bilingual word pair acquisition means based on the second natural language and common bilingual word pair extraction means And a word-by-language word extraction means for operating the translation word pair learning apparatus.

The parallel translation word pair learning device according to claim 3,
The bilingual word pair acquisition means based on the first and second natural languages associates a syntax tree obtained by parsing the input natural language sentence with a word or a word string, Learning device.

A computer-readable recording medium storing a program for causing a computer to execute each process of the bilingual word pair learning device according to claim 3.