JP3766406B2

JP3766406B2 - Machine translation device

Info

Publication number: JP3766406B2
Application number: JP2003200944A
Authority: JP
Inventors: 裕美子吉村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-07-24
Filing date: 2003-07-24
Publication date: 2006-04-12
Anticipated expiration: 2023-07-24
Also published as: JP2005044020A

Description

【０００１】
【発明の属する技術分野】
本発明は、第１の言語文を第２の言語文に翻訳する機械翻訳装置に関する。
【０００２】
【従来の技術】
近年、社会のグローバル化に伴い、第１の言語文（原文）を第２の言語文に自動翻訳する機械翻訳技術への関心が高まっており、インターネット上で提供する翻訳サービスやコンピュータに搭載して翻訳処理を行う機械翻訳ソフトウエア製品が数多く登場してきている。
【０００３】
ところで、機械翻訳にとって最も重要な点は翻訳の精度であると言える。年々、コンピュータ性能の向上、翻訳技術の向上、さらには翻訳知識の蓄積等により、翻訳の高精度化が進んできているが、自然言語の表現は極めて多様であることから、単に翻訳技術の向上や翻訳知識の蓄積等だけではユーザにとって満足できる翻訳ができないのが現状である。
【０００４】
そこで、以上のような翻訳精度上の問題を克服するために、従来の枠組みとは視点を変えた全く新たな枠組みのもとに翻訳処理を行うパターン翻訳技術が提案されている。
【０００５】
その１つは、対訳用例集や過去の翻訳結果などを格納する基本用例データベースから所要とする文を取り出して当該文の一部を変数とする穴明き例文を作成して変数用例データベースに登録し、入力される第１の言語文と変数用例データベースの穴明き例文とを照合し翻訳処理を行う翻訳方式が実現されている（特許文献１）。
【０００６】
他の１つは、定数部分（穴明き例文のうち、変数として指定しない部分）が同一の複数の翻訳パターンが存在するとき、より望ましいパターンを選択可能とした翻訳装置が実現されている（特許文献２）。この翻訳装置は、具体的には、変数部分に句構造に関する情報をもたせることにより、第１言語の入力文がどのパターンの変数部分の構造情報に近いかを測定し、その測定値から望ましい翻訳パターンを選択する翻訳方式である。例えば変数相当部分が動詞句であるか、名詞句であるか、単純な名詞句であるか、さらには連体修飾節をもつ名詞句であるかの条件を付け、入力文の構造と等価の構造を条件付けした翻訳パターンを選択する方式である。
【０００７】
しかし、以上のようなパターン翻訳技術を用いた機械翻訳装置では、ユーザが翻訳パターンの蓄積を進めていくにつれ、ユーザによる変数の与え方により、一つの入力文が複数の翻訳パターンに当てはまるケースが避けられない。その理由は、一般ユーザが変数の条件として指定可能な情報には、次に述べるように限りがある為である。
【０００８】
＜ユーザが指定可能な情報の例＞
変数部分の文法的カテゴリー情報、句構造情報、表層単語の文字列、意味情報（特定の選択肢から選択する）等である。
【０００９】
ここで、意味情報といってもユーザが指定できるのは、「人」、「場所」、「動作」、…などの分別容易な大区分に属する程度のものであり、ユーザにそれ以上の詳細な情報を指定させるのは大きな負担をかけるものであり、またユーザ自身が適切な情報を選択し指定できるとは限らない。
【００１０】
次に、１つの入力文に対して、複数の翻訳パターンが当てはまる例について説明する。なお、Ｐ１、Ｐ２は英日翻訳用の翻訳パターンを示し、Ｐ３、Ｐ４は日英翻訳用の翻訳パターンを示す。Eは英語用パターン、Jは日本語用パターンである。これらパターン中の$１、$２、$３は変数部分を示している。また、ここでは説明簡略化のために変数の条件は省略しているが、全て名詞句を変数とする例である。
【００１１】
＜変数範囲は異なるが、１つの入力文が複数のパターンに当てはまる例＞
（Ｐ１） E：$１ introduces you to $２．
J：$１は、$２の入門書です。
【００１２】
（元となる原文：This book introduces you to UNIX．）
（Ｐ２） E：$１ introduces $２ to $３．
J：$１では、$２に$３を紹介しています。
【００１３】
（元となる原文：Our center introduces foreign students to japanese culture．）
＜定数部分を同じくする同一パターンを複数登録してしまう例＞
（Ｐ３）Ｊ：$１は、正式には$２という。
【００１４】
E：The official name for $１ is $２．
（元となる原文：中国は、正式には中華人民共和国という。）
（Ｐ４）Ｊ：$１は、正式には$２という。
【００１５】
E：$１ is technically called $２．
（元となる原文：この物質は、正式には脂肪親和性アルカロイドという。）従って、以上のように複数の翻訳パターンが候補として挙がってきた場合、これら候補間の優先度に従って選択する。一例を述べれば、以下のような優先度の基準に従って調整している。
【００１６】
（１）定数部分の文字列の長いものを優先する。
【００１７】
（２）前記（１）で決まらない場合、定数の文字列の数が少ないものを優先する。
【００１８】
（３）前記（２）で決まらない場合、最長の定数どうしを比較して長い方を優先する。
【００１９】
（４）前記（３）で決まらない場合、時系列的に後に登録された方を優先する。
【００２０】
【特許文献１】
特開平０６−６８１３４号
【００２１】
【特許文献２】
特開平０６−２９０２１０号
【００２２】
【発明が解決しようとする課題】
従って、以上のようなパターン翻訳技術を用いた翻訳方式及び翻訳装置では、何れも１つの入力原文に対して複数の翻訳パターン候補が挙がってきた場合、何れも適用条件を満たすことになり、最も望ましい翻訳パターンを選択することができない問題がある。
【００２３】
また、ユーザが翻訳パターンを登録する際、常に過去の登録内容まで意識しながら登録しなければ似通った数多くの翻訳パターンを存在させることになり、益々ユーザの負担が大きくなる。
【００２４】
本発明は上記事情にかんがみてなされたもので、翻訳パターンの利用効率を高めるとともに、原文に対して最も相応しい翻訳パターンを選択し精度の高い翻訳を実現する機械翻訳装置を提供することを目的とする。
【００２５】
また、本発明の他の目的は、翻訳パターンを利用して第１の言語文を第２の言語文に翻訳するに際し、ユーザの負担を大幅に軽減する機械翻訳装置を提供することにある。
【００２６】
【課題を解決するための手段】
（１）上記課題を解決するために、本発明に係る機械翻訳装置は、第１言語文を第２言語文に翻訳するために必要な知識情報の他、変更可能な変数部分と固定の表現形式である定数部分とで構成される文字列パターン、条件及び参照原文をそれぞれ対応付けた複数の翻訳パターンが記憶された翻訳辞書部と、入力部から入力される前記第１言語文である入力原文及び翻訳処理指示を判断し、翻訳処理開始命令を出力する制御手段と、この制御手段から出力される翻訳処理命令に基づき、前記知識情報のもとに入力原文を構成する各単語の翻訳処理に必要な各種情報を取得する手段と、前記入力原文と前記各翻訳パターンの文字列とを比較照合し、翻訳パターン候補を抽出する手段と、この抽出された各翻訳パターン候補の各変数部分が前記各種情報のもとに該当翻訳パターンの条件に適合するかをチェックする変数条件チェック手段と、この変数条件チェック手段により条件に適合する複数の翻訳パターンが存在する場合、前記入力原文の変数相当部分と各翻訳パターンの参照原文の変数相当部分との類似度を算出し、類似度の最も高い翻訳パターンを選択する類似度算出手段とを設け、この選択された翻訳パターンを適用して第２言語文に翻訳する構成である。
【００２７】
本発明は以上のような構成とすることにより、入力原文の文字列とパターン辞書部の複数の翻訳パターンの原文側定数部分の文字列とを比較参照し、翻訳パターン候補を検索する。この検索によって複数の翻訳パターン候補が存在する場合、各翻訳パターン候補の変数部分がパターン辞書部の変数条件に適合するか否かを判断する。適合すると判断した時、入力原文に対するパターン辞書部の各翻訳パターンの参照原文を用いて、翻訳パターン適用候補間の類似度を算出し、最も類似度の高い翻訳パターンを用いて、入力原文を翻訳するので、複数の翻訳パターン候補が存在した場合でも、より適切な翻訳パターンを選択することが可能となる。また、最適な翻訳パターンを選択するに際し、ユーザが従来のように過去の登録パターン全体を意識しながら選択する必要が無くなり、ユーザの負担が大幅に軽減することが可能となる。なお、前述する翻訳パターン候補の検索処理と変数条件の適合処理は並行的に処理することも可能である。
【００２８】
なお、翻訳パターンが適用して得られる翻訳結果に対し、入力部から入力される翻訳結果を確定する指示を検出するとか、或いは翻訳結果の文から後方の翻訳対象文にカーソルが移動した際に翻訳確定したことを検出する確定検出手段と、この確定検出手段により翻訳確定と判断された場合、入力原文を、前記適用された翻訳パターンの参照原文に対応付けて記憶するパターン登録手段とをさらに追加すれば、入力原文の翻訳を繰り返すごとに各翻訳パターンの参照原文を蓄積するので、次以降の入力原文の翻訳に対し、最適な翻訳パターンを検索でき、より精度の高い翻訳処理が可能となる。
【００３１】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
【００３２】
図１は本発明に係る機械翻訳装置の第１の実施の形態を示す構成図である。
【００３３】
この機械翻訳装置は、翻訳対象となる第１の言語文（以下、入力原文、翻訳対象文と呼ぶ）や各種コマンドを入力する入力手段としての入力部１と、この入力部１から入力される入力原文や各種コマンドを判断し、各構成要素に必要な命令を出力する処理制御部２と、第１言語文を第２言語文に翻訳処理するための各種の規則や辞書を格納する翻訳辞書部３と、処理制御部２から翻訳命令を受けたとき、翻訳辞書部３に格納される各種の規則、辞書を用いて、第１言語文を第２言語文に翻訳する翻訳処理部４と、出力部５とによって構成されている。
【００３４】
なお、処理制御部２及び翻訳処理部４がＣＰＵで構成されている場合、処理制御部２に各構成要素を統括制御するためのプログラムを記録する記録媒体６、また翻訳処理部４に翻訳処理に関する処理手順を規定する翻訳処理用プログラムを記録する記録媒体７を用いて、それぞれ一連の処理を実行することも可能である。
【００３５】
前記入力部１は、一般にキーボード、マウスなどが用いられ、前述するように各種のコマンドを入力する他、キーボードの入力操作によって作成される入力原文の入力、マウスによる文書中の特定領域の指定により選択される入力原文の入力の他、印刷又は手書きの文書を読み取るＯＣＲ（Optical character Rerder）による読み取りデータの入力、フロッピーディスク、磁気テープ、磁気ディスクなどに保存される入力原文の入力、さらにはインターネット上から取り込んだ入力原文の入力、或いはディクテーション・ソフトウエア（Speech Dictation Software）を用いて、マイクから入力される会話文を自然言語の文字列に変換して得られる入力原文の入力などがある。すなわち、この入力部１としては、一般的な入力原文の入力だけでなく、異なる種々の入力形態による入力原文の入力を含むものである。
【００３６】
処理制御部２は、入力部１から入力される各種コマンドの指示内容を判断する指示判断手段２ａ及びこの指示判断手段２ａによる判断結果に従って所要とする処理命令を出力する処理制御手段２ｂが設けられている。
【００３７】
翻訳辞書部３には、語彙部３ａ、形態素解析規則部３ｂ、構文・意味解析規則部３ｃ、言語変換に関係する規則を有する変換規則部３ｄ、構文生成規則部３ｅ、形態素生成規則部３ｆ、パターン辞書部３ｇ及び概念辞書部３ｈその他翻訳処理に必要な規則，辞書が格納されている。なお、パターン辞書部３ｇ及び概念辞書部３ｈを除く他の規則等３ａ〜３ｆは、入力原文を第２言語文に翻訳処理するために使用される知識情報であると言える。
【００３８】
翻訳処理部４は、入力原文に対し、翻訳辞書部４の中の語彙部３ａ及び形態素解析規則部３ｂを用い、入力原文を構成する単語の全ての品詞、活用、意味情報他の種々の属性の候補をリストアップした後、さらにパターン辞書部３ｇの中の各翻訳パターンの原文側パターンと順次照合し、原文側パターン中の定数文字列が適合する翻訳パターンの候補を抽出するパターン照合抽出手段４ａと、このパターン照合抽出手段４ａによって抽出された翻訳パターン候補に存在する変数部分が予め定める翻訳パターンの条件に適合するかをチェックする変数条件チェック手段４ｂと、入力原文とそれぞれの翻訳パターン候補の参照原文とを比較し類似度を算出する類似度算出手段４ｃと、この類似度算出手段４ｃで算出された類似度が最も高い翻訳パターン候補を選択し、翻訳辞書部３の知識情報を用いて翻訳処理を行う訳文生成処理手段４ｄとが設けられている。
【００３９】
前記出力部５は、翻訳処理部４の出力である翻訳結果を出力したり、入力部１から入力される各種の指示に対する処理制御部２からの応答を表示する機能を有するものであって、通常，各種ディスプレイなどの表示手段が用いられるが、その他、例えばプリンタなどの印字手段、或いはフロッピーディスク、磁気テープ、磁気ディスクへの書き込み登録手段、さらには他のメディアに対して送信する送信手段その他ユーザの所望する各種の出力形態が挙げられる。
【００４０】
図２は、本実施の形態で用いられるパターン辞書部３ｇに記憶される翻訳パターンの一例を示す図である。
【００４１】
このパターン辞書部３ｇの各翻訳パターンは、次に述べるように５種類の情報を１単位として記憶されている。すなわち、各々の翻訳パターンは、
１．原文側パターン
２．訳文側パターン
３．原文側条件
４．訳文側条件
５．参照原文
によって構成されている。
【００４２】
この原文側パターン及び訳文側パターンの中に挿入されている$１，$２，…は変数部分を示し、その他の文字列は定数部分を示している。変数ごとに「$」の次の番号「１」、「２」を変えることで、原文側パターンの変数部分が訳文側パターン中のどの変数部分に対応するかを示している。
【００４３】
一方、原文側条件は、入力原文と照合する際、変数相当部分の文字列が満たすべき条件を指定する部分であって、図２に示す例では、変数相当部分の文字列の構文上のカテゴリーが何であるかを指定する。因みに、図２では、NPは名詞句、VPは動詞句であるが、その他、例えば原文中の語の表層文字列、品詞、意味情報などを指定できる。また、訳文側条件は、訳文生成時に変数部分の句をどういう条件で出力したいかを指定する部分であって、図２に示す例では、第２変数の動詞句は訳文中では節（clause）の形態で生成することを指定している。その他、例えば数情報、時制、他の句との数の呼応などを指定できる。なお、条件で何を指定するかは、本発明で限定するものでなく、趣旨を逸脱しない範囲で自由に設定できる。
【００４４】
さらに、参照原文は、当該翻訳パターンを登録した時に登録する入力原文を格納する部分である。
【００４５】
次に、以上のような機械翻訳装置のうち、特に処理制御部２におけるユーザとの対話的な処理の一例について図３を参照して説明する。なお、ユーザとの対話的な処理は、ハード的な論理回路構成によって実行することも可能であるが、ここでは例えば記録媒体６に格納されるプログラムに従って一連の処理を実行する例について説明する。
【００４６】
装置の動作が開始すると、処理制御部２は、記録媒体６に格納されるプログラムに従い、入力部１から入力原文が入力されたか否かを判断し（Ｓ１１）、入力原文が入力されたと判断された場合には図示されていない適宜なメモリに一時格納した後、出力部５に表示命令を出し、入力原文を表示する（Ｓ１２）。
【００４７】
ここで、処理制御部２は、入力原文を表示した後、またはステップＳ１１で入力原文が入力されていないと判断された場合、入力部１から翻訳指示命令が入力されたか否かを判断し（Ｓ１３）、翻訳指示が入力されている場合にはメモリに一時格納されている入力原文を翻訳処理部４に送り、翻訳処理開始命令を送る。この翻訳処理部４は、処理制御部２からの翻訳処理開始命令に基づき、翻訳辞書部３の規則・辞書等を用いて、入力原文を第２言語文に翻訳する翻訳処理を実行し（Ｓ１４）、その翻訳結果を出力部５に表示する（Ｓ１５）。なお、翻訳処理部４による翻訳処理の一連の処理例は後記する。
【００４８】
ステップＳ１３において、翻訳指示命令が入力されていないと判断された場合、引き続き、入力部１からパターン登録を起動する命令が入力されたか否かを判断し（Ｓ１６）、パターン登録起動命令が入力されている場合には後記するパターン登録部８にパターン登録処理の起動を指示する。このパターン登録部８は、ユーザの指示に従ってパターン登録処理を実行する（Ｓ１７）。
【００４９】
なお、入力部１から全体の処理終了の指示が入力された場合、全ての処理を終了する（Ｓ１８）。また、処理制御部２は、入力部１から以上の指示命令以外の指示命令が入力される場合には、その指示命令内容を判断し（Ｓ１９）、その指示命令に従った処理を行う（Ｓ２０）。
【００５０】
従って、処理制御部２は、機能的には、以上の処理のうち、入力部１から入力される各種の指示を判断するステップＳ１１、Ｓ１３、Ｓ１６、Ｓ１８、Ｓ１９等が指示判断手段２ａに相当し、それ以外のステップ処理が処理制御手段２ｂに相当する。
【００５１】
なお、終了指示命令の判断はその他の指示命令の判断の前で行っているが、その他の指示命令の判断の後で行ってもよく、さらに他の処理順序についても特に限定するものではない。
【００５２】
次に、以上のような一連の処理のうち、処理制御部２が翻訳処理指示であると判断し（Ｓ１３）、翻訳処理開始命令を送出したときの翻訳処理部４の翻訳処理（Ｓ１４）について図４を参照して説明する。なお、図４は入力部１から１文の入力原文（第１言語文）が入力された後、第２言語文に翻訳し出力するまでの一連の処理例を示す図である。
【００５３】
翻訳処理部４は、処理制御部２から入力原文とともに翻訳処理開始命令を受けたとき、記録媒体７に記録される図４に示す翻訳処理用プログラムに従い、パターン照合抽出手段４ａを実行する。このパターン照合抽出手段４ａは、入力原文に対し、翻訳辞書部４の中の語彙部３ａ及び形態素解析規則部３ｂを用い、形態素解析・辞書引き処理を行い、当該入力原文の品詞、活用、意味情報他の翻訳処理に必要な各種情報を取得する（Ｓ２１）。従って、この処理では、入力原文を構成する単語の全ての品詞、活用、意味情報等の候補をリストアップできる。
【００５４】
以上のようにして入力原文を構成する単語の品詞、活用の候補をリストアップした後、パターン辞書部３ｇを用いて、入力原文に対して各翻訳パターンの原文側パターンと順次照合し、原文側パターン中の定数文字列が適合する翻訳パターンの候補を全て抽出し、これら抽出した翻訳パターンの番号［例えばN０．１、N０．３、…］をパターン候補格納レジスタPcand［ｎ］（図示せず）に順次配列し記録する（Ｓ２２）。なお、入力原文に対して、各翻訳パターンの原文側パターンを照合した結果、翻訳パターン候補が存在しない場合、パターン候補格納レジスタPcand［０］は空の状態となる。この段階では、変数部分の条件の適合性評価は行わない。なお、これらステップＳ２１，Ｓ２２はパターン照合抽出機能、パターン照合ステップに相当する。この後、抽出した翻訳パターンの配列順位を表す配列カウンタｎに０をセットする（Ｓ２３）。
【００５５】
しかる後、ステップＳ２４に移行し、翻訳パターン候補を格納するパターン候補格納レジスタPcand［ｎ］の格納内容から未解析パターン候補が存在するか否かを判断する。ここで、カウンタｎ＝０であり、かつ、格納レジスタPcand［ｎ］の該当エリアが空である場合、つまり翻訳パターン候補が存在しない場合、構文・意味解析規則部３ｃを用いて、翻訳パターンを使用しない通常の構文・意味解析処理を実行し（Ｓ２５）、さらに後記するステップＳ３６〜Ｓ３８による一般的な翻訳処理を実行する。これらステップＳ２５、Ｓ３６〜Ｓ３８は訳文生成処理手段、訳文生成処理機能、訳文生成処理ステップに相当する。
【００５６】
また、翻訳処理部４は、ステップＳ２４において、ｎが０でなく、かつ、レジスタPcand［ｎ］が空でない場合、変数条件チェック手段４ｂを実行する。すなわち、この変数条件チェック手段４ｂは、レジスタPcand［ｎ］が空でない場合、当該未解析の翻訳パターン候補が存在すると判断し、当該翻訳パターン候補に存在する変数部分の数だけ図２に示す変数条件等の適合性チェックを行う必要があるので、最初に変数番号ｍに１をセットし（Ｓ２６）、第１の変数部分から順次変数解析が行われたか否かを判断する（Ｓ２７）。つまり、ステップＳ２７において、ｍ番目の変数に対する適合性チェックの解析が行われたかを判断し、未解析である場合には該当変数相当部分の入力原文の文字列に関し、構文・意味解析規則部３ｃを用いて、構文・意味解析処理を実行し（Ｓ２８）、入力原文の変数相当部分の構造、意味情報その他様々な文法情報を取得する。そして、取得された前述する各種情報と当該変数の原文側条件の記載内容とを照合し、条件に適合しているか否かを判断する（Ｓ２９）。ここで、変数の原文側条件に適合している場合、変数番号ｍに＋１をインクリメントした後（Ｓ３０）、ステップＳ２７に戻り、次の変数部分について同様の処理を繰り返し実行する。
【００５７】
ところで、前述するステップＳ２７において、ある特定の翻訳パターンに対する未解析変数がないと判断された場合、すなわち既に全ての変数部分について条件等のチェックが行われている場合、定数部分と変数部分との構造を繋ぎ合わせて一つの構造文とする（Ｓ３１）。
【００５８】
なお、図５は定数部分と変数部分との構造を連結させた構造連結図である。この図では、入力原文中の生起順に変数部分と定数部分との構造を連結したものであって、「（パターン属性Ｖ１）」は１番目の変数相当、「（パターン属性Ｖ２）」は２番目の変数相当の構造であることを示し、「（パターン属性Ｃ１）」は１番目の定数要素、「（パターン属性Ｃ２」は２目の定数要素であることを示している。
【００５９】
以上のようにして変数部分と定数部分の構造を連結した連結構造を作り上げた後、翻訳パターン候補の配列順位を表すカウンタｎに＋１をインクリメントし（Ｓ３２）、ステップＳ２４に戻り、同様の処理を繰り返し実行する。
【００６０】
また、ステップＳ２９において、前述する取得された各種情報と当該変数の原文側条件の記載内容とを照合し、変数条件に適合していないと判断された場合、当該翻訳パターンは不適合であることから棄却し、前述同様にステップＳ３２を経て次の翻訳パターン候補の処理（Ｓ２４以降）に進んでいく。
【００６１】
これら一連の処理ステップＳ２４、Ｓ２６〜Ｓ３２は変数条件チェック機能ないし変数条件チェックステップに相当する。
【００６２】
翻訳処理部４は、以上のようにして全ての翻訳パターン候補について変数の条件チェック、変数部分と定数部分との構造連結等が完了し、ステップＳ２４にて未解析パターンがもう残っていないと判断されたとき、類似度算出手段４ｃを実行する。
【００６３】
この類似度算出手段４ｃでは、ステップＳ２４にて変数部分の未解析なしと判断されたとき、次のステップＳ３３に移行し、１つの入力原文に対し、変数の条件チェックに適合した翻訳パターン候補が幾つ残ったかを判断する。この判断の結果、一つの翻訳パターンだけ残っている場合と未だ複数の翻訳パターン候補が残っている場合とがある。前者の場合には、ステップＳ３６に移行し、該当翻訳パターンに基づいて入力原文である第１言語の構造を第２言語の構造に変換処理するが、翻訳パターン候補が複数存在する場合には、入力原文とそれぞれの翻訳パターン候補の参照原文（図２参照）の変数相当部分との類似度を算出する（Ｓ３４：類似度算出機能、類似度算出ステップ）。なお、類似度算出処理については図８を用いて後記する。
【００６４】
以上のようにして類似度を算出すると、その算出された類似度の中から類似度が最も高い翻訳パターン候補を選択し（Ｓ３５）、変換処理に入る（Ｓ３６）。この変換処理は、前述するようにパターン辞書部３ｇと変換規則部３ｄとを用いて、入力原文の構造を第２言語の構造に置き換える処理を行う。具体的には、変換処理への入力構造が翻訳パターン適用結果のものであれば、パターン辞書部３ｇ中の該当する翻訳パターンの訳文側パターン情報を用い、定数部分を訳文側パターンに従って置き換え、各構成要素の訳文中の順番を参照して構造を作り換えていく処理を行う。図６はその置き換え処理結果の一例を示す図である。従って、この段階では、変数部分の構造は第１言語の解析結果のままである。引き続き、翻訳処理部３の変換規則部３ｄを用いて、変数部分の部分構造を第２言語の構造に置き換える。図７はその置き換え結果の一例を示す図である。
【００６５】
しかる後、前記変換結果に対し、構文生成規則部３ｅを用いて、第２言語の１次元的単語列を生成する処理を行う（Ｓ３７）。さらに、形態素生成規則部３ｆを用い、最終的な語の活用などを施し、入力原文に対する第２言語の訳文を生成し出力する（Ｓ３８）。これらステップＳＳ３５〜Ｓ３８は訳文生成機能ないし訳文生成ステップに相当する。
【００６６】
次に、翻訳パターン候補間の類似度算出処理例について図８を参照して説明する。
先ず、最初に翻訳パターン候補のうち、最初の候補を着目パターンとするためにカウンタｎ＝1をセットする（Ｓ４１）。このｎは着目パターンが候補中の何番目かを表す変数である。このステップＳ４１にてｎ＝１をセットした後、未処理の翻訳パターン候補が有るか否かを判断し（Ｓ４２）、未処理翻訳パターン候補有りと判断された場合には当該翻訳パターンの類似度を格納するレジスタD［ｎ］（図示せず）を０に初期化するとともに、着目変数のカウンタｍ（図示せず）に１をセットする（Ｓ４３）。これは第１番目の変数を処理することを意味する。
【００６７】
引き続き、未処理の変数が残っているか否かを判断し（Ｓ４４）、ここで未処理の変数が存在する場合には着目変数部分に相当する入力原文中の文字列と、当該翻訳パターンの参照原文中の変数相当部分の文字列とを比較し、類似度を算出する（Ｓ４５）。この類似度の具体的な算出法は後記するが、この類似度算出の結果、小さい数値ほど類似度が高くなるように構成する。この算出された類似度の数値は類似度を格納するレジスタD［ｎ］に加算する（Ｓ４６）。一つの変数に対する類似度が算出されると、着目変数カウンタｍに＋1をインクリメントし（Ｓ４７）、ステップＳ４４に戻り、次の変数に対する類似度算出処理を行う。そして、全変数の処理が終了すると、未処理変数なし、すなわち当該翻訳パターン候補の処理が終了したと判断し、カウンタｎに＋１をインクリメントし（Ｓ４８）、ステップＳ４２に移行し、次の未処理翻訳パターンが有るか否かを判断する。ここで、全ての翻訳パターン候補の処理が終了している場合、未処理パターンなしと判断され、ステップＳ４９に移行する。ここでは、レジスタD［ｎ］に格納されている各翻訳パターン候補の類似度と比較し、最も数値の小さいものが類似度の高い翻訳パターンとし、ステップＳ３５に渡し、類似度算出に関する全処理を終了する。
【００６８】
＊類似度の具体的な算出法について。
【００６９】
この類似度算出処理は、具体的には翻訳辞書部３の概念辞書部３ｈを用いて、類似度を算出する。この概念辞書部３ｈの概念辞書は、個々の概念どうしの関係が例えば樹状に構築され、各概念どうしはアークと呼ばれる関係名で結ばれている。各概念においては、次の要素から構成されている。
【００７０】
＊概念名例：国
＊概念ID 例：００１ｆ２２９ｂ
＊同義語セット例：｛日本、中国、中華人民共和国、韓国、アメリカ、アメリカ合衆国、米国、…｝
例えば「国名」という概念名は、概念IDとして００１ｆ２２９ｂが割り当てられ、その要素としては、「日本」、「中国」、「中華人民共和国」、「韓国」、「アメリカ」、「アメリカ合衆国」…といった具体的な語句が定義されている。なお、概念IDとしては、表層語そのものを定義するのではなく、語彙部３ａの中の各登録語に付与される見出し語IDのセットを定義しておくのでもよく、また語彙部３ａ中の個々の語から概念辞書の検索が容易なように、語彙部３ａ中の各見出し語にそれぞれ概念IDを付加させておいてもよい。
【００７１】
次に、概念どうしを結合する関係名の主な事項を挙げると、例えば
＊上位・下位関係（IS−A）
＊反意関係（ANT）などがある。
【００７２】
また、上記の例では、概念に関し、説明を簡易にするために、名詞的な事項を挙げているが、形容詞的な概念、動詞的な概念なども概念辞書部３ｈに関係付けることも有効である。
【００７３】
図９は本実施の形態で用いられる概念辞書の一部を図式的に表した図である。なお、図中の○は個々の概念を表し、これら概念どうしを結ぶ線はアークと呼ぶ。また、ここでは、便宜的に概念名だけを挙げており、概念の一部は省略されている。さらに、この模式図では、便宜的には、「上位・下位関係」を表すアークを例に挙げている。
【００７４】
この実施の形態は、基本的には、樹状構造中の特定の２つの概念間の隔たりをアーク名と関連付けて係数化し、類似度の数値化に変換するものである。具体的には、樹状構造上において、特定の概念から別の特定の概念まで辿るとき、「IS−A」を辿るごとに「１」、「ANT」を辿るごとに「２」を類似度格納レジスタD［ｎ］に加算するといった簡易な類似度算出法を採用する。また、図示されていないが、「ANT」の扱いとしては、品詞によって係数を変えたり、ここでは触れていない他のアークも用いて詳細に設定することも有効である。さらに、「IS−A」には方向性があるので、方向性を考慮しつつ係数化することも可能である。
【００７５】
要するに、概念辞書の構成とそれに基づく数値化の具体化は、本発明において制限するものではなく、趣旨を逸脱しない範囲で自由に定めることができる。例えば図９に示すような樹状構造ではなく、例えばニューラルネットワークやＷｏｒｄNｅｔなどのネットワーク構造を使用してもよい。
【００７６】
＊類似度算出の具体例について。
【００７７】
（１）今、具体的に、日英翻訳原文として、「韓国は、正式には大韓民国という。」が入力された時の類似度比較する例について説明する。
【００７８】
先ず、図４に示す一連の翻訳処理において、ステップＳ２２では、翻訳パターン候補を検索し、図２中の翻訳パターンNｏ．２、Nｏ．３の２つの候補が抽出されたとする。この両パターンは、定数範囲が同一であり、また入力原文中の変数相当部分と両パターンの変数相当部分との文字列がそれぞれ共通である。そして、第１変数相当部分は「韓国」、第２変数相当部分は「大韓民国」である。
【００７９】
そこで、これら変数相当部分はステップＳ２８で変数解析を行う。これら２つの変数相当部分は、何れも語彙部３ａの登録内容を用いて、名詞句であり、意味情報は「組織、場所」と解析される。これら変数相当部分は何れも該当パターンの原文側条件を満たしているので、変数解析後にステップＳ３４で類似度比較処理が行われる。類似度の比較は、個々の変数相当部分の類似度をN０．２、N０．３のそれぞれに対して行い、各パターンごとに類似度を合算することにより、どちらの翻訳パターンの類似性が高いかを判断する。
【００８０】
最初に、第１変数相当部分である「韓国」と、翻訳パターンNｏ．２の参照原文中の第１変数相当部分である「中国」との類似度を算出する。
【００８１】
そこで、これら「韓国」及び「中国」の概念辞書部３ｈ中のそれぞれの位置について、語彙部３ａ中の個々の見出し語に付加されている概念IDを用いて検出し、距離を数値化する。この検出の結果、これら２つの変数相当部分は何れも同一のIDの同義語セットに定義されていることが分かるので、類似度格納レジスタD［１］としては０が加算される。すなわち、「韓国」及び「中国」は概念辞書部３ｈ中に規定される位置が同じであると言える。
【００８２】
（２）次に、同じく第１変数相当部分である「韓国」と、翻訳パターンNｏ．３の参照原文中の第１変数相当部分である「この物質」との類似度を算出する。「この物質」は、語彙部３ａ、形態素解析規則部３ｂ、構文・意味解析規則部４ｃを用い解析したとき、句のヘッドが「物質」であると解析し、この「物質」と「韓国」との類似度を算出する。この「物質」に付与されている概念IDは、図９には詳細に表記されていないが、「具体物」の一段下の概念IDと一致する。よって、「韓国」の位置する「国」との距離は７と判断され、この７が類似度格納レジスタD［ｎ］に加算入力される。
【００８３】
引き続き、第２変数相当部分の類似度算出処理に入る。同様に「大韓民国」と「中華人民共和国」は類似度が０と判断され、この類似度０が類似度格納レジスタD［１］に加算入力され、最終的には類似度格納レジスタD［１］は０となる。
一方、第２変数相当部分である「大韓民国」と「脂肪親和性アルカロイド」のヘッドの「アルカロイド」との類似度を算出する。この「アルカロイド」についても同様に図９に表記されていないが、「物質」が属する概念とIS−A関係で結ばれる１段下の概念に属する。よって、類似度は８と判断され、類似度格納レジスタD［２］には８が加算入力され、最終的な類似度格納レジスタD［２］は１５となる。
【００８４】
従って、これらの類似算出により、類似度数値の低い１番目の候補である翻訳パターンNｏ．２が選択される。
【００８５】
次に、「この時の電気エネルギーの損失は、正式には抵抗損という。」なる入力原文が入力されたとし、同様に当該入力原文を日英翻訳処理により類似度比較する例について説明する。
【００８６】
この入力原文についても同様に、第１変数相当部分の「この時の電気エネルギーの損失」と第２変数相当部分の「抵抗損」が解析され、それぞれの翻訳パターンNｏ．２、Nｏ．３の原文側条件を満たすので、両パターンが候補として残る。第１変数相当部分のヘッド語は、「損失」と解析され、この「損失」に付与されている概念IDをもとに、「損失」はそれぞれ「中国」、「物質」と概念辞書の樹状構造上の距離が数値化される。「損失」は「作用」に属し、「中国」との距離は７と判断され、「物質」との距離は４と判断される。
【００８７】
次に、第２変数相当部分のヘッド語は「損」と解析され、これも同様に「作用」の概念IDが付与されているので、「中華人民共和国」との距離は８、「アルカロイド」の距離は５と判断される。最終的には、D［１］は１５、D［２］は９となる。よって、類似度数値の低い２番目の候補である翻訳パターンN０．３が選択される。
【００８８】
なお、以上の例では、概念辞書部３ｈのみを用いて類似度比較を行ったが、その前処理として、例えば構文・意味解析結果を使って候補の足切りを行うような構成にしてもよい。また、構文解析結果である変数部分の句構造の類似も尺度に入れて数値化をアレンジしてもよいものである。
【００８９】
（３）続いて、英日翻訳原文として、「This historical museum introduces you to the history of the Renaissance．」が入力された時の類似度比較について説明する。
【００９０】
この入力原文についても前述と同様に、翻訳パターンの検索を行い、図２中の翻訳パターンNｏ．４、Nｏ．５の２つの候補が抽出される。この２つのパターンは前述する例と異なり、変数の範囲が異なっている。しかし、本実施の形態では、変数範囲が異なる場合でもそれぞれの変数相当部分の類似度を算出し、加算入力する手法によって最終的に類似度を算出する。そして、算出された類似度のうち、数値が小さい翻訳パターンほど類似度が高いという構成をとることにより、例えば入力原文中の「you」は、翻訳パターンNｏ．４では定数と一致し、翻訳パターンNｏ．５では変数と一致する。パターンNｏ．４は、定数と一致する限り、類似度が加算が無いので、変数として認識した場合に類似度０と判断された場合と実質的に同一の扱いとなる。
【００９１】
最初に、入力原文の第１変数相当部分である「This historical museum」とパターンNｏ．４の第１変数相当部分の「This book」との類似度を算出する。ここでは、構文・意味解析の結果、「museum」と「book」との類似度の算出が行われる。「museum」は、図９中の「有空間組織」の概念IDを持ち、「book」は図９中には表記されていないが、「具体物」の２段下に位置する概念の概念IDを持つ。よって、類似度は８と判断され、レジスタD［１］には８が加算される。
【００９２】
次に、入力原文の第２変数相当部分である「the history of the Renaissance」とパターンNｏ．４の参照原文の「UNIX」との類似度算出が行われる。同様に、これは「history」と「UNIX」との類似度算出が行われる。「history」は図９では表記されていないが、「抽象物」の２段下の概念IDを持ち、「UNIX」は「抽象物」からの別のアークから結ばれる３段下の概念の概念IDを持つ。よって、類似度は５と判断され、レジスタD［１］には５が加算されるので、最終的にはレジスタD［１］は１３となる。
【００９３】
次に、翻訳パターンNｏ．５との類似度比較処理に入る。同様に、入力原文の第１変数相当部分である「This historical museum」のそれぞれのヘッド「museum」と翻訳パターンNｏ．５の参照原文の「center」は、図９中の「有空間組織」の中に位置する。よって、この段階では、類似度は０と判断され、レジスタD［１］には０がセットされる。
【００９４】
引き続き、入力原文の第２変数相当部分である「you」と翻訳パターンNｏ．５の参照原文の「foreign students」のヘッド「students」は何れも図９中の「人間」の中に位置する。よって、この段階では、類似度は０と判断され、レジスタD［２］は０のままである。
【００９５】
さらに、入力原文の最後の第３変数相当部分である「the history of the Renaissance」と翻訳パターンNｏ．５の第３変数相当部分である「Japanese culture」のそれぞれのヘッド「history」と「culture」との類似度算出に入る。この「culture」は図９中に表記されていないが、「history」の一段上位概念から別のIS−Aのアークで結ばれている概念の概念IDを持つ。よって、距離は２となり、最終的には、レジスタD［２］には２がセットされる。よって、類似度数値の小さい第２番目の翻訳パターンNｏ．５が選択される。
【００９６】
従って、以上のような実施の形態によれば、入力原文の定数部分の文字列とパターン辞書部３ｇの複数の翻訳パターンの原文側パターン中の定数部分の文字列とを比較参照し、翻訳パターン候補を検索する。その結果、複数の翻訳パターン候補が存在する時、各翻訳パターン候補の変数部分がパターン辞書部３ｇの変数条件に適合するか否かを判断する。そして、適合すると判断した時、入力原文に対するパターン辞書部３ｇの各翻訳パターンの参照原文を用いて、翻訳パターン適用候補間の類似度を算出し、最も類似度の高い翻訳パターンを用いて、入力原文を翻訳するので、複数の翻訳パターン候補が存在した場合でも、より適切な翻訳パターンを選択できる。
【００９７】
また、最適な翻訳パターンを選択するに際し、ユーザが従来のように過去の登録パターン全体を意識しながら選択する必要が無く、ユーザの負担が大幅に軽減化できる。
【００９８】
図１０は本発明に係る機械翻訳装置の第２の実施の形態を示す構成図である。なお、同図において、図１と同一部分には同一符号を付し、その詳しい説明は図１の説明に譲る。
【００９９】
この実施の形態は、ある特定の翻訳パターンが選定され、当該翻訳パターンを用いて訳文を生成した後、訳文生成文にカーソルを設定し、ユーザが入力部１から確定操作が行われればそれを検知するので、この確定操作を検知し、当該翻訳パターンの適用された入力原文を当該翻訳パターンの参照原文に自動的に格納する例である。
【０１００】
具体的には、処理制御部２には、指示判断手段２ａ、処理制御手段２ｂの他に、確定検出手段２ｃないし確定検出機能、確定検出ステップが設けられている。この機械翻訳装置は、翻訳パターンを用いて訳文が生成され、この生成文にカーソルが設定されているが、このとき、入力部１から訳文確定指示が入力された時、確定検出手段２ｃが例えば図３に示すステップＳ１６又はＳ１９の「その他の指示命令」で訳文確定指示有りと判断し、翻訳結果が後の編集で書き換わることを回避するための訳文ロックであると検出し、直接又は処理制御手段２ｂを通して訳文編集ロック処理命令をパターン登録部８に送出する（Ｓ２０）。
【０１０１】
ここで、パターン登録部８は、訳文編集ロック処理命令を受けると、既にカーソルが置かれている訳文が翻訳パターンの適用文であれば、該当入力原文をパターン辞書部３ｇの参照原文に追加する。なお、追加する参照原文は、前述するように複数の翻訳パターン候補が存在する場合、各翻訳パターンの参照原文について各々の類似度を算出し、より類似度の高い翻訳パターンの参照原文に入力原文を格納する。
【０１０２】
また、入力原部文を参照原文として格納するタイミングは、図３に示すステップＳ１９で訳文編集ロック処理指示と判断した時としたが、種々のタイミング及び入力部１からの操作により、該当入力原文をパターン辞書部３ｇの参照原文に格納することが可能である。
【０１０３】
従って、以上のような実施の形態によれば、入力原文の翻訳を繰り返すごとに各翻訳パターンの参照原文を蓄積するので、次以降の入力原文の翻訳に対し、最適な翻訳パターンを検索でき、より精度の高い翻訳処理を実現できる。
【０１０４】
なお、パターン登録部８は、処理制御部２や翻訳処理部４とは別に設けたが、処理制御部２、翻訳処理部４の何れかに設けてもよい。
【０１０５】
図１０は本発明に係る機械翻訳装置のさらに第３の実施の形態を示す構成図である。なお、同図において、図１と同一部分には同一符号を付し、その詳しい説明は図１の説明に譲る。
【０１０６】
この実施の形態は、前述する第２の実施の形態と同様に、翻訳パターンを適用された訳文生成文に対して自動的に入力原文を取得し、該当翻訳パターンの参照原文に格納するものであるが、第２の実施の形態では、ユーザが明示的に訳文の確定指示を行ったときに入力原文を取得するのに対し、この第３の実施の形態では、明示的な指示なしで、ユーザの操作から入力原文を取得すべきか否かを判断し、入力原文を参照原文として格納する例である。
【０１０７】
具体的には、ユーザが翻訳処理中にカーソルの移動への移動命令を入力した時、指示判断手段２ａが図３に示すステップＳ１９の「その他の指示命令」にてカーソル移動指示であると判断し、確定検出手段２ｃに通知する。この確定検出手段２ｃは、カーソル移動指示の判断通知を受けると、カーソルの下方移動がなされる前の直上の訳文生成文が翻訳パターン適用文であるかどうかを判断し、適用文であれば、「カーソルの下方移動＝翻訳パターン適用文の確定と判断し、処理制御手段２ｂを介して入力原文の取得命令をパターン登録部８に送出する。
【０１０８】
ここで、パターン登録部８は、前述するカーソルの下方移動がなされる前の直上の原文を当該文に適用された翻訳パターンの参照原文の領域に追加格納する。
【０１０９】
従って、このような実施の形態によれば、ユーザ明示的なアクションなしに、自動的に個々の翻訳パターンに対して適用してよい入力原文のバリエーションが蓄えられ、以降の翻訳処理に利用することにより、翻訳を繰り返すたびに翻訳パターンの適切な選択精度が高まり、ひいては精度の高い翻訳処理を実現できる。
【０１１０】
なお、パターン登録部８は、処理制御部２や翻訳処理部４とは別に設けたが、処理制御部２、翻訳処理部４の何れかに設けてもよい。
【０１１１】
また、本願発明は、上記実施の形態に限定されるものでなく、その要旨を逸脱しない範囲で種々変形して実施できる。
【０１１２】
また、各実施の形態は可能な限り組み合わせて実施することが可能であり、その場合には組み合わせによる効果が得られる。さらに、上記各実施の形態には種々の上位，下位段階の発明が含まれており、開示された複数の構成要素の適宜な組み合わせにより種々の発明が抽出され得るものである。例えば問題点を解決するための手段に記載される全構成要件から幾つかの構成要件が省略されうることで発明が抽出された場合には、その抽出された発明を実施する場合には省略部分が周知慣用技術で適宜補われるものである。
【０１１３】
【発明の効果】
以上説明したように本発明によれば、参照原文を利用することにより翻訳パターンの利用効率を高めることができ、しかも原文により忠実な翻訳パターンを選択し精度の高い翻訳を実現できる機械翻訳装置を提供できる。
また、本発明は、翻訳パターンを利用して第１の言語文を第２の言語文に翻訳するに際し、ユーザの負担を大幅に軽減できる機械翻訳装置を提供できる。
【０１１４】
また、本発明は、翻訳パターンを利用して第１の言語文を第２の言語文に翻訳するに際し、ユーザの負担を大幅に軽減できる機械翻訳システム、プログラム及び機械翻訳方法を提供できる。
【図面の簡単な説明】
【図１】本発明に係る機械翻訳装置の一実施の形態を示す構成図。
【図２】図１に示す翻訳辞書部に格納されるパターン辞書部のデータ配列図。
【図３】図１に示す処理制御部の一連の処理の流れを説明する図。
【図４】図１に示す翻訳処理部の一連の処理の流れを説明する図。
【図５】入力原文の定数部分と変数部分との構造結合図。
【図６】定数部分の変換処理を説明する図。
【図７】変数部分の変換処理を説明する図。
【図８】図１に示す翻訳処理部の類似度算出手段における一連の処理の流れを説明する図。
【図９】翻訳辞書部に格納される概念辞書部の一例を示す図。
【図１０】本発明に係る機械翻訳装置の他の実施の形態を示す構成図。
【符号の説明】
１…入力部、２…処理制御部、２ａ…指示判断手段、２ｂ…処理制御手段、２ｃ…確定検出手段と、３…翻訳辞書部、３ｇ…パターン辞書部、３ｈ…概念辞書部、４…翻訳処理部、４ａ…パターン照合抽出部、４ｂ…変数条件チェック手段、４ｃ…類似度算出手段、４ｄ…訳文生成処理手段、５…出力部、６，７…記録媒体、８…登録パターン部。[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a machine translation device that translates a first language sentence into a second language sentence.
[0002]
[Prior art]
In recent years, with the globalization of society, interest in machine translation technology that automatically translates a first language sentence (original text) into a second language sentence has increased, and it has been installed in translation services and computers provided on the Internet. Many machine translation software products that perform translation processing have appeared.
[0003]
By the way, the most important point for machine translation is the accuracy of translation. Every year, the accuracy of translation has been improved by improving computer performance, improving translation technology, and accumulating translation knowledge. However, the expression of natural language is extremely diverse, so simply improving translation technology. In other words, it is impossible to achieve satisfactory translation for users only by accumulating translation knowledge.
[0004]
Therefore, in order to overcome the above-mentioned problems in translation accuracy, a pattern translation technique has been proposed in which translation processing is performed under a completely new framework that has a different viewpoint from the conventional framework.
[0005]
One of them is to extract a required sentence from a basic example database that stores bilingual examples and past translation results, and create a perforated example sentence with a part of the sentence as a variable and register it in the variable example database. In addition, a translation method has been realized in which the input first language sentence and the example sentence in the variable example database are collated to perform translation processing (Patent Document 1).
[0006]
The other one is a translation device that enables selection of a more desirable pattern when there are a plurality of translation patterns having the same constant part (a part that is not specified as a variable in a perforated example sentence) ( Patent Document 2). Specifically, this translation device measures information about the structure of the variable part of the first language input sentence by giving information on the phrase structure to the variable part, and calculates the desired translation from the measured value. This is a translation method for selecting a pattern. For example, if the variable equivalent part is a verb phrase, a noun phrase, a simple noun phrase, or a noun phrase with a combined modifier clause, the structure is equivalent to the structure of the input sentence. This is a method of selecting a translation pattern that conditions
[0007]
However, in the machine translation apparatus using the pattern translation technology as described above, as the user proceeds to accumulate the translation patterns, one input sentence may be applied to a plurality of translation patterns depending on how the user gives variables. Inevitable. The reason is that information that can be specified as a variable condition by a general user is limited as described below.
[0008]
<Example of information that can be specified by the user>
This includes grammatical category information of variable parts, phrase structure information, character strings of surface words, semantic information (selected from specific options), and the like.
[0009]
Here, the semantic information can be specified by the user because it belongs to a large category that can be easily classified, such as “people”, “location”, “motion”, and so on. It is a heavy burden to specify correct information, and the user cannot select and specify appropriate information.
[0010]
Next, an example in which a plurality of translation patterns are applied to one input sentence will be described. P1 and P2 indicate translation patterns for English-Japanese translation, and P3 and P4 indicate translation patterns for Japanese-English translation. E is an English pattern and J is a Japanese pattern. In these patterns, $ 1, $ 2, and $ 3 indicate variable portions. Further, here, the variable conditions are omitted for the sake of simplicity of explanation, but all are examples in which noun phrases are used as variables.
[0011]
<Variable range is different, but one input sentence applies to multiple patterns>
(P1) E: $ 1 introduces you to $ 2.
J: $ 1 is an introduction to $ 2.
[0012]
(Original text: This book introduces you to UNIX.)
(P2) E: $ 1 introduces $ 2 to $ 3.
J: $ 1 introduces $ 3 to $ 2.
[0013]
(Original text: Our center introduces foreign students to japanese culture.)
<Example of registering multiple identical patterns with the same constant part>
(P3) J: $ 1 is officially $ 2.
[0014]
E: The official name for $ 1 is $ 2.
(Original text: China is officially called the People's Republic of China.)
(P4) J: $ 1 is officially $ 2.
[0015]
E: $ 1 is technically called $ 2.
(Original text: This substance is formally called a lipophilic alkaloid.) Therefore, when multiple translation patterns are listed as candidates as described above, they are selected according to the priority among these candidates. For example, the adjustment is made according to the following priority standards.
[0016]
(1) Priority is given to long constant strings.
[0017]
(2) If not determined in (1) above, priority is given to those with a small number of constant character strings.
[0018]
(3) If not determined in (2) above, the longest constants are compared and the longer one is given priority.
[0019]
(4) If not determined in (3) above, priority is given to those registered later in time series.
[0020]
[Patent Document 1]
Japanese Patent Laid-Open No. 06-68134
[0021]
[Patent Document 2]
Japanese Patent Laid-Open No. 06-290210
[0022]
[Problems to be solved by the invention]
Therefore, in the translation method and translation apparatus using the pattern translation technology as described above, when a plurality of translation pattern candidates are listed for one input original text, all of them satisfy the application condition. There is a problem that a desired translation pattern cannot be selected.
[0023]
In addition, when a user registers a translation pattern, many similar translation patterns exist if the user does not register while always paying attention to past registration contents, and the burden on the user increases more and more.
[0024]
  The present invention has been made in view of the above circumstances, and enhances the use efficiency of translation patterns, and selects a translation pattern most suitable for the original text to realize highly accurate translation.EquipmentThe purpose is to provide.
[0025]
  Another object of the present invention is to provide machine translation that greatly reduces the burden on the user when translating a first language sentence into a second language sentence using a translation pattern.EquipmentIt is to provide.
[0026]
[Means for Solving the Problems]
(1) In order to solve the above-described problem, the machine translation device according to the present invention includes variable information and a fixed expression that can be changed in addition to knowledge information necessary for translating the first language sentence into the second language sentence. A translation dictionary part in which a plurality of translation patterns each associated with a character string pattern, a condition, and a reference original text that are composed of a constant part that is a format, and an input that is the first language sentence input from the input part Control means for judging an original sentence and a translation processing instruction and outputting a translation processing start instruction, and translation processing of each word constituting the input original sentence based on the knowledge information based on the translation processing instruction outputted from the control means Means for acquiring various information necessary for the above, means for comparing and collating the input original text with the character strings of the respective translation patterns, extracting translation pattern candidates, and each variable portion of each of the extracted translation pattern candidates Above If there is a variable condition check means for checking whether the condition of the corresponding translation pattern is met based on the seed information and a plurality of translation patterns that match the condition by the variable condition check means, the variable equivalent part of the input source text And a similarity calculation means for selecting the translation pattern having the highest similarity, and applying the selected translation pattern to the second language. It is the structure which translates into a sentence.
[0027]
With the above configuration, the present invention searches for translation pattern candidates by comparing and referring to the character string of the input original text and the character strings of the source text side constant parts of the plurality of translation patterns in the pattern dictionary section. If there are a plurality of translation pattern candidates as a result of this search, it is determined whether or not the variable portion of each translation pattern candidate meets the variable conditions in the pattern dictionary portion. When it is determined that it matches, use the reference source text of each translation pattern in the pattern dictionary section for the input source text, calculate the similarity between translation pattern application candidates, and translate the input source text using the translation pattern with the highest similarity Therefore, even when a plurality of translation pattern candidates exist, a more appropriate translation pattern can be selected. Further, when selecting an optimal translation pattern, the user does not need to select the entire past registered pattern in the conventional manner, and the burden on the user can be greatly reduced. Note that the translation pattern candidate search process and the variable condition matching process described above can be processed in parallel.
[0028]
When an instruction to confirm the translation result input from the input unit is detected with respect to the translation result obtained by applying the translation pattern, or when the cursor moves from the translation result sentence to the later translation target sentence A confirmation detecting means for detecting that the translation has been confirmed, and a pattern registration means for storing the input original text in association with the reference original text of the applied translation pattern when it is determined that the translation is confirmed by the confirmation detecting means. If it is added, the reference source text of each translation pattern is accumulated each time the input source text is repeatedly translated, so that the most appropriate translation pattern can be searched for the translation of the input source text after the next, enabling more accurate translation processing. Become.
[0031]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0032]
FIG. 1 is a block diagram showing a first embodiment of a machine translation apparatus according to the present invention.
[0033]
This machine translation apparatus is inputted from an input unit 1 as an input means for inputting a first language sentence to be translated (hereinafter referred to as an input original sentence and a translation target sentence) and various commands, and the input unit 1. A processing control unit 2 that judges an input original sentence and various commands and outputs a command necessary for each component, and a translation dictionary that stores various rules and a dictionary for translating the first language sentence into the second language sentence A translation processing unit 4 that translates a first language sentence into a second language sentence using various rules and dictionaries stored in the translation dictionary unit 3 when a translation command is received from the unit 3 and the processing control unit 2 , And the output unit 5.
[0034]
When the processing control unit 2 and the translation processing unit 4 are configured by a CPU, the processing control unit 2 records a program for centrally controlling each component, and the translation processing unit 4 performs translation processing. It is also possible to execute a series of processes by using the recording medium 7 for recording the translation processing program that defines the processing procedure relating to the above.
[0035]
The input unit 1 generally uses a keyboard, a mouse, and the like. In addition to inputting various commands as described above, the input unit 1 is input by inputting an original text created by an input operation of the keyboard, or by specifying a specific area in a document using a mouse. In addition to inputting input text to be selected, input of read data by OCR (Optical character Rerder) that reads a printed or handwritten document, input of input text stored on a floppy disk, magnetic tape, magnetic disk, etc., and the Internet There are input source text input from above, or input source text obtained by converting a conversation text input from a microphone into a natural language character string using dictation software (Speech Dictation Software). That is, the input unit 1 includes not only general input original text input but also input original text input in various different input forms.
[0036]
The process control unit 2 is provided with an instruction determination unit 2a that determines the instruction contents of various commands input from the input unit 1, and a process control unit 2b that outputs a required process command according to a determination result by the instruction determination unit 2a. ing.
[0037]
The translation dictionary unit 3 includes a vocabulary unit 3a, a morpheme analysis rule unit 3b, a syntax / semantic analysis rule unit 3c, a conversion rule unit 3d having rules related to language conversion, a syntax generation rule unit 3e, a morpheme generation rule unit 3f, The pattern dictionary unit 3g, the concept dictionary unit 3h, and other rules and dictionaries necessary for translation processing are stored. In addition, it can be said that the other rules 3a to 3f other than the pattern dictionary unit 3g and the concept dictionary unit 3h are knowledge information used for translating the input original sentence into the second language sentence.
[0038]
The translation processing unit 4 uses the vocabulary unit 3a and the morphological analysis rule unit 3b in the translation dictionary unit 4 for the input original sentence, and includes various parts of speech, utilization, semantic information, and other attributes of the words constituting the input original sentence Pattern matching extraction means for sequentially matching the original pattern of each translation pattern in the pattern dictionary unit 3g and extracting a candidate of the translation pattern that matches the constant character string in the original pattern 4a, variable condition check means 4b for checking whether a variable portion existing in the translation pattern candidate extracted by the pattern matching extraction means 4a meets a predetermined translation pattern condition, an input original sentence and each translation pattern candidate The similarity calculation means 4c for calculating the similarity by comparing with the reference original text of the translation, and the translation parameter having the highest similarity calculated by the similarity calculation means 4c. Select over emissions candidate, and target generating means 4d for performing translation processing using knowledge information translation dictionary unit 3 is provided.
[0039]
The output unit 5 has a function of outputting a translation result that is an output of the translation processing unit 4 or displaying a response from the processing control unit 2 to various instructions input from the input unit 1. Usually, display means such as various displays are used. In addition, for example, printing means such as a printer, writing registration means to a floppy disk, magnetic tape, magnetic disk, transmission means for transmitting to other media, etc. There are various output forms desired by the user.
[0040]
FIG. 2 is a diagram showing an example of translation patterns stored in the pattern dictionary unit 3g used in the present embodiment.
[0041]
Each translation pattern of the pattern dictionary unit 3g stores five types of information as one unit as described below. That is, each translation pattern is
1. Source side pattern
2. Translation side pattern
3. Source-side conditions
4). Translation condition
5). Reference text
It is constituted by.
[0042]
The $ 1, $ 2,... Inserted in the source sentence pattern and the translated sentence pattern indicate variable parts, and the other character strings indicate constant parts. By changing the numbers “1” and “2” next to “$” for each variable, the variable portion of the original pattern corresponds to which variable portion in the translated pattern.
[0043]
On the other hand, the source-side condition is a part that specifies a condition to be satisfied by the character string corresponding to the variable when collating with the input original text. In the example shown in FIG. Specifies what the is. In FIG. 2, NP is a noun phrase and VP is a verb phrase. In addition, for example, a surface character string of a word in the original sentence, a part of speech, semantic information, and the like can be designated. The translation-side condition is a part that specifies under what conditions the phrase of the variable part is to be output when the translation is generated. In the example shown in FIG. 2, the verb phrase of the second variable is a clause in the translation. It is specified to be generated in the form of In addition, for example, number information, tense, and the number of responses to other phrases can be specified. Note that what is specified in the condition is not limited in the present invention, and can be freely set within a range not departing from the gist.
[0044]
Further, the reference original text is a portion that stores an input original text registered when the translation pattern is registered.
[0045]
Next, in the machine translation apparatus as described above, an example of interactive processing with the user in the processing control unit 2 will be described with reference to FIG. Note that interactive processing with the user can be executed by a hardware logic circuit configuration, but here, an example in which a series of processing is executed according to a program stored in the recording medium 6 will be described.
[0046]
When the operation of the apparatus starts, the processing control unit 2 determines whether or not an input original is input from the input unit 1 according to a program stored in the recording medium 6 (S11), and it is determined that the input original is input. In such a case, after temporarily storing in an appropriate memory not shown, a display command is issued to the output unit 5 to display the input original (S12).
[0047]
Here, after displaying the input original text or when it is determined in step S11 that the input original text has not been input, the processing control section 2 determines whether or not a translation instruction command has been input from the input section 1 ( S13) When a translation instruction is input, the input original text temporarily stored in the memory is sent to the translation processing unit 4 and a translation processing start command is sent. Based on the translation processing start command from the processing control unit 2, the translation processing unit 4 executes a translation process for translating the input original sentence into the second language sentence using the rules and dictionary of the translation dictionary unit 3 (S14). The translation result is displayed on the output unit 5 (S15). A series of processing examples of translation processing by the translation processing unit 4 will be described later.
[0048]
If it is determined in step S13 that a translation instruction command has not been input, it is subsequently determined whether or not a command for starting pattern registration has been input from the input unit 1 (S16), and a pattern registration start command has been input. If it is, the pattern registration unit 8 to be described later is instructed to start the pattern registration process. The pattern registration unit 8 executes pattern registration processing in accordance with a user instruction (S17).
[0049]
If an instruction to end the entire process is input from the input unit 1, all the processes are ended (S18). Further, when an instruction command other than the above instruction commands is input from the input unit 1, the process control unit 2 determines the content of the instruction command (S19), and performs processing according to the instruction command (S20). ).
[0050]
Accordingly, the process control unit 2 functionally includes steps S11, S13, S16, S18, S19, and the like that determine various instructions input from the input unit 1 among the above processes. The other step processing corresponds to the processing control means 2b.
[0051]
The determination of the end instruction command is performed before the determination of other instruction instructions, but it may be performed after the determination of other instruction instructions, and the other processing order is not particularly limited.
[0052]
  Next, of the series of processes as described above, the process control unit 2 determines that the instruction is a translation process instruction (S13), and the translation process (S14) of the translation process unit 4 when a translation process start command is sent. This will be described with reference to FIG. FIG. 4 is a diagram showing a series of processing examples from when an input source sentence (first language sentence) is inputted from the input unit 1 to translation into a second language sentence and output.The
[0053]
When the translation processing unit 4 receives a translation processing start command together with the input original text from the processing control unit 2, the translation processing unit 4 executes the pattern matching extraction unit 4a according to the translation processing program shown in FIG. This pattern matching extraction means 4a uses the vocabulary part 3a and the morpheme analysis rule part 3b in the translation dictionary part 4 for the input original sentence, performs morpheme analysis / dictionary processing, and the part of speech, utilization, meaning of the input original sentence Various information necessary for information and other translation processing is acquired (S21). Therefore, in this process, candidates for all parts of speech, utilization, semantic information, etc. of the words constituting the input original sentence can be listed.
[0054]
After listing the part of speech of the word constituting the input original text and candidates for utilization, the pattern dictionary unit 3g is used to sequentially match the original text side pattern of each translation pattern against the input original text. All of the translation pattern candidates that match the constant character string in the pattern are extracted, and the numbers [for example, N0.1, N0.3,...] Of these extracted translation patterns are extracted as pattern candidate storage registers Pcand [n] (not shown). ) Are sequentially arranged and recorded (S22). If no translation pattern candidate exists as a result of collating the original sentence side pattern of each translation pattern against the input original sentence, the pattern candidate storage register Pcand [0] is empty. At this stage, the suitability evaluation of the condition of the variable part is not performed. Note that these steps S21 and S22 correspond to a pattern matching extraction function and a pattern matching step. Thereafter, 0 is set to the array counter n indicating the array order of the extracted translation patterns (S23).
[0055]
Thereafter, the process proceeds to step S24, and it is determined whether or not there is an unanalyzed pattern candidate from the stored contents of the pattern candidate storage register Pcand [n] for storing the translation pattern candidate. Here, when the counter n = 0 and the corresponding area of the storage register Pcand [n] is empty, that is, when there is no translation pattern candidate, the translation pattern is determined using the syntax / semantic analysis rule unit 3c. A normal syntax / semantic analysis process that is not used is executed (S25), and a general translation process in steps S36 to S38 to be described later is executed. These steps S25 and S36 to S38 correspond to a translation generation processing means, a translation generation processing function, and a translation generation processing step.
[0056]
The translation processing unit 4 executes the variable condition checking unit 4b when n is not 0 and the register Pcand [n] is not empty in step S24. That is, if the register Pcand [n] is not empty, the variable condition checking unit 4b determines that there is an unanalyzed translation pattern candidate, and the variables shown in FIG. 2 are as many as the number of variable parts existing in the translation pattern candidate. Since it is necessary to check the compatibility of conditions and the like, first, 1 is set to the variable number m (S26), and it is determined whether or not variable analysis has been performed sequentially from the first variable portion (S27). That is, in step S27, it is determined whether or not the compatibility check has been analyzed for the m-th variable. If it has not been analyzed, the syntax / semantic analysis rule part 3c is related to the input original text string corresponding to the variable. Is used to execute syntax / semantic analysis processing (S28), and acquire the structure, semantic information and other various grammatical information of the variable equivalent portion of the input original sentence. Then, the obtained various information described above is collated with the description content of the original text condition of the variable, and it is determined whether or not the condition is met (S29). If the original condition of the variable is met, the variable number m is incremented by +1 (S30), and the process returns to step S27 to repeat the same process for the next variable part.
[0057]
By the way, if it is determined in step S27 described above that there is no unanalyzed variable for a specific translation pattern, that is, if conditions such as conditions have already been checked for all variable parts, the constant part and the variable part The structures are connected to form one structure sentence (S31).
[0058]
FIG. 5 is a structural connection diagram in which the structures of the constant portion and the variable portion are connected. In this figure, the structure of the variable part and the constant part is connected in the order of occurrence in the input original text, “(pattern attribute V1)” is equivalent to the first variable, and “(pattern attribute V2)” is the second. “(Pattern attribute C1)” indicates the first constant element, and “(pattern attribute C2)” indicates the second constant element.
[0059]
After creating a linked structure in which the structures of the variable part and the constant part are linked as described above, +1 is incremented to the counter n representing the sequence order of translation pattern candidates (S32), and the process returns to step S24 to perform the same processing. Run repeatedly.
[0060]
Also, in step S29, when the obtained various information is compared with the description content of the original condition of the variable, and it is determined that the variable condition is not met, the translation pattern is incompatible. Then, the process proceeds to the next translation pattern candidate process (after S24) through step S32 as described above.
[0061]
The series of processing steps S24 and S26 to S32 correspond to a variable condition check function or a variable condition check step.
[0062]
As described above, the translation processing unit 4 completes the variable condition check for all the translation pattern candidates, the structural connection between the variable part and the constant part, etc., and determines in step S24 that no unanalyzed pattern remains. If so, the similarity calculation means 4c is executed.
[0063]
In this similarity calculation means 4c, when it is determined in step S24 that there is no unanalyzed variable portion, the process proceeds to the next step S33, and for one input original, there is a translation pattern candidate suitable for the variable condition check. Judge how many are left. As a result of this determination, there are cases where only one translation pattern remains and cases where a plurality of translation pattern candidates still remain. In the former case, the process proceeds to step S36, and the structure of the first language, which is the input source sentence, is converted to the structure of the second language based on the corresponding translation pattern. If there are a plurality of translation pattern candidates, The similarity between the input original text and the variable equivalent part of the reference original text (see FIG. 2) of each translation pattern candidate is calculated (S34: similarity calculation function, similarity calculation step). The similarity calculation process will be described later with reference to FIG.
[0064]
When the similarity is calculated as described above, the translation pattern candidate having the highest similarity is selected from the calculated similarities (S35), and the conversion process is started (S36). In this conversion process, as described above, the pattern dictionary unit 3g and the conversion rule unit 3d are used to replace the structure of the input original sentence with the structure of the second language. Specifically, if the input structure to the conversion process is a translation pattern application result, the translation side pattern information of the corresponding translation pattern in the pattern dictionary unit 3g is used, the constant part is replaced according to the translation side pattern, The process of restructuring the structure with reference to the order in the translation of the component is performed. FIG. 6 is a diagram showing an example of the replacement processing result. Therefore, at this stage, the structure of the variable part remains the analysis result of the first language. Subsequently, the partial structure of the variable portion is replaced with the structure of the second language by using the conversion rule unit 3d of the translation processing unit 3. FIG. 7 is a diagram illustrating an example of the replacement result.
[0065]
Thereafter, a process for generating a one-dimensional word string in the second language is performed on the conversion result using the syntax generation rule unit 3e (S37). Further, using the morpheme generation rule unit 3f, final word utilization is performed, and a translation of the second language for the input original is generated and output (S38). These steps SS35 to S38 correspond to a translation generation function or a translation generation step.
[0066]
Next, an example of similarity calculation processing between translation pattern candidates will be described with reference to FIG.
First, a counter n = 1 is set to set the first candidate among the translation pattern candidates as the target pattern (S41). This n is a variable representing the number of the target pattern in the candidate. After n = 1 is set in step S41, it is determined whether or not there is an unprocessed translation pattern candidate (S42). If it is determined that there is an unprocessed translation pattern candidate, the similarity of the translation pattern is determined. Is initialized to 0, and 1 is set to the counter m (not shown) of the variable of interest (S43). This means that the first variable is processed.
[0067]
Subsequently, it is determined whether or not an unprocessed variable remains (S44). If there is an unprocessed variable, a character string in the input original corresponding to the variable of interest and a reference to the translation pattern are referred to. The character string corresponding to the variable in the original text is compared and the similarity is calculated (S45). A specific method for calculating the similarity will be described later. As a result of the similarity calculation, the smaller the numerical value, the higher the similarity. The calculated numerical value of the similarity is added to the register D [n] for storing the similarity (S46). When the similarity with respect to one variable is calculated, +1 is incremented to the target variable counter m (S47), and the process returns to step S44 to perform the similarity calculation process with respect to the next variable. Then, when all the variables have been processed, it is determined that there is no unprocessed variable, that is, that the processing of the translation pattern candidate has been completed, the counter n is incremented by +1 (S48), the process proceeds to step S42, and the next unprocessed process is performed. Determine whether there is a translation pattern. If all the translation pattern candidate processes have been completed, it is determined that there is no unprocessed pattern, and the process proceeds to step S49. Here, the similarity of each translation pattern candidate stored in the register D [n] is compared with a translation pattern having the smallest numerical value as a translation pattern having a high similarity, and the process is passed to step S35 to perform all processes related to similarity calculation. finish.
[0068]
* Specific calculation method of similarity.
[0069]
In the similarity calculation process, specifically, the similarity is calculated using the concept dictionary unit 3 h of the translation dictionary unit 3. In the concept dictionary of the concept dictionary unit 3h, the relationships between individual concepts are constructed in a tree shape, for example, and the concepts are connected by a relationship name called an arc. Each concept is composed of the following elements.
[0070]
* Concept name Example: Country
* Concept ID Example: 001f229b
* Synonym set Example: {Japan, China, People's Republic of China, Korea, USA, USA, USA, ...}
For example, the concept name “country name” is assigned 001f229b as a concept ID, and its elements are “Japan”, “China”, “People's Republic of China”, “Korea”, “America”, “United States”, etc. Specific phrases are defined. As the concept ID, instead of defining the surface word itself, a set of headword IDs assigned to each registered word in the vocabulary part 3a may be defined. A concept ID may be added to each headword in the vocabulary part 3a so that the concept dictionary can be easily searched from individual words.
[0071]
Next, the main items of relation names that connect concepts are as follows:
* Upper / lower relationship (IS-A)
* There is an antitrust (ANT).
[0072]
Further, in the above example, noun items are given for the purpose of simplifying the explanation of the concept. However, it is also effective to associate an adjective concept, a verbal concept, etc. with the concept dictionary unit 3h. is there.
[0073]
FIG. 9 is a diagram schematically showing a part of the concept dictionary used in the present embodiment. In the figure, each circle represents an individual concept, and a line connecting these concepts is called an arc. Here, for convenience, only the concept names are listed, and some of the concepts are omitted. Further, in this schematic diagram, for the sake of convenience, an arc representing “upper / lower relationship” is taken as an example.
[0074]
In this embodiment, basically, a distance between two specific concepts in a tree structure is converted into a coefficient by associating it with an arc name and converted into a numerical value of similarity. Specifically, when tracing from a specific concept to another specific concept on the dendritic structure, “1” every time “IS-A” is traced, and “2” every time “ANT” is traced. A simple similarity calculation method of adding to the storage register D [n] is employed. Although not shown, it is also effective to use “ANT” to change the coefficient depending on the part of speech or to set it in detail using other arcs not mentioned here. Furthermore, since “IS-A” has a directionality, it is possible to make a coefficient while taking the directionality into consideration.
[0075]
In short, the configuration of the concept dictionary and the quantification based thereon are not limited in the present invention, and can be freely determined without departing from the spirit of the present invention. For example, a network structure such as a neural network or WordNet may be used instead of the tree structure as shown in FIG.
[0076]
* Specific examples of similarity calculation.
[0077]
(1) Now, a specific example will be described in which similarities are compared when “Korea is officially called the Republic of Korea” is input as the original Japanese-English translation.
[0078]
First, in a series of translation processes shown in FIG. 4, in step S22, a translation pattern candidate is searched, and a translation pattern No. in FIG. 2, No. Assume that two candidates 3 are extracted. These two patterns have the same constant range, and the character strings of the variable equivalent part in the input original text and the variable equivalent part of both patterns are the same. The portion corresponding to the first variable is “Korea”, and the portion corresponding to the second variable is “South Korea”.
[0079]
Therefore, the variable equivalent portion is subjected to variable analysis in step S28. These two variable equivalent parts are both noun phrases using the registered contents of the vocabulary part 3a, and the semantic information is analyzed as “organization, place”. Since these variable-corresponding portions all satisfy the original text condition of the corresponding pattern, similarity comparison processing is performed in step S34 after variable analysis. Comparison of similarities is performed for each variable equivalent for N0.2 and N0.3, and the similarity of each translation pattern is high by adding the similarities for each pattern. Determine whether.
[0080]
First, “Korea” corresponding to the first variable and the translation pattern No. The degree of similarity with “China”, which is a portion corresponding to the first variable in the reference original text of 2, is calculated.
[0081]
Therefore, the respective positions in the concept dictionary unit 3h of “Korea” and “China” are detected using the concept ID added to each headword in the vocabulary unit 3a, and the distance is quantified. As a result of this detection, it can be seen that these two variable-corresponding parts are both defined in the synonym set having the same ID, so 0 is added to the similarity storage register D [1]. That is, it can be said that “Korea” and “China” have the same position defined in the concept dictionary unit 3h.
[0082]
(2) Next, “Korea”, which is also equivalent to the first variable, and the translation pattern No. The degree of similarity with “this substance”, which is the portion corresponding to the first variable in the reference text of 3, is calculated. When this “substance” is analyzed using the vocabulary part 3a, the morphological analysis rule part 3b, and the syntax / semantic analysis rule part 4c, it is analyzed that the head of the phrase is “substance”. The similarity is calculated. The concept ID assigned to the “substance” is not shown in detail in FIG. 9, but matches the concept ID one level below the “specific object”. Therefore, the distance to the “country” where “Korea” is located is determined to be 7, and this 7 is added to the similarity storage register D [n].
[0083]
Subsequently, the similarity calculation processing for the portion corresponding to the second variable is started. Similarly, “Republic of Korea” and “People's Republic of China” are judged to have a similarity of 0, and this similarity of 0 is added to the similarity storage register D [1], and finally the similarity storage register D [1]. Becomes 0.
On the other hand, the degree of similarity between the “album” of the head of “lipophilic alkaloid” and “Korean Republic” corresponding to the second variable is calculated. This “alkaloid” is not shown in FIG. 9 as well, but belongs to the concept one level below that is connected to the concept to which the “substance” belongs in an IS-A relationship. Therefore, the similarity is determined to be 8, 8 is added to the similarity storage register D [2], and the final similarity storage register D [2] is 15.
[0084]
Therefore, with these similarity calculations, the translation pattern No. 1 which is the first candidate with a low similarity value is obtained. 2 is selected.
[0085]
Next, an example will be described in which an input original “The loss of electrical energy at this time is formally referred to as resistance loss” is input, and similarly, the input original is compared by a Japanese-English translation process.
[0086]
Similarly, for this input original, “loss of electrical energy at this time” in the portion corresponding to the first variable and “resistance loss” in the portion corresponding to the second variable are analyzed, and each translation pattern No. 2, No. Since the original text side condition of 3 is satisfied, both patterns remain as candidates. The head word corresponding to the first variable is analyzed as “loss”. Based on the concept ID assigned to this “loss”, “loss” is “China”, “substance” and the concept dictionary tree, respectively. The distance on the structure is quantified. “Loss” belongs to “action”, the distance to “China” is determined to be 7, and the distance to “substance” is determined to be 4.
[0087]
Next, the head word of the portion corresponding to the second variable is analyzed as “loss”, and since this is also given the concept ID of “action”, the distance from “China” is 8, “alkaloid” Is determined to be 5. Eventually, D [1] is 15 and D [2] is 9. Therefore, the translation pattern N0.3, which is the second candidate with a low similarity value, is selected.
[0088]
In the above example, the similarity comparison is performed using only the concept dictionary unit 3h. However, as a preprocessing, for example, a configuration may be used in which candidates are cut off using a result of syntax / semantic analysis. . In addition, the similarity of the phrase structure of the variable part, which is the result of parsing, may be scaled to arrange numerical values.
[0089]
(3) Next, the similarity comparison when “This historical museum introduces you to the history of the Renaissance” is input as an English-Japanese translation original will be described.
[0090]
Similarly to the above, the input original text is searched for the translation pattern, and the translation pattern No. 1 in FIG. 4, No. Two candidates of 5 are extracted. These two patterns are different from the example described above, and have different variable ranges. However, in this embodiment, even when the variable ranges are different, the similarity of each variable-corresponding portion is calculated, and finally the similarity is calculated by a method of addition input. In the calculated similarity, the translation pattern having a smaller numerical value has a higher similarity, so that, for example, “you” in the input original is translated pattern No. 4 matches the constant, and the translation pattern No. 5 matches the variable. Pattern No. As long as it matches the constant, 4 is treated as substantially the same as when the similarity is determined to be 0 when it is recognized as a variable because there is no addition.
[0091]
First, “This historical museum” which is a portion corresponding to the first variable of the input original and the pattern No. The degree of similarity with “This book” in the portion corresponding to the first variable of 4 is calculated. Here, as a result of the syntax / semantic analysis, the similarity between “museum” and “book” is calculated. “Museum” has the concept ID of “space organization” in FIG. 9, and “book” is not shown in FIG. 9, but the concept ID of the concept located two steps below “concrete” have. Therefore, the similarity is determined to be 8, and 8 is added to the register D [1].
[0092]
Next, “the history of the Renaissance” corresponding to the second variable of the input original text and the pattern No. The similarity of the reference original text “4” with “UNIX” is calculated. Similarly, the similarity between “history” and “UNIX” is calculated. Although “history” is not shown in FIG. 9, it has a concept ID two steps below “abstract”, and “UNIX” is a concept concept three steps below that is connected from another arc from “abstract”. Has an ID. Therefore, the similarity is determined to be 5, and 5 is added to the register D [1], so that the register D [1] finally becomes 13.
[0093]
Next, the translation pattern No. The similarity comparison processing with 5 is started. Similarly, the head “museum” of “This historical museum”, which is the portion corresponding to the first variable of the input original, and the translation pattern No. “Center” in reference text 5 is located in “spaced organization” in FIG. Therefore, at this stage, the similarity is determined to be 0, and 0 is set in the register D [1].
[0094]
Subsequently, “you”, which is the portion corresponding to the second variable of the input original text, and the translation pattern No. The head “students” of “foreign students” in the reference text of 5 is located in “human” in FIG. 9. Therefore, at this stage, the similarity is determined to be 0, and the register D [2] remains 0.
[0095]
Furthermore, “the history of the Renaissance” corresponding to the last third variable of the input original text and the translation pattern No. 5 starts calculating the degree of similarity between the heads “history” and “culture” of “Japanese culture”, which is the portion corresponding to the third variable. Although this “culture” is not shown in FIG. 9, it has a concept ID of a concept that is connected by another IS-A arc from the one-level concept of “history”. Therefore, the distance is 2, and finally, 2 is set in the register D [2]. Therefore, the second translation pattern No. 2 having a small similarity value. 5 is selected.
[0096]
Therefore, according to the embodiment as described above, the character string of the constant part of the input original text and the character string of the constant part in the source text side patterns of the plurality of translation patterns of the pattern dictionary unit 3g are compared and referenced, and the translation pattern Search for candidates. As a result, when there are a plurality of translation pattern candidates, it is determined whether or not the variable part of each translation pattern candidate meets the variable condition of the pattern dictionary unit 3g. Then, when it is determined that they match, the similarity between the translation pattern application candidates is calculated using the reference original text of each translation pattern of the pattern dictionary unit 3g for the input original text, and the input using the translation pattern with the highest similarity. Since the original text is translated, a more appropriate translation pattern can be selected even when a plurality of translation pattern candidates exist.
[0097]
In addition, when selecting an optimal translation pattern, the user does not need to select the entire registered pattern while conscious of the entire past registered pattern, and the burden on the user can be greatly reduced.
[0098]
FIG. 10 is a block diagram showing a second embodiment of the machine translation apparatus according to the present invention. In this figure, the same parts as those in FIG. 1 are denoted by the same reference numerals, and detailed description thereof will be given in the description of FIG.
[0099]
In this embodiment, after a specific translation pattern is selected, a translation is generated using the translation pattern, a cursor is set on the translation generation sentence, and if the user performs a confirming operation from the input unit 1, This is an example in which the confirmation operation is detected and the input original sentence to which the translation pattern is applied is automatically stored in the reference original sentence of the translation pattern.
[0100]
Specifically, the process control unit 2 is provided with a confirmation detection means 2c or a confirmation detection function and a confirmation detection step in addition to the instruction determination means 2a and the process control means 2b. In this machine translation apparatus, a translated sentence is generated using a translation pattern, and a cursor is set in the generated sentence. At this time, when a translated sentence confirmation instruction is input from the input unit 1, the confirmation detecting unit 2c, for example, In step S16 or S19 shown in FIG. 3, it is determined that there is a translation finalization instruction, and it is detected that the translation result is a translation lock to avoid rewriting the translation result in a later editing, either directly or by processing. A translation edit lock processing command is sent to the pattern registration unit 8 through the control means 2b (S20).
[0101]
Here, when receiving the translation editing lock processing command, the pattern registration unit 8 adds the corresponding input original text to the reference original text of the pattern dictionary section 3g if the translation text on which the cursor is already placed is a translation pattern application text. . In addition, when there are a plurality of translation pattern candidates as described above, the reference original text to be added is calculated for each of the reference original text of each translation pattern, and the input original text is added to the reference original text of the translation pattern having a higher similarity. Is stored.
[0102]
Further, the timing of storing the input original sentence as the reference original sentence is the time when it is determined as the translation edit lock processing instruction in step S19 shown in FIG. 3, but the corresponding input original sentence is determined by various timings and operations from the input unit 1. Can be stored in the reference original text of the pattern dictionary section 3g.
[0103]
Therefore, according to the embodiment as described above, since the reference original text of each translation pattern is accumulated every time the translation of the input original text is repeated, the optimum translation pattern can be searched for the translation of the input original text after the next, More accurate translation processing can be realized.
[0104]
The pattern registration unit 8 is provided separately from the processing control unit 2 and the translation processing unit 4, but may be provided in either the processing control unit 2 or the translation processing unit 4.
[0105]
FIG. 10 is a block diagram showing a third embodiment of the machine translation apparatus according to the present invention. In this figure, the same parts as those in FIG. 1 are denoted by the same reference numerals, and detailed description thereof will be given in the description of FIG.
[0106]
In this embodiment, as in the second embodiment described above, an input original sentence is automatically acquired for a translated sentence sentence to which a translation pattern is applied, and stored in a reference original sentence of the corresponding translation pattern. However, in the second embodiment, the input original text is acquired when the user explicitly instructs the translation to be confirmed, whereas in the third embodiment, without an explicit instruction, This is an example in which it is determined whether or not an input original should be acquired from a user operation, and the input original is stored as a reference original.
[0107]
Specifically, when the user inputs a movement command to move the cursor during the translation process, the instruction determination unit 2a determines that the instruction is a cursor movement instruction in “Other instruction command” in step S19 shown in FIG. Then, it notifies the fixed detection means 2c. Upon receipt of the determination notification of the cursor movement instruction, the confirmation detection unit 2c determines whether the translated sentence generation sentence immediately before the downward movement of the cursor is a translation pattern application sentence. “Downward movement of cursor = determination of translation pattern application sentence is determined, and an input original sentence acquisition command is sent to the pattern registration unit 8 via the processing control means 2b.
[0108]
Here, the pattern registration unit 8 additionally stores the original text immediately before the cursor is moved downward in the reference original text area of the translation pattern applied to the text.
[0109]
Therefore, according to such an embodiment, variations of the input source text that may be automatically applied to individual translation patterns without user explicit action are stored and used for subsequent translation processing. Thus, each time the translation is repeated, the appropriate selection accuracy of the translation pattern is increased, and as a result, a highly accurate translation process can be realized.
[0110]
The pattern registration unit 8 is provided separately from the processing control unit 2 and the translation processing unit 4, but may be provided in either the processing control unit 2 or the translation processing unit 4.
[0111]
The invention of the present application is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the invention.
[0112]
In addition, the embodiments can be implemented in combination as much as possible, and in that case, the effect of the combination can be obtained. Further, each of the above embodiments includes various higher-level and lower-level inventions, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, when an invention is extracted because some constituent elements can be omitted from all the constituent elements described in the means for solving the problem, the omitted part is used when the extracted invention is implemented. Is appropriately supplemented by well-known conventional techniques.
[0113]
【The invention's effect】
  As described above, according to the present invention, the use of a reference original text can increase the efficiency of use of a translation pattern, and the machine translation that can realize a highly accurate translation by selecting a faithful translation pattern based on the original text.EquipmentCan be provided.
  The present invention also provides machine translation that can greatly reduce the burden on the user when translating a first language sentence into a second language sentence using a translation pattern.EquipmentCan be provided.
[0114]
In addition, the present invention can provide a machine translation system, a program, and a machine translation method that can greatly reduce the burden on the user when translating a first language sentence into a second language sentence using a translation pattern.
[Brief description of the drawings]
FIG. 1 is a configuration diagram showing an embodiment of a machine translation apparatus according to the present invention.
FIG. 2 is a data array diagram of a pattern dictionary unit stored in the translation dictionary unit shown in FIG.
FIG. 3 is a diagram for explaining a flow of a series of processes of a process control unit shown in FIG.
FIG. 4 is a diagram for explaining a flow of a series of processes of a translation processing unit shown in FIG.
FIG. 5 is a structural connection diagram of a constant part and a variable part of an input source text.
FIG. 6 is a diagram illustrating conversion processing for a constant part.
FIG. 7 is a view for explaining variable part conversion processing;
FIG. 8 is a diagram for explaining a flow of a series of processes in the similarity calculation unit of the translation processing unit shown in FIG. 1;
FIG. 9 is a diagram illustrating an example of a concept dictionary unit stored in a translation dictionary unit.
FIG. 10 is a block diagram showing another embodiment of the machine translation apparatus according to the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Input part, 2 ... Process control part, 2a ... Instruction judgment means, 2b ... Process control means, 2c ... Confirmation detection means, 3 ... Translation dictionary part, 3g ... Pattern dictionary part, 3h ... Concept dictionary part, 4 ... Translation processing unit, 4a ... pattern matching extraction unit, 4b ... variable condition check means, 4c ... similarity calculation means, 4d ... translation generation processing means, 5 ... output part, 6, 7 ... recording medium, 8 ... registration pattern part.

Claims

In addition to the knowledge information necessary for translating the first language sentence into the second language sentence, each of the character string pattern, condition, and reference source text composed of variable parts that can be changed and constant parts that are fixed expressions A translation dictionary part that stores a plurality of associated translation patterns;
Control means for judging a translation processing instruction input from the input unit and outputting a translation processing start instruction;
Based on the translation processing command output from the control means, means for acquiring various information necessary for translation processing of each word constituting the input original text that is the first language based on the knowledge information;
Means for comparing and collating the input original text and the character strings of the respective translation patterns, and extracting translation pattern candidates;
Variable condition checking means for checking whether each variable part of each extracted translation pattern candidate meets the conditions of the corresponding translation pattern based on the various information;
When there are a plurality of translation patterns that meet the conditions by this variable condition checking means, the similarity between the variable equivalent part of the input original text and the variable equivalent part of the reference original text of each translation pattern is calculated, and the highest similarity is obtained. A similarity calculation means for selecting a translation pattern,
A machine translation device that applies the selected translation pattern and translates it into the second language sentence.

The machine translation device according to claim 1,
The similarity calculation means provides a concept dictionary that defines a connection relationship between concepts in the translation dictionary unit, specifies an equivalent concept of the concept dictionary corresponding to a unit word of each variable equivalent part of the input original sentence, A machine translation apparatus characterized in that a coupling distance of an equivalent concept is digitized, and a similarity degree of a variable equivalent part of a reference original text of each translation pattern is calculated from the numeric value.

The machine translation apparatus according to claim 1 or 2 ,
A confirmation detection means for detecting an instruction to confirm the translation input from the input unit with respect to a translation result obtained by applying the translation pattern, and when the translation detection is determined by the confirmation detection means, the input original text The machine translation apparatus further comprising: a pattern registration unit that stores the information in association with a reference original of the applied translation pattern.

The machine translation device according to claim 3 ,
The machine translation apparatus, wherein the confirmation detection means detects a translation edit lock processing instruction for avoiding a translation result being rewritten by later editing.

The machine translation apparatus according to claim 1 or 2 ,
Confirmation detection means for detecting that the translation has been confirmed when the cursor moves from the sentence of the translation result to the translation target sentence behind, and if the translation detection is determined by this confirmation detection means, the translation result before the movement of the cursor A machine translation apparatus, further comprising: a pattern registration unit that stores an input original corresponding to the sentence in association with a reference original of the applied translation pattern.