JP2004199519A

JP2004199519A - Mechanical translation method, mechanical translation device, and mechanical translation program

Info

Publication number: JP2004199519A
Application number: JP2002368952A
Authority: JP
Inventors: Akio Koyama; 明雄小山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-12-19
Filing date: 2002-12-19
Publication date: 2004-07-15

Abstract

<P>PROBLEM TO BE SOLVED: To output a proper translation result by determining a translation or syntax with a viewpoint of the whole original document or a part where ambiguity is solved. <P>SOLUTION: In this mechanical translation program, the inputted original document is analyzed on the basis of a predetermined rule, the analysis result including ambiguity information of meaning and modification relation is added to the original document as a tag to form an intermediate document (1), which is then stored in a storage device. The document is further converted to a document of an intended language in the state where the tag is included in the document of the intended language on the basis of a predetermined rule in reference to the original document and the intermediate document (1) and stored as an intermediate document (2) in the storage means, and a translation having ambiguity information included in the intermediate document (2), or an intended language document obtained by selecting/editing the translation in reference to the original document and the intermediate document 1 is outputted, whereby a further proper translation result can be outputted. The analysis/conversion generation result is made into an intermediate document of standard XML format, whereby the man-power of program development can be reduced. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、機械翻訳に関し、特に翻訳処理の対象となる原言語で記述された原文を入力し、入力された原文を、原言語の辞書及び文法規則等を用いて解析し、解析により認識された原言語を段階的に変換し、曖昧性を解決した目的言語文を出力する技術に関する。
【０００２】
【従来の技術】
情報の国際化が進展している現状において、機械翻訳装置の利用が増加しており、翻訳の質の向上も強く求められるようになってきた。従来は、原文の大意が読み取れれば利用目的が達せられていたり、翻訳者の補助的道具として利用される場合が多かったが、現在では翻訳結果をそのまま加工することなく利用するなど様々な利用形態が要求されている。
【０００３】
機械翻訳に関しては、電子計算機が出現した１９４５年ごろから研究が始まり、基本的な自然言語処理技術は様々なものが提案され、文を辞書見出しとして登録されている語へ分解する形態素解析、文の構造を解析する構文解析、文法規則を論理式の形で表現し、それを使って文を解析する論理解析、文法情報を語彙項目の中に用意し、句構造規則を使わずに文を解析する語彙解析など種々の自然言語処理技術に関する技法については刊行物に記載されている（例えば、非特許文献１参照。）。
【０００４】
このような基本技術を実装した従来の機械翻訳装置は、例えば図１５で示すように訳語又は訳文を指定した訳語指定原文の入力を行う入力部１と、機械翻訳の処理を行うＣＰＵ及びメモリ２を有し、ＣＰＵ及びメモリ２には、文を形成する基本単位である単語や熟語より小さい単位として形態素（活用語尾、接頭語や接尾語等）を解析して単語の認定を行う形態素解析部２１と、形態素解析結果から構文解析を行う構文・意味解析部２２と、解析段階で認識した原言語を変換して目的言語で記述された文の生成を行う変換生成部２３と、辞書２５と、必要に応じて後編集された目的言語文を出力する出力部２４から構成されている。
【０００５】
入力された訳語指定原文は、形態素解析部２１が原言語で記述された原文を形態素に分解し、辞書２５を用いて形態素解析する。形態素解析された原文は、構文・意味解析部２２により、構文解析及び意味解析等が行われる。「構文解析」とは、単語や熟語の相互関係から文の構文構造を抽出することであり、「意味解析」とは、主として名詞の意味カテゴリと動詞の格支配等をチェックし、多義語の意味を一意的に選択し、原言語の意味構造を抽出する。構文解析及び意味解析された解析結果は、変換生成部２３に入力され、解析段階で認識した原言語を変換して目的言語で記述された文の生成を行うようにしている（例えば、特許文献１参照。）。
【０００６】
【特許文献１】
特開平６−１２４４８号公報（第２−３頁、図５）
【０００７】
【非特許文献１】
野村浩郷著「自然言語処理の基礎技術」２版電子情報通信学会出版、１９８９年５月２５日
【０００８】
【発明が解決しようとする課題】
しかしながら、上記のような従来の機械翻訳装置では、１文ずつの翻訳技術が基本とされてきたため、或る程度の文脈の解析をしていると言っても、文の持つ曖昧性に対する完全な対応は困難であり、訳語や訳文が適切でないという問題が多く発生していた。例えば、「私は鰻だ」という原文がある。訳文として「giveme eel 」とか「I am an eel 」などが候補になる。従来のように一文ずつ翻訳していると、曖昧性は解決できず「give me eel 」と「I am an eel 」のいずれかを選択することになるため適切な訳文とならないことが往々にしてあった。
【０００９】
ところが、文書の何処かに、「食べ終わって、店を出た」などのキーワードを持つ文があり、かつ翻訳時に原文書全体を視野に入れれば「give me eel 」が適した訳文であることが分かるはずである。本発明は、上記のような事情に鑑みて提案されたものであり、原文書全体もしくは曖昧性が解消した部分を視野に入れた訳語、構文の決定を行い、適切な翻訳結果を出力することを目的としている。
【００１０】
【課題を解決するための手段】
図１は、本発明の実施の形態１の全体構成図を示すものである。本発明の機械翻訳プログラム２０はコンピュータである機械翻訳装置２上で実行するプログラムであり、入力手段１を介して翻訳対象の原文書３１を指示されると、解析手段２１は、入力指示された原文書３１を所定のルールに基づいて解析し、意味、および係り受け関係の曖昧性情報を含む解析結果をタグとして原文書に付加したものを中間文書（１）３２として記憶装置３に記憶させ、変換生成手段２２は原文書３１、および中間文書（１）３２を参照して所定のルールに基づいて目的の言語の文書に前記タグを含めた状態で目的の言語の文書に変換したものを中間文書（２）３３として記憶装置に記憶させ、翻訳完成手段３４は中間文書（２）３３に含まれる曖昧性情報を有する訳語、または訳文を前記原文書３１、および中間文書（１）３２を参照して選択・編集し、かつタグを除去して目的語文書を出力するようにしたものであり、曖昧性を解決した翻訳の正確さを高めることが可能となる。
【００１１】
解析手段２１は、従来の機械翻訳プログラムと同様に文を形成する基本単位である単語や熟語より小さい単位として形態素（活用語尾、接頭語や接尾語等）を解析して単語の認定を行う形態素解析部と、形態素解析結果から構文解析を行う構文・意味解析部から成り、従来は解析結果を機械翻訳プログラムが処理するための内部テキスト情報としていたところを、本発明では利用者プログラムでも取り扱いが容易にできるようにするために形態素毎に解析結果をタグとして付加した標準的なＸＭＬ（eXtensible Markup Language）形式にしたところに特徴がある。
【００１２】
変換生成手段２２は、従来の機械翻訳プログラムと同様に解析段階で認識した原言語を変換して目的言語で記述された文の生成を行うものであるが、本発明では解析手段２１が生成したＸＭＬ形式の中間文書（１）３２をタグを含めて変換し、生成する中間文書（２）３３もタグ付きのＸＭＬ形式にしたところに特徴がある。このようにすることにより、解析手段２１、変換生成手段２２および翻訳完成手段２３の開発は容易となり、翻訳の正確さを高めたり、利用者が使用する原文の特徴に合わせてプログラムを改造したり、新たに解析手段２１や変換生成手段２２を追加することが一般利用者においても容易に可能となる。
【００１３】
また、解析手段２１、および変換生成手段２２を複数とすることにより、並列処理を行っても矛盾が生じない部分については、解析手段２１、変換生成手段２２は、別プログラムとして並列処理を行ったり、利用者が使用する原文の特徴に合わせて最小限の変換生成手段２２を選択することにより機械翻訳処理全体のスループットを削減することが可能となる。
【００１４】
【発明の実施の形態】
図１は本発明の実施の形態１の全体構成図を示してある。本発明の機械翻訳プログラム２０はコンピュータである機械翻訳装置２上で実行するプログラムであり、入力手段１を介して翻訳対象の原文書３１を指示されると、解析手段２１は、入力指示された原文書３１を所定のルールに基づいて解析し、意味、および係り受け関係の曖昧性情報を含む解析結果をタグとして原文書に付加したものを中間文書（１）３２として記憶装置３に記憶させ、変換生成手段２２は原文書３１、および中間文書（１）３２を参照して所定のルールに基づいて目的の言語の文書に前記タグを含めた状態で目的の言語の文書に変換したものを中間文書（２）３３として記憶装置に記憶させ、翻訳完成手段３４は中間文書（２）３３に含まれる曖昧性情報を有する訳語、または訳文を前記原文書３１、および中間文書（１）３２を参照して選択・編集し、かつタグを除去して目的語文書を出力するようにしたものである。
【００１５】
入力装置１は、キーボードなど機械翻訳装置２に対して、翻訳を指示するものであり、記憶装置３には、翻訳対象とする原文書３０、解析手段２１の解析結果である中間文書（１）３２、変換生成手段２２が変換・生成した中間文書（２）３３、翻訳完成手段２３が出力した翻訳文書３４が格納される。また、記憶装置４には、解析手段２１や変換生成手段２２が形態素解析、意味解析、構文解析、訳語選択などに利用する辞書４１が格納されている。なお、記憶装置３と記憶装置４は、異なる外部記憶装置である必要はなく、同一の外部記憶装置であっても、メモリなどの記憶装置であってもよい。
【００１６】
図２は、本発明に係る実施の形態１における機械翻訳プログラム２０の処理の流れを示すフローチャートである。本実施例では、図５で示す翻訳対象の原文書例１に基づき、日本語から英語に翻訳する場合の解析から訳文生成までの処理について説明するが、本発明は日本語から英語に翻訳する場合に限定したものではなく任意の言語から任意の言語への翻訳が可能である。先ず、機械翻訳プログラム２０は、入力手段１を介して指示された翻訳対象の原文書を取り出し、記憶装置３上に格納する（Ｓ２０１）。本実施の形態では、記憶装置３には、翻訳対象の原文書として「私は動物園で象を見た。象は鼻が長かった。」が格納される。
【００１７】
次に、機械翻訳プログラム２０は所定の形態素解析を行い、記憶装置３に格納された原文書３１を分かち書きとし、辞書４１（本処理では日本語文法解析辞書）を参照して各単語の品詞を決定し、文法情報を追加した解析結果をタグとして原文書に付加して中間文書（１）３２として記憶装置３に格納する。（Ｓ２０２）。
【００１８】
本実施の形態では、文「私は動物園で象を見た。」を従来通りの機械翻訳の形態素解析を行うと、「私（名詞）は（助詞）動物園（名詞）で（助詞）象（名詞）を（助詞）見た（動詞）」と分かち書きされ、さらに文法情報として文や主部／述部を表す情報がタグとして付加される。本処理で生成した中間文書１の例が図６に示してある。タグは「＜」と「＞」で囲まれ、例えば＜文書＞と＜／文書＞の対ように、その間に挟まれた単語や文、文書などを説明する記号である。各単語には、品詞、主部／述部などを表すタグが付加されている。タグの付加処理そのものについては、一般的なマークアップ言語と同様の処理であるため省略する。
【００１９】
次に、機械翻訳プログラム２０は構文・意味解析を行い、特殊構文を解析した結果を中間文書（１）３２として記憶装置３に格納する（Ｓ２０３）。本処理での解析結果の例が、図７に示してある。ここでは「象は鼻が」の２重主語問題を解決している。つまり、「象」は「鼻」を所有していると、意味解析辞書を用いて解決する。
【００２０】
次に、機械翻訳プログラム２０は辞書４１を参照し、単語対訳処理を行い、処理した結果を中間文書（２）３３として記憶装置３に格納する（Ｓ２０４）。なお、単語対訳処理は中間文書（１）３２に付加された日本語で表示されたタグに対しても行うものとする。本処理で生成した中間文書（２）３３の例がが図８に示してある。ここでは、基本的に単語や熟語の辞書引きにより目的の言語に変換を行う。
【００２１】
次に、機械翻訳プログラム２０は、辞書４１を参照し語順変換処理を行い、処理した結果を中間文書（２）３３として記憶装置３に格納する（Ｓ２０５）。本処理で生成した中間文書（２）３３の例が図９に示してある。次に機械翻訳プログラム２０は、辞書４１を参照し特殊構文処理を行い、処理した結果を中間文書（２）３３として記憶装置３に格納する（Ｓ２０６）。本処理で生成した中間文書（２）３３の例が図１０に示してある。
【００２２】
ここでは、「owner 」タグのための特殊処理を行っている。
「<owner><article>an/the</article><noun>elephant</noun></owner><subject><article>an/the</article><noun>nose/trunk(elepant)</noun></subject><verb>be</verb><adjective>long</adjective> 」の文法に着目し、この部分を、意味解析辞書を引くことにより「<subject-part><subject><article>an/the</article><noun>elephant</noun></subject></subject-part><verb>have/has/had/had</verb><adjective>long</adjective><article>a/the</article><noun>nose/trunk(elepant)</noun>」と「have」を動詞にとる構文に書き換えている。
【００２３】
次に、機械翻訳プログラム２０は、選択肢の決定処理を行い、処理した結果を中間文書（２）３３として記憶装置３に格納する（Ｓ２０７）。本処理で生成した中間文書（２）３３の例が図１１に示してある。本処理では、冠詞や多義語の解決を行う。複数選択の可能性のある単語は、例えば「nose/trunk」のように「／」で示してあるので、それを包むタグや前後の文章解析から得られる情報により、適切な単語を選択する。
【００２４】
次に、機械翻訳プログラム２０は曖昧性が解決した中間文書（２）３３からタグを取り除き翻訳文書３４として記憶装置３に格納する（Ｓ２０８）。本処理で生成した翻訳文書３４の例が図１２に示してある。
なお、解析・変換生成処理が進んでいくと、直前の処理結果だけではなく、それまでに処理したすべての出力結果が必要になる場合がある。一つの文書の中に全ての処理の結果をスタックしていくか、複数の文書にして、各出力情報の対応が取れるようにして利用するとよい。なお、解析・変換生成処理は、複数の文を並行して処理することが可能である。また、特殊構文の処理や多義語などの曖昧性の除去処理も各問題毎に並行処理を行うことにより、処理時間を許容範囲内に押さえることが可能となる。
【００２５】
また、a かthe のどちらを取るかといった冠詞の問題など文脈によって訳語、訳文を選択しなければならない場合も多い。そのため、文法情報だけでなく、意味解析による文脈情報もタグとして文書中に埋め込んで行く必要がある。そのような処理を行う例として、図１３の原文書の例を時系列解析した中間文書（１）３２の例が図１４に示してある。
【００２６】
< シーン> は文頭、時間の変わり目、場所の変更、主語の変更を検出したことを現すタグで、関連キーワードを発見したときにその解析を実行する。< フォーカス> は複数の文に同じ単語が現れたとき、その最初に現れた文に設定する。< 行動> は人、或いは擬人化した名詞が主語の文に付ける。< 描写> は人以外のものをさす名詞が主語の文に付ける。これらは、主語などの欠落語の検出、文脈解析の手がかりに使う。
【００２７】
図３は、解析手段２１の解析結果であるタグの処理を担当する変換生成手段２２との対応テーブルである。変換生成手段２２が複数のプログラムで構成される場合解析手段２１が出力したタグを使用して変換生成する変換生成プログラム名との対応関係が格納されている。例えば文書を表す＜文書＞、＜／文書＞タグは変換生成プログラム１が処理可能であり、品詞を表す＜名詞＞や＜動詞＞などは変換生成プログラム２が処理可能であることを示している。未解決フラグは、初期状態ではＯＮとなっており、変換生成プログラムがタグの処理を完了するとＯＦＦとし、該当するタグの処理が解決済であるかどうかを示すものである。このテーブルは、製品として提供される機械翻訳プログラムに予め提供され、利用者によって解析手段２１や変換生成手段２２として新たにプログラムを追加したり、既存のプログラムを更新する際に追加・更新される。
【００２８】
図４は、変換生成手段２２として動作する変換生成プログラムから他の変換生成プログラムの呼び出し処理の流れを示すフローチャートである。各変換生成プログラムは、解析手段２１の解析結果であるタグを処理して変換生成処理を行っていくが、変換生成処理自体は従来の機械翻訳プログラムと変わりないため処理の詳細については省略する。ここでは、変換生成プログラムから他の変換生成プログラムを呼び出す部分に着目して説明する。
【００２９】
各変換生成プログラムは、解析手段２１が生成した中間文書（１）に基づき変換生成を行い、処理結果を中間文書（２）を生成する。変換生成プログラムは中間文書（１）を先頭から走査し、タグを検出すると（Ｓ４０１）、タグ・変換生成対応テーブルを参照して、そのタグが自変換生成プログラムで処理可能なものかどうかを判定する（Ｓ４０２）。自変換生成プログラムで解決できないタグであると、そのタグに関する処理はスキップし、Ｓ４０１のタグの取り出し処理に戻る。
【００３０】
取り出したタグが自変換生成プログラムで処理可能なものであれば、タグの処理を行う（Ｓ４０３）。実際のタグの処理については、図２の本発明に係る実施の形態１における機械翻訳プログラム２０の処理の流れの説明を参照されたい。タグの処理を完了すると、図３で示すタグ・変換生成対応テーブルの該当するタグの未解決フラグをＯＦＦにする（Ｓ４０４）。すべてのタグの処理が完了するまで前述の処理を繰り返し（Ｓ４０５）、すべてのタグの処理が完了すると、図３で示すタグ・変換生成対応テーブルを参照し、未だ未解決フラグがＯＮとなっているものがあるかどうかを判定し（Ｓ４０６）、未解決フラグがＯＮとなっているタグの処理を担当する変換生成プログラムを呼び出す（Ｓ４０７）。
【００３１】
以上のように、解析・変換生成処理を多段階に処理し、後方の処理は、それ以前の処理結果すべてを参照して処理することが可能となるため、より適切な翻訳結果を得ることが可能となる。また、処理を複数の段階に分け、処理した結果を内部インタフェースとせずにＸＭＬ形式の中間文書（１）、中間文書（２）として伝達していくため、１つの複雑な処理で解決する場合に比べて、それぞれの段階の処理が単純化し開発が容易となる。更に、製品として提供された機械翻訳プログラムを利用者の翻訳対象となる原文の特徴に合わせてカストマイズしたり、新規の解析・変換生成処理を追加することがアプリケーションプログラムが作成できるレベルの利用者においても容易に開発することが可能となる。
【００３２】
【発明の効果】
より適切な翻訳結果を出力することが可能となり、かつ解析・変換生成結果を標準的なＸＭＬ形式による中間文書とすることによりプログラム開発工数の削減を図ることが可能となる。
【図面の簡単な説明】
【図１】実施の形態１の全体構成図
【図２】実施の形態１における機械翻訳プログラム処理
【図３】タグ・変換生成対応テーブル
【図４】実施の形態１における変換生成プログラムから他の変換生成プログラムの呼び出し処理
【図５】翻訳対象の原文書例１
【図６】中間文書（１）の例１
【図７】中間文書（１）の例２
【図８】中間文書（２）の例１
【図９】中間文書（２）の例２
【図１０】中間文書（２）の例３
【図１１】中間文書（２）の例４
【図１２】翻訳結果例１
【図１３】翻訳対象の原文書例２
【図１４】中間文書（１）の例３
【図１５】従来の機械翻訳装置例
【符号の説明】
１入力手段
２機械翻訳装置
３記憶装置
４記憶装置
２０機械翻訳プログラム
２１解析手段
２２変換生成手段
２３翻訳完成手段
３１原文書
３２中間文書（１）
３３中間文書（２）
４１辞書[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to machine translation, in particular, to input a source sentence described in a source language to be translated, analyze the input source text using a source language dictionary, grammar rules, and the like, and recognize by analysis. The present invention relates to a technique for converting a source language in a stepwise manner and outputting a target language sentence with ambiguity resolved.
[0002]
[Prior art]
With the current internationalization of information, the use of machine translation devices is increasing, and there is a strong demand for improved translation quality. In the past, if the meaning of the original text could be read, the purpose of use was often achieved, or it was often used as an auxiliary tool for translators, but now it is used for various purposes such as using the translation result without processing it as it is A form is required.
[0003]
Research on machine translation began around 1945, when computers appeared, and various basic natural language processing techniques were proposed, and morphological analysis, which decomposes sentences into words registered as dictionary headings, sentence Parsing to analyze the structure of a sentence, expressing grammatical rules in the form of a logical expression, using that to analyze the sentence, preparing grammatical information in a vocabulary item, and Techniques related to various natural language processing techniques such as lexical analysis for analysis are described in publications (for example, see Non-Patent Document 1).
[0004]
A conventional machine translator incorporating such a basic technology includes, for example, an input unit 1 for inputting a translated word or a translated word specifying original sentence specifying a translated word, a CPU and a memory 2 for performing a machine translation process as shown in FIG. The CPU and the memory 2 have a morphological analysis unit that analyzes morphemes (conjugated endings, prefixes, suffixes, etc.) as units smaller than words and idioms that are basic units forming sentences to identify words. 21; a syntactic / semantic analysis unit 22 that performs a syntax analysis from the morphological analysis result; a conversion generation unit 23 that converts a source language recognized in the analysis stage to generate a sentence described in a target language; The output unit 24 outputs the post-edited target language sentence as needed.
[0005]
The morphological analysis unit 21 decomposes the inputted translated word specifying original text into morphemes and analyzes the morpheme using the dictionary 25. The morphologically analyzed original text is subjected to syntax analysis and semantic analysis by the syntax / semantic analysis unit 22. "Syntactic analysis" is to extract the syntactic structure of a sentence from the interrelationship between words and idioms. "Semantic analysis" is mainly to check the semantic category of nouns and the case control of verbs, and Select the meaning uniquely and extract the semantic structure of the source language. The analysis result of the syntax analysis and the semantic analysis is input to the conversion generation unit 23, and the source language recognized in the analysis stage is converted to generate a sentence described in the target language (for example, refer to Patent Document 1). 1).
[0006]
[Patent Document 1]
JP-A-6-12448 (page 2-3, FIG. 5)
[0007]
[Non-patent document 1]
Nomura Hirosato, "Basic Technology of Natural Language Processing," 2nd Edition, The Institute of Electronics, Information and Communication Engineers, May 25, 1989 [0008]
[Problems to be solved by the invention]
However, in the conventional machine translation apparatus as described above, since the translation technique for each sentence has been fundamental, even if it analyzes a certain degree of context, it is possible to completely remove the ambiguity of the sentence. It was difficult to respond, and there were many problems that the translated words and translations were not appropriate. For example, there is an original text "I am an eel.""Givemeeel" and "I am an eel" are candidates for translation. If you are translating one sentence at a time as in the past, the ambiguity cannot be resolved and you will have to select either "give me eel" or "I am an eel", so that often the translation will not be appropriate. there were.
[0009]
However, somewhere in the document, there is a sentence with a keyword such as "I finished eating and left the store", and "give me eel" is a suitable translation if I take the whole original document into consideration during translation. You should understand. The present invention has been proposed in view of the above circumstances, and determines a translation word and a syntax in view of the entire original document or a part where ambiguity has been resolved, and outputs an appropriate translation result. It is an object.
[0010]
[Means for Solving the Problems]
FIG. 1 shows an overall configuration diagram of Embodiment 1 of the present invention. The machine translation program 20 of the present invention is a program executed on the machine translation device 2 which is a computer. When the original document 31 to be translated is specified through the input unit 1, the analysis unit 21 is instructed to input. The original document 31 is analyzed based on a predetermined rule, and an analysis result including the ambiguity information of the meaning and the dependency relation added to the original document as a tag is stored in the storage device 3 as the intermediate document (1) 32. The conversion generation means 22 refers to the original document 31 and the intermediate document (1) 32 and converts a document in the target language including the tag into a document in the target language based on a predetermined rule based on a predetermined rule. The intermediate document (2) 33 is stored in the storage device, and the translation completion means 34 translates the translation or the translated sentence having the ambiguity information included in the intermediate document (2) 33 into the original document 31 and the intermediate document ( ) 32 with reference to select and edit, and is obtained by the outputs of the desired word document by removing the tag, it is possible to increase the accuracy of the translation resolving the ambiguity.
[0011]
The analysis means 21 analyzes morphemes (conjugated endings, prefixes, suffixes, etc.) as units smaller than words or idioms, which are basic units for forming sentences, similarly to a conventional machine translation program, and performs morphological recognition of words. It consists of an analysis unit and a syntactic / semantic analysis unit that performs syntactic analysis from the morphological analysis results.In the present invention, the analysis results were used as internal text information for processing by a machine translation program. It is characterized in that it is converted into a standard XML (eXtensible Markup Language) format in which analysis results are added as tags for each morpheme in order to facilitate the processing.
[0012]
The conversion generation unit 22 converts the source language recognized in the analysis stage and generates a sentence described in the target language in the same manner as in a conventional machine translation program. In the present invention, the conversion generation unit 22 generates the sentence. There is a feature in that the intermediate document (1) 32 in the XML format is converted including the tag, and the intermediate document (2) 33 to be generated is also converted to the XML format with a tag. By doing so, the development of the analysis means 21, the conversion generation means 22, and the translation completion means 23 is facilitated, and the accuracy of translation is improved, and the program is modified according to the characteristics of the original text used by the user. In addition, a general user can easily add a new analysis unit 21 and a conversion generation unit 22.
[0013]
By using a plurality of analysis units 21 and conversion generation units 22, the analysis unit 21 and the conversion generation unit 22 may execute parallel processing as separate programs for parts where no inconsistency occurs even if parallel processing is performed. By selecting the minimum conversion generating means 22 according to the characteristics of the original text used by the user, it is possible to reduce the throughput of the entire machine translation process.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 shows an overall configuration diagram of Embodiment 1 of the present invention. The machine translation program 20 of the present invention is a program executed on the machine translation device 2 which is a computer. When the original document 31 to be translated is specified through the input unit 1, the analysis unit 21 is instructed to input. The original document 31 is analyzed based on a predetermined rule, and an analysis result including the ambiguity information of the meaning and the dependency relation added to the original document as a tag is stored in the storage device 3 as the intermediate document (1) 32. The conversion generation means 22 refers to the original document 31 and the intermediate document (1) 32 and converts a document in the target language including the tag into a document in the target language based on a predetermined rule based on a predetermined rule. The intermediate document (2) 33 is stored in the storage device, and the translation completion means 34 translates the translation or the translated sentence having the ambiguity information included in the intermediate document (2) 33 into the original document 31 and the intermediate document ( ) 32 with reference to select and edit, and is obtained by removing the tag to output the object document.
[0015]
The input device 1 instructs the machine translation device 2 such as a keyboard to perform translation, and the storage device 3 stores an original document 30 to be translated and an intermediate document (1) which is an analysis result of the analysis unit 21. 32, an intermediate document (2) 33 converted and generated by the conversion generating means 22, and a translated document 34 output by the translation completing means 23 are stored. The storage device 4 stores a dictionary 41 used by the analysis unit 21 and the conversion generation unit 22 for morphological analysis, semantic analysis, syntax analysis, translation word selection, and the like. The storage devices 3 and 4 need not be different external storage devices, and may be the same external storage device or a storage device such as a memory.
[0016]
FIG. 2 is a flowchart showing a processing flow of the machine translation program 20 according to the first embodiment of the present invention. In this embodiment, processing from analysis to generation of a translated sentence when translating from Japanese to English based on Example 1 of the original document to be translated shown in FIG. 5 will be described, but the present invention translates from Japanese to English. The translation is not limited to the case but can be performed from any language to any language. First, the machine translation program 20 takes out the original document to be translated specified via the input unit 1 and stores it on the storage device 3 (S201). In the present embodiment, the storage device 3 stores “I saw an elephant at the zoo. The elephant had a long nose” as an original document to be translated.
[0017]
Next, the machine translation program 20 performs a predetermined morphological analysis, separates the original document 31 stored in the storage device 3 and refers to a dictionary 41 (in this processing, a Japanese grammar analysis dictionary) to determine the part of speech of each word. The determination result is added to the original document as a tag with the analysis result to which the grammar information is added, and the result is stored in the storage device 3 as the intermediate document (1) 32. (S202).
[0018]
In the present embodiment, when the sentence “I saw an elephant at the zoo” is subjected to conventional morphological analysis of machine translation, “I (noun) is (particle) zoo (noun) and (particle) elephant ( Noun) is read as (verb) (verb), and information indicating a sentence or a main part / predicate is added as grammatical information as a tag. FIG. 6 shows an example of the intermediate document 1 generated by this processing. A tag is enclosed by “<” and “>”, and is a symbol for explaining a word, a sentence, a document, etc. sandwiched therebetween, for example, as a pair of <document> and </ document>. Each word has a tag indicating a part of speech, a main part / predicate, or the like. The tag addition processing itself is the same processing as that of a general markup language, and will not be described.
[0019]
Next, the machine translation program 20 performs syntax / semantic analysis, and stores the result of analyzing the special syntax in the storage device 3 as the intermediate document (1) 32 (S203). FIG. 7 shows an example of an analysis result in this processing. Here, the dual subject problem of "elephant is nose" is solved. That is, if "elephant" has "nose", it is solved using the semantic analysis dictionary.
[0020]
Next, the machine translation program 20 refers to the dictionary 41, performs word translation processing, and stores the processing result in the storage device 3 as an intermediate document (2) 33 (S204). Note that the word translation process is also performed on tags displayed in Japanese and added to the intermediate document (1) 32. FIG. 8 shows an example of the intermediate document (2) 33 generated by this processing. Here, conversion to a target language is basically performed by dictionary lookup of words and idioms.
[0021]
Next, the machine translation program 20 performs word order conversion processing with reference to the dictionary 41, and stores the processing result in the storage device 3 as the intermediate document (2) 33 (S205). FIG. 9 shows an example of the intermediate document (2) 33 generated by this processing. Next, the machine translation program 20 performs special syntax processing with reference to the dictionary 41, and stores the processing result in the storage device 3 as the intermediate document (2) 33 (S206). FIG. 10 shows an example of the intermediate document (2) 33 generated by this processing.
[0022]
Here, special processing for the "owner" tag is performed.
"<Owner><article> an / the </ article><noun> elephant </ noun></owner><subject><article> an / the </ article><noun> nose / trunk (elepant) </ noun></subject><verb> be </ verb><adjective> long </ adjective> ”, and this part is converted to <subject-part><subject><article> an / the </ article><noun> elephant </ noun></subject></subject-part><verb> have / has / had / had </ verb><adjective> long </ adjective><article> a / the </ article><noun> nose / trunk (elepant) </ noun>"and" have "as verbs.
[0023]
Next, the machine translation program 20 performs an option determination process, and stores the processing result in the storage device 3 as an intermediate document (2) 33 (S207). FIG. 11 shows an example of the intermediate document (2) 33 generated by this processing. In this processing, the article and the polysemy are resolved. Since words that can be selected multiple times are indicated by “/”, for example, “nose / trunk”, an appropriate word is selected based on a tag surrounding the word or information obtained from sentence analysis before and after.
[0024]
Next, the machine translation program 20 removes the tag from the intermediate document (2) 33 whose ambiguity has been resolved, and stores it in the storage device 3 as the translated document 34 (S208). FIG. 12 shows an example of the translated document 34 generated in this processing.
As the analysis / conversion generation process proceeds, not only the immediately preceding processing result but also all output results processed up to that time may be required. It is preferable to stack the results of all processes in one document, or to use a plurality of documents so that each output information can be corresponded. Note that the analysis / conversion generation processing can process a plurality of sentences in parallel. In addition, processing of special syntax and processing of removing ambiguity such as polysemy are performed in parallel for each problem, so that the processing time can be kept within an allowable range.
[0025]
Also, in many cases, it is necessary to select a translated word or a translated sentence depending on the context, such as a question of an article such as whether to take a or the. Therefore, it is necessary to embed not only grammatical information but also context information obtained by semantic analysis as a tag in a document. As an example of performing such processing, FIG. 14 shows an example of an intermediate document (1) 32 obtained by performing time-series analysis on the example of the original document of FIG.
[0026]
<Scene> is a tag indicating that the beginning of a sentence, a change in time, a change in location, or a change in subject has been detected. When a related keyword is found, the analysis is performed. <Focus> sets the first sentence when the same word appears in multiple sentences. <Action> is attached to a sentence whose subject is a person or a personified noun. <Description> is a sentence whose subject is a noun that refers to something other than a person. These are used to detect missing words such as the subject and provide clues for context analysis.
[0027]
FIG. 3 is a correspondence table with the conversion generation unit 22 that is in charge of processing the tag as the analysis result of the analysis unit 21. When the conversion generation means 22 is composed of a plurality of programs, the correspondence between the conversion generation means 22 and the name of the conversion generation program to be converted and generated using the tag output by the analysis means 21 is stored. For example, the <document> and </ document> tags indicating a document can be processed by the conversion generation program 1, and the <noun> and <verb> indicating the part of speech can be processed by the conversion generation program 2. . The unsolved flag is ON in the initial state, and is turned OFF when the conversion generation program completes the processing of the tag, and indicates whether or not the processing of the corresponding tag has been resolved. This table is provided in advance to a machine translation program provided as a product, and is added or updated when a user adds a new program as the analysis unit 21 or the conversion generation unit 22 or updates an existing program. .
[0028]
FIG. 4 is a flowchart illustrating a flow of a process of calling another conversion generation program from the conversion generation program that operates as the conversion generation unit 22. Each conversion generation program processes the tag as the analysis result of the analysis unit 21 to perform the conversion generation process. However, since the conversion generation process itself is the same as the conventional machine translation program, the details of the process are omitted. Here, a description will be given focusing on a part that calls another conversion generation program from the conversion generation program.
[0029]
Each conversion generation program performs conversion generation based on the intermediate document (1) generated by the analysis unit 21, and generates an intermediate document (2) based on the processing result. The conversion generation program scans the intermediate document (1) from the beginning and detects a tag (S401), and refers to the tag / conversion generation correspondence table to determine whether the tag can be processed by the own conversion generation program. (S402). If the tag cannot be resolved by the self-conversion generation program, the processing for that tag is skipped, and the flow returns to the tag extraction processing of S401.
[0030]
If the extracted tag can be processed by the self-conversion generation program, the tag is processed (S403). For the actual tag processing, refer to the description of the processing flow of the machine translation program 20 according to the first embodiment of the present invention in FIG. When the processing of the tag is completed, the unresolved flag of the corresponding tag in the tag / conversion generation correspondence table shown in FIG. 3 is turned off (S404). The above processing is repeated until the processing of all tags is completed (S405). When the processing of all tags is completed, the unresolved flag is turned ON with reference to the tag / conversion generation correspondence table shown in FIG. It is determined whether or not there is any tag (S406), and a conversion generation program that is in charge of processing the tag whose unresolved flag is ON is called (S407).
[0031]
As described above, the analysis / conversion generation processing is performed in multiple stages, and the subsequent processing can be performed by referring to all the processing results before that, so that a more appropriate translation result can be obtained. It becomes possible. Further, the processing is divided into a plurality of stages, and the processing result is transmitted as an intermediate document (1) and an intermediate document (2) in the XML format without using the internal interface, so that it can be solved by one complicated processing. In comparison, the processing at each stage is simplified and development is facilitated. Furthermore, it is possible to customize the machine translation program provided as a product according to the characteristics of the original text to be translated by the user, and to add a new analysis / conversion generation process. Can also be easily developed.
[0032]
【The invention's effect】
It is possible to output a more appropriate translation result, and it is possible to reduce the number of program development steps by using the analysis / conversion generation result as an intermediate document in a standard XML format.
[Brief description of the drawings]
FIG. 1 is an overall configuration diagram of Embodiment 1. FIG. 2 is a machine translation program process in Embodiment 1. FIG. 3 is a tag / conversion generation correspondence table. Call processing of conversion generation program [Figure 5] Example 1 of original document to be translated
FIG. 6 is an example 1 of the intermediate document (1).
FIG. 7 is an example 2 of the intermediate document (1).
FIG. 8 is an example 1 of the intermediate document (2).
FIG. 9 shows an example 2 of the intermediate document (2).
FIG. 10 shows an example 3 of the intermediate document (2).
FIG. 11 is an example 4 of the intermediate document (2).
FIG. 12: Translation result example 1
FIG. 13 is an original document example 2 to be translated.
FIG. 14 is an example 3 of the intermediate document (1).
FIG. 15 shows an example of a conventional machine translation apparatus [Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Input means 2 Machine translation device 3 Storage device 4 Storage device 20 Machine translation program 21 Analysis means 22 Conversion generation means 23 Translation completion means 31 Original document 32 Intermediate document (1)
33 Intermediate Document (2)
41 dictionaries

Claims

Input the source document described in the source language, analyze the input source document, convert the parsed source language to generate a target language document, and post-edit the target language document as necessary A machine translation method for outputting
When the input original document is analyzed based on a predetermined rule, a result obtained by adding the analysis result including the meaning and the ambiguity information of the dependency relationship to the original document as a tag is stored in the storage device as an intermediate document (1). One or more analysis steps;
With reference to the original document and the intermediate document (1), based on a predetermined rule, a document in a target language including the tag is converted into a document in a target language as an intermediate document (2). One or more transformation generating steps to be stored in a storage device;
A translation completion step of selecting and editing a translated word or a translated sentence having ambiguity information included in the intermediate document (2) with reference to the original document and the intermediate document (1), and outputting an object word document;
A machine translation method, comprising:

Input means for inputting an original document described in a source language;
When the input original document is analyzed based on a predetermined rule, a result obtained by adding the analysis result including the meaning and the ambiguity information of the dependency relationship to the original document as a tag is stored in the storage device as an intermediate document (1). One or more analysis means;
With reference to the original document and the intermediate document (1), based on a predetermined rule, a document in a target language including the tag is converted into a document in a target language as an intermediate document (2). One or more conversion generating means to be stored in a storage device;
Translation completion means for selecting and editing a translation word having ambiguity information or a translation included in the intermediate document (2) with reference to the original document and the intermediate document (1) and outputting an object word document;
A machine translation device comprising:

When the input original document is analyzed based on a predetermined rule, a result obtained by adding the analysis result including the meaning and the ambiguity information of the dependency relationship to the original document as a tag is stored in the storage device as an intermediate document (1). One or more analysis means;
With reference to the original document and the intermediate document (1), based on a predetermined rule, a document in a target language including the tag is converted into a document in a target language as an intermediate document (2). One or more conversion generating means to be stored in a storage device;
A computer as translation completion means for selecting and editing a translated word having ambiguity information or a translated sentence included in the intermediate document (2) with reference to the original document and the intermediate document (1) and outputting an object word document A machine translation program that works.