JP2005025474A

JP2005025474A - Machine translation device, computer program, and computer

Info

Publication number: JP2005025474A
Application number: JP2003189787A
Authority: JP
Inventors: Taro Watanabe; 太郎渡辺; Eiichiro Sumida; 英一郎隅田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2003-07-01
Filing date: 2003-07-01
Publication date: 2005-01-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a statistical machine translation device capable of performing machine translation with high precision between greatly different languages. <P>SOLUTION: This machine translation device 80 is provided with a chunk translation part 90 and a chunk rearrangement processing part 92. The chunk translation part 90 receives an input sentence 78 and outputs a plurality of chunk lines with high possibility from translated chunk lines obtained for optional chunk divisions of the input sentence by using a dictionary stored in a dictionary storage part 84 and a translation model stored in a translation model storage part 74. The chunk rearrangement processing part 92 rearranges chunks of each of the output chunk lines outputted from the chunk translation part 90 and outputs the arrangement of the chunks with the highest likelihood, which is computed according to the translation model stored in the translation model storage part 74 and a chunk pair appearance frequency stored in a chunk pair storage part 76, as a translated sentence 82. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は統計的機械翻訳装置に関し、特に、対訳コーパスを用いた学習により、構造が大きく異なった言語間でも精度よく翻訳する事が可能な統計的機械翻訳装置に関する。
【０００２】
【従来の技術】
［従来の技術の概論］
機械翻訳の一方法として最近盛んに研究されている方式に、統計的機械翻訳がある。統計的機械翻訳では、第１の言語の文と第２の言語の文との対訳文を多数含む対訳コーパスを用いた学習により予め翻訳モデルを作成しておき、この翻訳モデルを用いて翻訳を行なう。翻訳時には、具体的には次の様な作業を行なう。なお以下の説明では、第１の言語を日本語とし、「Ｊ」と表記する。また第２の言語を英語とし、「Ｅ」と表記する。
【０００３】
統計的機械翻訳では、第１の言語Ｊの入力文から第２の言語Ｅへの翻訳を、条件付確率Ｐ（Ｅ｜Ｊ）の最大化問題（＾Ｅ＝ａｒｇｍａｘ_ＥＰ（Ｅ｜Ｊ））として定式化する。この式にベイズの定理を適用する事により、＾Ｅ＝ａｒｇｍａｘ_ＥＰ（Ｅ）Ｐ（Ｊ｜Ｅ）が得られる。この式のうちＰ（Ｅ）は言語モデルと呼ばれるものであり、ターゲット言語Ｅの文における単語の出現尤度を示すものである。後者のＰ（Ｊ｜Ｅ）が翻訳モデルと呼ばれるものであり、第２の言語の文Ｅから第１の言語の文Ｊが生成される確率を表す。この言語モデルと翻訳モデルとを用いて、入力文Ｊに対し前述した条件付確率Ｐ（Ｅ）Ｐ（Ｊ｜Ｅ）が最大となる翻訳文＾Ｅを生成する。
【０００４】
なお、翻訳モデル作成の際の翻訳の原言語をソース言語、翻訳の目的言語をターゲット言語と呼ぶ。したがってこの翻訳モデルではソース言語はＥ，ターゲット言語はＪとなる。これはこの翻訳モデルを用いて実際に翻訳をする際の入力言語と出力言語との関係を逆にしたものとなっている。以下の説明では、ソース言語の文及び単語をそれぞれソース文及びソース単語、ターゲット言語の文及び単語をそれぞれターゲット文及びターゲット単語と呼ぶ。
【０００５】
翻訳モデルＰ（Ｊ｜Ｅ）を実現するにあたって、フランス語と英語、及びドイツ語と英語など、互いに近い関係にある言語間の翻訳では、語アライメント方式と呼ばれる統計的翻訳がよい成績を収めてきた。
【０００６】
［語アライメントによる統計的翻訳］
語アライメント方式の統計的翻訳は、語アライメントという概念によって二つの言語の対応関係を表し翻訳モデルを生成する。なお、語アライメントでは、ソース文の単語の各々に対して１対多の関係でのターゲット単語の生成を許すものとする。
【０００７】
図１は、翻訳モデル（Ｊ｜Ｅ）におけるソース言語（英語）Ｅとターゲット言語（日本語）Ｊとについての、語アライメントの例を示すものである。図１において、英語の文Ｅ（”ｓｈｏｗｍｅｔｈｅｏｎｅｉｎｔｈｅｗｉｎｄｏｗ”）の各単語と、それに対応する日本語の文Ｊ（「ウィンドノシナモノオミセテクダサイ（ｕｉｎｄｏｎｏｓｈｉｎａｍｏｎｏｏｍｉｓｅｔｅｋｕｄａｓａｉ）」の各単語との対応関係を示す。対応している単語の対を線で結んである。なお、文ＪとＥとの各単語の右下に示した数字は、その単語の文頭からの位置を示す番号である。
【０００８】
以下の説明では、語アライメントについて「Ａ」という符号を用いて説明する。図１において、語アライメントＡは「７０４０１１」となっている。これは、日本語の文Ｊを構成する各単語に対応する英語の文Ｅ中の単語の位置を、日本語の文Ｊ中の単語の順に並べたものである。すなわち、日本語の文Ｊ中の番号１，２，３，４，５，６の単語はそれぞれ、英語の文Ｅ中の番号７，０，４，０，１，１に対応付けられている（アラインされている）。なおここで「０」は、対応する単語が存在しないこと（ＮＵＬＬに対応すること）を示す。逆に複数の単語に一つの単語が対応する場合もある。この例では、枠２０で示すソース単語「ｓｈｏｗ_１」からは、枠２２で示す「ｍｉｓｅ_５」及び「ｔｅｋｕｄａｓａｉ_６」という２つのターゲット単語が生成されている。
【０００９】
こうした語アライメントを想定すると、翻訳モデルＰ（Ｊ｜Ｅ）はさらに、厳密に以下の通り分解できる。
【００１０】
【数１】

この式は、ソース文Ｅとターゲット文Ｊとの間の語アライメントＡを全て考え、それらの尤度を全て加算したものが、ソース文Ｅに対するターゲット文Ｊの尤度となる事を意味する。
【００１１】
［ＩＢＭモデル］
ソース文Ｅからターゲット文Ｊへの生成プロセスにおいて、Ｐ（Ｊ，Ａ｜Ｅ）は、挿入、削除、及び並べ替えの様ないくつかのプロセスが組合わされて構成されている。後掲の非特許文献１により定義されている語アライメント方式の翻訳モデル（例えばＩＢＭモデル４）は、以下の様なシナリオに従っている。
【００１２】
（１）ソース単語の各々について、いくつのターゲット単語を生成するかをファーティリティモデルにより選択する。例を図２に示す。図２は、左端に示したソース文（単語を縦に配列してある。）から、右端に示したターゲット文への変換過程において、単語の対応関係がどのように変化するかを示す。矢印は、その左側の単語がその右側の単語（群）に対応付けられていることを示す。図２において、枠３０に示す通り「ｓｈｏｗ」というソース単語は２語に増やされ、枠３２に示すソース単語「ｍｅ」は削除されている。
【００１３】
（２）ＮＵＬＬ生成モデルに従って、適切な位置にＮＵＬＬを挿入する。図２に示す例では、枠３４に示す通り、二つの「ｓｈｏｗ」の各々の後にＮＵＬＬが挿入されている。
【００１４】
（３）生成された単語の各々について、語彙モデルを用いたルックアップによって１語ごとに翻訳を行なう。図２に示す例では、二つのソース単語「ｓｈｏｗ」のうち、枠３６によって示すものがターゲット単語「ｍｉｓｅ」に翻訳されている。
【００１５】
（４）ディストーションモデルを参照する事により翻訳後の単語を並べ替える。図２の例では、枠３８によって示す通り、「ｍｉｓｅ」は５番目の位置に配置され、「ｕｉｎｄｏ」は先頭に配置される。句の制約を保存するために、単語の位置は直前の語のアライメントによって決定される。
【００１６】
この従来例で用いられている各モデルでのシンボルの意味については、非特許文献１を参照されたい。
【００１７】
【非特許文献１】
ピーターＦ．ブラウン、スティーブンＡ．デラ・ピエトラ、ビンセントＪ．デラ・ピエトラ、及びロバートＬ．メルサー、１９９３、「統計的機械翻訳の数学：パラメータ推定」、コンピューテーショナルリングイスティックス、１９（２）：２６３−３１１（ＰｅｔｅｒＦ．Ｂｒｏｗｎ，ＳｔｅｐｈｅｎＡ．ＤｅｌｌａＰｉｅｔｒａ，ＶｉｎｃｅｎｔＪ．ＤｅｌｌａＰｉｅｔｒａ，ａｎｄＲｏｂｅｒｔＬ．Ｍｅｒｃｅｒ．１９９３．Ｔｈｅｍａｔｈｅｍａｔｉｃｓｏｆｓｔａｔｉｓｔｉｃａｌｍａｃｈｉｎｅｔｒａｎｓｌａｔｉｏｎ：Ｐａｒａｍｅｔｅｒｅｓｔｉｍａｔｉｏｎ．ＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ、１９（２）：２６３−３１１．
【００１８】
【発明が解決しようとする課題】
語アライメント方式の翻訳モデルの生成では、ソース文に含まれる単語の集合の各々について個別に翻訳語を生成してターゲット単語の集合を生成し、さらにそれらターゲット単語の、翻訳文内での位置を決定する事により翻訳を行なう、という戦略を採っている。こうした手続を用いて翻訳モデルを生成する事により、同種の言語間の翻訳モデルではその対応関係を比較的精度よく捕らえる事ができる。しかし、日本語と英語の様に構造が互いに大きく異なる言語間ではさらに解決すべき問題が残っている。
【００１９】
ファーティリティモデルでは、削除についてのモデル化がされているが、残念ながら文脈に関わらず削除された語にゼロを割当てているだけである。同様に、挿入される語は語彙モデルパラメータを用いて選択され、二項分布により決定される位置に挿入されるに過ぎない。
【００２０】
この様な挿入／削除の方式は翻訳プロセスの表現を単純化する上では有用であり、膨大な対訳文の集合に基づいて処理を行なう事が可能になるという効果を持つ。しかし、語の削除及び挿入の様な現象についてこの様に弱いモデル化しか行なわない場合、日本語と英語の様に互いに大きく異なる言語の組合せについては、十分な翻訳性能を期待する事はできない。
【００２１】
ＩＢＭモデル４（及び５）は、ディストーションモデルのパラメータとして、暗黙のうちに句の制約をシミュレートしている。さらに、全体の並べ替えは局所的な並べ替えを寄せ集めたものにより決定されており、長い距離にわたる句の制約を十分に捕らえる事はできない。
【００２２】
それゆえに本発明の目的は、大きく異なる言語間でも精度の高い機械翻訳を行なう事ができる統計的機械翻訳装置を提供する事である。
【００２３】
この発明の他の目的は、大きく異なる言語間でも、長い距離にわたる句の制約を反映した精度の高い機械翻訳を行なう事ができる統計的機械翻訳装置を提供する事である。
【００２４】
【課題を解決するための手段】
本発明の第１の局面にかかる機械翻訳装置は、第１の言語の入力文を第２の言語に翻訳するための機械翻訳装置であって、入力文を１又は複数個のチャンクに分割することにより得られたチャンクの各々について個別に翻訳を行なうためのチャンク翻訳手段と、チャンク翻訳手段から出力される翻訳後のチャンクを並べ替える事により、入力文に対する翻訳文を生成するためのチャンク並べ替え手段とを含む。
【００２５】
好ましくは、機械翻訳装置は、チャンク方式の翻訳モデルを記憶するための手段をさらに含み、チャンク翻訳手段、及びチャンク並べ替え手段は、第１の言語の入力文に対して、チャンク方式の翻訳モデルを参照して処理を行なう。
【００２６】
さらに好ましくは、チャンク翻訳手段は、入力文に対して複数通りのチャンク分割を行ない、各々が１又は複数個のチャンクを含む１又は複数個のチャンク列を出力するためのチャンク分割手段と、チャンク分割手段により出力される１又は複数個のチャンク列の各々に対し、当該チャンク列が含むチャンクの各々に対する翻訳を行なうことにより１又は複数個の出力チャンク列を作成するための出力チャンク作成手段とを含む。出力チャンク作成手段は、１又は複数個の出力チャンク列に対し、翻訳モデルに基づいて尤度を算出する。
【００２７】
機械翻訳装置はさらに、出力チャンク作成手段により作成される出力チャンク列のうち、尤度が所定の条件を充足するものを選択してチャンク並べ替え手段に与えるための出力チャンク列選択手段を含んでもよい。
【００２８】
出力チャンク列選択手段は、出力チャンク作成手段により作成される出力チャンク列のうち、尤度が所定の値以上のものを選択してチャンク並べ替え手段に与えるための手段を含んでもよい。
【００２９】
好ましくは、機械翻訳装置は、第１の言語のチャンクと第２の言語のチャンクとの対を記憶しておくためのチャンク対記憶手段をさらに含む。チャンク翻訳手段はさらに、出力チャンク作成手段により作成された出力チャンクと、当該出力チャンクに対応する入力文のチャンクとの対が、チャンク対記憶手段に記憶されたものと一致していることを検出して、当該出力チャンクの尤度を所定の計算方法にしたがって変更するための尤度変更手段を含む。
【００３０】
より好ましくは、機械翻訳装置は、予め準備された、第１の言語と第２の言語との対訳コーパスに出現するチャンク対を検出し、チャンク対記憶手段に記憶させるための手段をさらに含む。
【００３１】
チャンク対記憶手段は、第１の言語のチャンクと第２の言語のチャンクとの対ごとに予め割当てられた重みを記憶していてもよく、尤度変更手段は、出力チャンク作成手段により作成された出力チャンクと、当該出力チャンクに対応する入力文のチャンクとの対が、チャンク対記憶手段に記憶された第１のチャンク対と一致していることを検出して、当該出力チャンクの尤度を、第１のチャンク対に割当てられた重みの関数により変更するための手段を含んでもよい。
【００３２】
好ましくは、機械翻訳装置は、予め準備された、第１の言語と第２の言語との対訳コーパスに出現するチャンク対と、各チャンク対の対訳コーパス中での出現頻度とを検出し、当該チャンク対と、当該チャンク対の頻度からなる重みとをチャンク対記憶手段に記憶させるための手段をさらに含む。
【００３３】
チャンク分割手段は、入力文に対して可能な全てのチャンク分割を行ない、各々が１又は複数個のチャンクを含む１又は複数個のチャンク列を出力するための手段を含んでもよい。
【００３４】
好ましくは、チャンク方式の翻訳モデルは、第２の言語をソース言語、第１の言語をターゲット言語とする、チャンクの並べ替えモデルを含む。そしてチャンク並べ替え手段は、チャンク翻訳手段から出力される出力チャンク列の各々について１又は複数通りのチャンクの並べ替えを行ない、各チャンクの並べ替えモデルから算出されるチャンク並べ替えの尤度と、当該出力チャンク列に含まれる出力チャンクの各々に対して算出されている尤度とから、各並べ替えの尤度を算出し、最も高い尤度を持つチャンクの配列を翻訳文として出力するための手段をさらに含んでもよい。
【００３５】
第１の言語は日本語、第２の言語は英語でもよい。
【００３６】
本発明の第２の局面にかかるコンピュータプログラムは、コンピュータにより実行されると、当該コンピュータを上記したいずれかの機械翻訳装置として動作させる。
【００３７】
本発明の第３の局面にかかるコンピュータは、上記したコンピュータプログラムによりプログラムされたものである。
【００３８】
【発明の実施の形態】
［概略］
本実施の形態に係る機械翻訳装置は、上記した様な従来の語アライメント方式に替えて、チャンク方式の統計的機械翻訳を行なうものである。チャンクとは、文中で連続している単語の集まりをいう。チャンク方式の翻訳モデル作成の際の翻訳プロセスは以下の通りである。すなわち、まずソース文を複数通りのチャンク分割方法にしたがってチャンク分けする。局所単語アライメントを用いて各チャンクをターゲット言語に翻訳する。最後に、ターゲット言語の制約に従う様に翻訳後のチャンクを並べ替える。このシナリオに従うと、チャンク方式の統計的翻訳モデルは複数の要素で構成され、ＥＭアルゴリズムの亜種によって訓練される。
【００３９】
翻訳結果の探索時には、この翻訳モデルを用い、第１の言語（翻訳モデル生成の際のターゲット言語）の入力文が与えられたとき、入力文を複数通りのチャンク分割の方法により複数個のチャンク列を生成する。各チャンク列は１又は複数個のチャンクを含む。一つのチャンク列の各チャンクに含まれる単語を個々に翻訳し、それらを並べ替えて１又は複数の出力チャンクを得る。この出力チャンクの各々に対しては、翻訳モデルにより尤度が算出できる。さらにこれら出力チャンクを並べ替える。このチャンクの並べ替えの各々に対し、チャンク並べ替えモデルを用いて尤度が算出できる。
【００４０】
このようにして、入力文から得られる複数個のチャンク列の各々について、出力チャンク列がその尤度と共に求められる。これらチャンク列のうち、最も尤度が高くなるチャンク列を探索する。最も尤度が高くなるチャンク列を第２の言語の翻訳文として出力する。
【００４１】
この探索時には、生成される仮説の数が膨大となる。そのため、翻訳のための各段階で展開された仮説のうち、スコアが高い仮説数を一定数だけ残すｌｅｆｔ−ｔｏ−ｒｉｇｈｔ型のビーム探索を行なう事により、計算量を削減する。
【００４２】
実験の結果、翻訳の品質を表すＢＬＥＵスコアでは従来の技術の４６．５％から５２．１％への向上が見られ、主観的な評価でも５９．２％から６５．１％への評価の向上が見られた。
【００４３】
［チャンク方式の統計的翻訳］
以下の説明では、従来の技術の説明と同様、第１の言語（ターゲット言語）を日本語、第２の言語（ソース言語）を英語として、日本語から英語へのチャンク方式の統計的機械翻訳について説明する。チャンク方式の統計的翻訳モデルは、ソース及びターゲット言語の文、ＥとＪとのチャンク分けプロセスを次の式によってモデル化する。
【００４４】
【数２】

ただしＪ^＊及びＥ^＊はそれぞれターゲット文Ｊ及びソース文Ｅに対するチャンク分けを表し、２次元の配列として定義される。例えば、Ｊ^＊ _ｉ，ｊはｉ番目のチャンクのｊ番目の語を表す。ソース及びターゲットのチャンク数は等しいものとする。すなわち、｜Ｅ^＊｜＝｜Ｊ^＊｜である。こうする事により、各チャンクは情報の付加も削除もなしにひとまとまりの意味を表すと考える事ができる。
【００４５】
Ｐ（Ｊ，Ｊ^＊，Ｅ^＊｜Ｅ）をさらにチャンクアライメントＡ及び各チャンクの翻訳に対する語アライメントＡ^＊により分解する。すなわち、
【００４６】
【数３】

この式の意味するところは、文Ｊ，チャンク分けした文Ｊ^＊及びＥ^＊という組合せに割当てられる尤度は、文Ｊ，チャンク分けした文Ｊ^＊及びＥ^＊に対するチャンクアライメントＡ及び語アライメントＡ^＊の組合せの尤度を全て合計したものに等しい、という事である。
【００４７】
アライメントＡという概念自体は、語アライメント方式の翻訳モデルで用いられているものと同一である。チャンクアライメントＡでは、各ターゲットチャンクに対してソースチャンクインデックスを割当てる。Ａ^＊は二次元配列であり、チャンク毎に、ターゲットの単語に対してソースの単語のインデックスを割当てる。
【００４８】
例えば、図３は図１に示す例により、２段階のアライメントを示したものである。図３において、最も上に示されているのは、ソース文Ｅに対するチャンク分けＥ^＊である。チャンク分けＥ^＊はソース文Ｅを３つのチャンクに分けている。図３においても、各単語の右下には単語番号が、各チャンクの右下にはチャンク番号が、それぞれ示されている。
【００４９】
チャンク分けＥ^＊の各チャンクに含まれる単語の集合をそれぞれターゲット言語に訳したものが図３において２段目に示されるものである。この例ではチャンクごとに尤度が高くなるように単語を配列してある。このチャンクを尤度が高くなるようにさらにアラインすることにより、３段目に示されるターゲット語の文Ｊのチャンク分けＪ^＊が生成される。図３において、ソース言語のチャンク分けＥ^＊の番号３，２，１のチャンクが、ターゲット言語のチャンク分けＪ^＊の番号１，２，３のチャンクに対応している。したがってこの例でのチャンクアライメントＡは「３，２，１」となる。ターゲット言語のチャンク分けＪ^＊における各チャンク内での単語のアライメントを、ソース言語の単語の番号でチャンクごとに角カッコで分離して示したものがアライメントＡ^＊である。
【００５０】
３番目の位置のターゲットチャンクＪ^＊、「ｍｉｓｅｔｅｋｕｄａｓａｉ」は先頭位置（Ａ_３＝１）にアラインされ、「ｍｉｓｅ」と「ｔｅｋｕｄａｓａｉ」との双方ともソース文の先頭位置にアラインされている（Ａ^＊ _３，１＝１、Ａ^＊ _３，２＝１）。
【００５１】
［翻訳モデルの構造］
Ｐ（Ｊ，Ｊ^＊，Ａ，Ａ^＊，Ｅ^＊｜Ｅ）という項はさらに、以下のシナリオによって近似的に以下の様に分解される。なお図４は、ソース文Ｅを最左端に、ターゲット文Ｊを最右端に示し、ソース文Ｅからターゲット文Ｊへの翻訳の過程を、チャンク分け、削除及びファーティリティモデルの適用、挿入モデルの適用、語彙モデルの適用、チャンク内での語の並べ替えモデルの適用、及びチャンク並べ替えモデルの適用の各段階ごとに示したものである。
【００５２】
図４において、角を丸めた四角はチャンクを示し、矢印は隣合う二つの段階における単語またはチャンクの対応関係を示す。
【００５３】
（１）ソース文Ｅに対してＰ（Ｅ^＊｜Ｅ）を用いてチャンク分けを行なう。例えば、「ｓｈｏｗｍｅ」及び「ｔｈｅｏｎｅ」というチャンクが得られる。このプロセスはさらに以下の二つのステップ（ａ）及び（ｂ）によりモデル化される。
【００５４】
（ａ）ヘッドモデルによりチャンクサイズを選択する。各語Ｅ_ｉについて、ヘッドモデルε（Φ_ｉ｜Ｅ_ｉ）を用いてチャンクサイズΦ_ｉを割当てる。０より大きなチャンクサイズの語はヘッド語として取り扱われ、それ以外の語は非ヘッド語として取り扱われる。すなわちヘッドモデルは、単語をチャンクのヘッド語とするか、非ヘッド語とするかを指定するための機能を持つ。図４において、ヘッド語はボールド体で示してある。
【００５５】
（ｂ）非ヘッド語をそれぞれ、ヘッド語に関連付ける（チャンクモデル）。各非ヘッド語Ｅ_ｉを尤度η（ｃ（Ｅ_ｈ）｜ｈ−ｉ、ｃ（Ｅ_ｉ））によってヘッド語Ｅ_ｈに関連付ける。ただしｈはヘッド語の位置であり、ｃ（Ｅ）は単語Ｅをその単語クラス（例えば品詞）にマップする関数である。例えば、「ｔｈｅ_３」は４−３＝＋１の位置に存在するヘッド語「ｏｎｅ_４」に関連付けられる。こうした操作により、入力文が１又は複数個のチャンクに分割される。かつ各チャンクには一つのヘッド語が含まれる事になる。すなわちチャンクモデルは、ヘッド語以外の単語をヘッド語と関連付ける機能を持つ。
【００５６】
（２）削除及びファーティリティモデルにより翻訳対象単語を選択する。
【００５７】
（ａ）ヘッド語の数を選択する。各ヘッド語Ｅ_ｈ（Φ_ｈ＞０）に対し、ファーティリティモデルν（φ_ｈ｜Ｅ_ｈ）に従ってファーティリティφ_ｈを選ぶ。ファーティリティモデルは、ソース言語Ｅの単語をターゲット言語Ｊに翻訳した後の語数を決める機能を持つ。ここでは、ヘッド語は必ず翻訳する事を想定する。従ってヘッド語についてはφ_ｈ＞０である。加えて、ファーティリティモデルにより生成された語のうちの一つは、一様分布１／φ_ｈを用いて、ターゲット文でのヘッド語として選択される。
【００５８】
（ｂ）非ヘッド語の幾つかを削除する。非ヘッド語Ｅ_ｉ（Φ_ｉ＝０）の各々に対し、削除モデルδ（ｄ_ｉ｜ｃ（Ｅ_ｉ），ｃ（Ｅ_ｈ））に従って削除する。ここで、Ｅ_ｈは同じチャンク内のヘッド語であり、ｄ_ｉはＥ_ｉが削除される場合に１、それ以外の場合には０となる。
【００５９】
（３）単語を挿入する。図４において、二つのチャンクでＮＵＬＬが挿入されている。各チャンクＥ^＊ _ｉに対し、挿入モデルι（φ^’ _ｉ｜ｃ（Ｅ_ｈ））によりスプリアス語φ^’ _ｉの数を選択する。ここでＥ_ｈはチャンクＥ^＊ _ｈのヘッド語である。
【００６０】
（４）単語ごとに翻訳する。スプリアス語を含め、ソース単語Ｅ_ｉの各々を語彙モデルτ（Ｊ_ｊ｜Ｅ_ｉ）に従ってＪ_ｊに翻訳する。
【００６１】
（５）単語を並べ替える。チャンク内の各単語を並べ替えモデルＰ（Ａ^＊ _ｉ｜ε_Ａｊ、Ｊ^＊ _ｊ）に従って並べ替える。この並べ替えモデルは翻訳後のチャンクに含まれる単語の順序を尤度によりモデル化したものであり、チャンクの並べ替えはＩＢＭモデル４に従って行なわれる。すなわち、単語の位置はヘッド語に対する相対的位置によって決定される。
【００６２】
【数４】

ただしｈはチャンクＪ^＊ _ｊに対するヘッド語の位置を示す。例えば、「ｎｏ」は「ｕｉｎｄｏ」に対し−１の位置にある。
【００６３】
（６）チャンクを並べ替える。全てのチャンクを、チャンク並べ替えモデルＰ（Ａ｜Ｅ^＊、Ｊ^＊）に従って並べ替える。チャンク並べ替えはディストーションモデルにも類似している。ディストーションモデルでは、直前のアライメントでの相対的位置によって位置が決定される。チャンク並べ替えモデルは以下の式で表される。
【００６４】
【数５】

ただしｊ’は直前のチャンクａＥ^＊ _Ａｊ−１のチャンクアライメントである。ｈ及びｈ’はそれぞれ、Ｊ^＊ _ｊ及びＥ^＊ _Ａｊ−１のヘッド語のインデックスである。なお、この並べ替えはヘッド語とは独立である。
【００６５】
以上を要約すると、チャンク方式の翻訳モデルは以下の様に定式化できる。
【００６６】
【数６】

［チャンク方式翻訳モデルの特徴］
チャンク方式翻訳モデルが語アライメント方式の翻訳モデルと異なるのは、単語の集合の翻訳の取り扱いである。語アライメント翻訳モデルでは、各ソース単語について単語の集合を生成する。それに対してチャンク方式翻訳モデルでは、ソース単語の集合に対してターゲット単語の集合を構築する。この振舞いは、最初に単語をその属するチャンクのヘッド語に関連付け、さらにチャンク毎の翻訳／挿入／削除を行なうチャンク処理プロセスとしてモデル化される。
【００６７】
語アライメントは複雑であるが、二段階にわたる単語位置の決定により処理される。すなわちチャンクの翻訳とチャンクの並べ替えとである。前者は局部的な順序を決定し、後者は全体的な順序を決定する。さらに、各チャンクにヘッド語を設定する事により、ヘッド語からの位置によって、２段階の並べ替えに対する制約を設ける事が可能になっている。
【００６８】
［パラメータ推定］
チャンク方式の翻訳モデルのパラメータ推定にはＥＭアルゴリズムを用いる。大きな対訳コーパス（これを「学習コーパス」と呼ぶ。）を準備し、以下の条件確率をターゲット文Ｊ及びソース文Ｅの対の各々に対して推定する（Ｅステップ）。
【００６９】
【数７】

さらにこの推定された条件確率に基づいて、各モデルのパラメータを計算する（Ｍステップ）。これらステップを、パラメータの集合が収束するまで繰返す。
【００７０】
しかし、この様な単純なアルゴリズムだと、非常に困難な計算問題に遭遇する。チャンク分けＪ^＊及びＥ^＊、さらに語アライメントＡ^＊及びチャンクアライメントＡに関して可能なものを全て列挙しようとすれば、非常に大きな計算量が必要である。従って、本実施の形態ではＥステップの計算においてインサイド‐アウトサイド・アルゴリズムの一種を導入した。インサイド‐アウトサイド・アルゴリズムについては後掲の参考文献１を参照されたい。本実施の形態で使用したインサイド‐アウトサイド・アルゴリズムについては別に説明する。
【００７１】
計算量の問題に加え、局所最大値問題も存在する。すなわちＥＭアルゴリズムである最大値解に収束したとしても、それがグローバルな最大値であるという保証はない。この問題に対処し、かつパラメータの収束を早めるため、本実施の形態では学習の初期値としてＩＢＭモデル４のパラメータを用いる。語彙モデルとファーティリティモデルとはチャンク方式の翻訳モデルに直接に適用するが、他のパラメータは一様なものとする。
【００７２】
［インサイド‐アウトサイド・アルゴリズム］
インサイド・アウトサイド計算の基本的考え方は、プロセス全体を二つの部分（すなわちチャンク個々の翻訳と、チャンクの並べ替えと）に切り分けるという事である。チャンク翻訳では各チャンクの翻訳を行なう。チャンクの並べ替えでは、チャンク分けと翻訳後のチャンクの入替えとを行なう。
【００７３】
インサイド（バックワード又はベータ）確率はチャンクと文とのソース／ターゲットの組合せの確率を表し、これは算出する事ができる。アウトサイド（フォワード又はアルファ）確率は、特定のチャンク分け及びチャンクの並べ替えにおいて、特定のソース及びターゲットの対が出現する確率として定義する事ができる。
【００７４】
＜インサイド確率＞
まず、ソース文Ｅとターゲット文Ｊとが与えられると、ソース及びターゲットのチャンク対Ｅ^ｉ’ _ｊ及びＪ^ｊ’ _ｊについて考えられる全ての組合せについて、チャンク翻訳のインサイド確率を下の式に従い計算する。ここでＥ^ｉ’ _ｊはインデックスｉからｉ’までのチャンクを表す。
【００７５】
【数８】

ただしＰθは、ε（Φ_ｉ｜Ｅ_ｉ）又はτ（Ｊ_ｊ｜又はＥ_ｉ）など、対応するランダム変数のための値に対応付けられたモデルの確率である。ただし、チャンク並べ替えモデルについては除く。Ａ’はチャンクＥ^ｉ’ _ｉ及びＪ^ｊ’ _ｊのための語アライメントである。
【００７６】
次に、ソース文Ｅとターゲット文Ｊとの対について、考えられるすべてのチャンク分けとチャンクアライメントを考慮してインサイド確率を計算する。
【００７７】
＜アウトサイド確率＞
文の対作成のためのアウトサイド確率は常に１である。すなわち、
α（Ｅ，Ｊ）＝１．０
各チャンク対のアウトサイド確率は以下で与えられる。
【００７８】
【数９】

＜インサイド‐アウトサイド計算＞
上述のインサイド‐アウトサイド確率の組合せにより、対の累積発生数を求める以下の式が得られる。
【００７９】
まず、関連のランダム変数ｃｏｕｎｔθ（Θ）を持つ各モデルパラメータθのカウントは以下で与えられる。
【００８０】
【数１０】

次に、関連のランダム変数
【００８１】
【数１１】

を持つチャンク並替えのカウントは以下で与えられる。
【００８２】
【数１２】

＜近似＞
インサイド‐アウトサイドによるパラメータ推定のパラダイムを導入しても、考えられるすべてのチャンク対作成と語のアライメントとを列挙するにはＯ（ｌｍｋ^４（ｋ＋１）^ｋ）の計算が必要である。ここでｌとｍとは、それぞれソース文Ｅとターゲット文Ｊの文の長さであり、ｋはチャンク当たり許される最大の語数である。さらに、考えられる全てのチャンク分けされた文について、考えられるすべてのアライメントを数え挙げると、Ｏ（２^ｌ２^ｍｎ！）となる。ただしここでｎ＝｜Ｊ^＊｜＝｜Ｅ^＊｜である。
【００８３】
要求される膨大な計算量に対処するために、インサイド‐アウトサイド推定手順に近似を適用する。まず、チャンクの翻訳のための語アライメントを列挙するのに、特定の語アライメントの移動／交換動作を通し、ビタビアライメントと隣接アライメントというアライメントの組によって近似する。
【００８４】
次に、チャンクアライメントを列挙する際にも、以下の様に、チャンク分けとチャンクアライメントの組合せによって近似する。
１．文当たりのチャンク数を決定する。
２．最初のチャンク分けとアライメントとを決定する。
３．以下のオペレータを用いた山登り法によって、ビタビチャンク分け‐アライメントを計算する。
【００８５】
・チャンクの境界を移動
・チャンクアライメントの交換
・ヘッド位置を移動
４．上述のオペレータを用いて、隣接チャンク分け‐アライメントを計算する。
【００８６】
［デコーディング（翻訳）］
本実施の形態に係るデコードアルゴリズムは、後掲の参考文献２において提示された、語アライメント方式の統計的翻訳のためのビーム探索アルゴリズムに基づくものである。このアルゴリズムでは、入力を任意の順序で消費し、文頭から文末への順で出力を生成する。
【００８７】
デコーダは２段階からなる。
（１）考えられる入力チャンクの全てに対し、考えられる出力チャンクを全て生成する。
（２）入力チャンクを任意の順序で消費しながら、考えられる出力を全て左から右への順序で結合する事により、仮の出力を生成する。
【００８８】
考えられる出力チャンクの生成は、参考文献２に記載の逆語彙モデル及び挿入文字列シーケンスによって評価される。さらに、実例によるスコアの加算方式を導入する。この方式では、ビタビアルゴリズムにより得られるチャンク分けとアライメントとを学習コーパスからルックアップする事により候補のチャンクを作成する。
【００８９】
考えられる全てのチャンクの組合せについて計算を行なうと計算量が膨大となるので、以下の様な枝刈りとスコアリング戦略とを導入する。
【００９０】
＜ビーム枝刈り＞
探索空間が非常に大きいので、上記した二段階の双方において、出力の一部のみを残すためのサイズしきい値を設定する。またスコアリングのためのしきい値も導入し、あるスコアより大きな出力のみを処理する様にする。すなわち、あるしきい値以下のスコアしか持たない出力候補は翻訳の候補から除外される。
【００９１】
＜実例によるスコア加算＞
学習コーパスに実際に現れる入力／出力チャンクの組合せに対して、それらが探索のビームに残る確率を高くするために以下の様なスコア方法を取り入れる。すなわち、デコードのプロセスの第１段階で、学習コーパスに現れたチャンクの組合せが現れると、そのスコアを以下の式（対数形式で示す。）に従い加算する。
【００９２】
【数１３】

ただしＰ_ｔｍ（Ｊ｜Ｅ）及びＰ_ｌｍ（Ｅ）は、それぞれ、翻訳モデル及び言語モデルであり、ｆｒｅｑ（Ｅ^＊ _Ａｊ，Ｊ^＊ _ｊ）は学習コーパスにＥ^＊ _ＡｊとＪ^＊ _ｊとの対が現れる頻度を表し、ｗｅｉｇｈｔは調整のためのパラメータ（重み）を表す。この加算により、学習コーパスに実際に現れたチャンク対に対しては、他のものよりも高い尤度が割当てられる。また、学習コーパスに現れる頻度が高ければ高いほど、そのチャンク対に割当てられる尤度は高くなる。
【００９３】
［装置の構成］
図５に、本実施の形態に係る、日本語から英語への統計的機械翻訳装置のブロック図を示す。図５を参照して、この統計的機械翻訳装置６０は、日本語と英語との多数の対訳文からなる、学習コーパスとしての対訳コーパス７０と、対訳コーパス７０を用い、前述したＥＭアルゴリズムによりチャンク方式の翻訳モデルを生成するための学習部７２と、学習部７２により生成された翻訳モデルを記憶するための翻訳モデル記憶部７４とを含む。対訳コーパス７０に含まれる対訳文の各々は、日本語の文と、それに対応する英語の文とからなっている。本実施の形態の装置では、これら対訳文について、前もってチャンク分けなどの処理はされていない。
【００９４】
図６を参照して、翻訳モデル記憶部７４が記憶する翻訳モデルは、ヘッドモデル１００と、チャンクモデル１０２と、ファーティリティモデル１０４と、削除モデル１０６と、挿入モデル１０８と、語彙モデル１１０と、翻訳後のチャンク内の語の並べ替えに用いられる並べ替えモデル１１２と、翻訳後のチャンクの並べ替えに用いられるチャンク並べ替えモデル１１４とを含む。この翻訳モデルにおいて、ソース言語は英語Ｅ、ターゲット言語は日本語Ｊである。
【００９５】
再び図５を参照して、統計的機械翻訳装置６０はさらに、学習部７２による翻訳モデルの学習の過程で対訳コーパス７０内に出現した日本語と英語のチャンク対を記憶するためのチャンク対記憶部７６と、翻訳モデル記憶部７４に記憶された翻訳モデルとチャンク対記憶部７６に記憶されたチャンク対とを用いて、入力文７８に対して上記した様なチャンク方式の統計的翻訳を行ない、翻訳文８２を出力するためのデコーダ８０と、デコーダ８０が入力チャンクのチャンク分けの際および個々の単語の翻訳の際に参照する辞書を記憶した辞書記憶部８４とを含む。
【００９６】
デコーダ８０は、入力文７８を受け、辞書記憶部８４に記憶された辞書と、翻訳モデル記憶部７４に記憶された翻訳モデルとを用い、入力文の任意のチャンク分けについて得られる翻訳後のチャンク列のうち、尤度の高い複数個のチャンク列を出力するためのチャンク翻訳部９０と、チャンク翻訳部９０から出力される出力チャンク列の各々についてチャンクの並べ替えを行ない、翻訳モデル記憶部７４に記憶された翻訳モデルとチャンク対記憶部７６に記憶されたチャンク対の出現頻度とにしたがって計算される尤度の最も高くなるチャンクの配列を翻訳文８２として出力するためのチャンク並べ替え処理部９２とを含む。
【００９７】
入力チャンク作成部９４は、辞書記憶部８４に記憶されたチャンク分け用の辞書を用いて、入力文７８に対して考えられる入力チャンク列を全て作成するための入力チャンク作成部９４と、入力チャンク作成部９４により作成されたチャンク列の各々について、辞書記憶部８４に記憶された翻訳用の辞書及び翻訳モデル記憶部７４に記憶された翻訳モデルとを用いて、可能な出力チャンク列を全て生成するための出力チャンク作成部９６とを含む。出力チャンク作成部９６はこのとき、出力されるチャンクの各々に対し、翻訳モデル記憶部７４に記憶された翻訳モデルのうち、ヘッドモデル１００、チャンクモデル１０２、ファーティリティモデル１０４、削除モデル１０６、挿入モデル１０８、語彙モデル１１０、及び並べ替えモデル１１２を用いて尤度を算出する。この尤度の算出の際には、前述のとおり、チャンク対記憶部７６に記憶されたチャンク対と同一のチャンク対については尤度の加算が行なわれる。
【００９８】
なお、実際には出力チャンク作成部９６は、翻訳モデルによる尤度計算を行なってチャンク列を出力する際、ビーム探索を行なう事によりチャンクの中でも尤度の高いもののみを残すようにすることにより計算量を削減している。これはこの後のプロセスでも同様である。しかし以下の説明では、説明を簡略化するためにそうしたビーム探索による計算量の削減には言及しない。
【００９９】
チャンク並べ替え処理部９２は、出力チャンク選択部９８から出力されるチャンク列の各々についてチャンクの並べ替えを行ない、翻訳モデル中のチャンク並べ替えモデル１１４に従って尤度を算出し、その結果に従って最も高い翻訳結果を選択し出力する機能を持っている。
【０１００】
［動作］
図５に示した統計的機械翻訳装置６０は以下の様に動作する。統計的機械翻訳装置６０の動作には二つの局面がある。第１の局面は翻訳モデルの学習という局面である。第２の局面は、学習した翻訳モデルを用いて、入力文に対する統計的翻訳を行なうという局面である。これらを順に説明する。
【０１０１】
＜翻訳モデルの学習の局面＞
この翻訳モデル作成において、前述の通りソース言語は英語Ｅ、ターゲット言語は日本語Ｊとする。翻訳モデルの学習に先立って、対訳コーパス７０が準備される。学習部７２は、前述した通りのＥＭアルゴリズムに従って対訳コーパス７０から翻訳モデルを生成する。すなわち、学習部７２は、翻訳モデルの初期値から出発してＥＭアルゴリズムを用いて翻訳モデルのパラメータを繰返し計算する。翻訳モデルは、ヘッドモデル１００、チャンクモデル１０２、ファーティリティモデル１０４、削除モデル１０６、挿入モデル１０８、語彙モデル１１０、並べ替えモデル１１２及びチャンク並べ替えモデル１１４を含む。学習部７２はＥＭアルゴリズムに従って計算したこれら翻訳モデルのパラメータが収束すると、その結果を翻訳モデル記憶部７４に格納する。
【０１０２】
学習部７２はまた、上記した学習の過程で、対訳コーパス７０に出現する日本語と英語とのチャンク対を調べ、出現する全てのチャンク対と、その頻度とをチャンク対記憶部７６に格納する。
【０１０３】
翻訳モデル記憶部７４及びチャンク対記憶部７６が準備できた時点で第１の局面は終了する。なお、このとき言語モデルＰ（Ｅ）も利用可能となっているものとする。
【０１０４】
＜統計的翻訳の局面＞
日本語の入力文７８が与えられると、入力チャンク作成部９４が辞書記憶部８４に記憶されたチャンク分け用の辞書を参照して、入力文７８の考えられるチャンク分けを全て作成する。出力チャンク作成部９６は、入力チャンク作成部９４により作成された入力文７８のチャンク分けごとに、考えられる英語の出力チャンク列を全て作成する。出力チャンク作成部９６はこのとき、語彙モデル１１０及び並べ替えモデル１１２を用いて出力チャンクの尤度を算出する。また、チャンク対記憶部７６に記憶されたチャンク対と同一のチャンク対が現れた場合、そのチャンク対の尤度については前述したとおり加算が行なわれる。
【０１０５】
なお、出力チャンク作成部９６から出力される出力チャンク列のうち、全体としての尤度が所定の値以上のものをチャンク並べ替え処理部９２に与えるようにする。この処理は、前述したとおりビーム探索アルゴリズムにより実現される。
【０１０６】
チャンク並べ替え処理部９２は、出力チャンク作成部９６が作成した出力チャンク列の各々について、チャンクの並べ替えを行なう。そうした並べ替えにより得られる全ての出力チャンク列に対しスコア（尤度）をチャンク並べ替えモデル１１４及び各チャンクの尤度を用いて計算し、最も高いスコアを示した出力文を翻訳文８２として出力する。
【０１０７】
入力チャンク作成部９４、出力チャンク作成部９６及びチャンク並べ替え処理部９２は、最もスコアの高い翻訳文の探索にあたって、前記したインサイド‐アウトサイドアルゴリズムを用いる。
【０１０８】
こうして得られた翻訳文８２が、入力文７８に対してチャンク方式による統計的機械翻訳で得られた結果である。
【０１０９】
なお、上記した処理は全て、コンピュータ、当該コンピュータの記憶装置に格納される電子的に読取可能な対訳コーパス、及び当該コンピュータにより実行されるソフトウェアにより実現できる。特に、学習部７２の処理及びデコーダ８０の処理は、いずれもコンピュータプログラムによって実現する事が可能である。
【０１１０】
［実験］
上記した処理が可能な様にプログラムされたコンピュータを用いて以下の実験を行なった。本実験で使用した対訳コーパスは、出願人において用意した日本語と英語とからなる旅行会話の対話集である（参考文献３を参照されたい。）。この対訳コーパスの概略を以下の表１に示す。
【０１１１】
【表１】

このコーパスの全体を３部に分割した。すなわち、トレーニング用の１５２，１６９文、テスト用の４，８４６文、及びパラメータ調整のための１０，１４８文である。パラメータ調整は、学習時の繰返し終了のための判断基準及びデコーダ８０の調整のために行なった。
【０１１２】
比較のための３つの翻訳システムをテストした。それらはモデル４、ｃｈｕｎｋ３、及びｃｈｕｎｋ３＋である。以下、それらについて簡単に説明する。
【０１１３】
モデル４は、語アライメント方式の翻訳モデルであり、ビーム探索デコーダを備えたＩＢＭモデル４である。
【０１１４】
ｃｈｕｎｋ３は本実施の形態と同様のチャンク方式の翻訳モデルであり、最大チャンクサイズを３に制限したものである。
【０１１５】
ｃｈｕｎｋ３＋は、ｃｈｕｎｋ３に加え、本実施の形態で説明した通り、実際の文例を用いたチャンク候補生成を行なうチャンク方式の翻訳モデルである。
【０１１６】
図７に、英語から日本語への翻訳におけるｃｈｕｎｋ３でのビタビチャンク化及びチャンクアライメントの例を示す。図７において、チャンクは角カッコで区分してある。また各チャンクのヘッド語の左側には「＊」マークを付してある。また、チャンク間のアライメントは、チャンクの対の間にひいた棒線で示してある。たとえば、１番目の例でいえば、「ｉｈａｖｅ」というチャンクは「はあります」というチャンクと、「ｔｈｅｎｕｍｂｅｒ」というチャンクは「番号の控え」というチャンクと、「ｏｆｍｙｐａｓｓｐｏｒｔ」というチャンクは「パスポートの」というチャンクと、それぞれアラインされている。
【０１１７】
実験では、テスト文の集合からランダムに選んだ５１０文を翻訳し、１６個の基準文の集合を参照して、以下の基準により翻訳結果を評価した。
【０１１８】
ＷＥＲ：Ｗｏｒｄ−ｅｒｒｏｒ−ｒａｔｅ。これは基準となる翻訳に到達するまでの編集距離に対応する。この値が低いほど評価は高い。
【０１１９】
ＰＥＲ：位置と独立なＷＥＲ。これは語順上の問題を考慮しない。値が低いほど評価は高い。
【０１２０】
ＢＬＥＵ：ＢＬＥＵスコア。これは、翻訳結果のうち、基準となる訳文の中に見出されたＮ−グラムの率を計算するものである。値が高いほど評価も高い。
【０１２１】
ＳＥ：主観的評価。ネイティブ・スピーカにより、ランクＡからランクＤ（Ａ：完璧Ｂ：良好Ｃ：可Ｄ：ナンセンス）までの範囲で評価したものである。Ａ，Ａ＋Ｂ，Ａ＋Ｂ＋Ｃの範囲に含まれる訳文の率を表す。一般的に、値が高いほど評価も高い。
【０１２２】
表２は、日本語から英語への翻訳結果の評価を要約したものである。また図８はモデル４及びｃｈｕｎｋ３＋の結果をいくつか示したものである。
【０１２３】
【表２】

表２を参照して、ｃｈｕｎｋ３は非主観的な評価においてはモデル４よりもよい結果を示している。もっとも、主観的評価では両者はほぼ同じである。ｃｈｕｎｋ３では、実際に学習コーパスに現れた例を候補としているので、これら３つの中では最もよいスコアを示している事が分かる。
【０１２４】
以上の通り、本実施の形態に係る統計的機械翻訳装置は、入力文をチャンク分けし、各チャンク内で翻訳を行なう事により出力チャンクを作成し、さらに出力チャンクを並べ替える事により翻訳を行なう。個々のチャンク内という局所的な部分で第１段階の翻訳と語の並べ替えが行なわれるので、局所的に正しい翻訳が得られる可能性が高くなる。また、文全体の翻訳結果を得るために、語ではなくチャンク単位での並べ替えが行なわれるので、最終的に得られる翻訳文の構造が、入力文の構造を正しく反映したものとなる可能性も高くなる。その結果、日本語と英語など、大きく構造が異なる言語間でも良好な翻訳結果を得る事ができる。さらに、２段階の並べ替えを行なうので、文中の比較的長い距離にわたる制約でも翻訳結果に反映する事ができる。
【０１２５】
今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内でのすべての変更を含む。
【０１２６】
［参考文献１］
ケンジヤマダ及びケビンナイト、２００１、シンタックス・ベースの統計的翻訳モデル、ＡＣＬ２０００予稿集、ツールーズ、フランス（ＫｅｎｊｉＹａｍａｄａａｎｄＫｅｖｉｎＫｎｉｇｈｔ．２００１．Ａｓｙｎｔａｘ−ｂａｓｅｄｓｔａｔｉｓｔｉｃａｌｔｒａｎｓｌａｔｉｏｎｍｏｄｅｌ．ＩｎＰｒｏｃ．ｏｆＡＣＬ２００１、Ｔｏｕｌｏｕｓｅ，Ｆｒａｎｃｅ．）
【０１２７】
［参考文献２］
クリストフティルマン及びハーマンネイ、２０００、統計的翻訳における語の並べ替え及びｄｐ方式の探索、ＣＯＬＩＮＧ２０００予稿集、７月‐８月（ＣｈｒｉｓｔｏｐｈＴｉｌｌｍａｎｎａｎｄＨｅｒｍａｎｎＮｅｙ．２０００．Ｗｏｒｄｒｅ−ｏｒｄｅｒｉｎｇａｎｄｄｐ−ｂａｓｅｄｓｅａｒｃｈｉｎｓｔａｔｉｓｔｉｃａｌｍａｃｈｉｎｅｔｒａｎｓｌａｔｉｏｎ．ＩｎＰｒｏｃ．ｏｆｔｈｅＣＯＬＩＮＧ２０００，Ｊｕｌｙ−Ａｕｇｕｓｔ．）
【０１２８】
［参考文献３］
トシユキタケザワ、エイイチロウスミタ、フミアキスガヤ、ヒロフミヤマモト、及びセイイチヤマモト．２００２．実世界における旅行会話の音声翻訳のための大規模バイリンガルコーパスの構築に向けて．ＬＲＥＣ２００２予稿集、１４７頁‐１５２頁、ラス・パルマス、カナリア諸島、スペイン、５月（ＴｏｓｈｉｙｕｋｉＴａｋｅｚａｗａ，ＥｉｉｃｈｉｒｏｕＳｕｍｉｔａ，ＦｕｍｉａｋｉＳｕｇａｙａ，Ｈｉｒｏｆｕｍｉｙａｍａｍｏｔｏ，ａｎｄＳｅｉｉｃｈｉＹａｍａｍｏｔｏ．２００２．Ｔｏｗａｒｄａｂｒｏａｄ−ｃｏｖｅｒａｇｅｂｉｌｉｎｇｕａｌｃｏｒｐｕｓｆｏｒｓｐｅｅｃｈｔｒａｎｓｌａｔｉｏｎｏｆｔｒａｖｅｌｃｏｎｖｅｒｓａｔｉｏｎｉｎｔｈｅｒｅａｌｗｏｒｌｄ．ＩｎＰｒｏｃ．ｏｆＬＲＥＣ２００２，ｐａｇｅｓ１４７−１５２．ＬａｓＰａｌｍａｓ，ＣａｎａｒｙＩｓｌａｎｄｓ，Ｓｐａｉｎ，Ｍａｙ．）
【図面の簡単な説明】
【図１】従来の技術による語アライメント方式の機械翻訳における語の対応関係を説明するための図である。
【図２】従来の技術にかかる、語アライメント方式の機械翻訳の翻訳プロセスを説明するための図である。
【図３】本発明の一実施の形態に係る機械翻訳装置での２段階の並べ替えを説明するための図である。
【図４】本発明の一実施の形態に係る機械翻訳装置での翻訳プロセスを説明するための図である。
【図５】本発明の一実施の形態に係る機械翻訳装置の機能ブロック図である。
【図６】本発明の一実施の形態に係る機械翻訳装置で用いられる翻訳モデルを模式的に示す図である。
【図７】本発明の一実施の形態に係る機械翻訳装置による翻訳結果の例を示す図である。
【図８】本発明の一実施の形態に係る機械翻訳装置の翻訳結果の評価を、従来の機械翻訳装置の翻訳結果の評価と対比して示す図である。
【符号の説明】
６０統計的機械翻訳装置、７０対訳コーパス、７２学習部、７４翻訳モデル記憶部、７６チャンク対記憶部、８０デコーダ、９０チャンク翻訳部、９２チャンク並べ替え処理部、９４入力チャンク作成部、９６出力チャンク作成部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a statistical machine translation apparatus, and more particularly, to a statistical machine translation apparatus that can accurately translate between languages having greatly different structures by learning using a bilingual corpus.
[0002]
[Prior art]
[Overview of conventional technology]
Statistical machine translation is a method that has been actively studied recently as one method of machine translation. In statistical machine translation, a translation model is created in advance by learning using a bilingual corpus that includes a large number of parallel translations of sentences in a first language and sentences in a second language, and translation is performed using this translation model. Do. Specifically, the following work is performed during translation. In the following description, the first language is Japanese and is represented as “J”. In addition, the second language is English and expressed as “E”.
[0003]
In statistical machine translation, the translation from the input sentence of the first language J to the second language E is performed by maximizing the conditional probability P (E | J) (^ E = argmax_EFormulated as P (E | J)). By applying Bayes' theorem to this equation, ^ E = argmax_EP (E) P (J | E) is obtained. Of these expressions, P (E) is called a language model and indicates the likelihood of appearance of a word in a sentence of the target language E. The latter P (J | E) is called a translation model, and represents the probability that the sentence J of the first language is generated from the sentence E of the second language. Using this language model and the translation model, a translation sentence ^ E having the maximum conditional probability P (E) P (J | E) described above is generated for the input sentence J.
[0004]
The source language for translation when creating a translation model is called a source language, and the target language for translation is called a target language. Therefore, in this translation model, the source language is E and the target language is J. This reverses the relationship between the input language and the output language when actually translating using this translation model. In the following description, a source language sentence and a word are referred to as a source sentence and a source word, respectively, and a target language sentence and a word are referred to as a target sentence and a target word, respectively.
[0005]
In implementing translation model P (J | E), statistical translation called word alignment has been successful in translating between languages such as French and English and German and English. .
[0006]
[Statistical translation by word alignment]
Statistical translation using the word alignment method expresses the correspondence between two languages by the concept of word alignment and generates a translation model. In the word alignment, it is assumed that target words are generated in a one-to-many relationship with respect to each word of the source sentence.
[0007]
FIG. 1 shows an example of word alignment for a source language (English) E and a target language (Japanese) J in a translation model (J | E). In FIG. 1, each word of an English sentence E (“show me the one in the window”) and a corresponding Japanese sentence J (“Wind no Sinamono o misse tekudai”). The correspondence between each word is shown, and the pair of corresponding words is connected with a line.The numbers shown at the lower right of each word of sentences J and E indicate the position of the word from the beginning of the sentence. It is a number to show.
[0008]
In the following description, the word alignment will be described using the symbol “A”. In FIG. 1, the word alignment A is “7 0 4 0 1 1”. In this example, the positions of the words in the English sentence E corresponding to the words constituting the Japanese sentence J are arranged in the order of the words in the Japanese sentence J. That is, the

numbers

1, 2, 3, 4, 5, 6 in the Japanese sentence J are associated with the

numbers

7, 0, 4, 0, 1, 1 in the English sentence E, respectively. (Aligned). Here, “0” indicates that the corresponding word does not exist (corresponds to NULL). Conversely, a single word may correspond to a plurality of words. In this example, the source word “show” indicated by a frame 20 is displayed.₁From “miss” shown in frame 22.₅"And" tekudasai "₆Are generated as two target words.
[0009]
Assuming such word alignment, the translation model P (J | E) can be further decomposed strictly as follows.
[0010]
[Expression 1]

This expression means that the word alignment A between the source sentence E and the target sentence J is considered, and the total of those likelihoods is the likelihood of the target sentence J with respect to the source sentence E.
[0011]
[IBM model]
In the generation process from the source sentence E to the target sentence J, P (J, A | E) is configured by combining several processes such as insertion, deletion, and rearrangement. The translation model (for example, IBM model 4) of the word alignment method defined by the non-patent document 1 described later follows the following scenario.
[0012]
(1) For each source word, the number of target words to be generated is selected by the parity model. An example is shown in FIG. FIG. 2 shows how the correspondence relationship between words changes in the process of conversion from the source sentence shown at the left end (words are arranged vertically) to the target sentence shown at the right end. The arrow indicates that the left word is associated with the right word (group). In FIG. 2, the source word “show” is increased to two words as indicated by a frame 30, and the source word “me” indicated by a frame 32 is deleted.
[0013]
(2) Insert NULL at an appropriate position according to the NULL generation model. In the example illustrated in FIG. 2, NULL is inserted after each of the two “shows” as indicated by a frame 34.
[0014]
(3) For each generated word, translation is performed for each word by lookup using a vocabulary model. In the example illustrated in FIG. 2, of the two source words “show”, the one indicated by the frame 36 is translated into the target word “miss”.
[0015]
(4) Rearrange translated words by referring to the distortion model. In the example of FIG. 2, as indicated by the frame 38, “miss” is arranged at the fifth position, and “uido” is arranged at the head. In order to preserve phrase constraints, the word position is determined by the alignment of the previous word.
[0016]
Refer to Non-Patent Document 1 for the meaning of symbols in each model used in this conventional example.
[0017]
[Non-Patent Document 1]
Peter F. Brown, Steven A. Della Pietra, Vincent Della Pietra and Robert L. Melser, 1993, “Mathematics of Statistical Machine Translation: Parameter Estimation”, Computational Linguistics, 19 (2): 263-311 (Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematical of of static machine translation: Parameter estimation. Computational Linguistics, 19 (2): 263-311.
[0018]
[Problems to be solved by the invention]
In the generation of a word alignment translation model, a translation word is generated individually for each set of words included in the source sentence to generate a set of target words, and the position of the target word in the translation sentence is further determined. The strategy is to translate by making a decision. By generating a translation model using such a procedure, it is possible to capture the correspondence between translation models of the same kind of language with relatively high accuracy. However, there are still problems that need to be solved between languages that have very different structures such as Japanese and English.
[0019]
The parity model models deletion, but unfortunately only assigns zeros to deleted words regardless of context. Similarly, inserted words are selected using vocabulary model parameters and are only inserted at positions determined by the binomial distribution.
[0020]
Such an insertion / deletion method is useful for simplifying the expression of the translation process, and has an effect that it is possible to perform processing based on a huge collection of parallel translation sentences. However, if only such weak modeling of phenomena such as deletion and insertion of words is performed, sufficient translation performance cannot be expected for combinations of languages that are greatly different from each other, such as Japanese and English.
[0021]
The IBM model 4 (and 5) implicitly simulates phrase constraints as a distortion model parameter. Furthermore, the overall sort is determined by a collection of local sorts, and the phrase constraints over long distances cannot be fully captured.
[0022]
Therefore, an object of the present invention is to provide a statistical machine translation apparatus capable of performing machine translation with high accuracy even between greatly different languages.
[0023]
Another object of the present invention is to provide a statistical machine translation apparatus capable of performing machine translation with high accuracy reflecting the restriction of phrases over a long distance even between languages that are greatly different.
[0024]
[Means for Solving the Problems]
A machine translation device according to a first aspect of the present invention is a machine translation device for translating an input sentence in a first language into a second language, and divides the input sentence into one or a plurality of chunks. The chunk translation means for individually translating each of the obtained chunks, and the chunk arrangement for generating a translation sentence for the input sentence by rearranging the translated chunks output from the chunk translation means Replacement means.
[0025]
Preferably, the machine translation device further includes means for storing a chunk-based translation model, and the chunk translation means and the chunk rearranging means are adapted to the chunk-based translation model for the input sentence in the first language. Processing is performed with reference to.
[0026]
More preferably, the chunk translating means performs a plurality of chunk divisions on the input sentence and outputs one or a plurality of chunk strings each including one or a plurality of chunks. Output chunk creation means for creating one or more output chunk strings by translating each of the one or more chunk strings output by the dividing means with respect to each of the chunks included in the chunk string; including. The output chunk creation means calculates the likelihood for one or a plurality of output chunk strings based on the translation model.
[0027]
The machine translation apparatus may further include an output chunk sequence selection unit for selecting an output chunk sequence created by the output chunk creation unit and having a likelihood satisfying a predetermined condition and giving the selected chunk to the chunk rearranging unit. Good.
[0028]
The output chunk sequence selection means may include means for selecting, from the output chunk sequences created by the output chunk creation means, those having a likelihood equal to or greater than a predetermined value and giving them to the chunk rearrangement means.
[0029]
Preferably, the machine translation apparatus further includes a chunk pair storage unit for storing a pair of the first language chunk and the second language chunk. The chunk translation means further detects that the pair of the output chunk created by the output chunk creation means and the chunk of the input statement corresponding to the output chunk matches the one stored in the chunk pair storage means. And the likelihood change means for changing the likelihood of the said output chunk according to a predetermined calculation method is included.
[0030]
More preferably, the machine translation device further includes means for detecting a chunk pair appearing in a bilingual corpus of the first language and the second language prepared in advance and storing the chunk pair in the chunk pair storage unit.
[0031]
The chunk pair storage means may store a weight assigned in advance for each pair of the first language chunk and the second language chunk, and the likelihood changing means is created by the output chunk creating means. The likelihood of the output chunk is detected by detecting that the pair of the output chunk and the chunk of the input sentence corresponding to the output chunk match the first chunk pair stored in the chunk pair storage means. May be included according to a function of the weight assigned to the first chunk pair.
[0032]
Preferably, the machine translation device detects a chunk pair appearing in a bilingual corpus of the first language and the second language prepared in advance, and an appearance frequency of each chunk pair in the bilingual corpus, The apparatus further includes means for storing the chunk pair and the weight including the frequency of the chunk pair in the chunk pair storage means.
[0033]
The chunk division means may include means for performing all possible chunk divisions on the input sentence and outputting one or more chunk strings each including one or more chunks.
[0034]
Preferably, the chunk translation model includes a chunk rearrangement model in which the second language is the source language and the first language is the target language. The chunk rearrangement means performs one or a plurality of chunk rearrangements for each of the output chunk strings output from the chunk translation means, and the likelihood of chunk rearrangement calculated from the chunk rearrangement model; For calculating the likelihood of each permutation from the likelihood calculated for each of the output chunks included in the output chunk sequence, and outputting an array of chunks having the highest likelihood as a translated sentence Means may further be included.
[0035]
The first language may be Japanese and the second language may be English.
[0036]
When the computer program according to the second aspect of the present invention is executed by a computer, it causes the computer to operate as one of the machine translation devices described above.
[0037]
A computer according to the third aspect of the present invention is programmed by the computer program described above.
[0038]
DETAILED DESCRIPTION OF THE INVENTION
[Outline]
The machine translation apparatus according to the present embodiment performs chunk type statistical machine translation instead of the conventional word alignment method as described above. A chunk is a collection of consecutive words in a sentence. The translation process for creating a chunked translation model is as follows. That is, the source sentence is first chunked according to a plurality of chunk division methods. Translate each chunk into the target language using local word alignment. Finally, the translated chunks are rearranged to comply with the target language constraints. According to this scenario, the chunked statistical translation model is composed of multiple elements and trained by variants of the EM algorithm.
[0039]
When searching for a translation result, this translation model is used, and when an input sentence in the first language (target language for generating the translation model) is given, the input sentence is divided into a plurality of chunks by a plurality of chunk division methods. Generate a column. Each chunk column includes one or more chunks. The words contained in each chunk of one chunk sequence are individually translated and rearranged to obtain one or more output chunks. For each of these output chunks, the likelihood can be calculated by a translation model. Furthermore, these output chunks are rearranged. For each of the chunk rearrangements, the likelihood can be calculated using the chunk rearrangement model.
[0040]
In this way, for each of a plurality of chunk strings obtained from the input sentence, an output chunk string is obtained together with its likelihood. Among these chunk sequences, the chunk sequence having the highest likelihood is searched. The chunk string with the highest likelihood is output as a translated sentence in the second language.
[0041]
During this search, the number of hypotheses generated is enormous. Therefore, the amount of calculation is reduced by performing a left-to-right type beam search that leaves a certain number of hypotheses having a high score among the hypotheses developed at each stage for translation.
[0042]
As a result of the experiment, the BLEU score representing the quality of translation shows an improvement from 46.5% to 52.1% of the conventional technique, and the subjective evaluation also shows an evaluation from 59.2% to 65.1%. An improvement was seen.
[0043]
[Static translation of chunk method]
In the following description, as in the case of the conventional technology, the chunk language statistical machine translation from Japanese to English, where the first language (target language) is Japanese and the second language (source language) is English Will be described. The chunk-based statistical translation model models the chunking process of source and target language sentences, E and J, by the following formula.
[0044]
[Expression 2]

However, J^*And E^*Represents chunking for the target sentence J and the source sentence E, respectively, and is defined as a two-dimensional array. For example, J^* _{i, j}Represents the j-th word of the i-th chunk. The number of source and target chunks shall be equal. That is, | E^*| = | J^*|. In this way, each chunk can be considered to represent a group of meanings without adding or deleting information.
[0045]
P (J, J^*, E^*| E) further chunk alignment A and word alignment A for translation of each chunk^*Decompose by. That is,
[0046]
[Equation 3]

This expression means sentence J, chunked sentence J^*And E^*Likelihood assigned to the combination is sentence J, sentence J divided into chunks^*And E^*Chunk alignment A and word alignment A^*Is equal to the sum of all the likelihoods of the combinations.
[0047]
The concept of alignment A itself is the same as that used in the word alignment translation model. In chunk alignment A, a source chunk index is assigned to each target chunk. A^*Is a two-dimensional array and assigns the index of the source word to the target word for each chunk.
[0048]
For example, FIG. 3 shows two-stage alignment according to the example shown in FIG. In FIG. 3, the top part shows the chunk division E for the source sentence E.^*It is. Chunk separation E^*Divides the source sentence E into three chunks. Also in FIG. 3, a word number is shown at the lower right of each word, and a chunk number is shown at the lower right of each chunk.
[0049]
Chunk separation E^*A set of words contained in each chunk is translated into a target language as shown in the second row in FIG. In this example, words are arranged so that the likelihood is high for each chunk. By further aligning this chunk so as to increase the likelihood, chunk division J of the sentence J of the target word shown in the third row^*Is generated. In FIG. 3, source language chunking E^*The chunks numbered 3, 2, and 1 are the target language chunking J^*Correspond to chunks numbered 1, 2, and 3. Therefore, the chunk alignment A in this example is “3, 2, 1”. Target language chunking J^*Alignment of words in each chunk in, with the source language word number separated by square brackets for each chunk, is alignment A^*It is.
[0050]
Target chunk J in the third position^*, “Miss tekudasai” is the head position (A₃= 1), both “miss” and “tekudasai” are aligned at the beginning of the source sentence (A^* _3,1= 1, A^* _{3, 2}= 1).
[0051]
[Structure of translation model]
P (J, J^*, A, A^*, E^*The term | E) is further decomposed approximately as follows by the following scenario. FIG. 4 shows the source sentence E at the left end and the target sentence J at the right end, and the process of translation from the source sentence E to the target sentence J is divided into chunking, deletion, application of the parity model, insertion model It is shown for each stage of application, application of a vocabulary model, application of a word rearrangement model within a chunk, and application of a chunk rearrangement model.
[0052]
In FIG. 4, squares with rounded corners indicate chunks, and arrows indicate the correspondence between words or chunks in two adjacent stages.
[0053]
(1) P (E for source sentence E^*Chunking is performed using | E). For example, chunks “show me” and “the one” are obtained. This process is further modeled by the following two steps (a) and (b).
[0054]
(A) Chunk size is selected according to the head model. Each word E_iFor the head model ε (Φ_i｜ E_i) For chunk size Φ_iIs assigned. Words with a chunk size greater than 0 are treated as head words, and other words are treated as non-head words. That is, the head model has a function for designating whether a word is a chunk head word or a non-head word. In FIG. 4, the head word is shown in bold.
[0055]
(B) Each non-head word is associated with a head word (chunk model). Each non-head word E_iIs the likelihood η (c (E_h) | Hi, c (E_i)) Head word E_hAssociate with. Here, h is the position of the head word, and c (E) is a function that maps the word E to its word class (eg part of speech). For example, “the₃"Is the head word" one "at the position 4-3 = + 1₄". By such an operation, the input sentence is divided into one or a plurality of chunks. And each chunk will contain one head word. That is, the chunk model has a function of associating words other than the head word with the head word.
[0056]
(2) Select a translation target word by deletion and a parity model.
[0057]
(A) Select the number of head words. Each head word E_h(Φ_h> 0) for the fertility model ν (φ_h｜ E_hAccording to_hSelect. The parity model has a function of determining the number of words after the words of the source language E are translated into the target language J. Here, it is assumed that the head word is always translated. Therefore, for head words, φ_h> 0. In addition, one of the words generated by the parity model has the uniform distribution 1 / φ_hIs used as the head word in the target sentence.
[0058]
(B) Delete some non-head words. Non-head word E_i(Φ_i= 0), the deletion model δ (d_i| C (E_i), C (E_hDelete according to)). Where E_hIs the head word in the same chunk, d_iIs E_i1 is deleted, and 0 otherwise.
[0059]
(3) Insert a word. In FIG. 4, NULL is inserted in two chunks. Each chunk E^* _iIn contrast, the insertion model ι (φ^' _i| C (E_h))^' _iSelect the number of. Where E_hIs Chunk E^* _hIs the head word.
[0060]
(4) Translate word by word. Source word E, including spurious_iEach vocabulary model τ (J_j｜ E_i) J_jTranslate to
[0061]
(5) Rearrange words. Reorder each word in the chunk model P (A^* _i| Ε_Aj, J^* _j) This rearrangement model is obtained by modeling the order of words included in the translated chunk by likelihood, and the rearrangement of chunks is performed according to the IBM model 4. That is, the position of the word is determined by the relative position with respect to the head word.
[0062]
[Expression 4]

Where h is chunk J^* _jIndicates the position of the head word relative to. For example, “no” is at a position of −1 relative to “uindo”.
[0063]
(6) Rearrange the chunks. All chunks are replaced with the chunk rearrangement model P (A | E^*, J^*) Chunk sorting is similar to the distortion model. In the distortion model, the position is determined by the relative position in the previous alignment. The chunk rearrangement model is expressed by the following formula.
[0064]
[Equation 5]

Where j 'is the previous chunk aE^* _Aj-1This is the chunk alignment. h and h ′ are J^* _jAnd E^* _Aj-1This is the index of head words. This rearrangement is independent of the head word.
[0065]
In summary, the chunk-based translation model can be formulated as follows:
[0066]
[Formula 6]

[Characteristics of chunk translation model]
The chunk translation model differs from the word alignment translation model in the handling of translation of a set of words. The word alignment translation model generates a set of words for each source word. In contrast, the chunk translation model builds a set of target words for a set of source words. This behavior is modeled as a chunk processing process that first associates a word with the head word of the chunk to which it belongs and further translates / inserts / deletes each chunk.
[0067]
Word alignment is complex, but is handled by word position determination in two stages. That is, chunk translation and chunk rearrangement. The former determines the local order and the latter determines the overall order. Furthermore, by setting a head word for each chunk, it is possible to set a restriction on rearrangement in two stages depending on the position from the head word.
[0068]
[Parameter estimation]
The EM algorithm is used for parameter estimation of the chunk translation model. A large parallel corpus (referred to as a “learning corpus”) is prepared, and the following conditional probabilities are estimated for each of the target sentence J and source sentence E pairs (E step).
[0069]
[Expression 7]

Further, parameters of each model are calculated based on the estimated conditional probability (M step). These steps are repeated until the parameter set converges.
[0070]
However, such a simple algorithm encounters a very difficult computational problem. Chunk separation J^*And E^*Furthermore, word alignment A^*And if you want to enumerate everything possible with respect to chunk alignment A, a very large amount of computation is required. Therefore, in this embodiment, a kind of inside-outside algorithm is introduced in the calculation of the E step. For the inside-outside algorithm, see Reference 1 below. The inside-outside algorithm used in this embodiment will be described separately.
[0071]
In addition to the computational complexity problem, there is a local maximum value problem. That is, even if it converges to the maximum value solution which is an EM algorithm, there is no guarantee that it is a global maximum value. In order to cope with this problem and speed up the convergence of the parameters, the parameters of the IBM model 4 are used as initial values for learning in the present embodiment. The vocabulary model and the fertility model are applied directly to the chunk translation model, but the other parameters are assumed to be uniform.
[0072]
[Inside-outside algorithm]
The basic idea of inside / outside computation is to divide the entire process into two parts (ie, chunk individual translation and chunk reordering). In chunk translation, each chunk is translated. In the chunk rearrangement, chunk separation and chunk replacement after translation are performed.
[0073]
The inside (backward or beta) probability represents the probability of a chunk / sentence source / target combination, which can be calculated. The outside (forward or alpha) probability can be defined as the probability that a particular source and target pair will appear in a particular chunking and chunk reordering.
[0074]
<Inside probability>
First, given source sentence E and target sentence J, source and target chunk pair E^{i '} _jAnd J^{j '} _jFor all possible combinations of, calculate the inside probability of chunk translation according to the following formula: Where E^{i '} _jRepresents chunks from index i to i '.
[0075]
[Equation 8]

Where Pθ is ε (Φ_i｜ E_i) Or τ (J_j| Or E_i) And the like, the probability of the model associated with the value for the corresponding random variable. However, the chunk rearrangement model is excluded. A ’is Chunk E^{i '} _iAnd J^{j '} _jWord alignment for
[0076]
Next, for the pair of the source sentence E and the target sentence J, the inside probability is calculated in consideration of all possible chunk divisions and chunk alignments.
[0077]
<Outside probability>
The outside probability for sentence pair creation is always 1. That is,
α (E, J) = 1.0
The outside probability for each chunk pair is given by
[0078]
[Equation 9]

<Inside-outside calculation>
The combination of the above-mentioned inside-outside probabilities gives the following formula for determining the cumulative number of pairs.
[0079]
First, a count for each model parameter θ with an associated random variable count θ (Θ) is given by:
[0080]
[Expression 10]

Then the associated random variable
[0081]
## EQU11 ##

The count of chunk reordering with is given below.
[0082]
[Expression 12]

<Approximation>
Even with the inside-outside parameter estimation paradigm, O (lmk) can be used to enumerate all possible chunk pair creations and word alignments.⁴(K + 1)^k) Calculation is required. Here, l and m are the sentence lengths of the source sentence E and the target sentence J, respectively, and k is the maximum number of words allowed per chunk. Furthermore, for all possible chunked statements, enumerate all possible alignments, O (2^l2^mn! ) Where n = | J^*| = | E^*|.
[0083]
An approximation is applied to the inside-outside estimation procedure to deal with the large amount of computation required. First, word alignments for chunk translation are enumerated through a specific word alignment move / exchange operation and approximated by an alignment pair of Viterbi alignment and adjacent alignment.
[0084]
Next, when enumerating chunk alignment, approximation is performed by a combination of chunk division and chunk alignment as follows.
1. Determine the number of chunks per sentence.
2. Determine the initial chunking and alignment.
3. The Viterbi chunk division-alignment is calculated by the hill-climbing method using the following operators.
[0085]
-Move chunk boundaries
・ Chang alignment
-Move the head position
4). The adjacent chunking-alignment is calculated using the operator described above.
[0086]
[Decoding (translation)]
The decoding algorithm according to the present embodiment is based on the beam search algorithm for statistical translation of the word alignment method presented in Reference Document 2 described later. In this algorithm, input is consumed in an arbitrary order, and output is generated in order from the sentence head to the sentence end.
[0087]
The decoder consists of two stages.
(1) Generate all possible output chunks for all possible input chunks.
(2) Generate temporary output by combining all possible outputs in order from left to right while consuming input chunks in any order.
[0088]
The generation of possible output chunks is evaluated by the reverse vocabulary model and the inserted string sequence described in reference 2. Furthermore, a score addition method by example is introduced. In this method, the chunk division and alignment obtained by the Viterbi algorithm are looked up from the learning corpus to create candidate chunks.
[0089]
If the calculation is performed for all possible combinations of chunks, the calculation amount becomes enormous. Therefore, the following pruning and scoring strategies are introduced.
[0090]
<Beam pruning>
Since the search space is very large, a size threshold for leaving only a part of the output is set in both of the above two stages. A threshold for scoring is also introduced, so that only output larger than a certain score is processed. That is, an output candidate having only a score below a certain threshold is excluded from translation candidates.
[0091]
<Score addition by example>
For input / output chunk combinations that actually appear in the learning corpus, the following scoring method is introduced to increase the probability that they will remain in the search beam. That is, when a combination of chunks appearing in the learning corpus appears in the first stage of the decoding process, the scores are added according to the following equation (shown in logarithmic form).
[0092]
[Formula 13]

However, P_tm(J | E) and P_lm(E) is a translation model and a language model, respectively, and freq (E^* _Aj, J^* _j) Is the learning corpus E^* _AjAnd J^* _jRepresents the frequency at which the pair appears, and weight represents a parameter (weight) for adjustment. By this addition, a higher likelihood than the others is assigned to the chunk pair that actually appears in the learning corpus. Further, the higher the frequency of appearing in the learning corpus, the higher the likelihood assigned to the chunk pair.
[0093]
[Device configuration]
FIG. 5 shows a block diagram of a statistical machine translation apparatus from Japanese to English according to the present embodiment. Referring to FIG. 5, this statistical machine translation device 60 uses a parallel corpus 70 as a learning corpus, which is composed of a large number of parallel translations of Japanese and English, and chunks by the above-mentioned EM algorithm. A learning unit 72 for generating a translation model of the method and a translation model storage unit 74 for storing the translation model generated by the learning unit 72 are included. Each of the parallel translations included in the parallel translation corpus 70 is composed of a Japanese sentence and an English sentence corresponding thereto. In the apparatus according to the present embodiment, these parallel translations are not subjected to processing such as chunking in advance.
[0094]
Referring to FIG. 6, the translation model stored in translation model storage unit 74 includes head model 100, chunk model 102, parity model 104, deletion model 106, insertion model 108, vocabulary model 110, A rearrangement model 112 used for rearranging words in the translated chunk and a chunk rearrangement model 114 used for rearranging the chunk after translation are included. In this translation model, the source language is English E and the target language is Japanese J.
[0095]
Referring again to FIG. 5, the statistical machine translation apparatus 60 further stores a chunk pair storage for storing Japanese and English chunk pairs that appear in the bilingual corpus 70 in the course of the translation model learning by the learning unit 72. Using the part 76, the translation model stored in the translation model storage part 74, and the chunk pair stored in the chunk pair storage part 76, the chunk translation statistical processing as described above is performed on the input sentence 78. And a decoder 80 for outputting a translation sentence 82, and a dictionary storage unit 84 that stores a dictionary that the decoder 80 refers to when chunking input chunks and translating individual words.
[0096]
The decoder 80 receives the input sentence 78 and uses the dictionary stored in the dictionary storage unit 84 and the translation model stored in the translation model storage unit 74 to obtain a post-translational chunk obtained for arbitrary chunking of the input sentence. Among the columns, the chunk translation unit 90 for outputting a plurality of chunk sequences with high likelihood, and the chunk sorting for each of the output chunk sequences output from the chunk translation unit 90, and the translation model storage unit 74 A chunk rearrangement processing unit for outputting, as a translation sentence 82, an array of chunks having the highest likelihood calculated according to the translation model stored in the chunk pair and the occurrence frequency of the chunk pair stored in the chunk pair storage unit 76. 92.
[0097]
The input chunk creation unit 94 uses the chunk partitioning dictionary stored in the dictionary storage unit 84 to create all possible input chunk sequences for the input sentence 78, and the input chunk creation unit 94 For each of the chunk sequences created by the creation unit 94, all possible output chunk sequences are generated using the translation dictionary stored in the dictionary storage unit 84 and the translation model stored in the translation model storage unit 74. And an output chunk creation unit 96. At this time, the output chunk creation unit 96 includes, for each of the output chunks, a head model 100, a chunk model 102, a parity model 104, a deletion model 106, an insertion among the translation models stored in the translation model storage unit 74. The likelihood is calculated using the model 108, the vocabulary model 110, and the rearrangement model 112. In calculating the likelihood, as described above, the likelihood is added to the same chunk pair as the chunk pair stored in the chunk pair storage unit 76.
[0098]
Actually, the output chunk creation unit 96 performs beam search to output only a chunk having a high likelihood by performing a beam search when performing a likelihood calculation by a translation model and outputting a chunk string. The amount of calculation is reduced. The same applies to the subsequent processes. However, in the following description, in order to simplify the description, the reduction of the calculation amount by such beam search is not mentioned.
[0099]
The chunk rearrangement processing unit 92 rearranges the chunks for each of the chunk strings output from the output chunk selection unit 98, calculates the likelihood according to the chunk rearrangement model 114 in the translation model, and has the highest according to the result. It has a function to select and output translation results.
[0100]
[Operation]
The statistical machine translation apparatus 60 shown in FIG. 5 operates as follows. There are two aspects to the operation of the statistical machine translation apparatus 60. The first aspect is the aspect of translation model learning. In the second aspect, statistical translation is performed on an input sentence using the learned translation model. These will be described in order.
[0101]
<Learning aspects of translation model>
In creating this translation model, the source language is English E and the target language is Japanese J as described above. Prior to learning the translation model, a bilingual corpus 70 is prepared. The learning unit 72 generates a translation model from the bilingual corpus 70 according to the EM algorithm as described above. That is, the learning unit 72 repeatedly calculates the parameters of the translation model using the EM algorithm starting from the initial value of the translation model. The translation model includes a head model 100, a chunk model 102, a parity model 104, a deletion model 106, an insertion model 108, a vocabulary model 110, a rearrangement model 112, and a chunk rearrangement model 114. When the parameters of the translation model calculated according to the EM algorithm converge, the learning unit 72 stores the result in the translation model storage unit 74.
[0102]
The learning unit 72 also examines the Japanese and English chunk pairs appearing in the parallel corpus 70 in the above-described learning process, and stores all the appearing chunk pairs and their frequencies in the chunk pair storage unit 76. .
[0103]
The first phase ends when the translation model storage unit 74 and the chunk pair storage unit 76 are ready. Note that the language model P (E) is also available at this time.
[0104]
<Statistical translation>
When the Japanese input sentence 78 is given, the input chunk creation unit 94 creates all possible chunk divisions of the input sentence 78 with reference to the chunk division dictionary stored in the dictionary storage unit 84. The output chunk creation unit 96 creates all possible English output chunk strings for each chunk of the input sentence 78 created by the input chunk creation unit 94. At this time, the output chunk creation unit 96 calculates the likelihood of the output chunk using the vocabulary model 110 and the rearrangement model 112. When the same chunk pair as the chunk pair stored in the chunk pair storage unit 76 appears, the likelihood of the chunk pair is added as described above.
[0105]
Note that, among the output chunk sequences output from the output chunk creation unit 96, the chunk rearrangement processing unit 92 is given an overall likelihood that is greater than or equal to a predetermined value. This processing is realized by the beam search algorithm as described above.
[0106]
The chunk rearrangement processing unit 92 rearranges chunks for each output chunk sequence created by the output chunk creation unit 96. Scores (likelihoods) are calculated for all output chunk sequences obtained by such rearrangement using the chunk rearrangement model 114 and the likelihood of each chunk, and an output sentence showing the highest score is output as a translation sentence 82 To do.
[0107]
The input chunk creation unit 94, the output chunk creation unit 96, and the chunk rearrangement processing unit 92 use the inside-outside algorithm described above in searching for a translated sentence with the highest score.
[0108]
The translation sentence 82 thus obtained is a result obtained by statistical machine translation using the chunk method for the input sentence 78.
[0109]
All of the above processing can be realized by a computer, an electronically readable parallel corpus stored in a storage device of the computer, and software executed by the computer. In particular, both the processing of the learning unit 72 and the processing of the decoder 80 can be realized by a computer program.
[0110]
[Experiment]
The following experiment was performed using a computer programmed to allow the above processing. The bilingual corpus used in this experiment is a travel conversation dialogue collection in Japanese and English prepared by the applicant (see Reference 3). The outline of this bilingual corpus is shown in Table 1 below.
[0111]
[Table 1]

The entire corpus was divided into three parts. That is, there are 152,169 sentences for training, 4,846 sentences for testing, and 10,148 sentences for parameter adjustment. The parameter adjustment was performed for the determination criterion for the repetition end at the time of learning and the adjustment of the decoder 80.
[0112]
Three translation systems for comparison were tested. They are model 4, chunk3, and chunk3 +. These will be briefly described below.
[0113]
Model 4 is a translation model of the word alignment method and is an IBM model 4 having a beam search decoder.
[0114]
Chunk 3 is a chunk-based translation model similar to that of the present embodiment, and the maximum chunk size is limited to 3.
[0115]
Chunk3 + is a chunk type translation model that generates chunk candidates using actual sentence examples, as described in the present embodiment, in addition to chunk3.
[0116]
FIG. 7 shows an example of chunking and chunk alignment in chunk 3 in the translation from English to Japanese. In FIG. 7, the chunks are separated by square brackets. The "*" mark is attached to the left side of the head word of each chunk. The alignment between chunks is indicated by a bar drawn between the pairs of chunks. For example, in the first example, the chunk “i have” is “has there”, the chunk “the number” is the chunk “number record”, and the chunk “of my passsport” is “ Each is aligned with a passport chunk.
[0117]
In the experiment, 510 sentences randomly selected from a set of test sentences were translated, and the results of translation were evaluated according to the following criteria with reference to a set of 16 standard sentences.
[0118]
WER: Word-error-rate. This corresponds to the edit distance until the reference translation is reached. The lower this value, the higher the evaluation.
[0119]
PER: WER independent of position. This does not take into account word order problems. The lower the value, the higher the evaluation.
[0120]
BLEU: BLEU score. This is to calculate the rate of N-grams found in the translation that becomes the reference among the translation results. The higher the value, the higher the evaluation.
[0121]
SE: Subjective evaluation. It was evaluated by a native speaker in a range from rank A to rank D (A: perfect B: good C: acceptable D: nonsense). It represents the rate of translated sentences included in the range of A, A + B, A + B + C. Generally, the higher the value, the higher the evaluation.
[0122]
Table 2 summarizes the evaluation of the translation results from Japanese to English. FIG. 8 shows some results of model 4 and chunk3 +.
[0123]
[Table 2]

Referring to Table 2, chunk 3 shows better results than model 4 in non-subjective evaluation. However, in subjective evaluation, both are almost the same. Chunk 3 uses the examples that actually appear in the learning corpus as candidates, and it can be seen that the best score among these three is shown.
[0124]
As described above, the statistical machine translation apparatus according to the present embodiment divides an input sentence into chunks, creates an output chunk by performing translation within each chunk, and performs translation by rearranging the output chunks. . Since the first-stage translation and word rearrangement are performed in a local portion within each chunk, the possibility of obtaining a correct translation locally increases. Also, since the result of translation of the entire sentence is sorted by chunks instead of words, the final translation structure may reflect the structure of the input sentence correctly. Also gets higher. As a result, a good translation result can be obtained even between languages such as Japanese and English having different structures. Furthermore, since the sorting is performed in two stages, even a restriction over a relatively long distance in the sentence can be reflected in the translation result.
[0125]
The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.
[0126]
[Reference 1]
Kenji Yamada and Kevin Knight, 2001, syntax-based statistical translation model, ACL2000 proceedings, Toulouse, France (Kenji Yamada and Kevin Knight. 2001. A syntax-basted statistical Trans. , France.)
[0127]
[Reference 2]
Christoph Tillman and Herman Ney, 2000, word reordering and statistical search in statistical translation, COLING 2000 proceedings, July-August (Christoph Tillmann and Hermann Ney. 2000. Word re-ordering and dp-based search in statistical machine translation. In Proc. of the COLING 2000, July-August.)
[0128]
[Reference 3]
Toshiyuki Takezawa, Eiichirou Sumita, Fumiaki Sugaya, Hirofumi Yamamoto, and Seichi Yamamoto. 2002. Toward the construction of a large-scale bilingual corpus for speech translation of travel conversations in the real world. LREC 2002 Proceedings, pp. 147-152, Las Palmas, Canary Islands, Spain, May (translation of travel conversion in the real world. In Proc. of LREC 2002, pages 147-152. Las Palmas, Canada Islands, Spain, May.)
[Brief description of the drawings]
FIG. 1 is a diagram for explaining a correspondence relationship between words in machine translation using a word alignment method according to a conventional technique;
FIG. 2 is a diagram for explaining a translation process of word alignment type machine translation according to a conventional technique;
FIG. 3 is a diagram for explaining two-stage sorting in the machine translation device according to one embodiment of the present invention.
FIG. 4 is a diagram for explaining a translation process in a machine translation apparatus according to an embodiment of the present invention.
FIG. 5 is a functional block diagram of a machine translation apparatus according to an embodiment of the present invention.
FIG. 6 is a diagram schematically showing a translation model used in the machine translation apparatus according to one embodiment of the present invention.
FIG. 7 is a diagram illustrating an example of a translation result obtained by the machine translation device according to the embodiment of the present invention.
FIG. 8 is a diagram showing the evaluation of the translation result of the machine translation device according to the embodiment of the present invention in comparison with the evaluation of the translation result of the conventional machine translation device.
[Explanation of symbols]
60 statistical machine translation device, 70 parallel corpus, 72 learning unit, 74 translation model storage unit, 76 chunk pair storage unit, 80 decoder, 90 chunk translation unit, 92 chunk rearrangement processing unit, 94 input chunk creation unit, 96 output Chunk creation part

Claims

A machine translation device for translating an input sentence in a first language into a second language,
Chunk translation means for individually translating each of the chunks obtained by dividing the input sentence into one or a plurality of chunks;
A machine translation apparatus comprising: a chunk rearranging unit for rearranging the translated chunks output from the chunk translation unit to generate a translated sentence for the input sentence.

Means for storing a chunk-based translation model, wherein the chunk translating means and the chunk rearranging means refer to the chunk-based translation model for the input sentence in the first language; The machine translation apparatus according to claim 1, which performs processing.

The chunk translation means includes:
Chunk division means for performing a plurality of chunk divisions on the input sentence and outputting one or a plurality of chunk strings each including one or a plurality of chunks;
Output chunk creation for creating one or a plurality of output chunk sequences by translating each of one or a plurality of chunk sequences output by the chunk dividing means with respect to each of the chunks included in the chunk sequence Means,
The machine translation apparatus according to claim 2, wherein the output chunk creation unit calculates a likelihood based on the translation model for the one or a plurality of output chunk strings.

The output chunk sequence selecting means for selecting the output chunk sequence created by the output chunk creating means and satisfying a predetermined condition and providing the selected chunk to the chunk rearranging means. 3. The machine translation apparatus according to 3.

The output chunk sequence selection means includes means for selecting, from the output chunk sequences created by the output chunk creation means, those having a likelihood equal to or greater than a predetermined value and giving them to the chunk rearrangement means. Item 5. The machine translation device according to Item 4.

The machine translation device further includes chunk pair storage means for storing a pair of the chunk of the first language and the chunk of the second language,
The chunk translating means further matches the pair of the output chunk created by the output chunk creating means and the chunk of the input sentence corresponding to the output chunk with the one stored in the chunk pair storage means. The machine translation device according to claim 3, further comprising likelihood changing means for detecting this and changing the likelihood of the output chunk according to a predetermined calculation method.

7. The apparatus according to claim 6, further comprising means for detecting a chunk pair appearing in a bilingual corpus of the first language and the second language prepared in advance and storing the chunk pair in the chunk pair storage means. Machine translation device.

The chunk pair storage means stores a weight assigned in advance for each pair of the first language chunk and the second language chunk,
The likelihood changing means includes a pair of an output chunk created by the output chunk creating means and a chunk of an input sentence corresponding to the output chunk as a first chunk pair stored in the chunk pair storage means. The machine translation device according to claim 6, further comprising means for detecting a match and changing the likelihood of the output chunk according to a function of a weight assigned to the first chunk pair.

Detecting a chunk pair appearing in a bilingual corpus of the first language and the second language prepared in advance, and an appearance frequency of each chunk pair in the bilingual corpus; 9. The machine translation apparatus according to claim 8, further comprising means for causing the chunk pair storage means to store a weight consisting of the frequency of the chunk pair.

4. The chunk dividing means includes means for performing all possible chunk divisions on the input sentence and outputting one or more chunk strings each including one or more chunks. The machine translation device according to claim 9.

The chunk-based translation model includes a chunk rearrangement model in which the second language is a source language and the first language is a target language,
The chunk rearranging means rearranges one or a plurality of chunks for each output chunk sequence output from the chunk translating means, and the likelihood of chunk rearrangement calculated from the chunk rearrangement model And the likelihood calculated for each of the output chunks included in the output chunk sequence, the likelihood of each sort is calculated, and the array of chunks having the highest likelihood is output as the translated sentence The machine translation device according to claim 3, further comprising means for

The machine translation device according to claim 1, wherein the first language is Japanese.

The machine translation device according to any one of claims 1 to 12, wherein the second language is English.

A computer program that, when executed by a computer, causes the computer to operate as the machine translation device according to any one of claims 1 to 13.

A computer programmed by the computer program according to claim 14.