JPH07200604A

JPH07200604A - Apparatus and method for language translation using contenxt-based translation model

Info

Publication number: JPH07200604A
Application number: JP6227645A
Authority: JP
Inventors: L Bergar Adam; アダム・エル・バーガー; Peter F Brown; ピーター・フィッツヒュー・ブラウン; Andrew Dera Pietra Steve; スティーブ・アンドリュー・デラ・ピエトラ; Pietra Vincent J Della; ビンセント・ジョセフ・デラ・ピエトラ; Scott Kaller Andrew; アンドリュー・スコット・ケーラー; Leroy Mercer Robert; ロバート・レロイ・マーサー
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1993-10-28
Filing date: 1994-09-22
Publication date: 1995-08-04

Abstract

PURPOSE: To provide a device for translating a series of source words of a first language to a series of target words of a second word. CONSTITUTION: Concerning a series of inputted source words at least two target hypotheses including respectively series of target words are generated. Each target word is provided with a context including at least one other word of a target hypothesis. Concerning each target hypothesis, a language coincident score includes the estimated value of the generation probability of the series of works. At least one alignment combining each source word with at least one of the target words of the target hypotheses is identified. Concerning each source unit and each target hypothesis, a translation coinciding score includes the estimated value of the conditional generation probability of the source word when the target word of the target hypothesis connected with the source word and the context of the target hypothesis of the target word combined with the source language are given. Concerning each target hypothesis, the translation coinciding score includes the combination of the word coinciding score of the target hypothesis and the source word of a series of inputted source words. A target hypothesis coinciding score includes the combination of the language model coinciding score of the target hypothesis and the translation coinciding score of the target hypothesis. A target hypothesis with a best target hypothesis coinciding score is outputted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、フランス文から英文へ
のコンピュータによる翻訳などのコンピュータ言語翻訳
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to computer language translation, such as computer translation of French to English.

【０００２】[0002]

【従来の技術】ピーター・Ｆ・ブラウン（Peter F. Bro
wn）他の"Method and system For Natural Language Tr
anslation"と題する１９９１年７月２５日出願の米国特
許出願第０７／７３６２７８号（この内容のすべてを参
照により本明細書に組み込む）に、原始言語のテキスト
Ｆを目標言語のテキストＥに翻訳するためのコンピュー
タ翻訳システムが記載されている。そこに記載されたシ
ステムは、いくつかの仮定目標言語テキストＥの各々に
ついて、原始言語テキストＦが与えられた場合の目標言
語テキストＥの条件付き確率Ｐ（Ｅ｜Ｆ）を評価する。
最高の条件付き確率Ｐ（Ｅ｜Ｆ）を有する仮定目標言語
テキストＥハットが、原始言語テキストＦの翻訳として
選択される。[Prior Art] Peter F. Bro
wn) Other "Method and system For Natural Language Tr
In US patent application Ser. No. 07 / 736,278, filed Jul. 25, 1991, entitled "anslation", the source language text F is translated into the target language text E, the entire contents of which are incorporated herein by reference. A computer translation system is described for which, for each of several hypothetical target language texts E, the conditional probability P of the target language text E given a source language text F. Evaluate (E | F).
The hypothetical target language text E hat with the highest conditional probability P (E | F) is selected as the translation of the source language text F.

【数１】をＥハットで表す。[Equation 1] Is represented by E hat.

【０００３】ベイズの定理を用いて、原始言語テキスト
Ｆが与えられた場合の目標言語テキストＥの条件付き確
率Ｐ（Ｅ｜Ｆ）は下記のように表すことができる。Using the Bayesian theorem, the conditional probability P (E | F) of the target language text E, given the source language text F, can be expressed as:

【数２】Ｐ（Ｅ｜Ｆ）＝Ｐ（Ｆ｜Ｅ）Ｐ（Ｅ）／Ｐ（Ｆ）（１）## EQU00002 ## P (E | F) = P (F | E) P (E) / P (F) (1)

【０００４】数式１の分母である原始言語テキストＦの
確率Ｐ（Ｆ）は目標言語テキストＥとは独立であるの
で、最高の条件付き確率Ｐ（Ｅ｜Ｆ）を有する仮定目標
言語テキストＥハットが、最高の積Ｐ（Ｆ｜Ｅ）Ｐ
（Ｅ）をも有することになる。したがって次式が得られ
る。Since the probability P (F) of the source language text F, which is the denominator of Equation 1, is independent of the target language text E, the hypothetical target language text E hat having the highest conditional probability P (E | F). Is the highest product P (F | E) P
(E) will also be included. Therefore, the following equation is obtained.

【数３】 [Equation 3]

【０００５】数式２では、目標言語テキストＥの確率Ｐ
（Ｅ）は言語モデル適合スコアであり、目標言語モデル
から推定される。目標言語テキストＥの確率Ｐ（Ｅ）を
推定するためにどのような既知の言語モデルを使用して
もよいが、上記の関連米国特許出願第０７／７３６２７
号では、補間推定でその値が得られるパラメータによっ
て組み合わされた１グラム・モデル、２グラム・モデ
ル、及び３グラム・モデルを含むｎグラム言語モデルを
記載している。In equation 2, the probability P of the target language text E is
(E) is a language model matching score, which is estimated from the target language model. Although any known linguistic model may be used to estimate the probability P (E) of the target linguistic text E, the above-referenced related US patent application Ser. No. 07/73627.
The issue describes an n-gram language model including a 1-gram model, a 2-gram model, and a 3-gram model combined by parameters whose values are obtained by interpolation estimation.

【０００６】式２における条件付き確率Ｐ（Ｆ｜Ｅ）
は、翻訳適合スコアである。上記の関連特許出願に記載
されているように、一連の目標単語を含む目標仮説Ｅが
与えられた場合の一連の原始単語を含む原始テキストＦ
の翻訳適合スコアＰ（Ｆ｜Ｅ）は、原始テキストＦ中の
原始単語を目標テキストＥ中の目標単語に結合するすべ
ての可能なアラインメントを見つけることによって推定
され、これらのアラインメントには、１つまたは複数の
原始単語がどの目標単語にも結合されていないようなア
ラインメントは含まれるが、原始単語が複数の目標単語
に結合されているアラインメントは含まれない。原始テ
キストＦ中のφ原始単語に結合された目標テキストＥ中
の各アラインメント及び各目標単語ｅについて、目標単
語ｅがアラインメント中のφ原始単語に結合される、フ
ァーティリティ確率ｎ（φ｜ｅ）が推定される。また、
原始テキストＦ中の各原始単語ｆとアラインメントによ
って原始単語ｆに結合された目標テキストＥ中の各目標
単語ｅについて、結合された目標単語ｅの発生が与えら
れた場合に原始単語ｆが発生する、語彙確率ｔ（ｆ｜
ｅ）が推定される。Conditional probability P (F | E) in equation 2
Is the translation match score. As described in the above related patent application, a source text F containing a series of source words given a target hypothesis E containing a series of target words.
The translation matching score P (F | E) of is estimated by finding all possible alignments that combine the source word in source text F with the target word in target text E, and one of these alignments is Or, an alignment in which a plurality of source words are not connected to any target word is included, but an alignment in which a source word is connected to a plurality of target words is not included. For each alignment and each target word e in the target text E combined with the φ source word in the source text F, the probability of probability n (φ | e) that the target word e is connected to the φ source word in the alignment. Is estimated. Also,
For each source word f in source text F, and for each target word e in target text E linked to source word f by alignment, source word f occurs when the occurrence of the linked target word e is given. , Vocabulary probability t (f |
e) is estimated.

【０００７】各アラインメント及び各原始単語ｆについ
て、上記の関連特許出願ではさらに、原始単語ｆに結合
された目標単語ｅが目標テキストＥ中の位置ａjにある
と仮定し、かつ原始テキストＦ中にｍ個の単語があると
仮定して、原始単語ｆが原始テキストＦの位置ｊにあ
る、ひずみ確率ａ（ｊ｜ａj，ｍ）を推定している。For each alignment and each source word f, the related patent application cited above further assumes that the target word e connected to the source word f is at position a j in the target text E, and the source text F contains Assuming that there are m words, the distortion probability a (j | aj, m) in which the source word f exists at the position j of the source text F is estimated.

【０００８】アラインメントのファーティリティ確率と
目標テキストＥ中のすべての目標単語ｅのファーティリ
ティ確率とを組み合わせ、その結果にアラインメント中
のどの原始単語とも結合されていない目標単語の数φ0
の確率The alignment probability and the probability probabilities of all target words e in the target text E are combined, and the result is the number of target words φ 0 that are not combined with any of the source words in the alignment.
Probability of

【数４】を掛けることにより、アラインメント中の目標テキスト
Ｅ中のすべての目標単語のファーティリティの和φが与
えられているものとして、目標テキストＥとアラインメ
ントのファーティリティ・スコアが得られる。[Equation 4] By multiplying by, the fertility score of the target text E and the alignment is obtained assuming that the sum φ of the fertility of all target words in the target text E being aligned is given.

【０００９】アラインメントの語彙確率と原始テキスト
Ｆ中のすべての原始単語の語彙確率とを組み合わせて、
アラインメントの語彙スコアが得られる。Combining the alignment vocabulary probability and the vocabulary probabilities of all source words in the source text F,
The alignment lexical score is obtained.

【００１０】アラインメントのひずみ確率とアラインメ
ント中の目標単語に結合されている原始テキストＦ中の
すべての原始単語のひずみ確率とを組み合わせ、その結
果に１／φ0！（ただし、φ0はどの原始単語とも結合さ
れていない目標テキストＥ中の目標単語の数）を掛ける
ことにより、アラインメントのひずみスコアが得られ
る。The alignment distortion probability is combined with the distortion probabilities of all source words in the source text F associated with the target word in the alignment, and the result is 1 / φ0 !. By multiplying (where φ0 is the number of target words in the target text E that are not combined with any source word), the distortion score of the alignment is obtained.

【００１１】結局、アラインメントのファーティリティ
スコア、語彙スコア、及びひずみスコアを組み合わせ、
その結果に組合せ係数Finally, combining the alignment fertility score, vocabulary score, and distortion score,
The result is a combination coefficient

【数５】を掛けることによって、アラインメントの翻訳合致スコ
アが得られる。（上記の関連特許出願のセクション８．
２．を参照）[Equation 5] The translation match score for the alignment is obtained by multiplying by. (Section 8. of the above related patent application.
2. (See)

【００１２】原始テキストＦと目標仮説Ｅの翻訳合致ス
コアＰ（Ｆ｜Ｅ）は、原始テキストＦと目標仮説Ｅとの
間の許されるすべてのアラインメントの翻訳合致スコア
の和でもよい。原始テキストＦと目標仮説Ｅの翻訳合致
スコアＰ（Ｆ／Ｅ）は、最も確率が高いと推定されたア
ラインメントの翻訳合致スコアであることが好ましい。The translation match score P (F | E) of the source text F and the target hypothesis E may be the sum of the translation match scores of all the allowed alignments between the source text F and the target hypothesis E. The translation match score P (F / E) of the source text F and the target hypothesis E is preferably the translation match score of the alignment estimated to have the highest probability.

【００１３】数式２を使って、仮説目標言語テキストＥ
と原始言語テキストＦの目標仮説合致スコアＰ（Ｆ｜
Ｅ）Ｐ（Ｅ）を直接推定することもできる。しかし、言
語モデルＰ（Ｅ）と翻訳モデルＰ（Ｆ｜Ｅ）を簡単にす
るため、また管理可能な量の訓練データからこれらのモ
デルのパラメータを推定するために、上記の関連特許出
願では、目標言語テキストＥと原始言語テキストＦそれ
ぞれの簡略化された中間形式Ｅ'とＦ'の目標仮説合致ス
コアＰ（Ｆ｜Ｅ）Ｐ（Ｅ）を推定している。各中間目標
言語単語ｅ'は関係する目標言語単語のクラスを表す。
各中間原始言語単語ｆ'は関係する原始言語単語のクラ
スを表す。原始言語変換機構が原始言語テキストＦを中
間形式Ｆ'に変換する。最高の仮説合致スコアＰ（Ｆ'｜
Ｅ'）Ｐ（Ｅ'）を有する仮説中間形式の目標言語テキス
トＥハット'が、数式２から推定される。目標言語変換
機構が、最良に合致した中間目標言語テキストＥハッ
ト'を目標言語テキストＥハットに変換する。Using Equation 2, the hypothetical target language text E
And the target hypothesis agreement score P (F |
E) P (E) can also be estimated directly. However, in order to simplify the language model P (E) and the translation model P (F | E), and to estimate the parameters of these models from a manageable amount of training data, The target hypothesis matching score P (F | E) P (E) of the simplified intermediate forms E ′ and F ′ of the target language text E and the source language text F, respectively, is estimated. Each intermediate target language word e'represents a class of related target language words.
Each intermediate source language word f'represents a class of related source language words. The source language conversion mechanism converts the source language text F into an intermediate form F '. Best hypothesis match score P (F '|
The target language text E-hat 'in the hypothetical intermediate form with E') P (E ') is estimated from Eq. A target language translation mechanism translates the best matching intermediate target language text E-hat 'into a target language text E-hat.

【００１４】上記の関連特許出願の言語翻訳システムで
は、各原始単語ｆの語彙確率を、アラインメント中の原
始単語に結合された目標単語ｅだけが与えられた場合の
各原始単語ｆの条件付き確率ｔ（ｆ｜ｅ）として推定し
ている。したがって語彙確率は原始単語ｆの確率の粗い
推定を提供するにすぎない。In the language translation system of the above related patent application, the vocabulary probability of each source word f is a conditional probability of each source word f when only the target word e combined with the source word in the alignment is given. It is estimated as t (f | e). The vocabulary probability thus only provides a rough estimate of the probability of the source word f.

【００１５】[0015]

【発明が解決しようとする課題】本発明の目的は、改善
された原始単語語彙確率の推定を与える、第１言語の一
連の原始単語を第１言語とは異なる第２言語の一連の目
標単語に翻訳するための装置と方法を提供することであ
る。It is an object of the present invention to provide a set of source words of a first language with a set of target words of a second language different from the first language, which gives an improved estimation of the source word vocabulary probability. An apparatus and method for translating into.

【００１６】本発明の他の目的は、原始単語の語彙確率
が、アラインメント中の原始単語に結合された目標単語
が与えられかつ原始単語に結合された目標単語の文脈が
与えられた場合の条件付き確率として推定される、第１
言語の一連の原始単語を第１言語とは異なる第２言語の
一連の目標単語に翻訳するための装置と方法を提供する
ことである。Another object of the invention is the condition when the vocabulary probability of a source word is given a target word connected to the source word in the alignment and the context of the target word connected to the source word. First estimated as the attached probability
An apparatus and method for translating a series of source words in a language into a series of target words in a second language different from the first language.

【００１７】[0017]

【課題を解決するための手段】本発明によれば、第１言
語の一連の原始単語を第１言語とは異なる第２言語の一
連の目標単語に翻訳するための装置は、一連の原始単語
を入力する手段を含む。また、少なくとも２つの目標仮
説を生成する手段も設けられる。各目標仮説は、第２言
語の語彙から選択された一連の目標単語を含む。各目標
単語は、目標仮説の少なくとも１つの他の単語を含む文
脈を有する。According to the present invention, an apparatus for translating a series of source words in a first language into a series of target words in a second language different from the first language comprises a series of source words. And means for inputting. Means are also provided for generating at least two target hypotheses. Each target hypothesis comprises a series of target words selected from a second language vocabulary. Each target word has a context that includes at least one other word in the target hypothesis.

【００１８】言語モデル合致スコア生成機構は、各目標
仮説について、目標仮説の一連の単語の発生確率の推定
値を含む言語モデル合致スコアを生成する。アラインメ
ント識別機構が、入力された一連の原始単語と各目標仮
説との間の少なくとも１つのアラインメントを識別す
る。アラインメントは、各原始単語を目標仮説の少なく
とも１つの目標単語に結合する。The language model match score generation mechanism generates, for each target hypothesis, a language model match score including an estimated value of the occurrence probability of a series of words of the target hypothesis. An alignment identification mechanism identifies at least one alignment between the input series of source words and each target hypothesis. The alignment combines each source word with at least one target word of the target hypothesis.

【００１９】単語合致スコア生成機構は、各原始単語及
び各目標仮説について、原始単語に結合された目標仮説
の目標単語が与えられかつ原始単語に結合された目標仮
説の目標単語の文脈が与えられた場合の原始単語の条件
付き発生確率の推定値を含む、単語合致スコアを生成す
るために設けられる。翻訳合致スコア生成機構は、各目
標仮説について、その目標仮説の単語合致スコアと入力
された一連の原始単語中の原始単語との組合せを含む翻
訳合致スコアを生成する。The word match score generation mechanism is given, for each source word and each target hypothesis, a target word of the target hypothesis coupled to the source word and a context of the target word of the target hypothesis coupled to the source word. It is provided to generate a word match score, which includes an estimate of the conditional occurrence probability of the source word if The translation match score generation mechanism generates, for each target hypothesis, a translation match score including a combination of the word match score of the target hypothesis and a source word in the series of source words input.

【００２０】仮説合致スコア生成機構は、各目標仮説の
目標仮説合致スコアを生成するために設けられる。各目
標仮説合致スコアは、目標仮説の言語モデル合致スコア
と目標仮説の翻訳合致スコアとの組合せを含む。最良の
目標仮説合致スコアを有する目標仮説が出力部で提供さ
れる。The hypothesis matching score generation mechanism is provided to generate a target hypothesis matching score for each target hypothesis. Each target hypothesis match score includes a combination of the target hypothesis language model match score and the target hypothesis translation match score. The goal hypothesis with the best goal hypothesis match score is provided at the output.

【００２１】各目標仮説は、第２言語の単語を含む語彙
から選択された一連の目標単語と単語がないことを示す
空白単語とを含むことが好ましい。Each target hypothesis preferably includes a set of target words selected from a vocabulary containing words in a second language and blank words indicating no words.

【００２２】アラインメント識別機構は、入力された一
連の原始単語と各目標単語との間の２つまたはそれ以上
のアラインメントを識別する手段を含むこともできる。
各アラインメントは、各原始単語を目標仮説の少なくと
も１つの目標単語と結合する。単語合致スコア生成機構
は、各原始単語及び各アラインメント及び各目標仮説に
ついて、原始単語に結合された目標単語が与えられかつ
目標単語の文脈が与えられた場合の原始単語の条件付き
発生確率の推定値を含む、単語合致スコアを生成する。
翻訳合致スコア生成機構は、各目標仮説について、目標
仮説の単語合致スコアと入力された一連の原始単語中の
原始単語との組合せを含む、翻訳合致スコアを生成す
る。The alignment identification mechanism may also include means for identifying two or more alignments between the input series of source words and each target word.
Each alignment combines each source word with at least one target word of the target hypothesis. The word match score generation mechanism estimates a conditional occurrence probability of a source word for each source word, each alignment, and each target hypothesis when a target word connected to the source word is given and a context of the target word is given. Generate a word match score, including values.
The translation match score generation mechanism generates, for each target hypothesis, a translation match score that includes a combination of the target hypothesis word match score and a source word in the input series of source words.

【００２３】原始テキスト入力装置は、入力された一連
の原始単語を一連の変換原始単語に変換する手段を含
む。アラインメント手段は、一連の変換原始単語と各目
標仮説との間の少なくとも１つのアラインメントを識別
する。各アラインメントは、各変換原始単語を目標仮説
の少なくとも１つの目標単語に結合する。単語合致スコ
ア生成機構は、各変換原始単語及び各目標仮説につい
て、変換単語に結合された目標仮説の目標単語が与えら
れかつ変換原始単語に結合された目標仮説の目標単語の
文脈が与えられた場合の変換原始単語の条件付き発生確
率の推定値を含む、単語合致スコアを生成する。The source text input device includes means for converting a series of input source words into a series of converted source words. The alignment means identifies at least one alignment between the series of transformed source words and each target hypothesis. Each alignment binds each transformed source word to at least one target word of the target hypothesis. The word match score generation mechanism, for each converted source word and each target hypothesis, is given the target word of the target hypothesis connected to the converted word and the context of the target word of the target hypothesis connected to the converted source word. Generate a word match score that includes an estimate of the conditional occurrence probability of the transformed source word in the case.

【００２４】翻訳合致スコア生成機構は、各目標仮説に
ついて、目標仮説の単語合致スコアと変換原始単語との
組合せを含む翻訳合致スコアを生成する。最良の目標仮
説合致スコアを有する目標仮説から一連の出力単語を合
成するため、及び出力単語を出力するために出力手段が
設けられる。The translation match score generation mechanism generates, for each target hypothesis, a translation match score including a combination of the word match score of the target hypothesis and the converted source word. Output means are provided for synthesizing a series of output words from the target hypothesis having the best target hypothesis match score and for outputting the output words.

【００２５】目標仮説の翻訳合致スコアは、目標仮説の
単語合致スコアと入力された一連の原始単語中の原始単
語との積を含むことができる。目標仮説の目標仮説合致
スコアは、目標仮説の言語モデル合致スコアに目標仮説
のための翻訳合致スコアを掛けた積を含むことができ
る。The target hypothesis translation match score may include the product of the target hypothesis word match score and the source word in the input series of source words. The target hypothesis match score for the target hypothesis may include the product of the target model hypothesis language model match score times the translation match score for the target hypothesis.

【００２６】原始単語に結合された目標仮説中の目標単
語の文脈は、２つまたはそれ以上の文脈クラスの少なく
とも１つ中に含まれることができる。原始単語に結合さ
れた目標単語が与えられかつ目標単語の文脈が与えられ
た場合の原始単語の推定条件付き発生確率は、目標単語
の文脈を含むクラスに依存する値を有する少なくとも１
つの関数を含む。The context of the target word in the target hypothesis bound to the source word can be included in at least one of two or more context classes. An estimated conditional occurrence probability of a source word given a target word coupled to the source word and a context of the target word has at least a value that depends on the class containing the context of the target word.
Including one function.

【００２７】あるいは、原始単語の推定条件付き発生確
率は、原始単語に結合された目標単語の文脈における少
なくとも１つの単語の目標仮説における品詞に依存する
値、または原始単語に結合された目標単語の文脈におけ
る少なくとも１つの単語の識別に依存する値を有する関
数を含むことができる。Alternatively, the estimated conditional occurrence probability of the source word is a value that depends on the part of speech in the target hypothesis of at least one word in the context of the target word bound to the source word, or of the target word bound to the source word. A function may be included that has a value that depends on the identification of at least one word in the context.

【００２８】最良の合致スコアを有する目標仮説を出力
する手段は、ディスプレイを含むことができる。入力手
段は、キーボード、コンピュータ・ディスク駆動機構、
またはコンピュータ・テープ駆動機構を含むことができ
る。The means for outputting the target hypothesis with the best match score can include a display. The input means is a keyboard, a computer disk drive mechanism,
Or it may include a computer tape drive.

【００２９】原始単語の語彙確率を、アラインメントの
原始単語に結合された目標単語が与えられかつ原始単語
に結合された目標単語の文脈が与えられた場合の原始単
語の条件付き確率として推定することにより、本発明で
は、精度が改善された翻訳合致スコアが得られる。Estimating the vocabulary probability of the source word as the conditional probability of the source word given the target word connected to the source word of the alignment and the context of the target word connected to the source word. Thus, the present invention provides a translation match score with improved accuracy.

【００３０】[0030]

【実施例】図１は、本発明による第１言語から第２言語
に単語を翻訳する装置の一例のブロック図である。この
装置は、一連の原始単語を入力するための原始テキスト
入力装置１０を含む。原始テキスト入力装置１０は、例
えばキーボード、コンピュータ・ディスク駆動機構、ま
たはコンピュータ・テープ駆動機構を含むことができ
る。1 is a block diagram of an example of a device for translating words from a first language to a second language according to the present invention. The device includes a source text input device 10 for entering a series of source words. Primitive text input device 10 may include, for example, a keyboard, computer disk drive, or computer tape drive.

【００３１】原始テキスト入力装置１０はさらに、入力
された一連の原始単語を一連の変換原始単語に変換する
手段を含むことができる。各変換原始単語は１組の関連
入力原始単語を表すことができる。例えば、このセット
の各入力原始単語｛etre, etant, ete, suis, es, est.
sommes, etes, sont, fus, fumes, serai, serons,soi
s, soit, soyons, soyez, soient｝はフランス語不定動
詞"etre"の変化形であり、元来の入力原始単語の時制を
表すタグ付きの"etre"に変換できる。The source text input device 10 may further include means for converting the input series of source words into a series of converted source words. Each converted source word may represent a set of related input source words. For example, each input source word {etre, etant, ete, suis, es, est.
sommes, etes, sont, fus, fumes, serai, serons, soi
s, soit, soyons, soyez, soient} is a variation of the French indefinite verb "etre" and can be converted into a tagged "etre" that represents the tense of the original input source word.

【００３２】入力された一連の原始単語を一連の変換原
始単語に変換する手段は、上記の関連特許出願のセクシ
ョン３、４、１１に記載の原始変換機構を含むことがで
きる。基本的に、これらの変換機構は入力原始単語のス
ペルの検査と訂正、入力原始単語の格の検査と訂正、入
力された一連の原始単語における文書タイトルの検出、
及び入力された一連の原始単語における名前の検出を行
う。変換機構はまた、各入力原始単語を原始単語の最も
確からしい品詞でタグ付けし、未知の原始単語（記憶さ
れた原始単語語彙に含まれいないもの）をフラグ付けす
る。入力された一連の原始単語を変換する手段はまた、
入力された原始単語の複数単語単位を単一の変換原始単
語に分解し、複合入力原始単語を２つ以上の変換原始単
語に分割する。入力された一連の原始単語を一連の変換
原始単語に変換する手段はさらに、１単語の様々な形の
単一基本形への言語的変換または形態的変換を実施す
る。最後に、入力された一連の原始単語を変換する手段
はまた、各入力原始単語の意味を推定し、この意味を変
換された原始単語に割り当てる。The means for converting an input sequence of source words into a sequence of conversion source words may include the source conversion mechanism described in Sections 3, 4, 11 of the above related patent application. Basically, these conversion mechanisms check and correct the spelling of the input source word, check and correct the case of the input source word, detect the document title in the input source word sequence,
And detecting the name in the input series of source words. The translator also tags each input source word with the most probable part of speech of the source word and flags unknown source words (those not included in the stored source word vocabulary). The means for converting a series of source words entered is also
A plurality of word units of the input source word are decomposed into a single converted source word, and the composite input source word is divided into two or more converted source words. The means for converting an input sequence of source words into a sequence of conversion source words further performs a linguistic or morphological conversion of a word into various basic single forms. Finally, the means for converting the input sequence of source words also estimates the meaning of each input source word and assigns this meaning to the converted source word.

【００３３】表１は、本発明による入力された一連の原
始単語の仮説例である。この例では、原始単語はフラン
ス語の単語である。Table 1 is an example hypothesis of a series of input source words according to the present invention. In this example, the source word is a French word.

【表１】入力された一連の原始単語、Ｆｆ1 ｆ2 ｆ3 ｆ4 ｆ5 ｆ6 La clef est dans la porte[Table 1] Series of input source words, F f1 f2 f3 f4 f5 f6 La clef est dans la porte

【００３４】本発明による翻訳装置は、目標仮説生成機
構１２をさらに含む。この目標仮説生成機構１２は、少
なくとも２つの目標仮説を生成する。各目標仮説は、第
２言語の単語語彙から選択された一連の目標単語を含
む。第２言語の単語語彙を、目標言語語彙記憶域１４に
記憶することができる。目標仮説の各目標単語は、目標
仮説の少なくとも１つの他の単語を含む文脈を持つ。The translation device according to the present invention further includes a target hypothesis generating mechanism 12. The target hypothesis generating mechanism 12 generates at least two target hypotheses. Each target hypothesis includes a set of target words selected from a second language word vocabulary. The second language word vocabulary may be stored in the target language vocabulary storage area 14. Each target word of the target hypothesis has a context that includes at least one other word of the target hypothesis.

【００３５】目標仮説生成機構の一例は、前記特許出願
のセクション１４に記載されている。An example of a target hypothesis generation mechanism is described in section 14 of the above patent application.

【００３６】表２に、目標仮説Ｅ1、Ｅ2、Ｅ3の一仮説
例を示す。この例では、目標単語は単語である。Table 2 shows an example of one of the target hypotheses E1, E2 and E3. In this example, the target word is a word.

【表２】 [Table 2]

【００３７】各目標仮説は、第２言語の単語を含む語彙
から選択された一連の目標単語と、単語が存在しないこ
とを表す空白単語を含む。表２では、目標仮説Ｅhの全
部が「空白」単語を含むものと見なされる。Each target hypothesis includes a set of target words selected from a vocabulary containing words in a second language and a blank word indicating that the word does not exist. In Table 2, all of the target hypotheses Eh are considered to contain "blank" words.

【００３８】再び図１を見ると、翻訳装置は、各目標仮
説について、目標仮説の一連の単語の発生確率の推定値
を含む言語モデル合致スコアを生成するための、言語モ
デル合致スコア生成機構１６を含む。上記の関連特許出
願のセクション６と７に言語モデル合致スコア生成機構
の一例が記載されている。どんな既知のモデルでも目標
仮説の一連の単語の発生確率を推定するのに使用できる
が、上記の関連特許出願では、補間推定によってその値
が得られるパラメータによって組み合わされた１グラム
・モデル、２グラム・モデル、及び３グラム・モデルを
含むｎグラム言語モデルを記載している。Referring again to FIG. 1, the translation apparatus generates, for each target hypothesis, a language model match score generation mechanism 16 for generating a language model match score including an estimated value of the occurrence probability of a series of words of the target hypothesis. including. An example of a language model match score generation mechanism is described in Sections 6 and 7 of the above related patent applications. Although any known model can be used to estimate the probability of occurrence of a series of words in the target hypothesis, in the above-mentioned related patent application, a 1-gram model, 2-gram model combined by parameters whose values are obtained by interpolation estimation. -Models and n-gram language models including 3-gram models are described.

【００３９】翻訳装置はさらに、入力された一連の原始
単語と各目標仮説の間の少なくとも１つのアラインメン
トを識別するための、アラインメント識別機構１８を含
む。アラインメントは、各原始単語を目標仮説の少なく
とも１つの目標単語に結合する。The translation device further includes an alignment identification mechanism 18 for identifying at least one alignment between the input series of source words and each target hypothesis. The alignment combines each source word with at least one target word of the target hypothesis.

【００４０】図２、３、４に、表１における一連の仮説
入力原始単語と表２における目標仮説Ｅ1の一連の仮説
入力目標単語との間の可能なアラインメントの例を概略
的に示す。各アラインメントにおいて、入力された一連
の原始単語Ｆ中の各原始単語は、目標仮説Ｅ1の少なく
とも１つの目標単語に実線で結合されている。図４のア
ラインメントでは、原始単語の第２の発生"La"からの実
線はなく、したがって「空白」単語に結合されるものと
見なされる。2, 3 and 4 schematically show examples of possible alignments between the set of hypothesis input source words in Table 1 and the set of hypothesis input target words of the target hypothesis E1 in Table 2. In each alignment, each source word in the input series of source words F is connected to at least one target word of the target hypothesis E1 by a solid line. In the alignment of FIG. 4, there is no solid line from the second occurrence of the source word "La" and is therefore considered to be joined to the "blank" word.

【００４１】表３、４、５は、それぞれ図２、３、４の
アラインメントを別の形で示したものである。Tables 3, 4, and 5 show different alignments of FIGS. 2, 3, and 4, respectively.

【表３】 [Table 3]

【表４】 [Table 4]

【表５】 [Table 5]

【００４２】各表で、１からｍ（ｍは入力された一連の
原始単語における単語の数）の範囲の各パラメータｊに
ついて、０〜ｌの範囲にある単一の値を持つ他のパラメ
ータａjがある（ただし、ｌは目標仮説における単語の
数）。In each table, for each parameter j in the range 1 to m (where m is the number of words in the input source word), another parameter aj having a single value in the range 0 to 1 (Where l is the number of words in the target hypothesis).

【００４３】所与のアラインメントについて、入力され
た一連の原始単語中の各単語ｆjが、目標仮説の単語ｅa
jに結合されている。For a given alignment, each word fj in the series of source words input is the target hypothesis word ea.
is bound to j.

【００４４】一般に、一連のｍ個の原始単語と一連のｌ
個の非空白目標単語との間には２lm通りの可能なアライ
ンメントがあり、この場合、各原始単語は空白原始単語
または１つまたは複数の非空白目標単語のいずれかに結
合される。各原始単語がただ１つの空白または非空白目
標単語に結合されるように制約されている場合には、ｍ
(l+1)通りの可能なアラインメントがある。In general, a series of m source words and a series of l
There are 2 lm possible alignments with the non-blank target words, where each source word is combined with either a blank source word or one or more non-blank target words. M if each source word is constrained to be joined to exactly one blank or non-blank target word
There are (l + 1) possible alignments.

【００４５】入力された一連の原始単語と各目標仮説と
の間のただ１つのアラインメントが、あとで説明するよ
うに、各原始単語について単語合致スコアを得るために
識別されることが好ましい。一連の原始単語と各目標仮
説のための１つの識別されたアラインメントは、上記の
関連特許出願のセクション１４に記載されているよう
に、目標仮説生成機構によって生成されることが好まし
い。Preferably, only one alignment between the input sequence of source words and each target hypothesis is identified to obtain a word match score for each source word, as described below. The series of source words and one identified alignment for each target hypothesis is preferably generated by the target hypothesis generator, as described in section 14 of the above-referenced patent application.

【００４６】原始テキスト入力装置１０が、入力された
一連の原始単語を一連の変換原始単語に変換するための
手段を含む場合には、アラインメント識別機構１８は、
一連の変換原始単語と各目標仮説との間の少なくとも１
つのアラインメントを識別する。アラインメントは、各
変換原始単語を目標仮説の少なくとも１つの目標単語に
結合する。If the source text input device 10 includes means for converting an input sequence of source words into a sequence of converted source words, the alignment identification mechanism 18 will:
At least one between the series of transformed source words and each target hypothesis
Identify the two alignments. The alignment combines each transformed source word with at least one target word of the target hypothesis.

【００４７】再び図１を見ると、翻訳装置はさらに単語
合致スコア生成機構２０を含む。単語合致スコア生成機
構２０は、各原始単語及び各目標仮説について、原始単
語ｆに結合された目標仮説の目標単語ｅが与えられかつ
原始単語ｆに結合された目標仮説の目標単語ｅの文脈Ｘ
が与えられた場合の原始単語ｆの条件付き発生確率Ｐ
（ｆ｜ｅ，Ｘ）の推定値を含む、単語合致スコアを生成
する。Referring again to FIG. 1, the translation device further includes a word match score generation mechanism 20. For each source word and each target hypothesis, the word match score generation mechanism 20 is provided with the target word e of the target hypothesis coupled to the source word f, and the context X of the target word e of the target hypothesis coupled to the source word f.
Conditional occurrence probability P of the source word f when is given
Generate a word match score that includes an estimate of (f | e, X).

【００４８】表６は、表１の入力された一連の原始単語
Ｆによる表３のアラインメントＡ1,1についての表２の
目標仮説Ｅ1の各目標単語ｅajの文脈Ｘの仮説例を示
す。Table 6 shows a hypothetical example of the context X of each target word eaj of the target hypothesis E1 of Table 2 for the alignment A1,1 of Table 3 with the input series of source words F of Table 1.

【表６】 [Table 6]

【００４９】表６に示すように、この仮説例では、選択
された目標単語の文脈Ｘは、目標仮説における選択され
た目標単語に先行する３つの目標単語と選択された目標
単語に続く３つの目標単語から成る。この文脈は句読点
と単語不在も含む。As shown in Table 6, in this hypothesis example, the context X of the selected target word is three target words preceding the selected target word in the target hypothesis and three target words following the selected target word. Composed of target words. This context also includes punctuation and the absence of words.

【００５０】一般に、原始単語ｆjに結合された目標仮
説Ｅの目標単語ｅajの文脈は、２つ以上の文脈クラスの
少なくとも１つに含まれる。原始単語に結合された目標
仮説の目標単語が与えられかつ原始単語に結合された目
標単語の文脈が与えられた場合の原始単語の推定条件付
き発生確率は、原始単語に結合された目標単語の文脈を
含むクラスに依存する値を有する少なくとも１つの関数
を含むことができる。In general, the context of the target word eaj of the target hypothesis E combined with the source word fj is included in at least one of two or more context classes. The estimated conditional occurrence probability of a source word given the target word of the target hypothesis connected to the source word and the context of the target word connected to the source word is It can include at least one function whose value depends on the class that contains the context.

【００５１】あるいは、文脈は、目標仮説の品詞を有す
る少なくとも１つの単語を含むことができる。原始単語
に結合された目標仮説の目標単語が与えられかつ原始単
語に結合された目標単語の文脈が与えられた場合の原始
単語の推定条件付き発生確率は、原始単語に結合された
目標単語の文脈の少なくとも１つの単語の目標仮説の品
詞に依存する値を有する、少なくとも１つの関数を含む
ことができる。Alternatively, the context may include at least one word having a target hypothesis part of speech. The estimated conditional occurrence probability of a source word given the target word of the target hypothesis connected to the source word and the context of the target word connected to the source word is At least one function can be included that has a value that depends on the part-of-speech of the target hypothesis of at least one word of the context.

【００５２】他の例では、原始単語に結合された目標仮
説の目標単語の文脈は、識別を有する少なくとも１つの
単語を含む。原始単語に結合された目標仮説の目標単語
が与えられかつ原始単語に結合された目標単語の文脈が
与えられた場合の原始単語の推定条件付き発生確率は、
原始単語に結合された目標単語の文脈の少なくとも１つ
の単語の識別に依存する値を有する、少なくとも１つの
関数を含む。In another example, the target word context of the target hypothesis bound to the source word includes at least one word having an identification. The estimated conditional occurrence probability of a source word given a target word of the target hypothesis connected to the source word and a context of the target word connected to the source word is
Includes at least one function having a value that depends on the identification of at least one word in the context of the target word coupled to the source word.

【００５３】数式３、４、５および６は、原始単語に結
合された目標単語の文脈に依存する値を有する関数の仮
説例である。Equations 3, 4, 5 and 6 are hypothetical examples of functions whose values depend on the context of the target word bound to the source word.

【数６】ｇ1（ｆ，ｅ＝key,Ｘ）＝１，ｆ＝"clef"、かつ"key"の直前にあるＸ中の単語ｘ3が"the"である場合（３）＝０，上記でない場合G1 (f, e = key, X) = 1, f = “clef” and the word x3 in X immediately before “key” is “the” (3) = 0, If not above

【数７】ｇ2（ｆ，ｅ＝key,Ｘ）＝１，ｆ＝"clef"、かつ"key"の直前にあるＸ中の単語ｘ3が"car"である場合（４）＝０，上記でない場合## EQU00007 ## g2 (f, e = key, X) = 1, f = "clef", and the word x3 in X immediately before "key" is "car" (4) = 0, If not above

【数８】ｇ3（ｆ，ｅ＝key,Ｘ）＝１，ｆ＝"ton"、かつ"key"の直前にあるＸの単語ｘ3が"the"である場合（５）＝０，上記でない場合G3 (f, e = key, X) = 1, f = “ton” and word X of X immediately before “key” is “the” (5) = 0, not above If

【数９】ｇ4（ｆ，ｅ＝key,Ｘ）＝１，ｆ＝"ton"、かつ"key"の直後にくるＸ中の単語ｘ4または"key"の次の次にくるＸ中の単語ｘ5が、集合｛Ａ、Ｂ、Ｃ、Ｄ、Ｅ、ＦＧ｝の一要素である場合（６）＝０，上記でない場合## EQU00009 ## g4 (f, e = key, X) = 1, f = "ton", and a word x4 in X immediately after "key" or in X next to "key" in X When the word x5 is one element of the set {A, B, C, D, E, FG} (6) = 0, otherwise

【００５４】数式３において、関数ｇ1は、原始単語ｆ
が"clef"であり、目標単語ｅが"key"であり、"key"の直
線にある文脈Ｘ中の単語が"the"である場合、１の値を
持つ。これらの条件に合っていない場合には、文脈関数
ｇ1は０の値を持つ。In Equation 3, the function g1 is the source word f
Is “clef”, the target word e is “key”, and the word in the context X on the straight line of “key” is “the”, it has a value of 1. When these conditions are not met, the context function g1 has a value of 0.

【００５５】数式４の仮説文脈関数ｇ2は、原始単語ｆ
が"clef"であり、目標単語ｅが"key"であり、"key"の直
前にある文脈Ｘ中の単語が"car"である場合、１の値を
持つ。これらの条件に合っていない場合には、関数ｇ2
は０の値を持つ。The hypothetical context function g2 of the mathematical expression 4 is the source word f
Is “clef”, the target word e is “key”, and the word in the context X immediately before “key” is “car”, it has a value of 1. If these conditions are not met, the function g2
Has a value of 0.

【００５６】数式５において、文脈関数ｇ3は、原始単
語ｆが"ton"であり、目標単語ｅが"key"であり、"key"
の直前にある文脈Ｘ中の単語が"the"である場合、１の
値を持つ。これらの条件に合っていない場合には、文脈
関数ｇ3は０の値を持つ。[Mathematical formula-see original document] In the mathematical expression 5, in the context function g3, the source word f is "ton", the target word e is "key", and "key".
Has a value of 1 if the word in context X immediately preceding the is "the". When these conditions are not met, the context function g3 has a value of 0.

【００５７】最後に、数式６における仮説文脈関数ｇ4
は、原始単語ｆが"ton"であり、目標単語ｅが"key"であ
り、"key"の直後にくる文脈Ｘ中の単語、または"key"の
次の次にくる文脈Ｘ中の単語が、集合｛Ａ、Ｂ、Ｃ、
Ｄ、Ｅ、Ｆ、Ｇ｝の一要素である場合、１の値を持つ。
これらの条件に合っていない場合には、文脈関数ｇ4は
０の値を持つ。Finally, the hypothetical context function g4 in Equation 6
Is a word in context X that comes immediately after "key" and whose source word f is "ton" and target word e is "key", or a word in context X that comes next to "key" Is a set {A, B, C,
When it is one element of D, E, F, G}, it has a value of 1.
When these conditions are not met, the context function g4 has a value of 0.

【００５８】表７は、原始単語ｆ＝"clef"、目標単語ｅ
＝"key"の場合の、表２の目標仮説Ｅ1の目標単語"key"
の文脈Ｘに関する文脈関数ｇ（ｆ，ｅ，Ｘ）の評価を示
す。Table 7 shows the source word f = “clef” and the target word e.
= ”Key”, the target word “key” of the target hypothesis E1 in Table 2
3 shows an evaluation of a context function g (f, e, X) for a context X of.

【表７】 [Table 7]

【００５９】表７に示すように、文脈関数ｇ1は１の値
を持ち、文脈関数ｇ2、ｇ3、ｇ4は０の値を持つ。As shown in Table 7, the context function g1 has a value of 1, and the context functions g2, g3, g4 have a value of 0.

【００６０】上述のように、各原始単語及び各目標仮説
の単語合致スコアは、原始単語ｆに結合された目標仮説
の目標単語ｅが与えられかつ原始単語ｆに結合された目
標単語ｅの文脈Ｘが与えられた場合の原始単語ｆの条件
付き発生確率Ｐ（ｆ｜ｅ，Ｘ）の推定値を含む。単語合
致スコアは、例えば次の数式によって定義されたモデル
を用いて得ることができる。As described above, the word match score of each source word and each target hypothesis is given by the target word e of the target hypothesis connected to the source word f and the context of the target word e connected to the source word f. It contains an estimate of the conditional occurrence probability P (f | e, X) of the source word f when X is given. The word match score can be obtained using, for example, a model defined by the following mathematical formula.

【数１０】 [Equation 10]

【００６１】数式７では、関数ｇi（ｆ，ｅ，Ｘ）は、
入力された一連の原始単語と目標仮説との間のアライン
メントにおける原始単語ｆの結合された目標単語ｅの文
脈Ｘに依存する値を有する関数である。パラメータλ
(e,i)は、文脈Ｘにおける目標単語ｅから原始単語ｆを
予測する際の各文脈関数ｇiの相対的強度を表すパラメ
ータである。数量Ｎ（ｅ，Ｘ）は、数式８に示すよう
に、目標単語ｅと目標単語ｅの文脈Ｘに依存する正規化
係数である。In equation 7, the function gi (f, e, X) is
A function having a value that depends on the context X of the combined target word e of the source words f in the alignment between the input series of source words and the target hypothesis. Parameter λ
(e, i) is a parameter indicating the relative strength of each context function gi when predicting the source word f from the target word e in the context X. The quantity N (e, X) is a normalization coefficient that depends on the target word e and the context X of the target word e, as shown in Expression 8.

【数１１】 [Equation 11]

【００６２】目標単語ｅ＝"key"の場合、上記の数式３
〜６の仮説文脈関数ｇ1〜ｇ4について、数式９は、原始
単語ｆのための単語合致スコアを生成する仮説モデルで
ある。When the target word e = “key”, the above equation 3
For hypothetical context functions g1 to g4 of ~ 6, Equation 9 is a hypothesis model that produces a word match score for the source word f.

【数１２】 [Equation 12]

【００６３】数式９に関して、正規化Ｎ（ｅ，Ｘ）は数
式１０によって与えられる。With respect to Equation 9, the normalized N (e, X) is given by Equation 10.

【数１３】（１０）[Equation 13] (10)

【００６４】このモデルの使い方を説明するために、モ
デル・パラメータの仮説値をλ(e=key,1)＝0.12、λ(e=
key,2)＝0.34、λ(e=key,3)＝0.09、λ(e=key,4)＝0.40
とする。In order to explain how to use this model, the hypothetical values of the model parameters are λ (e = key, 1) = 0.12 and λ (e = e
key, 2) = 0.34, λ (e = key, 3) = 0.09, λ (e = key, 4) = 0.40
And

【００６５】表８は、原始単語"clef"、目標単語"key"
の場合に、表２の目標仮説Ｅ1における"key"の文脈Ｘに
ついての数式７、９からの単語合致スコアの計算を示す
ものである。Table 8 shows the source word "clef" and the target word "key".
In the case of, the calculation of the word matching score from Expressions 7 and 9 for the context X of "key" in the target hypothesis E1 of Table 2 is shown.

【表８】 [Table 8]

【数１４】 [Equation 14]

【００６６】この仮説例では、原始単語"clef"に結合さ
れた目標仮説の目標単語"key"が与えられかつ原始単語"
clef"に結合された目標仮説の目標単語"key"の文脈が与
えられた場合の原始単語の条件付き発生確率Ｐ（"clef"
｜"key",Ｘ）は、（数式９と１０から）0.507となる。In this hypothesis example, the target word "key" of the target hypothesis combined with the source word "clef" is given and the source word "key" is given.
Conditional occurrence probability P ("clef" of the source word when the context of the target word "key" of the target hypothesis linked to clef "is given
| "Key", X) is 0.507 (from equations 9 and 10).

【００６７】アラインメント識別機構１８が、入力され
た一連の原始単語と各目標仮説との間の２つ以上のアラ
インメントを識別する場合、単語合致スコア生成機構
は、各原始単語及び各アラインメント及び各目標仮説に
ついて単語合致スコアを生成する。各単語合致スコアは
数式７のモデルを用いて推定できる。When the alignment identifying mechanism 18 identifies two or more alignments between the input sequence of source words and each target hypothesis, the word match score generator will use each source word and each alignment and each target. Generate a word match score for the hypothesis. Each word match score can be estimated using the model of Equation 7.

【００６８】原始テキスト入力装置１０が入力された一
連の原始単語を一連の変換原始単語に変換する手段を含
む場合には、単語合致スコア生成機構は、各変換原始単
語及び各目標仮説について単語合致スコアを生成する。
単語合致スコアは、変換原始単語に結合された目標仮説
の目標単語が与えられかつ変換原始単語に結合された目
標仮説の目標単語の文脈が与えられた場合の変換原始単
語の条件付き発生確率の推定値を含む。変換された単語
の単語合致スコアも数式７のモデルを用いて推定でき
る。When the source text input device 10 includes means for converting a series of input source words into a series of converted source words, the word matching score generating mechanism causes the word matching score generation unit to match the words for each converted source word and each target hypothesis. Generate a score.
The word match score is the conditional occurrence probability of the converted source word given the target word of the target hypothesis connected to the converted source word and the context of the target word of the target hypothesis connected to the converted source word. Including estimated value. The word match score of the converted word can also be estimated using the model of Equation 7.

【００６９】本発明による翻訳装置さらには、翻訳合致
スコア生成機構２２を含む。翻訳合致スコア生成機構２
２は、各目標仮説について、目標仮説と入力された一連
の原始単語中の原始単語の単語合致スコアの組合せを含
む、翻訳合致スコアを生成する。目標仮説のための翻訳
合致スコアは例えば、目標仮説と入力された一連の原始
単語中の原始単語の単語合致スコアの積を含む。The translation apparatus according to the present invention further includes a translation match score generation mechanism 22. Translation match score generation mechanism 2
2 generates, for each target hypothesis, a translation match score that includes a combination of the target hypothesis and the word match score of the source word in the series of source words input. The translation match score for the target hypothesis includes, for example, the product of the word match scores of the source words in the series of source words input with the target hypothesis.

【００７０】表９は、表１の入力された一連の原始単語
ｆj、及び表２の目標仮説Ｅ1の目標単語ｅajについて翻
訳合致スコアの計算を示す。Table 9 shows the calculation of the translation match score for the input series of source words fj of Table 1 and the target word eaj of the target hypothesis E1 of Table 2.

【表９】 [Table 9]

【数１５】（本発明を説明するために、この例ではファーティリテ
ィスコアとひずみスコアを想定し、組合せ係数はすべて
１である）[Equation 15] (In order to explain the present invention, a fertility score and a distortion score are assumed in this example, and the combination coefficients are all 1).

【００７１】各単語合致スコアＰ（ｆj｜ｅaj，Ｘ）は
数式７から得られる。表９の数は仮説の数である。本発
明を説明するために、上記の関連特許出願のファーティ
リティスコアとひずみスコアは１に想定されている。The word matching score P (fj | eaj, X) is obtained from the equation 7. The numbers in Table 9 are hypothetical numbers. To illustrate the present invention, the Fertility Score and Strain Score of the above related patent applications are assumed to be one.

【００７２】図１を再び見ると、翻訳装置は、各目標仮
説について目標仮説合致スコアを生成する仮説合致スコ
ア生成機構２４を含む。各目標仮説合致スコアは、目標
仮説の言語モデル合致スコアと目標仮説の翻訳合致スコ
アの組合せ（たとえば積）を含む。上述のように、言語
モデル合致スコアは、上記の関連特許出願によって記載
されたｎグラム言語モデルなどの既知の言語モデルから
得ることもできる。Referring again to FIG. 1, the translation device includes a hypothesis matching score generator 24 which generates a target hypothesis matching score for each target hypothesis. Each target hypothesis match score includes a combination (eg, product) of the target model hypothesis language model match score and the target hypothesis translation match score. As mentioned above, the linguistic model match score can also be obtained from known linguistic models, such as the n-gram linguistic model described by the above-referenced patent applications.

【００７３】アラインメント識別機構１８が、入力され
た一連の原始単語と各目標仮説との間の２つ以上のアラ
インメントを識別する場合には、各目標仮説の翻訳合致
スコアは、目標仮説の単語合致スコアとアラインメント
と入力された一連の原始単語中の原始単語との組合せを
含む。When the alignment identifying mechanism 18 identifies two or more alignments between the input series of source words and each target hypothesis, the translation match score of each target hypothesis is the word match of the target hypothesis. It includes a combination of a score, an alignment and a source word in the set of source words entered.

【００７４】原始テキスト入力装置１０が、入力された
一連の原始単語を一連の変換原始単語に変換する手段を
含む場合には、翻訳合致スコアは、目標仮説の単語合致
スコアと変換原始単語との組合せを含む。When the source text input device 10 includes means for converting a series of input source words into a series of converted source words, the translation match score is a combination of the word match score of the target hypothesis and the converted source word. Including combinations.

【００７５】本発明による翻訳装置はさらに出力部２６
を含む。出力部２６は最良の目標仮説合致スコアを持つ
目標仮説を出力する。出力部２６は、例えば表示装置ま
たはプリンタを含むことができる。The translation device according to the present invention further includes an output unit 26.
including. The output unit 26 outputs the target hypothesis having the best target hypothesis matching score. The output unit 26 can include, for example, a display device or a printer.

【００７６】原始テキスト入力装置１０が入力された一
連の原始単語を一連の変換原始単語に変換する手段を含
む場合には、出力部２６は、最良の目標仮説合致スコア
を持つ目標仮説から一連の出力単語を合成する手段を含
む。目標仮説合致スコアを持つ目標仮説から一連の出力
単語を合成する手段は、上記の関連特許出願のセクショ
ン５に記載されているような目標変換機構を含むことが
できる。例えば、元の原始入力単語の時制を示すタグ付
きの目標単語"be"は、不定詞"be"の変化形である合成出
力単語｛be, was, were, been, am, are, is, being} の１つに変換される。If the source text input device 10 includes means for converting a series of input source words into a series of converted source words, the output unit 26 outputs a series of target hypotheses having the best target hypothesis matching score. It includes means for synthesizing output words. The means for synthesizing a series of output words from a target hypothesis with a target hypothesis match score may include a target transformation mechanism as described in Section 5 of the above-referenced patent application. For example, the target word "be" with a tag indicating the tense of the original source input word is a synthetic output word {be, was, were, been, am, are, is, being which is a variation of the infinitive "be". } Is converted into one of.

【００７７】上述のように、数式７は、本発明に従って
単語合致スコアを得るために使用できるモデルの一例で
ある。単語合致スコア・モデルの文脈関数ｇi（ｆ，
ｅ，Ｘ）及び単語合致スコア・モデルのパラメータλ
(e,i)は下記のようにして得ることができる。As mentioned above, Equation 7 is an example of a model that can be used to obtain a word match score in accordance with the present invention. Context function g i (f,
e, X) and the parameter λ of the word match score model
(e, i) can be obtained as follows.

【００７８】候補文脈関数ｇi（ｆ，ｅ，Ｘ）は例え
ば、前記の表６の例に示すように、目標仮説における目
標単語の文脈Ｘを目標単語ｅの左側３単語と目標単語ｅ
の右側３単語に制限することによって、得ることができ
る。The candidate context function gi (f, e, X) is, for example, as shown in the example of Table 6 above, the context X of the target word in the target hypothesis is defined by the three words to the left of the target word e and the target word e.
It can be obtained by limiting to the right 3 words of.

【００７９】次に、相互の翻訳である原始言語文と目標
言語文の訓練テキストが得られる。相互の翻訳である対
応する原始言語文と目標言語文を、例えば熟練した翻訳
者が固定することができる。相互の翻訳である対応する
原始言語文と目標言語文は、例えば上記の関連特許出願
のセクション１２と１３に記載の方法によって、自動的
に識別できることが好ましい。Next, the training texts of the source language sentence and the target language sentence, which are mutual translations, are obtained. Corresponding source language sentences and target language sentences, which are mutual translations, can be fixed, for example by a trained translator. Corresponding source and target language sentences, which are mutual translations, are preferably automatically identifiable, for example by the methods described in sections 12 and 13 of the above-referenced patent applications.

【００８０】訓練テキストにおける対応する原始文と目
標文の各対について、原始単語と目標単語との間の推定
された最も確率の高いアラインメントを、アラインメン
ト識別機構１８によって使用される前記の方法を用いて
見つける。次に、訓練テキストにおける目標単語ｅajの
各発生を識別し、各アラインメントにおける目標単語ｅ
ajに結合された原始単語ｆjでタグ付けする。目標単語
ｅajも各目標文における文脈Ｘでタグ付けする。For each corresponding source and target sentence pair in the training text, the estimated most probable alignment between the source and target words is calculated using the method described above used by the alignment identification mechanism 18. Find out. Next, each occurrence of the target word eaj in the training text is identified, and the target word eaj in each alignment is identified.
Tag with the source word fj combined with aj. The target word eaj is also tagged with the context X in each target sentence.

【００８１】表１０は、整列された原始言語文と目標言
語文の訓練テキストにおける目標単語ｅaj＝"key"に関
する、訓練イベントの仮説例を示す。Table 10 shows a hypothetical example of a training event for the target word eaj = "key" in the training text of the aligned source language sentence and target language sentence.

【表１０】 [Table 10]

【００８２】候補文脈関数ｇi（ｆ，ｅ，Ｘ）は、訓練
テキストからの目標単語ｅajの訓練イベントを用いて、
まず文脈関数の形式を識別することによって得られる。
例えば、ある形式の文脈関数は、文脈中の１つまたは複
数の位置における特定単語の存在をテストする。別の形
式の文脈関数は、文脈中の１つまたは複数の位置におけ
る特定の単語クラス（例えば品詞）の存在をテストす
る。目標単語ｅの文脈関数においてテストすべき特定の
単語または単語クラスは、訓練テキスト中の目標単語ｅ
の文脈における単語または単語クラスから得ることがで
きる。The candidate context function gi (f, e, X) is calculated using the training event of the target word eaj from the training text,
It is obtained by first identifying the form of the context function.
For example, one form of context function tests for the presence of a particular word at one or more positions in the context. Another form of context function tests for the presence of a particular word class (eg, part of speech) at one or more positions in the context. The specific word or word class to be tested in the context function of the target word e is the target word e in the training text.
Can be obtained from a word or word class in the context of.

【００８３】あるいは、候補文脈関数ｇi（ｆ，ｅ，
Ｘ）を、上記の関連特許出願のセクション７に記載の方
法を使用して、文脈Ｘに従って訓練イベントをクラス化
することによって得ることができる。Alternatively, the candidate context function gi (f, e,
X) can be obtained by classifying the training events according to context X using the method described in section 7 of the related patent application above.

【００８４】最初に、数式７の単語合致スコア・モデル
のパラメータλ(e,i)をすべてゼロに設定する。First, all the parameters λ (e, i) of the word match score model of Equation 7 are set to zero.

【００８５】各候補文脈関数ｇi（ｆ，ｅ，Ｘ）につい
て、「メリットの尺度」Ｇ（ｉ）を数式１１に従って算
出する。For each candidate context function gi (f, e, X), the “measure of merit” G (i) is calculated according to equation 11.

【数１６】 [Equation 16]

【数１７】 [Equation 17]

【数１８】 [Equation 18]

【数１９】 [Formula 19]

【００８６】数式１１ないし１４では、結合目標単語ｅ
の文脈Ｘが与えられた場合の原始単語ｆの条件付き確率
Ｐ（ｆ｜Ｘ）が、モデル・パラメータの最も新しい値を
用いて、数式７から得られる。文脈Ｘの確率Ｐ（Ｘ）
は、表１０に示す形式の訓練テキスト・イベントにおけ
る目標単語ｅと文脈Ｘの発生をカウントし、そのカウン
トを、目標単語をｅとする訓練テキストにおけるイベン
トの全数で割ることによって得られる。原始単語ｆと文
脈Ｘの確率Ｐ（ｆ，Ｘ）は、表１０に示す形式の訓練テ
キスト・イベントにおける目標単語ｅ、文脈Ｘ、及び原
始単語ｆの発生をカウントし、各カウントを、目標単語
をｅとする訓練テキストにおけるイベントの全数で割る
ことによって得られる。In equations 11 to 14, the combined target word e
The conditional probability P (f | X) of the source word f given the context X of is obtained from Eq. 7 using the most recent values of the model parameters. Probability P (X) of context X
Is obtained by counting the occurrences of the target word e and context X in a training text event of the form shown in Table 10, and dividing that count by the total number of events in the training text with the target word e. The probability P (f, X) between the source word f and the context X counts the occurrences of the target word e, the context X, and the source word f in a training text event of the form shown in Table 10, and each count is the target word Obtained by dividing by the total number of events in the training text, where e is e.

【００８７】数式１１からの「メリットの尺度」Ｇ
（ｉ）が最高の文脈関数ｇi（ｆ，ｅ，Ｘ）が、数式７
で使用すべき文脈関数として選択される。パラメータλ
(e,i)は、まずλ(e,i)＝０を設定し、次に数量Δλ(e,
i)について下記の数式１５を解くことによって得られ
る。“Measures of Merit” from Equation 11 G
The context function gi (f, e, X) with the highest (i) is
Selected as the context function to be used in. Parameter λ
For (e, i), first set λ (e, i) = 0 and then the quantity Δλ (e,
It is obtained by solving the following Expression 15 for i).

【数２０】 [Equation 20]

【００８８】λ(e,i)の新しい値は、前のλ(e,i)にΔλ
(e,i)を加えることによって得られる。次いでλ(e,i)の
新しい値を用いてΔλ(e,i)の新しい値について数式１
５を解く。このプロセスを、Δλ(e,i)の値が選択され
た限界値以下になるまで繰り返す。この方法は反復スケ
ーリングとして知られる。The new value of λ (e, i) is equal to the previous λ (e, i) by Δλ.
Obtained by adding (e, i). Then, using the new value of λ (e, i), Equation 1 for the new value of Δλ (e, i)
Solve 5. This process is repeated until the value of Δλ (e, i) is less than or equal to the selected limit value. This method is known as iterative scaling.

【００８９】単語合致スコアの新しいモデル（数式７）
を使用して、残りの候補文脈関数ｇi（ｆ，ｅ，Ｘ）に
ついて数式１１の「メリットの尺度」Ｇ（ｉ）を再計算
し、最高の「メリットの尺度」Ｇ（ｉ）を有する残りの
文脈関数を識別する。最良の残りの文脈関数を数式７の
単語合致スコア・モデルに加え、パラメータλ(e,i)の
すべての新しい値を、反復スケーリング法と数式１５を
用いて計算する。数式７の単語合致スコアが２つ以上の
パラメータλ(e,i)を含むときは、各パラメータλ(e,i)
は反復毎にちょうど一回だけ更新され、したがってすべ
てのパラメータλ(e,i)同じ反復で収斂する。最良の文
脈関数の「メリットの尺度」が選択された限界値以下に
なるまで、このプロセスが残りの候補文脈関数ｇi
（ｆ，ｅ，Ｘ）で反復される。New model of word match score (Equation 7)
To recalculate the "measure of merit" G (i) of Equation 11 for the remaining candidate context functions g i (f, e, X), and the remaining with the highest "measure of merit" G (i) Identifies the context function of. The best remaining context function is added to the word match score model of Equation 7 and all new values of the parameter λ (e, i) are calculated using the iterative scaling method and Equation 15. When the word match score in Expression 7 includes two or more parameters λ (e, i), each parameter λ (e, i)
Is updated exactly once per iteration, so all parameters λ (e, i) converge at the same iteration. This process continues until the best context function "measure of merit" is less than or equal to the selected limit.
Iterate over (f, e, X).

【００９０】まとめとして、本発明の構成に関して以下
の事項を開示する。As a summary, the following matters will be disclosed regarding the configuration of the present invention.

【００９１】（１）第１言語の一連の原始単語を第１言
語とは異なる第２言語の一連の目標単語に翻訳するため
の装置であって、一連の原始単語を入力する手段と、各
目標仮説が第２言語の単語の語彙から選択された一連の
目標単語を含み、各目標単語が目標仮説の少なくとも１
つの他の単語を含む文脈を有する、少なくとも２つの目
標仮説を発生させる手段と、各目標仮説について、目標
仮説の一連の単語の発生確率の推定値を含む言語モデル
合致スコアを生成する手段と、各原始単語を目標仮説の
少なくとも１つの目標単語に結合する、入力された一連
の原始単語と各目標仮説との間の少なくとも１つのアラ
インメントを識別する手段と、各原始単語及び各目標仮
説について、原始単語に結合された目標仮説の目標単語
が与えられ、かつ原始単語に結合された目標仮説の目標
単語の文脈が与えられた場合の原始単語の条件付き発生
確率の推定値を含む、単語合致スコアを生成する手段
と、各目標仮説について、目標仮説の単語合致スコアと
入力された一連の原始単語中の原始単語との組合せを含
む、翻訳合致スコアを生成する手段と、各目標仮説合致
スコアが目標仮説の言語モデル合致スコアと目標仮説の
翻訳合致スコアとの組合せを含む、各目標仮説の目標仮
説合致スコアを生成する手段と、最良の目標仮説合致ス
コアを有する目標仮説を出力する手段とを含む装置。（２）各目標仮説が、第２言語の単語を含む語彙から選
択された一連の目標単語と単語がないことを示す空白単
語を含むことを特徴とする、上記（１）に記載の装置。（３）少なくとも１つのアラインメントを識別する手段
が、それぞれが各原始単語を目標仮説の少なくとも１つ
の目標単語と結合する、入力された一連の原始単語と各
目標単語との間の２つ以上のアラインメントを識別し、
単語合致スコア生成機構が、各原始単語及び各アライン
メント及び各目標仮説について、原始単語に結合された
目標単語が与えられかつ目標単語の文脈が与えられた場
合の原始単語の条件付き発生確率の推定値を含む、単語
合致スコアを生成し、翻訳合致スコア生成機構が、各目
標仮説について、目標仮説の単語合致スコアと入力され
た一連の原始単語中の原始単語との組合せを含む、翻訳
合致スコアを生成することを特徴とする、上記（２）に
記載の装置。（４）入力手段が、入力された一連の原始単語を一連の
変換原始単語に変換する手段を含み、アラインメント手
段が、各変換原始単語を目標仮説の少なくとも１つの目
標単語に結合する、一連の変換原始単語と各目標仮説と
の間の少なくとも１つのアラインメントを識別し、単語
合致スコア生成機構が、各変換原始単語及び各目標仮説
について、変換単語に結合された目標仮説の目標単語が
与えられかつ変換原始単語に結合された目標仮説の目標
単語の文脈が与えられた場合の変換原始単語の条件付き
発生確率の推定値を含む、単語合致スコアを生成し、翻
訳合致スコア生成機構が、各目標仮説について、目標仮
説の単語合致スコアと変換原始単語との組合せを含む、
翻訳合致スコアを生成し、出力手段が、最良の目標仮説
合致スコアを有する目標仮説から一連の出力単語を合成
する手段と、合成出力単語を出力する出力手段とを含む
ことを特徴とする、上記（２）に記載の装置。（５）目標仮説の翻訳合致スコアが、目標仮説の単語合
致スコアと入力された一連の原始単語中の原始単語との
積を含み、目標仮説の目標仮説合致スコアが、目標仮説
の言語モデル合致スコアに目標仮説の翻訳合致スコアを
掛けた積を含むことを特徴とする、上記（２）に記載の
装置。（６）原始単語に結合された目標仮説における目標単語
文脈が、２つ以上の文脈クラスの少なくとも１つ中に含
まれ、原始単語に結合された目標単語が与えられかつ目
標単語の文脈が与えられた場合の原始単語の推定条件付
き発生確率が、目標単語の文脈を含むクラスに依存する
値を有する少なくとも１つの関数を含むことを特徴とす
る、上記（２）に記載の装置。（７）原始単語に結合された目標仮説の目標単語の文脈
が、目標仮説の品詞を有する少なくとも１つの単語を含
み、原始単語に結合された目標単語が与えられかつ目標
単語の文脈が与えられた、原始単語の推定付き発生確率
が、原始単語に結合された目標単語の文脈における少な
くとも１つの単語の目標仮説の品詞に依存する値を有す
る少なくとも１つの関数を含むことを特徴とする、上記
（２）に記載の装置。（８）原始単語に結合された目標仮説の目標単語の文脈
が、識別を有する少なくとも１つの単語を含み、原始単
語に結合された目標単語が与えられかつ目標単語の文脈
が与えられた場合の原始単語の推定条件付き発生確率
が、原始単語に結合された目標単語の文脈における少な
くとも１つの単語の識別に依存する値を有する少なくと
も１つの関数を含むことを特徴とする、上記（２）に記
載の装置。（９）最良の合致スコアを有する目標仮説を出力する手
段がディスプレイを含むことを特徴とする、上記（２）
に記載の装置。（１０）入力手段がキーボードを含むことを特徴とす
る、上記（２）に記載の装置。（１１）入力手段がコンピュータ・ディスク駆動機構を
含むことを特徴とする、上記（２）に記載の装置。（１２）入力手段がコンピュータ・テープ駆動機構を含
むことを特徴とする、上記（２）に記載の装置。（１３）第１言語の一連の原始単語を第１言語とは異な
る第２言語の一連の目標単語に翻訳する方法であって、
一連の原始単語を入力するステップと、各目標仮説が第
２言語の単語の語彙から選択された一連の目標単語を含
み、各目標単語が目標仮説の少なくとも１つの他の単語
を含む文脈を有する、少なくとも２つの目標仮説を生成
するステップと、各目標仮説について、目標仮説の一連
の単語の発生確率の推定値を含む言語モデル合致スコア
を生成するステップと、各原始単語を目標仮説の少なく
とも１つの目標単語に結合する、入力された一連の原始
単語と各目標仮説との間の少なくとも１つのアラインメ
ントを識別するステップと、各原始単語及び各目標仮説
について、原始単語に結合された目標仮説の目標単語が
与えられかつ原始単語に結合された目標仮説の目標単語
の文脈が与えられた場合の原始単語の条件付き発生確率
の推定値を含む、単語合致スコアを生成するステップ
と、各目標仮説について、目標仮説の単語合致スコアと
入力された一連の原始単語の原始単語との組合せを含む
翻訳合致スコアを生成するステップと、目標仮説の言語
モデル合致スコアと目標仮説の翻訳合致スコアとの組合
せを含む、各目標仮説の目標仮説合致スコアを生成する
ステップと、最良の目標仮説合致スコアを有する目標仮
説を出力するステップとを含む方法。（１４）各目標仮説が、第２言語の単語を含む語彙から
選択された一連の目標単語と単語がないことを示す空白
単語を含むことを特徴とする、上記（１３）に記載の方
法。（１５）少なくとも１つのアラインメントを識別するス
テップが、各原始単語を目標仮説の少なくとも１つの目
標単語と結合する、入力された一連の原始単語と各目標
単語との間の２つ以上のアラインメントを識別するステ
ップを含み、単語合致スコア生成ステップが、各原始単
語及び各アラインメント及び各目標仮説について、原始
単語に結合された目標単語が与えられかつ目標単語の文
脈が与えられた場合の原始単語の条件付き発生確率の推
定値を含む、単語合致スコアの生成を含み、翻訳合致ス
コア生成ステップが、各目標仮説について、目標仮説の
単語合致スコアと入力された一連の原始単語の原始単語
との組合せを含む翻訳合致スコアを生成するステップを
含むこと、を特徴とする、上記（１４）に記載の方法。（１６）入力ステップが、入力された一連の原始単語を
一連の変換原始単語に変換するステップを含み、アライ
ンメント・ステップが、各変換原始単語を目標仮説の少
なくとも１つの目標単語に結合する、一連の変換原始単
語と各目標仮説との間の少なくとも１つのアラインメン
トを識別するステップを含み、単語合致スコア生成ステ
ップが、各変換原始単語及び各目標仮説について、変換
単語に結合された目標仮説の目標単語が与えられかつ変
換原始単語に結合された目標仮説の目標単語の文脈が与
えられた場合の変換原始単語の条件付き発生確率の推定
値を含む、単語合致スコアを生成するステップを含み、
翻訳合致スコア生成ステップが、各目標仮説について、
目標仮説の単語合致スコアと変換原始単語との組合せを
含む翻訳合致スコアを生成するステップを含み、出力ス
テップが、最良の目標仮説合致スコアを有する目標仮説
から一連の出力単語を合成するステップと、合成出力単
語を出力する出力ステップを含むことを特徴とする、上
記（１４）に記載の方法。（１７）目標仮説の翻訳合致スコアが、目標仮説の単語
合致スコアと入力された一連の原始単語中の原始単語と
の積を含み、目標仮説の目標仮説合致スコアが、目標仮
説の言語モデル合致スコアに目標仮説の翻訳合致スコア
を掛けた積を含むことを特徴とする、上記（１４）に記
載の方法。（１８）原始単語に結合された目標仮説における目標単
語文脈が、２つ以上の文脈クラスの少なくとも１つ中に
含まれ、原始単語に結合された目標単語が与えられかつ
目標単語の文脈が与えられた場合の原始単語の推定条件
付き発生確率が、目標単語の文脈を含むクラスに依存す
る値を有する少なくとも１つの関数を含むことを特徴と
する、上記（１４）に記載の方法。（１９）原始単語に結合された目標仮説の目標単語の文
脈が、目標仮説の品詞を有する少なくとも１つの単語を
含み、原始単語に結合された目標単語が与えられかつ目
標単語の文脈が与えられた場合の原始単語の推定条件付
き発生確率が、原始単語に結合された目標単語の文脈に
おける少なくとも１つの単語の目標仮説の品詞に依存す
る値を有する少なくとも１つの関数を含むことを特徴と
する、上記（１４）に記載の方法。（２０）原始単語に結合された目標仮説の目標単語の文
脈が、識別を有する少なくとも１つの単語を含み、原始
単語に結合された目標単語が与えられかつ目標単語の文
脈が与えられた場合の原始単語の推定条件付き発生確率
が、原始単語に結合された目標単語の文脈における少な
くとも１つの単語の識別に依存する値を有する少なくと
も１つの関数を含むことを特徴とする、上記（１４）に
記載の方法。（２１）最良の合致スコアを有する目標仮説を出力する
ステップが、最良の合致スコアを有する目標仮説を表示
するステップを含むことを特徴とする、上記（１４）に
記載の方法。(1) An apparatus for translating a series of source words in a first language into a series of target words in a second language different from the first language, and means for inputting the series of source words, and The target hypothesis comprises a series of target words selected from a vocabulary of words in a second language, each target word being at least one of the target hypotheses.
Means for generating at least two target hypotheses having a context that includes two other words; and for each target hypothesis, generating a language model match score that includes an estimate of the probability of occurrence of a series of words of the target hypothesis. Means for identifying at least one alignment between the input series of source words and each target hypothesis, which combines each source word with at least one target word of the target hypothesis; and for each source word and each target hypothesis, A word match that contains an estimate of the conditional occurrence probability of the source word given the target word of the target hypothesis connected to the source word and the context of the target word of the target hypothesis connected to the source word A translation match score, including means for generating a score and for each target hypothesis, a combination of the target hypothesis word match score and the source word in the series of source words entered. A means for generating, and a means for generating a target hypothesis matching score for each target hypothesis, each target hypothesis matching score including a combination of a language model match score for the target hypothesis and a translation match score for the target hypothesis; Means for outputting a target hypothesis having a score. (2) The apparatus according to (1) above, wherein each target hypothesis includes a series of target words selected from a vocabulary containing words in a second language and a blank word indicating no words. (3) A means for identifying at least one alignment comprises two or more between an input sequence of source words and each target word, each combining each source word with at least one target word of the target hypothesis. Identify the alignment,
For each source word, each alignment, and each target hypothesis, the word matching score generation mechanism estimates the conditional occurrence probability of the source word given the target word connected to the source word and the context of the target word. A translation match score generating a word match score, including values, wherein the translation match score generation mechanism includes, for each target hypothesis, a combination of the target hypothesis word match score and a source word in the set of source words. The device according to (2) above, which is characterized in that: (4) A series of input means includes means for converting the input series of source words into a series of converted source words, and the alignment means combines each converted source word with at least one target word of the target hypothesis. Identifying at least one alignment between the transformed source word and each target hypothesis, and for each transformed source word and each target hypothesis, a target word of the target hypothesis coupled to the transformed word is provided. And, when the context of the target word of the target hypothesis coupled to the converted source word is given, including the estimated value of the conditional occurrence probability of the converted source word, to generate a word match score, the translation match score generation mechanism, For the target hypothesis, including the combination of the target hypothesis word match score and the converted source word,
Generating a translation match score, the output means including means for synthesizing a series of output words from the target hypothesis having the best target hypothesis match score; and output means for outputting the synthesized output word. The device according to (2). (5) The translation match score of the target hypothesis includes the product of the word match score of the target hypothesis and the source word in the series of input source words, and the target hypothesis match score of the target hypothesis matches the language model of the target hypothesis. The apparatus according to (2) above, which includes a product of the score and a translation match score of the target hypothesis. (6) The target word context in the target hypothesis coupled to the source word is included in at least one of two or more context classes, the target word coupled to the source word is given, and the context of the target word is given. The apparatus according to (2) above, wherein the estimated conditional occurrence probability of the source word when given includes at least one function having a value that depends on a class that includes the context of the target word. (7) The context of the target word of the target hypothesis coupled to the source word includes at least one word having the part of speech of the target hypothesis, the target word coupled to the source word is given, and the context of the target word is given. And the estimated probability of occurrence of the source word comprises at least one function having a value that depends on the part of speech of the target hypothesis of at least one word in the context of the target word coupled to the source word, The device according to (2). (8) In the case where the context of the target word of the target hypothesis coupled to the source word includes at least one word having an identification, the target word coupled to the source word is given, and the context of the target word is given. (2) above, wherein the estimated conditional occurrence probability of the source word comprises at least one function having a value dependent on the identification of at least one word in the context of the target word bound to the source word The described device. (9) The above-mentioned (2), wherein the means for outputting the target hypothesis having the best match score includes a display.
The device according to. (10) The apparatus according to (2) above, wherein the input means includes a keyboard. (11) The apparatus according to (2) above, wherein the input means includes a computer disk drive mechanism. (12) The apparatus according to (2) above, wherein the input means includes a computer tape drive mechanism. (13) A method of translating a series of source words in a first language into a series of target words in a second language different from the first language,
Inputting a sequence of source words, each target hypothesis comprising a sequence of target words selected from a vocabulary of words in a second language, each target word having a context comprising at least one other word of the target hypothesis , Generating at least two target hypotheses; generating, for each target hypothesis, a language model match score that includes an estimate of the probability of occurrence of a series of words in the target hypothesis; Identifying at least one alignment between the input series of source words and each target hypothesis that is associated with one target word, and for each source word and each target hypothesis, the target hypothesis associated with the source word. Including an estimate of the conditional occurrence probability of the source word given the target word and the context of the target word of the target hypothesis coupled to the source word, Generating a word match score, generating, for each target hypothesis, a translation match score that includes a combination of the target hypothesis word match score and a series of source words of the source word, and a language model of the target hypothesis. Generating a target hypothesis match score for each target hypothesis, including a combination of the match score and the translation match score of the target hypothesis; and outputting the target hypothesis with the best target hypothesis match score. (14) The method according to (13) above, wherein each target hypothesis includes a series of target words selected from a vocabulary containing words in a second language and a blank word indicating no words. (15) The step of identifying at least one alignment comprises two or more alignments between the input series of source words and each target word, each source word being combined with at least one target word of the target hypothesis. The step of generating a word match score includes the step of identifying, for each source word and each alignment and each target hypothesis, the source word of the source word given the target word associated with the source word and given the context of the target word. A translation match score generation step that includes generating a word match score, including an estimate of the conditional probability of occurrence, and for each target hypothesis, a combination of the word match score of the target hypothesis and a source word of the input series of source words. Generating a translation match score that includes the method according to (14) above. (16) The input step includes a step of converting the input series of source words into a series of converted source words, and the alignment step combines each converted source word with at least one target word of a target hypothesis. Of at least one alignment between each transformed source word and each target hypothesis, wherein a word match score generation step includes, for each transformed source word and each target hypothesis, a goal of the target hypothesis associated with the transformed word. Generating a word match score, including an estimate of the conditional occurrence probability of the transformed source word given the word and the context of the target word of the target hypothesis coupled to the transformed source word,
The translation match score generation step, for each target hypothesis,
Generating a translation match score that includes a combination of a target hypothesis word match score and a transformed source word, the output step synthesizing a series of output words from the target hypothesis having the best target hypothesis match score; The method according to (14) above, including an output step of outputting a combined output word. (17) The translation match score of the target hypothesis includes the product of the word match score of the target hypothesis and the source word in the series of input source words, and the target hypothesis match score of the target hypothesis matches the language model of the target hypothesis. The method according to (14) above, comprising a product of the score multiplied by the translation match score of the target hypothesis. (18) A target word context in a target hypothesis coupled to a source word is included in at least one of two or more context classes, a target word coupled to the source word is given, and a target word context is given. Method according to (14) above, characterized in that the estimated conditional occurrence probability of the source word when given comprises at least one function having a value that depends on the class containing the context of the target word. (19) The context of the target word of the target hypothesis coupled to the source word includes at least one word having the part of speech of the target hypothesis, the target word coupled to the source word is given, and the context of the target word is given. The estimated conditional occurrence probability of the source word in the case of including at least one function having a value that depends on the part-of-speech of the target hypothesis of at least one word in the context of the target word bound to the source word. The method according to (14) above. (20) if the context of the target word of the target hypothesis coupled to the source word includes at least one word having an identification, the target word coupled to the source word is given and the context of the target word is given. (14) above, wherein the estimated conditional occurrence probability of the source word comprises at least one function having a value that depends on the identification of at least one word in the context of the target word bound to the source word. The method described. (21) The method according to (14) above, wherein the step of outputting the target hypothesis having the best match score includes the step of displaying the target hypothesis having the best match score.

【００９２】[0092]

【発明の効果】本発明による翻訳装置では、目標仮説生
成機構１２、言語モデル合致スコア生成機構１６、アラ
インメント識別機構１８、単語合致スコア生成機構２
０、翻訳合致スコア生成機構２２、及び仮説合致スコア
生成機構２４は、適切にプログラムされた汎用または専
用のディジタル信号プロセッサとすることができる。目
標言語語彙記憶域１４は、ランダム・アクセス・メモリ
などのコンピュータ記憶装置でよい。入力された一連の
原始単語を原始テキスト入力装置１０における一連の変
換原始単語に変換する手段と、出力部２６の最良目標仮
説合致スコアを有する目標仮説から一連の出力単語を合
成する手段も、適切にプログラムされた汎用または専用
のディジタル信号プロセッサでよい。In the translation device according to the present invention, the target hypothesis generating mechanism 12, the language model matching score generating mechanism 16, the alignment identifying mechanism 18, the word matching score generating mechanism 2 are used.
0, the translation match score generator 22, and the hypothesis match score generator 24 can be appropriately programmed general purpose or special purpose digital signal processors. The target language vocabulary store 14 may be a computer storage device such as a random access memory. Means for converting a series of inputted source words into a series of converted source words in the source text input device 10 and means for synthesizing a series of output words from a target hypothesis having a best target hypothesis matching score in the output unit 26 are also suitable. It may be a general-purpose or special-purpose digital signal processor programmed in.

[Brief description of drawings]

【図１】本発明による第１言語から第２言語に単語を翻
訳するための装置の一例のブロック図である。FIG. 1 is a block diagram of an example of an apparatus for translating words from a first language to a second language according to the present invention.

【図２】一連の仮説原始単語と一連の仮説目標単語との
間のアラインメントの例を示す概略図である。FIG. 2 is a schematic diagram showing an example of an alignment between a series of hypothesis source words and a series of hypothesis target words.

【図３】図２における一連の仮説原始単語と一連の仮説
目標単語との間の第２アラインメントの例を示す概略図
である。3 is a schematic diagram showing an example of a second alignment between the series of hypothetical source words and the series of hypothetical target words in FIG.

【図４】図２における一連の仮説原始単語と一連の仮説
目標単語との間の第３アラインメントの例を示す概略図
である。4 is a schematic diagram showing an example of a third alignment between the series of hypothetical source words and the series of hypothetical target words in FIG.

[Explanation of symbols]

１０原始テキスト入力装置１２目標仮説生成機構１４目標言語語彙記憶域１６言語モデル合致スコア生成機構１８アラインメント識別機構２０単語合致スコア生成機構２２翻訳合致スコア生成機構２４仮説合致スコア生成機構２６出力部 10 Primitive Text Input Device 12 Target Hypothesis Generating Mechanism 14 Target Language Vocabulary Storage 16 Language Model Matching Score Generating Mechanism 18 Alignment Identifying Mechanism 20 Word Matching Score Generating Mechanism 22 Translation Matching Score Generating Mechanism 24 Hypothesis Matching Score Generating Mechanism 26 Output Unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者ピーター・フィッツヒュー・ブラウンアメリカ合衆国10025 ニューヨーク州ニューヨークリバーサイド・ドライブ390 アパートメント12エイ−エフ (72)発明者スティーブ・アンドリュー・デラ・ピエトラアメリカ合衆国10965 ニューヨーク州パール・リバーメイヤー・オーバル113 (72)発明者ビンセント・ジョセフ・デラ・ピエトラアメリカ合衆国10913 ニューヨーク州ブラウベルトサンセット・ロード129 (72)発明者アンドリュー・スコット・ケーラーアメリカ合衆国02143 マサチューセッツ州ソマービルビーコン・ストリート326 ナンバー２ (72)発明者ロバート・レロイ・マーサーアメリカ合衆国10598 ニューヨーク州ヨークタウン・ハイツビューランド・ドライブ669 ─────────────────────────────────────────────────── ——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————-- # # # # # # # # # # # # # # # # # # # OF # # # = # # # # # # # # # # # # # # # # # # # # # # # # # # # # 2 Pall River Mayer Oval 113 (72) Inventor Vincent Joseph de la Pietra United States 10913 Braubert Sunset Road 129 (72) Inventor Andrew Scott Keller United States 02143 Somerville Massachusetts Beacon Street 326 Number 2 (72) Inventor Robert Leroy Mercer USA Yo Shu countries 10598 New York Kutaun Heights view Land drive 669

Claims

[Claims]

1. An apparatus for translating a series of source words in a first language into a series of target words in a second language different from the first language, comprising means for inputting the series of source words and each target. Means for generating at least two target hypotheses, wherein the hypothesis comprises a series of target words selected from a vocabulary of words of a second language, each target word having a context comprising at least one other word of the target hypothesis; For each target hypothesis, a means for generating a linguistic model match score containing an estimate of the probability of occurrence of the set of words in the target hypothesis, and a set of input sequences connecting each source word to at least one target word in the target hypothesis. Means for identifying at least one alignment between a source word and each target hypothesis, and for each source word and each target hypothesis, a target word of the target hypothesis linked to the source word is provided, and Means for generating a word match score, including an estimate of the conditional probability of occurrence of a source word given the context of the target word of the target hypothesis coupled to the word, and for each target hypothesis, the word match of the target hypothesis Including a combination of a score and a source word in the set of source words entered,
A means for generating a translation match score, a means for generating a target hypothesis match score for each target hypothesis, each target hypothesis match score including a combination of the target model hypothesis language model match score and the target hypothesis translation match score; Outputting a target hypothesis having a goal hypothesis match score of.

2. The apparatus according to claim 1, wherein each target hypothesis comprises a set of target words selected from a vocabulary containing words in a second language and a blank word indicating no words. .

3. A means for identifying at least one alignment, two between an input sequence of source words and each target word, each combining each source word with at least one target word of a target hypothesis. The above-mentioned alignment is identified, and the word match score generation mechanism, for each source word, each alignment, and each target hypothesis, the source word when the target word connected to the source word is given and the context of the target word is given. Generates a word match score, which includes an estimate of the conditional occurrence probability of, and for each target hypothesis, the translation match score generation mechanism compares the word match score of the target hypothesis with the source word in the set of source words. 3. The device according to claim 2, characterized in that it produces a translation match score containing the combinations.

4. The input means includes means for converting the input series of source words into a series of converted source words, and the alignment means combines each converted source word with at least one target word of a target hypothesis. Identifying at least one alignment between the series of transformed source words and each target hypothesis, and the word match score generator determines, for each transformed source word and each target hypothesis, the target word of the target hypothesis associated with the transformed word. Generates a word match score, which includes an estimate of the conditional occurrence probability of the converted source word given the context of the target word of the target hypothesis that is given and combined with the converted source word, For each target hypothesis, a translation match score is generated that includes a combination of the target hypothesis word match score and the converted source word, and the output means outputs the best target hypothesis match score. 3. The apparatus according to claim 2, comprising means for synthesizing a series of output words from a target hypothesis having a, and output means for outputting the synthetic output word.

5. The translation match score of the target hypothesis includes the product of the word match score of the target hypothesis and the source word in the series of input source words, and the target hypothesis match score of the target hypothesis is the language of the target hypothesis. 3. The apparatus of claim 2 including a product of the model match score times the translation match score of the target hypothesis.

6. A target word context in a target hypothesis bound to a source word is included in at least one of two or more context classes, a target word bound to the source word is given and a context of the target word. 3. An apparatus according to claim 2, characterized in that the estimated conditional occurrence probability of a source word given is given at least one function whose value depends on the class containing the context of the target word.

7. The context of a target word of a target hypothesis bound to a source word comprises at least one word having a part of speech of the target hypothesis, a target word bound to the source word is given and the context of the target word is Given probability of occurrence of a given source word,
3. Include at least one function having a value that depends on the part-of-speech of the target hypothesis of at least one word in the context of the target word coupled to the source word.
The device according to.

8. A target word context of a target hypothesis bound to a source word comprises at least one word having an identification, a target word bound to the source word is given and a target word context is given. 3. The case where the estimated conditional occurrence probability of a source word comprises at least one function having a value that depends on the identification of at least one word in the context of the target word bound to the source word. The device according to.

9. The apparatus of claim 2, wherein the means for outputting the target hypothesis with the best match score comprises a display.

10. The apparatus of claim 2, wherein the input means comprises a keyboard.

11. The apparatus of claim 2 wherein the input means comprises a computer disk drive.

12. The apparatus of claim 2 wherein the input means comprises a computer tape drive.

13. A method of translating a series of source words in a first language into a series of target words in a second language different from the first language, the method comprising inputting a series of source words, each target hypothesis comprising: Generating at least two target hypotheses comprising a set of target words selected from a vocabulary of words in a second language, each target word having a context containing at least one other word of the target hypotheses; For a hypothesis, generating a language model match score that includes an estimate of the probability of occurrence of a set of words in the target hypothesis, and an input set of source words that joins each source word to at least one target word in the target hypothesis. Identifying at least one alignment between the source hypothesis and each target hypothesis, and for each source word and each target hypothesis, a target word of the target hypothesis coupled to the source word is given. And generating a word match score that includes an estimate of the conditional occurrence probability of the source word given the context of the target word of the target hypothesis coupled to the source word, and for each target hypothesis, Generating a translation match score that includes a combination of the word match score and the source word of the input sequence of source words, and each goal that includes a combination of the target model hypothesis model match score and the target hypothesis translation match score. Generating a target hypothesis match score for the hypothesis, and outputting the target hypothesis with the best target hypothesis match score.

14. The method of claim 13, wherein each target hypothesis comprises a set of target words selected from a vocabulary containing words in a second language and a blank word indicating no words. .

15. The step of identifying at least one alignment includes assigning each source word to at least one of the target hypotheses.
A step of identifying two or more alignments between each target word and an input sequence of source words associated with one target word, the word matching score generating step comprising: each source word and each alignment and each target; For a hypothesis, including a target word coupled to the source word and an estimate of the conditional occurrence probability of the source word given the context of the target word,
Including generating a word match score, the translation match score generation step for each target hypothesis
15. The method of claim 14, comprising: generating a translation match score that includes a combination of the target hypothesis word match score and the source word of the input series of source words.

16. The input step includes the step of converting the input series of source words into a series of converted source words, and the alignment step combines each converted source word with at least one target word of a target hypothesis. , A step of identifying at least one alignment between a series of transformed source words and each target hypothesis, wherein a word match score generation step comprises, for each transformed source word and each target hypothesis, a target hypothesis associated with the transformed word. Generating a word match score, including an estimate of the conditional occurrence probability of the transformed source word given the target words of and the context of the target word of the target hypothesis combined with the transformed source word. , Translation match score generation step, for each target hypothesis,
Generating a translation match score that includes a combination of a target hypothesis word match score and a transformed source word, the output step synthesizing a series of output words from the target hypothesis having the best target hypothesis match score; 15. The method according to claim 14, characterized in that it comprises an output step for outputting a synthetic output word.

17. The translation match score of the target hypothesis includes the product of the word match score of the target hypothesis and the source word in the series of input source words, and the target hypothesis match score of the target hypothesis is the language of the target hypothesis. 15. Method according to claim 14, characterized in that it comprises the product of the model match score times the translation match score of the target hypothesis.

18. A target word context in a target hypothesis associated with a source word is at least one of two or more context classes.
, The estimated conditional occurrence probability of the source word given the target word combined with the source word and the context of the target word is a value that depends on the class including the context of the target word. 15. Method according to claim 14, characterized in that it comprises at least one function having.

19. The context of a target word of a target hypothesis bound to a source word comprises at least one word having a part of speech of the target hypothesis, a target word bound to the source word is given, and the context of the target word is Estimated conditional occurrence probability of the source word, if given, comprises at least one function having a value that depends on the part of speech of the target hypothesis of at least one word in the context of the target word bound to the source word. And
The method according to claim 14.

20. A target word context of a target hypothesis bound to a source word comprises at least one word having an identification, a target word bound to a source word is given and a target word context is given. 15. The case where the estimated conditional occurrence probability of a source word comprises at least one function having a value that depends on the identification of at least one word in the context of the target word bound to the source word.
The method described in.

21. The step of outputting the target hypothesis with the best match score comprises displaying the target hypothesis with the best match score.
The method according to 4.