JP5288371B2

JP5288371B2 - Statistical machine translation system

Info

Publication number: JP5288371B2
Application number: JP2008145533A
Authority: JP
Inventors: アンドリュー・フィンチ; 英一郎隅田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2008-06-03
Filing date: 2008-06-03
Publication date: 2013-09-11
Anticipated expiration: 2028-06-03
Also published as: JP2009294747A

Description

この発明は統計的機械翻訳（ＳｔａｔｉｓｔｉｃａｌＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ：ＳＭＴ）に関し、特に、クラス依存ＳＭＴの改良に関する。 The present invention relates to statistical machine translation (SMT), and more particularly to improvement of class-dependent SMT.

音声認識において、モデルの品質を改善するのに、トピック依存モデリングが有効であることが知られている。最近、機械翻訳分野での実験により（先行技術の非特許文献１、２及び３）、クラスに特定のモデルもまた、翻訳に有用であることが示された。非特許文献１では、トピック依存性は、デコード処理開始前にデータを集合に分け、その後、前処理パスでソース文の全てによって学習を済ませた分類器により、ソース文のクラスを予測し、予測されたクラスに特定の別々のモデルを用いて、これらの集合を独立してデコードすることによって実現される。
ヒロフミヤマモトら、２００７年。統計的機械翻訳のためのバイリンガルクラスタベースモデル。ＥＭＮＬＰ−ＣｏＮＬＬ−２００７（ＡＣＬ２００７に続くコンピュータによる自然言語学習ジョイントミーティング、自然言語処理大会における実験方法カンファレンス）、プラハ、チェコ共和国、第５１４−５２３ページ。 It is known that topic-dependent modeling is effective in improving the quality of a model in speech recognition. Recently, experiments in the field of machine translation (prior art non-patent documents 1, 2 and 3) have shown that class-specific models are also useful for translation. According to Non-Patent Document 1, topic dependency is predicted by classifying a source sentence by a classifier that divides the data into a set before starting the decoding process and then learning by all of the source sentences in the preprocessing pass. This is accomplished by independently decoding these sets using separate models specific to the given class.
Hirofumi Yamamoto et al., 2007. A bilingual cluster-based model for statistical machine translation. EMNLP-CoNLL-2007 (Natural Computer Learning Joint Meeting following ACL 2007, Experimental Method Conference at Natural Language Processing Competition), Prague, Czech Republic, pp. 514-523.

（Hirofumi Yamamoto et al. 2007. Bilingual cluster based models for statistical machine translation. EMNLP-CoNLL-2007 (Conference on Empirical Methods in Natural Language Processing Conference on Computational Natural Language Learning Joint Meeting following ACL 2007), Prague, Czech Republic; pp. 514-523.）
アンドリューフィンチら、２００７年。ＮＩＣＴ／ＡＴＲＩＷＳＬＴ２００７のための音声翻訳システム。ＩＷＳＬＴ２００７年、トレント、イタリア。 (Hirofumi Yamamoto et al. 2007. Bilingual cluster based models for statistical machine translation. EMNLP-CoNLL-2007 (Conference on Empirical Methods in Natural Language Processing Conference on Computational Natural Language Learning Joint Meeting following ACL 2007), Prague, Czech Republic; pp 514-523.)
Andrew Finch et al., 2007. Speech translation system for NICT / ATR IWSLT2007. IWSLT 2007, Trento, Italy.

（Andrew Finch et al. 2007. The NICT/ATR speech translation system for IWSLT 2007. IWSLT 2007, Trento, Italy.）
ジョージフォスター及びローランドクーン、２００７年。ＳＭＴのための混合モデルの適応。統計的機械翻訳に関する第２回ワークショップ予稿集、ＡＣＬ、第１２８−１３５ページ、プラハ、チェコ共和国。 (Andrew Finch et al. 2007. The NICT / ATR speech translation system for IWSLT 2007. IWSLT 2007, Trento, Italy.)
George Foster and Roland Kuhn, 2007. Adaptation of mixed models for SMT. Proceedings of the second workshop on statistical machine translation, ACL, pages 128-135, Prague, Czech Republic.

（George Foster and Roland Kuhn. 2007. Mixture-model adaptation for SMT. In Proceedings of the Second Workshop on Statistical Machine Translation, ACL, pp. 128-135, Prague, Czech Republic.） (George Foster and Roland Kuhn. 2007. Mixture-model adaptation for SMT. In Proceedings of the Second Workshop on Statistical Machine Translation, ACL, pp. 128-135, Prague, Czech Republic.)

トピック依存、又はクラス依存のモデリングは機械翻訳の精度を改善する。しかし、精度は分類器の精度に大きく左右される。もし入力された文が誤ったトピック又はクラスに分類されてしまうと、翻訳の精度は非常に劣化する。 Topic-dependent or class-dependent modeling improves machine translation accuracy. However, the accuracy greatly depends on the accuracy of the classifier. If the input sentence is classified into an incorrect topic or class, the accuracy of translation is greatly deteriorated.

従って、この発明の目的の一つは、特定のクラスの入力文をより安定して頑健に翻訳することのできるＳＭＴ装置を提供することである。 Accordingly, one of the objects of the present invention is to provide an SMT apparatus that can more stably and robustly translate an input sentence of a specific class.

この発明の別の目的は、特定のクラスの入力文をより安定してより高い精度で頑健に翻訳することのできるＳＭＴ装置を提供することである。 Another object of the present invention is to provide an SMT apparatus capable of more stably and robustly translating a specific class of input sentences.

この発明の第１の局面に従った統計的機械翻訳装置は、ソース文のクラスメンバーシップを表す確率のベクトルを決定するための手段を含む。ベクトルの要素は、ソース文の確率が予め定められたクラスの集合の１つに属する確率を表す。装置はさらに、予め定められたクラスの集合のクラスそれぞれについて設けられた、複数個のクラス特定統計的サブデコーダを含む。デコーダはそれぞれのクラスのトレーニングデータのそれぞれの集合によって統計的にトレーニングされる。デコーダの各々はソース文中の単語又は単語シーケンスの各々についてターゲット言語での翻訳単語又は単語シーケンスの確率を出力する。装置はさらに、ターゲット言語の可能な単語シーケンスの確率に従って、前記ソース文の前記ターゲット言語における最も尤度の高い翻訳仮説を推定する手段を含む。ターゲット言語の可能な単語シーケンスの確率は、複数個のサブデコーダによって出力される確率をターゲット言語の単語又は単語シーケンスの各々について、確率ベクトルに従って補間することによって計算される。 The statistical machine translation apparatus according to the first aspect of the present invention includes means for determining a vector of probabilities representing class membership of a source sentence. The vector element represents the probability that the probability of the source sentence belongs to one of a set of predetermined classes. The apparatus further includes a plurality of class specific statistical sub-decoders provided for each class of the predetermined set of classes. The decoder is statistically trained with a respective set of training data for each class. Each of the decoders outputs the probability of the translated word or word sequence in the target language for each word or word sequence in the source sentence. The apparatus further includes means for estimating the most likely translation hypothesis in the target language of the source sentence according to the probability of a possible word sequence in the target language. The probability of possible word sequences in the target language is calculated by interpolating the probabilities output by the plurality of subdecoders for each word or word sequence in the target language according to a probability vector.

クラスメンバーシップを決定する手段は、確率ベクトルを決定する。ベクトルの要素はソース文がそれぞれのクラスに属する確率を表す。複数個の統計的サブデコーダはソース文中の単語又は単語シーケンスの各々についてターゲット言語での翻訳単語又は単語シーケンスの確率を出力する。推定手段は、単語又は単語シーケンスの確率に従って、最も尤度の高い翻訳仮説を推定し、これらはサブデコーダによって出力される確率を補間することによって計算される。 The means for determining class membership determines a probability vector. The vector element represents the probability that the source sentence belongs to each class. The plurality of statistical sub-decoders outputs the probability of the translated word or word sequence in the target language for each word or word sequence in the source sentence. The estimation means estimates the most likely translation hypothesis according to the word or word sequence probabilities, which are calculated by interpolating the probabilities output by the sub-decoder.

好ましくは、複数個のクラスは一般クラスと複数個の特定クラスとを含み、前記複数個の特定クラスは、前記一般クラスを分割したものである。 Preferably, the plurality of classes include a general class and a plurality of specific classes, and the plurality of specific classes are obtained by dividing the general class.

より好ましくは、一般クラスに対応する前記ベクトルの１要素は、０から１の範囲の定数である。 More preferably, one element of the vector corresponding to the general class is a constant in the range of 0 to 1.

さらに好ましくは、装置は前記ベクトルの要素を正規化して、前記要素の和が１となるようにするための正規化手段をさらに含む。 More preferably, the apparatus further includes normalizing means for normalizing the elements of the vector so that the sum of the elements is 1.

前記確率のベクトルを決定するための手段は、最大エントロピモデルに基づいて統計的にトレーニングされ、前記クラスのそれぞれにメンバーシップ確率を割当ててもよい。 The means for determining the probability vector may be statistically trained based on a maximum entropy model and assigning membership probabilities to each of the classes.

好ましくは、前記複数個のクラス特定統計的サブデコーダの各々は、クラス特定言語モデル、クラス特定翻訳モデル、クラス特定長さモデル、若しくはクラス特定ディストーションモデル、又はこれらモデルの任意の組合せに従って確率を計算する。 Preferably, each of the plurality of class specific statistical sub-decoders calculates a probability according to a class specific language model, a class specific translation model, a class specific length model, or a class specific distortion model, or any combination of these models. To do.

この発明のアプローチは、多くの点において先行技術の非特許文献１を一般化したものである。この発明の技術により、デコード処理そのものにおいて多数のモデルの集合を利用することが可能になる。クラス特定モデルの集合の各々の寄与分は、後述するように、補間重みの集合によって、デコードの間に動的に制御される。これらの重みは、文ごとに変更可能である。以前のアプローチでは、本質的に、補間の重みは（ソース文がモデルと同じトピックであることを示す）１であるか、又は（ソース文が異なるトピックであることを示す）０であるか、のどちらかであった。 The approach of the present invention is a generalization of the prior art Non-Patent Document 1 in many respects. The technique of the present invention makes it possible to use a large number of models in the decoding process itself. Each contribution of the set of class specific models is dynamically controlled during decoding by a set of interpolation weights, as described below. These weights can be changed for each sentence. In the previous approach, the interpolation weight is essentially 1 (indicating that the source sentence is the same topic as the model) or 0 (indicating that the source sentence is a different topic), Either.

本発明の利点の一つは、これが柔軟なアプローチである、ということである。すなわち、ソース文は、多数のクラスに様々な程度で属することができる。ここでは、確率分類器を用いて、クラスメンバーシップを表す確率のベクトルを決定した。これらの確率は、補間されたモデルの集合において、それぞれのクラス依存モデルについて、混合重みとして直接使用される。 One advantage of the present invention is that this is a flexible approach. That is, a source sentence can belong to many classes to various degrees. Here, a probability class representing a class membership was determined using a probability classifier. These probabilities are used directly as blend weights for each class dependent model in the set of interpolated models.

この発明のシステムの別の特徴は、これが、クラス特定モデルの集合とともに、全てのデータから構築された一般モデルを含むことである。この結果、正確で安定した翻訳が得られる。 Another feature of the system of the present invention is that it includes a general model constructed from all data along with a set of class specific models. This results in an accurate and stable translation.

この実施の形態のアプローチは、クラス依存のモデルの点で、先行する全てのアプローチと異なる。先行技術の非特許文献１以前には、クラス依存の言語モデルのみが用いられていた。非特許文献１及び３はともに、これを拡張して翻訳モデルを含めている。この発明のアプローチでは、ディストーション及びターゲット長さモデルを含みうるすべてのモデルが、単一のフレームワーク内でＳＭＴシステムに組合されている。 The approach of this embodiment differs from all previous approaches in terms of class dependent models. Prior to the prior art Non-Patent Document 1, only class-dependent language models were used. Both Non-Patent Documents 1 and 3 extend this to include a translation model. In the inventive approach, all models that can include distortion and target length models are combined into an SMT system within a single framework.

バイリンガルコーパスは、文の対の集合体である。各対は、第１の言語の文と第２の言語の文とを含む。各文は他方の翻訳である。バイリンガルコーパス中の文は単語又は音素にセグメント化され、品詞ラベルを付されている。 A bilingual corpus is a collection of sentence pairs. Each pair includes a first language sentence and a second language sentence. Each sentence is a translation of the other. Sentences in a bilingual corpus are segmented into words or phonemes and labeled with part-of-speech labels.

言語モデル（ＬＭ）は、Ｎ−１個の他の単語がその前に出現するという条件での、単語の出現確率を与える。Ｎ−グラムＬＭは、バイリンガルコーパスのトレーニング集合のターゲット部分から得られる統計により、構築（トレーニング）される。 The language model (LM) gives a word appearance probability on condition that N−1 other words appear before it. The N-gram LM is constructed (trained) with statistics obtained from the target portion of the training set of the bilingual corpus.

翻訳モデル（ＴＭ）は、第１の言語の単語が第２の言語の別の単語にされる確率を与える。この実施の形態では、ＴＭはトレーニング集合から統計的に得られる。 The translation model (TM) gives the probability that a word in the first language is made another word in the second language. In this embodiment, TM is obtained statistically from the training set.

長さモデル（ＬｅＭ）は平均に対して翻訳（ターゲット）中の単語が１つ付加されるたびにペナルティを与える。長さモデルはトレーニング集合中の文の対のターゲット部分から得られる。 The length model (LeM) gives a penalty each time a word in the translation (target) is added to the average. The length model is obtained from the target portion of the sentence pair in the training set.

ディストーションモデル（ＤＭ）はターゲット言語において２つの隣り合った句に対応付けられた、２つのソース言語の句の相対的距離に対するペナルティを与える。ＤＭはトレーニング集合から統計的に得られる。 The distortion model (DM) provides a penalty for the relative distance between two source language phrases associated with two adjacent phrases in the target language. DM is obtained statistically from the training set.

１．始めに
この実施の形態は、多数のＳＭＴシステムを重み付けして組合せ、システム中の全てのモデルについて、トピック依存モデル間の確率的に柔軟な重みづけを可能にする。この実施の形態はこの技術を応用したもので、疑問文及び叙述文のためのクラスベースのモデルを構築し組合せることによって、対話システムの品質を改善する。 1. Introduction This embodiment weights and combines multiple SMT systems, allowing for probabilistic and flexible weighting between topic-dependent models for all models in the system. This embodiment is an application of this technology and improves the quality of the dialogue system by building and combining class-based models for question sentences and narrative sentences.

この実施の形態のＳＴＭシステムは、全てのモデルのクラス依存の形式がデコード処理に直接統合される点で、先行するクラス依存の翻訳方法と異なる。この実施の形態のシステムは、モデルの間の確率的な混合重みを用いるが、この重みはソースセグメントの特性に依存してセグメントごとにダイナミックに変更可能である。 The STM system of this embodiment differs from the previous class-dependent translation method in that the class-dependent format of all models is directly integrated into the decoding process. The system of this embodiment uses a probabilistic blend weight between models, but this weight can be dynamically changed from segment to segment depending on the characteristics of the source segment.

この実施の形態のシステムはクラス依存のモデルを用いた質問及び叙述文の翻訳に関する。これを達成するために、このシステムは対話文の２つのクラス、すなわち質問と叙述、の一つに当てはまる文に対処するために特別に構築された２つのモデルの集合を、一般のクラスを扱うために構築された第３の集合と統合する。 The system of this embodiment relates to the translation of questions and narratives using a class dependent model. To accomplish this, the system handles the general class of two models of conversational sentences, a set of two models specially constructed to deal with sentences that fit into one of the questions and descriptions. Integrate with a third set built for

この実施の形態の目的のために、疑問文と、それ以外とを区別したい。表現を簡潔にするために、以下の明細書中では、疑問文を「質問」とし、それ以外を「叙述」と呼ぶことにする。トレーニングに用いられるバイリンガルコーパス中の文には各々、「質問」又は「叙述」のラベルが付されているものとする。 For the purposes of this embodiment, we want to distinguish question sentences from others. In order to simplify the expression, in the following specification, the question sentence is referred to as “question”, and the rest is referred to as “description”. Assume that each sentence in the bilingual corpus used for training is labeled “question” or “description”.

２．システムの概観
２．１システムアーキテクチャ
後述する図１は、このシステムの全体構造を示す。データはクラスに分けられ、さらに各クラスについて、トレーニングセットと開発セットとに細分される。３個の完全なＳＭＴシステムが構築される。各クラスのための１つと、両方のクラスからのデータについての１つとである。確率分類器（次の項で述べる）もまた、トレーニングデータの完全なセットからトレーニングされる。 2. System Overview 2.1 System Architecture FIG. 1 described below shows the overall structure of this system. The data is divided into classes, and further divided into a training set and a development set for each class. Three complete SMT systems are built. One for each class and one for data from both classes. A probability classifier (described in the next section) is also trained from the complete set of training data.

用いられる機械翻訳デコーダは、デコードされるべき各ソース単語シーケンスについて与えられる補間重みのベクトルに従って、全てのサブシステムからの全てのモデルを線形補間可能である。こうするために、検索に先立って、デコーダはまず、各サブシステムからの句（フレーズ）テーブルをマージしなければならない。全ての句テーブルの句の全てが、デコードの間に用いられる。１つのサブシステムのテーブルで発現するが他のサブシステムのテーブルでは発現しない句も用いられるが、トレーニング中にこの句を獲得しなかったサブシステムによるサポートはない（ゼロ確率）。探索処理は、典型的な多段句ベースデコーダにおけるのと同様に行われる。 The machine translation decoder used can linearly interpolate all models from all subsystems according to a vector of interpolation weights given for each source word sequence to be decoded. In order to do this, prior to the search, the decoder must first merge the phrase tables from each subsystem. All of the phrases in all phrase tables are used during decoding. Phrases that appear in the table of one subsystem but not in the tables of other subsystems are also used, but there is no support by the subsystem that did not acquire this phrase during training (zero probability). The search process is performed in the same way as in a typical multistage phrase-based decoder.

一般モデルのための重みは、このパラメータを、一般開発セットに対してＢＬＥＵスコアが最大になるように調整することによって設定される。この重みは、一般モデルに割当てられるべき確率の大きさを決定し、全ての文のデコードの間、固定されたままである。確率の大きさの残りの部分は実行時に、各文について動的に、クラス特定モデルの間で分割される。各クラスに割当てられる割合は、単に、分類器によって割当られたソース文のクラスメンバーシップ確率である。 The weight for the general model is set by adjusting this parameter to maximize the BLEU score for the general development set. This weight determines the amount of probability to be assigned to the general model and remains fixed during the decoding of all sentences. The remaining portion of the magnitude of probability is divided among the class specific models dynamically at run time for each sentence. The percentage assigned to each class is simply the class membership probability of the source sentence assigned by the classifier.

３．質問予測
３．１問題の概要
ある特定のクラス（この実施の形態では、疑問又は叙述）のソース文が与えられる場合、生成されるターゲット文が確実に適切なクラスであることが望まれる。これは必ずしも、ソースで質問が与えられるとターゲットで質問が生成されなければならない、という意味ではない。しかし、少なくとも直観的には、ソースの質問からはターゲットの質問が、ソースの叙述からはターゲットの叙述が生成できるはずだと仮定するのが合理的であろう。これが合理的なのは、機械翻訳エンジンの役割が、ソースから可能な全ての翻訳を生成することではなく、１つの受容可能な翻訳を生成できるようにすることだからである。この仮定から、進むべきもっともふさわしい方策が２つ導かれる。 3. Question Prediction 3.1 Overview of the Problem Given a certain class of source sentences (questions or narrations in this embodiment), it is desirable to ensure that the generated target sentence is an appropriate class. This does not necessarily mean that if a question is given at the source, the question must be generated at the target. However, at least intuitively, it would be reasonable to assume that the target question should be able to be generated from the source question and the target description from the source description. This is reasonable because the machine translation engine's role is to generate one acceptable translation rather than all possible translations from the source. This assumption leads to the two most appropriate strategies to proceed.

１．ソース文のクラスを予測し、これを用いてターゲットを生成するのに用いられるデコード処理を制約すること。 1. Predict the source sentence class and use it to constrain the decoding process used to generate the target.

２．ターゲットのクラスを予測すること。 2. Predict the target class.

後述する実験では、最も正確であると思われたため、第２の方法を選択したが、いずれの戦略にも相応の利点があると思われる。 In the experiments described below, the second method was chosen because it appeared to be the most accurate, but it appears that both strategies have reasonable advantages.

３．２最大エントロピ分類器
この実施の形態では、最大エントロピ（ＭａｘｉｍｕｍＥｎｔｒｏｐｙ：ＭＥ）分類器を用い、語彙的特徴量の集合を用いて入力ソース文が属するクラスを決定する。すなわち、分類器を用いて、クラス特定モデルの混合重みを設定する。最近は、この様な分類器が、さまざまな自然言語処理課題において多数の語彙的特徴量を利用して有力なモデルを生成している。例えば、ロナルドローゼンフェルド、１９９６を参照（ロナルドローゼンフェルド、１９９６年。適応的統計的言語モデル化への最大エントロピアプローチ。コンピュータ音声及び言語。１０：１８７−２２８）（Ronald Rosenfeld. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language. 10:187-228）ＭＥモデルは以下の形の指数モデルである。 3.2 Maximum Entropy Classifier In this embodiment, a maximum entropy (ME) classifier is used, and a class to which an input source sentence belongs is determined using a set of lexical feature quantities. That is, the classifier is used to set the mixing weight of the class specific model. Recently, such classifiers have generated powerful models using a large number of lexical features in various natural language processing tasks. See, for example, Ronald Rosenfeld, 1996 (Ronald Rosenfeld. 1996. Maximum entropy approach to adaptive statistical language modeling. Computer speech and language. 10: 187-228) (Ronald Rosenfeld. 1996. A maximum entropy. Computer Speech and Language. 10: 187-228) The ME model is an exponential model of the following form.

ここで、
ｔは予測されるクラス、
ｃはｔの文脈、
γは正規化係数、
Ｋはモデル中の特徴量の数、
α_ｋは特徴量ｆ_ｋの重み、
ｆ_ｋは二次特徴量関数、
ｐ_０はデフォルトモデルであり、
これらはソース文中の、文のクラスを予測するための特徴量である。

here,
t is the predicted class,
c is the context of t,
γ is a normalization factor,
K is the number of features in the model,
α _k is the weight of the feature quantity f _k ,
f _k is a secondary feature function,
p ₀ is the default model,
These are feature quantities for predicting a sentence class in a source sentence.

さらに、文中で出現するものを、文頭及び文末で出現するｎグラムと区別するために、単語シーケンス中に文頭トークン（＜ｓ＞）と文末トークンとを導入した。これは、「質問語」又は文が質問であることを示す単語が、（たとえば、英語のｗｈ−＜ｗｈａｔ，ｗｈｅｒｅ，ｗｈｅｎ＞、マレー語の−ｋａｈ語−＜ａｐａｋａｈ，ｄｉｍａｎａｋａｈ，ｋａｐａｎｋａｈ＞のように）文頭にしばしば見出されるか、（日本語の＜ｋａ＞又は中国語の＜ｍａ＞のように）文末にしばしば見出される、という観察に基づくものである。 Furthermore, in order to distinguish what appears in the sentence from n-grams appearing at the beginning and end of the sentence, we introduced a beginning token (<s>) and a sentence end token in the word sequence. This is because a word indicating that the “question word” or sentence is a question is (for example, wh- <what, where, when> in English, -kah word in Malay- <apakah, dimanakah, kapankah>) ) Based on the observation that it is often found at the beginning of sentences or is often found at the end of sentences (like <ka> in Japanese or <ma> in Chinese).

このｎグラム抽出を採用したのは、誤りの分析から、“ｅｘｃｕｓｅｍｅｐｌｅａｓｅｗｈｅｒｅｉｓ…”等の文を扱うには、文の内側からのｎグラムが必要であることが示されたためである。簡単な例文とその文から生成された特徴量の集合を図１１に示し、詳細は後述する。 The reason why this n-gram extraction is adopted is that an error analysis indicates that n-grams from the inside of the sentence are necessary to handle sentences such as “exclude me please where is ...”. A simple example sentence and a set of feature values generated from the sentence are shown in FIG. 11 and will be described in detail later.

この発明のＭＥモデルを実現するために、ＬｅＺｈａｎｇのＭＥモデリングツールキットを用いた。（ＬｅＺｈａｎｇ。２００４年。Ｐｙｔｈｏｎ及びＣ＋＋用最大エントロピモデリングツールキット）（Le Zhang. 2004. Maximum Entropy Modeling Toolkit for Python and C++, [http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html]）。これらのモデルは、Ｌ―ＢＦＧＳパラメータ推定によってトレーニングされ、トレーニングの間、平滑化のためにガウス事前分布を用いた。「Ｌ−ＢＦＧＳ」は非線形最適化問題を解決するための周知のソフトウェアパッケージである。 In order to realize the ME model of the present invention, Le Zhang's ME modeling toolkit was used. (LeZhang. 2004. Maximum entropy modeling toolkit for Python and C ++) (Le Zhang. 2004. Maximum Entropy Modeling Toolkit for Python and C ++, [http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit. html]). These models were trained by L-BFGS parameter estimation and used a Gaussian prior for smoothing during training. “L-BFGS” is a well-known software package for solving nonlinear optimization problems.

デコーダからのｎベスト出力をとり、ソース及びターゲット分類器に従ったクラスが一致するリストの中で最も高い翻訳仮説を選択する。 Take the n best output from the decoder and select the highest translation hypothesis in the list of matching classes according to the source and target classifiers.

４．システム構成
図１はこの実施の形態のＳＭＴシステム３０の全体構造を示す。図１を参照して、ＳＭＴシステム３０は、クラス依存ＳＭＴモデル、ソース文を分類するために用いられる分類器モデル、及びＳＭＴデコーダ内で用いられる句テーブルをトレーニングするためのトレーニングモジュール４４を含む。トレーニングセット４２はトレーニングデータとして用いられる。トレーニングモジュール４４はさらに、一般ＳＭＴモデルに割当てられる重みＷ１を推定する。重みは、開発セット４０に基づいて推定される。バイリンガルコーパスはクラスに分けられ、さらに、各クラスについてトレーニングセットと開発セットとに細分される。 4). System Configuration FIG. 1 shows the overall structure of the SMT system 30 of this embodiment. Referring to FIG. 1, the SMT system 30 includes a training module 44 for training a class dependent SMT model, a classifier model used to classify source sentences, and a phrase table used in an SMT decoder. The training set 42 is used as training data. The training module 44 further estimates a weight W1 assigned to the general SMT model. The weight is estimated based on the development set 40. Bilingual corpora are divided into classes, and each class is further subdivided into a training set and a development set.

ＳＭＴシステム３０はさらに、ソース言語の入力文４８をターゲット言語の翻訳５０に翻訳するための統計的機械翻訳（ＳｔａｔｉｓｔｉｃａｌＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ：ＳＭＴ）装置４６を含む。ＳＭＴ装置４６はトレーニングモジュール４４によってトレーニングされたモデルと、トレーニングモジュール４４によって推定された重みＷ１とに基づいて、統計的に翻訳を行う。 The SMT system 30 further includes a statistical machine translation (SMT) device 46 for translating the source language input sentence 48 into a target language translation 50. The SMT device 46 statistically translates based on the model trained by the training module 44 and the weight W1 estimated by the training module 44.

トレーニングモジュール４４は、入力文の特徴量の組が与えられると、その文が質問である確率を分類器モデル１１０に基づいて計算するように、分類器モデル１００をトレーニングするための分類器トレーニングモジュール７２と、クラス依存ＳＭＴモデル１１２の３つの集合、すなわち一般、質問に特定、叙述に特定のモデルをトレーニングするためのＳＭＴトレーニングモジュール７４と、バイリンガルコーパスのトレーニングセット４２から抽出された句テーブル１１４を生成するための、句テーブル生成モジュール７６と、開発セット４０に基づいて、一般ＳＭＴモデルの一般集合に割当られた重みＷ１を推定するための重み推定モジュール７０とを含む。 A training module 44 is a classifier training module for training the classifier model 100 so that, given a set of feature quantities of an input sentence, the probability that the sentence is a question is calculated based on the classifier model 110. 72, three sets of class-dependent SMT models 112, namely, an SMT training module 74 for training models specific to general, question specific and narrative, and a phrase table 114 extracted from the training set 42 of a bilingual corpus. A phrase table generation module 76 for generation and a weight estimation module 70 for estimating the weight W1 assigned to the general set of general SMT models based on the development set 40 are included.

ＳＭＴ装置４６は、分類器モデル１１０、クラス依存ＳＭＴモデル１１２の３つの集合、句テーブル１１４及び重み推定モジュール７０によって推定された重み１１６（Ｗ１）を記憶するための記憶部９０を含む。 The SMT device 46 includes a storage unit 90 for storing the classifier model 110, the three sets of the class-dependent SMT model 112, the phrase table 114, and the weight 116 (W1) estimated by the weight estimation module 70.

ＳＭＴ装置４６はさらに、入力文４８が質問文である確率Ｐ_Ｑを推定する分類器９２と、翻訳処理の間に一般ＳＭＴモデル、質問に特定のＳＭＴモデル及び叙述に特定のＳＭＴモデルに基づいて、重みＷ１、Ｗ２及びＷ３の和が１になるように計算される、確率に割当てられる重みＷ１、Ｗ２及びＷ３を正規化する正規化モジュール９４と、ソース言語の入力文４８を、統計的機械翻訳方法を利用してターゲット言語の翻訳５０に翻訳するためのＳＭＴモジュール９６とを含む。ＳＭＴモジュール９６は、一般集合から由来する確率に代えて、ＳＭＴモデル１１２の３つの集合からくる確率の重みづけ合計で仮説の確率を計算する点を除き、通常のＳＭＴモジュールである。 The SMT device 46 is further based on a classifier 92 that estimates the probability P _Q that the input sentence 48 is a question sentence, a general SMT model during the translation process, a SMT model specific to the question, and a SMT model specific to the description. A normalization module 94 for normalizing the weights W1, W2 and W3 assigned to the probabilities, calculated so that the sum of the weights W1, W2 and W3 is 1, and a source language input sentence 48, a statistical machine And an SMT module 96 for translating into the target language translation 50 using the translation method. The SMT module 96 is a normal SMT module except that the hypothesis probability is calculated by a weighted sum of the probabilities from the three sets of the SMT model 112 instead of the probabilities derived from the general set.

図２は図１のＳＭＴトレーニングモジュール７４とクラス依存ＳＭＴモデル１１２の３つの集合とを示す詳細なブロック図である。 FIG. 2 is a detailed block diagram illustrating the SMT training module 74 and the three sets of class dependent SMT models 112 of FIG.

図２を参照して、クラス依存ＳＭＴモデル１１２の３つの集合は、一般ＳＭＴモデルの集合１６０、質問に特定のＳＭＴモデルの集合１６２、及び叙述に特定のＳＭＴモデルの集合１６４を含む。 Referring to FIG. 2, the three sets of class-dependent SMT models 112 include a set 160 of general SMT models, a set 162 of SMT models specific to the question, and a set 164 of specific SMT models to the description.

一般ＳＭＴモデル１６０は、言語モデル１８０、翻訳モデル１８２、長さモデル１８４、及びディストーションモデル１８６を含む。 The general SMT model 160 includes a language model 180, a translation model 182, a length model 184, and a distortion model 186.

言語モデル（ＬＭ）はＮ−１個の他の単語が直前に出現しているという条件での、単語の出現の確率を与える。Ｎ−グラムＬＭはバイリンガルコーパスのトレーニングセット４２のターゲット部から得られる統計から構築（トレーニング）される。 The language model (LM) gives the probability of the appearance of a word under the condition that N-1 other words appear immediately before. The N-gram LM is constructed (trained) from statistics obtained from the target portion of the training set 42 of the bilingual corpus.

翻訳モデル（ＴＭ）は第１の言語の単語が、第２の言語の単語に翻訳される確率を与える。この実施の形態では、ＴＭ１８２はバイリンガルコーパスのトレーニングセット４２から得られる。 The translation model (TM) gives the probability that a word in the first language is translated into a word in the second language. In this embodiment, TM 182 is obtained from a training set 42 of a bilingual corpus.

長さモデル（ＬｅＭ）は平均に対して翻訳（ターゲット）中の単語が１つ増えるたびにペナルティを与える。長さモデル１８４はバイリンガルコーパスのトレーニングセット４２の文の対のうちターゲット部から得られる。 The length model (LeM) gives a penalty for each additional word in the translation (target) relative to the average. The length model 184 is obtained from the target portion of the sentence pair of the bilingual corpus training set 42.

ディストーションモデル（ＤＭ）は、２つの近接するターゲット言語の句に対応付けされる２つのソース言語の句の相対的距離に対してペナルティを与える。ＤＭ１８６はバイリンガルコーパスのトレーニングセット４２から統計的に得られる。 The distortion model (DM) penalizes the relative distance between two source language phrases associated with two adjacent target language phrases. The DM 186 is statistically obtained from the training set 42 of the bilingual corpus.

同様に、質問に特定のＳＭＴモデル１６２の集合はＬＭ２００、ＴＭ２０２、ＬｅＭ２０４、及びＤＭ２０６を含み、叙述に特定のＳＭＴモデル１６４の集合はＬＭ２２０、ＴＭ２２２、ＬｅＭ２２４、及びＤＭ２２６を含む。 Similarly, the set of SMT models 162 specific to the question includes LM200, TM202, LeM204, and DM206, and the set of SMT models 164 specific to the description includes LM220, TM222, LeM224, and DM226.

ＳＭＴトレーニングモジュール７４は、トレーニングセット４２の全体に基づいて、一般ＳＭＴモデル１６０の集合をトレーニングするための一般ＳＭＴトレーニングモジュール１３０と、トレーニングセット４２から文の対であってターゲット側に質問を含むものを抽出する、質問抽出モジュール１３２と、質問抽出モジュール１３２によって抽出された文の対に基づいて、質問に特定のＳＭＴモデル１６２をトレーニングするための質問特定ＳＭＴトレーニングモジュール１３４と、トレーニングセット４２から文の対であってターゲット側に叙述を含むものを抽出する叙述抽出モジュール１３６と、叙述抽出モジュール１３６によって抽出された文の対に基づいて、叙述に特定のＳＭＴモデル１６４をトレーニングするための、叙述特定ＳＭＴトレーニングモジュール１３８とを含む。 The SMT training module 74 includes a general SMT training module 130 for training a set of general SMT models 160 based on the entire training set 42 and a sentence pair from the training set 42 that includes a question on the target side. A question extraction module 132, a question specific SMT training module 134 for training a SMT model 162 specific to the question based on the sentence pair extracted by the question extraction module 132, and a sentence from the training set 42. A narrative extraction module 136 for extracting a pair including a narrative on the target side, and a narrative for training the SMT model 164 specific to the narrative based on the sentence pair extracted by the narrative extraction module 136 Special And a SMT training module 138.

図３は、図１に示す句テーブル生成モジュール７６のブロック図である。図１を参照して、句テーブル生成モジュール７６は、バイリンガルコーパスのトレーニングセット４２の対の各々のソース文とターゲット文とを対応付ける自動アライメントモジュール２４０と、自動アライメントモジュール２４０によって対応付けされたソース文とターゲット文とを特定しその句を抽出する句抽出モジュール２４２と、を含む。 FIG. 3 is a block diagram of the phrase table generation module 76 shown in FIG. Referring to FIG. 1, the phrase table generation module 76 includes an automatic alignment module 240 that associates a source sentence and a target sentence of each pair of a bilingual corpus training set 42, and a source sentence associated by the automatic alignment module 240. And a phrase extraction module 242 that identifies the target sentence and extracts the phrase.

自動アライメントモジュール２４０は、ソース文の各単語をターゲット文の対応の単語と対応付ける。句抽出モジュール２４２はソース文中の特定の単語シーケンスであってターゲット文中の連続した単語と対応付けされたものを句の対として抽出し、これらを一般句テーブル２４４に記憶する。 The automatic alignment module 240 associates each word in the source sentence with a corresponding word in the target sentence. The phrase extraction module 242 extracts a specific word sequence in the source sentence, which is associated with consecutive words in the target sentence, as a pair of phrases, and stores them in the general phrase table 244.

同様に、句テーブル生成モジュール７６はさらに、質問特定句テーブル２５４を生成するための、自動アライメントモジュール２５０及び句抽出モジュール２５２と、叙述特定句テーブル２６４を生成するための自動アライメントモジュール２６０及び句抽出モジュール２６２とを含む。 Similarly, the phrase table generation module 76 further includes an automatic alignment module 250 and phrase extraction module 252 for generating the question specific phrase table 254 and an automatic alignment module 260 and phrase extraction for generating the narrative specific phrase table 264. Module 262.

句テーブル生成モジュール７６はさらに、一般句テーブル２４４、質問特定句テーブル２５４及び叙述特定句テーブル２６４をマージするためのテーブルマージモジュール２７０を含む。句テーブル１１４を生成するにあたって、１つのサブシステムのテーブルで出現するが別のサブシステムのテーブルには出現しない句も用いられるが、トレーニング中にこの句を獲得しないサブシステムからのサポートはない（ゼロ確率）。 The phrase table generation module 76 further includes a table merge module 270 for merging the general phrase table 244, the question specific phrase table 254, and the narrative specific phrase table 264. In generating the phrase table 114, phrases that appear in the table of one subsystem but not in the table of another subsystem are also used, but there is no support from subsystems that do not acquire this phrase during training ( Zero probability).

図４は図１に示した、分類器トレーニングモジュール７２の詳細なブロック図であり、これは入力文の特徴量の予め定められた組を受け、ＭＥモデルに基づいてその文が質問である確率を出力する、質問特定分類器９２のためのＭＥ（最大エントロピ）モデルをトレーニングするためのものである。 FIG. 4 is a detailed block diagram of the classifier training module 72 shown in FIG. 1, which receives a predetermined set of input sentence features and the probability that the sentence is a question based on the ME model. For training the ME (maximum entropy) model for the question specific classifier 92.

図４を参照して、分類器トレーニングモジュール７２は、バイリンガルコーパスのトレーニングセット４２のソース文の各々から特徴量の予め定められた組を抽出する特徴量抽出モジュール２９０と、特徴量の組と、ソース文のラベル（質問／叙述）とを記憶する記憶部２９２と、確率分類モデル１１０を計算するための最大エントロピモデリングモジュール２９４とを含む。最大エントロピモデリングモジュール２９４は最大エントロピツールキットで実現される。このようなツールキットのいくつかがインターネット上で入手可能である。 Referring to FIG. 4, the classifier training module 72 includes a feature quantity extraction module 290 that extracts a predetermined set of feature quantities from each of the source sentences of the training set 42 of the bilingual corpus, a set of feature quantities, A storage unit 292 that stores the label (question / description) of the source sentence and a maximum entropy modeling module 294 for calculating the probability classification model 110 are included. Maximum entropy modeling module 294 is implemented with a maximum entropy toolkit. Several such toolkits are available on the Internet.

図５は図１に示す重み推定モジュール７０のブロック図である。図５を参照して、重み推定モジュール７０はバイリンガルコーパスの開発セット４０とＳＭＴ装置４６とを利用して、翻訳セット３１０について計算された平均ＢＬＥＵスコアが最も高くなるように、一般ＳＭＴの重みＷ１を最適化する。 FIG. 5 is a block diagram of the weight estimation module 70 shown in FIG. Referring to FIG. 5, the weight estimation module 70 uses the bilingual corpus development set 40 and the SMT device 46 so that the average BLEU score calculated for the translation set 310 is the highest, so that the general SMT weight W1. To optimize.

重み推定モジュール７０は翻訳セット３１０内の全ての翻訳のＢＬＥＵスコアを評価するＢＬＥＵ評価器３２０を含む。翻訳セット３１０は、開発セット４０内の全てのソース文の、ＳＭＴ装置４６によるターゲット言語への翻訳を含む。ＢＬＥＵ評価器３２０は開発セット４０内の文の対のうちターゲット部分を、基準翻訳として使用する。 The weight estimation module 70 includes a BLEU evaluator 320 that evaluates the BLEU score of all translations in the translation set 310. The translation set 310 includes the translation of all source sentences in the development set 40 into the target language by the SMT device 46. The BLEU evaluator 320 uses the target portion of the sentence pairs in the development set 40 as a reference translation.

重み推定モジュール７０はさらに、ＢＬＥＵ評価器３２０によって評価された翻訳のＢＬＥＵスコアを記憶するための記憶部３２２と、翻訳と評価との繰返しにより一般ＳＭＴ確率についての重み３２６（Ｗ１）を最適化するための重み最適化モジュール３２４とを含む。後述するように、重みＷ１の最適化に先立って、分類器モデル１１０及びクラス特定ＳＭＴモデル１１２と句テーブル１１４との３つの組が生成される。従って、重みＷ１の最適化は、各々が０から１までの範囲の重みの組について全てのソース文を繰返し翻訳し、最も高いＢＬＥＵスコアが得られる値を見出すことによって可能となる。 The weight estimation module 70 further optimizes the storage unit 322 for storing the BLEU score of the translation evaluated by the BLEU evaluator 320 and the weight 326 (W1) for the general SMT probability by repetition of the translation and the evaluation. A weight optimization module 324. As will be described later, prior to optimization of the weight W1, three sets of a classifier model 110, a class specific SMT model 112, and a phrase table 114 are generated. Thus, optimization of the weight W1 is possible by repeatedly translating all source sentences for each set of weights ranging from 0 to 1 and finding the value that yields the highest BLEU score.

図６は図１に示すＳＭＴモジュール９６のブロック図である。図６を参照して、ＳＭＴモジュール９６は、入力文４８を受け、一般ＳＭＴモデル１６０の集合に基づいて、ＬｅＭ及びＤＭペナルティとともにそのＳＭＴ（ＳＭ及びＴＭ）確率を出力する一般ＳＭＴサブシステム３４０と、ターゲット言語からの確率及びペナルティの各々を図１の正規化モジュール９４からの重みＷ１で乗算する重みづけモジュール３５０と、入力文４８を受け、質問特定ＳＭＴモデル１６２に基づいて、ＬｅＭ及びＤＭペナルティとともにそのＳＭＴ確率を出力する質問特定ＳＭＴサブシステム３４２と、質問特定ＳＭＴサブシステム３４２からの確率及びペナルティの各々を乗算し、入力文４８を受け叙述特定ＳＭＴモデルに基づいてＬｅＭ及びＤＭペナルティとともにそのＳＭＴ確率を出力するための重みづけモジュール３５２と、入力文４８を受け、叙述特定ＳＭＴモデル１６４に基づいて、ＬｅＭ及びＤＭペナルティとともにそのＳＭＴ確率を出力する叙述特定ＳＭＴサブシステム３４４と、ＬＭ及びＴＭの値の各々を質問特定及び叙述とともに乗算する重みづけモジュール３５４と、を含む。 FIG. 6 is a block diagram of the SMT module 96 shown in FIG. Referring to FIG. 6, the SMT module 96 receives an input sentence 48 and, based on the set of general SMT models 160, outputs a general SMT subsystem 340 that outputs its SMT (SM and TM) probabilities along with LeM and DM penalties. , A weighting module 350 that multiplies each of the probabilities and penalties from the target language by the weight W1 from the normalization module 94 of FIG. 1, and receives the input sentence 48 and, based on the query specific SMT model 162, LeM and DM penalties And the query specific SMT subsystem 342 that outputs the SMT probability and the probability and penalty from the query specific SMT subsystem 342, respectively, and receives the input sentence 48 and includes the LeM and DM penalties based on the description specific SMT model. Weighting mod to output SMT probability 352 and the input sentence 48, and based on the description specific SMT model 164, the description specific SMT subsystem 344 that outputs the SMT probability together with the LeM and DM penalties, and specifies the values of the LM and TM as questions. A weighting module 354 that multiplies with the description.

ＳＭＴモジュール９６はさらに、重みづけられたＬＭ、ＴＭ、ＬｅＭペナルティとＤＭペナルティとを合計する合計モジュール３６０と、ＬＭ及びＴＭ確率とＬｅＭ及びＤＭペナルティとの合計を受け、句テーブル１１４を利用して、入力文４８の翻訳のｎベスト仮説を探索する多段フレーズベースデコーダ３６２とを含む。 The SMT module 96 further receives a sum module 360 that sums the weighted LM, TM, LeM and DM penalties, and the sum of the LM and TM probabilities and the LeM and DM penalties, and uses the phrase table 114. A multi-stage phrase-based decoder 362 that searches for the n-best hypothesis of the translation of the input sentence 48.

図７は重みづけモジュール３５２の簡略化したブロック図である。図７を参照して、重みづけモジュール３５２は、質問特定ＳＭＴサブシステム３４２からのＬＭ確率を重みＷ２で乗算する乗算器４００と、質問特定ＳＭＴサブシステム３４２からのＴＭ確率を重みＷ２で乗算する乗算器４０２と、質問特定ＳＭＴサブシステム３４２からのＬｅＭペナルティを重みＷ２で乗算する乗算器４０４と、質問特定ＳＭＴサブシステム３４２からのＤＭペナルティを重みＷ２で乗算する乗算器４０６と、を含む。 FIG. 7 is a simplified block diagram of the weighting module 352. Referring to FIG. 7, weighting module 352 multiplies LM probability from question specific SMT subsystem 342 by weight W2, and multiplies TM probability from question specific SMT subsystem 342 by weight W2. A multiplier 402, a multiplier 404 that multiplies the LeM penalty from the query specific SMT subsystem 342 by a weight W2, and a multiplier 406 that multiplies the DM penalty from the query specific SMT subsystem 342 by a weight W2.

図示しないが、重みづけモジュール３５０及び３５４は重みづけモジュール３５２と同様の構造を有する。しかしながら、重みづけモジュール３５０及び３５４の重みはそれぞれＷ１とＷ３とである。重みづけモジュール３５０、３５２及び３５４の出力は合計モジュール３６０に与えられる。 Although not shown, the weighting modules 350 and 354 have the same structure as the weighting module 352. However, the weights of the weighting modules 350 and 354 are W1 and W3, respectively. The outputs of the weighting modules 350, 352 and 354 are provided to the summing module 360.

図８は図６に示す合計モジュール３６０のブロック図である。図６を参照して、合計モジュール３６０は、重みづけモジュール３５０、３５２及び３５４から出力されるＬＭ確率、ＴＭ確率、ＬｅＭペナルティ及びＤＭペナルティをそれぞれ計算するための４つの合計回路４２０、４２２、４２４及び４２６を含む。合計回路４２０、４２２、４２４及び４２６の出力はデコーダ３６２の入力に与えられ、これは、これらの値に基づいて翻訳の最も確率の高い仮説を探索する。 FIG. 8 is a block diagram of the summing module 360 shown in FIG. Referring to FIG. 6, summing module 360 includes four summing circuits 420, 422, 424 for calculating LM probabilities, TM probabilities, LeM penalties, and DM penalties output from weighting modules 350, 352, and 354, respectively. And 426. The outputs of summing circuits 420, 422, 424 and 426 are provided to the input of decoder 362, which searches for the most probable hypothesis for translation based on these values.

図９は、分類器９２によって推定された確率Ｐ_Ｑに基づいて、クラスメンバーシップを表す重みベクトルの要素である重みＷ１、Ｗ２及びＷ３の合計が１となるように、重みＷ２及びＷ３を正規化するための正規化モジュール９４のブロック図である。重みＷ１は、一旦重み推定モジュール７０によって最適化されると、固定されたままである。従って、正規化モジュール９４はＷ２とＷ３との合計が１−Ｗ１となるように、Ｗ２及びＷ３に対するＰ_Ｑと１−Ｐ_Ｑとを正規化する。 FIG. 9 shows that the weights W2 and W3 are normalized based on the probability P _Q estimated by the classifier 92 so that the sum of the weights W1, W2, and W3, which are elements of the weight vector representing class membership, becomes 1. It is a block diagram of the normalization module 94 for The weight W1 remains fixed once optimized by the weight estimation module 70. Therefore, normalization module 94 is such that the sum of the W2 and W3 is 1-W1, normalizing and _{P Q} and _{1-P Q} for W2 and W3.

具体的には、正規化モジュール９４は、数値定数「１」を記憶するための記憶部４４０と、一方入力が分類器９２からの確率Ｐ_Ｑを受けるように結合され、他方入力が記憶装置４４０に結合されて、定数１と確率Ｐ_Ｑとの差、すなわち１−Ｐ_Ｑを出力する減算器４４２と、一方入力が重みＷ１を受けるように結合され、他方入力が記憶装置４４０に結合されて、定数１と重みＷ１との差を出力する減算器４４４と、一方入力が減算器４４４の出力を受けるように結合され、他方入力が分類器９２からの確率Ｐ_Ｑを受けるように結合された乗算器４４６と、一方入力が減算器４４４の出力を受けるように結合され、他方入力が減算器４４２の出力を受けるように結合された乗算器４４８と、を含む。 Specifically, the normalization module 94 is coupled to the storage unit 440 for storing the numerical constant “1” so that one input receives the probability P _Q from the classifier 92 and the other input is the storage device 440. And a subtractor 442 that outputs the difference between the constant 1 and the probability P _Q , that is, 1−P _Q , one input coupled to receive the weight W 1, and the other input coupled to the storage device 440. A subtractor 444 that outputs the difference between the constant 1 and the weight W1, one input coupled to receive the output of the subtractor 444 and the other input coupled to receive the probability P _Q from the classifier 92. Multiplier 446 and a multiplier 448 having one input coupled to receive the output of subtractor 444 and the other input coupled to receive the output of subtractor 442.

減算器４４２及び４４４の出力はそれぞれ、１−Ｐ_Ｑと１−Ｗ１とに等しい。従って、乗算器４４６及び４４８の出力Ｗ２及びＷ３は、それぞれＰ_Ｑ＊（１−Ｗ１）と、（１−Ｐ_Ｑ）＊（１−Ｗ１）とに等しい。Ｗ１、Ｗ２及びＷ３の合計、すなわちＷ１＋Ｐ_Ｑ＊（１−Ｗ１）＋（１−Ｐ_Ｑ）＊（１−Ｗ１）は１に等しい。 Each output of the subtractor 442 and 444, is equal to the _{1-P Q} and 1-W1. Accordingly, the outputs W2 and W3 of the multipliers 446 and 448 are equal to P _Q * (1−W1) and (1−P _Q ) * (1−W1), respectively. The sum of W1, W2 and W3, ie W1 + P _Q * (1−W1) + (1−P _Q ) * (1−W1) is equal to 1.

図１０は図１に示す分類器９２のブロック図である。図１０を参照して、分類器９２は、図４に示す特徴量抽出モジュール２９０によって抽出されたのと同じ特徴量の組を入力文４８から抽出するための特徴量抽出モジュール４６０と、分類器モデル１１０（図１を参照）及び特徴量抽出モジュール４６０によって抽出された入力文４８の特徴量の組に基づいて、入力文４８の確率Ｐ_Ｑを計算するための確率計算モジュール４６２と、を含む。 FIG. 10 is a block diagram of the classifier 92 shown in FIG. Referring to FIG. 10, the classifier 92 includes a feature quantity extraction module 460 for extracting from the input sentence 48 the same feature quantity set extracted by the feature quantity extraction module 290 shown in FIG. model 110 based on the characteristic quantity of the set of input text 48 that is extracted by (see FIG. 1) and the feature extraction module 460 includes a probability calculation module 462 for calculating the probability P _Q of the input sentence 48, the .

図１１はターゲット文のクラスを予測するためにＭＥモデルにおいて述語として用いられる、文“＜ｓ＞ｗｈｅｒｅｉｓｔｈｅｓｔａｔｉｏｎ＜／ｓ＞”から抽出されたｎグラム（ｎ≦３）の組を示す。この組は、４個のユニグラム（＜ｓ＞ｗｈｅｒｅ，ｉｓ，ｔｈｅ，ｓｔａｔｉｏｎ＜／ｓ＞）、３個のバイグラム（＜ｓ＞ｗｈｅｒｅｉｓ，ｉｓｔｈｅ，ｔｈｅｓｔａｔｉｏｎ＜／ｓ＞）、及び２個のトライグラム（＜ｓ＞ｗｈｅｒｅｉｓｔｈｅ，ｉｓｔｈｅｓｔａｔｉｏｎ＜／ｓ＞）を含む。ｎグラムの特徴量の説明を簡潔にするため、図１ではｎを３とした。しかし、ｎの数は３に限られない。後述するように、発明者らは実験では５グラム特徴量（ｎ＝５）を用いている。 FIG. 11 shows a set of n-grams (n ≦ 3) extracted from the sentence “<s> where is the station </ s>” used as a predicate in the ME model to predict the class of the target sentence. This set consists of 4 unigrams (<s> where, is, the, station </ s>), 3 bigrams (<s> where is, is the, the station </ s>), and 2 Of the trigram (<s> where is the, is the station </ s>). In order to simplify the description of the feature quantity of n-grams, n is 3 in FIG. However, the number of n is not limited to three. As will be described later, the inventors use a 5-gram feature (n = 5) in the experiment.

５．動作
＜全体手順＞
ＳＭＴシステムは以下のように動作する。ＳＭＴシステム３０は大まかに言って２つの動作段階を含む。トレーニング段階と翻訳段階である。 5. Operation <Overall procedure>
The SMT system operates as follows. The SMT system 30 generally includes two operational phases. Training stage and translation stage.

図１２を参照して、トレーニング段階は４つのサブ段階を含む。クラス依存ＳＭＴモデル１１２のトレーニング（ステップ５００）と、分類器モデル１１０のトレーニング（ステップ５０２）と、句テーブル１１４の生成（ステップ５０４及び５０６）と、開発セット４０の一般モデルのための重みＷ１の最適化（ステップ５０８）と、である。ステップ５００から５０８が完了すると、ＳＭＴシステム３０は何らかの入力文を翻訳する準備が整う。 Referring to FIG. 12, the training stage includes four sub-stages. Training of the class dependent SMT model 112 (step 500), training of the classifier model 110 (step 502), generation of the phrase table 114 (steps 504 and 506), and the weight W1 for the general model of the development set 40 And optimization (step 508). When steps 500 to 508 are complete, the SMT system 30 is ready to translate any input sentence.

［ＳＭＴモデルのトレーニング（ステップ５００）］
図２を参照して、一般ＳＭＴトレーニングモジュール１３０はトレーニングセット４２の全データに基づいて一般ＳＭＴモデル１６０をトレーニングする。ＳＭＴモデルのトレーニングは通常の方法で行われる。 [SMT model training (step 500)]
With reference to FIG. 2, the general SMT training module 130 trains the general SMT model 160 based on all data in the training set 42. Training of the SMT model is performed in the usual way.

質問抽出モジュール１３２はトレーニングセット４２から、各々がターゲット側に質問文を含む文の対を抽出する。質問特定ＳＭＴトレーニングモジュール１３４は、質問抽出モジュール１３２によって抽出された文の対に基づいて、質問特定ＳＭＴモジュール１６２をトレーニングする。トレーニングの方法は、一般ＳＭＴトレーニングモジュール１３０と同様である。 The question extraction module 132 extracts from the training set 42 pairs of sentences each including a question sentence on the target side. The question specific SMT training module 134 trains the question specific SMT module 162 based on the sentence pairs extracted by the question extraction module 132. The training method is the same as that of the general SMT training module 130.

叙述抽出モジュール１３６は、トレーニングセット４２から、各々がターゲット側に叙述文を含む文の対を抽出する。叙述特定ＳＭＴトレーニングモジュール１３８は、叙述抽出モジュール１３６によって抽出された文の対に基づいて、叙述特定ＳＭＴモジュール１６４をトレーニングする。トレーニングの方法は、ＳＭＴトレーニングモジュール１３０及び質問特定ＳＭＴトレーニングモジュール１３４と同様である。 The narrative extraction module 136 extracts from the training set 42 a pair of sentences each including a narrative sentence on the target side. The narrative specific SMT training module 138 trains the narrative specific SMT module 164 based on the sentence pairs extracted by the narrative extraction module 136. The training method is the same as that of the SMT training module 130 and the question specifying SMT training module 134.

［分類器モデル１１０のトレーニング（ステップ５０２）］
図４を参照して、特徴量抽出モジュール２９０は、トレーニングセット４２の文の対のソース文の各々から図１０に示す特徴量抽出モジュール４６０によって抽出されるのと同じ特徴量の組を抽出する。記憶部２９２は抽出された特徴量の組をターゲット側の文の各々の文ラベル（質問／叙述）とともに記憶する。その後最大エントロピモデリングモジュール２９４が記憶部２９２に記憶された特徴量の組と文ラベルとに基づいて、式（１）に従って分類モデル１１０のパラメータを計算する。 [Training of classifier model 110 (step 502)]
Referring to FIG. 4, the feature quantity extraction module 290 extracts the same set of feature quantities as extracted by the feature quantity extraction module 460 shown in FIG. 10 from each of the source sentences of the pair of sentences in the training set 42. . The storage unit 292 stores the extracted feature value pairs together with the sentence labels (questions / descriptions) of the sentences on the target side. Thereafter, the maximum entropy modeling module 294 calculates the parameters of the classification model 110 according to the equation (1) based on the set of feature values and the sentence label stored in the storage unit 292.

［句テーブルの生成（ステップ５０４及び５０６）］
図３を参照して、自動アライメントモジュール２４０は、トレーニングセット４２の文の対の各々について、ソース文の単語とターゲット文の単語とを対応付ける。句抽出モジュール２４２は、対応付けされた文の対から、句の対を抽出する。ここで、句抽出モジュール２４２は、ターゲット文中の連続した単語に対応付けられたソース文中の連続した単語のシーケンスを見出し、これら単語シーケンスの対を句の翻訳対として抽出する。抽出した句の対は、一般句テーブル２４４に記憶される。 [Phrase Table Generation (Steps 504 and 506)]
Referring to FIG. 3, the automatic alignment module 240 associates a source sentence word and a target sentence word for each sentence pair of the training set 42. The phrase extraction module 242 extracts a phrase pair from the associated sentence pair. Here, the phrase extraction module 242 finds a sequence of consecutive words in the source sentence associated with the consecutive words in the target sentence, and extracts a pair of these word sequences as a phrase translation pair. The extracted phrase pairs are stored in the general phrase table 244.

自動アライメントモジュール２５０は、トレーニングセット４２の「質問」というラベルを付された文の対の各々において、ソース文の単語とターゲット文の単語とを対応付ける。句抽出モジュール２５２は、一般句テーブル２４４と同様に、対応付けされた文の対から句の対を抽出する。抽出された句の対は、質問特定句テーブル２５４に記憶される。 The automatic alignment module 250 associates the words of the source sentence with the words of the target sentence in each sentence pair labeled “Question” in the training set 42. Similarly to the general phrase table 244, the phrase extraction module 252 extracts phrase pairs from the associated sentence pairs. The extracted phrase pairs are stored in the question specific phrase table 254.

自動アライメントモジュール２５０はトレーニングセット４２の「叙述」というラベルを付された文の対の各々において、ソース文の単語とターゲット文の単語とを対応付ける。句抽出モジュール２６２は、句抽出モジュール２４２及び一般句テーブル２４４と同様に、対応付けされた文の対から句の対を抽出する。抽出された句の対は、叙述特定句テーブル２６４に記憶される。 The automatic alignment module 250 associates the source sentence word with the target sentence word in each pair of sentences labeled “description” in the training set 42. Similar to the phrase extraction module 242 and the general phrase table 244, the phrase extraction module 262 extracts phrase pairs from the associated sentence pairs. The extracted phrase pairs are stored in the description specific phrase table 264.

テーブルマージモジュール２７０は、一般句テーブル２４４、質問特定句テーブル２５４及び叙述特定句テーブル２６４をマージする。ここで、テーブル２４４、２５４及び２６４の１つ又は２つで出現する句の対は、句テーブル１１４に記憶される。しかし、この句をトレーニング中に獲得しなかったサブシステムにはサポートがない（ゼロ確率）。 The table merge module 270 merges the general phrase table 244, the question specific phrase table 254, and the narrative specific phrase table 264. Here, phrase pairs appearing in one or two of the tables 244, 254 and 264 are stored in the phrase table 114. However, subsystems that did not acquire this phrase during training have no support (zero probability).

［重みＷ１の最適化（ステップ５０８）］
重みＷ１の最適化には開発セット４０が用いられる。図５を参照して、開発セット４０内のソース文の各々がＳＭＴ装置４６によって翻訳され、翻訳セット３１０ができる。ＢＬＥＵ評価器３２０が翻訳の各々のＢＬＥＵスコアを評価する。開発セット４０内のターゲット側の文は、この評価において基準翻訳として用いられる。ＢＬＥＵスコアの平均が計算され記憶される。 [Optimization of weight W1 (step 508)]
The development set 40 is used for optimizing the weight W1. Referring to FIG. 5, each of the source sentences in the development set 40 is translated by the SMT device 46 to form a translation set 310. A BLEU evaluator 320 evaluates each BLEU score of the translation. The target sentence in the development set 40 is used as a reference translation in this evaluation. The average of the BLEU score is calculated and stored.

次のサイクルで、重みＷ１の値をわずかに変えて、同様のＢＬＥＵ評価が行われる。こうして、最少誤差トレーニングにより（フランツＪオック、２００３年。統計的機械翻訳のための最少誤差率トレーニング、ＡＣＬ予稿集）（Franz J. Och, 2003. Minimum error rate training for statistical machine translation, Proceedings ACL.）、一般モデルの重みＷ１が最適化される。 In the next cycle, the value of the weight W1 is changed slightly, and the same BLEU evaluation is performed. Thus, by minimum error training (Franz J. Och, 2003. Minimum error rate training for statistical machine translation, Proceedings ACL. ), The weight W1 of the general model is optimized.

一旦最適化されると、重みＷ１は文のデコード（翻訳）の間、固定されたままである。 Once optimized, weight W1 remains fixed during sentence decoding (translation).

［ＳＭＴモジュール９６による翻訳］
ラベル（質問／叙述）なしの入力文４８が分類器９２（図１及び図１０を参照）に与えられると、特徴量抽出モジュール４６０は入力文４８から特徴量の組を抽出し、その特徴量の組を確率計算モジュール４６２に与える。確率計算モジュール４６２は、特徴量の組を分類器モデル１１０に適用することによって、入力文４８が質問である確率を計算する。計算された確率Ｐ_Ｑは正規化モジュール９４の減算器４４２及び乗算器４４６の入力に与えられる。分類器９２から与えられた確率Ｐ_Ｑに基づいて、正規化モジュール９４は、重みＷ１、Ｗ２及びＷ３の和が１となるように重みＷ２及びＷ３を正規化し、重みＷ１、Ｗ２及びＷ３をＳＭＴモジュール９６に与える。 [Translation by SMT module 96]
When an input sentence 48 without a label (question / description) is given to the classifier 92 (see FIGS. 1 and 10), the feature amount extraction module 460 extracts a feature amount pair from the input sentence 48, and the feature amount is extracted. Are provided to the probability calculation module 462. The probability calculation module 462 calculates the probability that the input sentence 48 is a question by applying the set of feature quantities to the classifier model 110. The calculated probability P _Q is provided to the inputs of the subtractor 442 and the multiplier 446 of the normalization module 94. Based on the probability P _Q given from the classifier 92, the normalization module 94 normalizes the weights W2 and W3 such that the sum of the weights W1, W2 and W3 is 1, and the weights W1, W2 and W3 are SMT. To module 96.

図６を参照して、一般ＳＭＴサブシステム３４０、質問特定ＳＭＴサブシステム３４２及び叙述特定ＳＭＴサブシステム３４４は、特徴量の組が与えられると、一般ＳＭＴモデル１６０、質問特定ＳＭＴモデル１６２及び叙述特定ＳＭＴモデル１６４にそれぞれ基づいて、仮説の確率を独立に計算する。ＬＭ及びＴＭ確率と、ＬｅＭ及びＤＭペナルティとが一般ＳＭＴサブシステム３４０、質問特定ＳＭＴサブシステム３４２及び叙述特定ＳＭＴサブシステム３４４から重みづけモジュール３５０、３５２及び３５４にそれぞれ与えられ、重みＷ１、Ｗ２及びＷ３によってそれぞれ重みづけられる。 Referring to FIG. 6, the general SMT subsystem 340, the question specifying SMT subsystem 342, and the description specifying SMT subsystem 344 are given a general SMT model 160, a question specifying SMT model 162, and a description specifying given a set of features. Based on the SMT model 164, hypothesis probabilities are calculated independently. LM and TM probabilities and LeM and DM penalties are provided from the general SMT subsystem 340, the query specific SMT subsystem 342 and the narrative specific SMT subsystem 344 to the weighting modules 350, 352 and 354, respectively, and weights W1, W2 and Each is weighted by W3.

重みづけられたＬＭ及びＴＭ確率と重みづけられたＬｅＭ及びＤＭペナルティとは合計モジュール３６０に与えられ（図８を参照）、ここで重みづけモジュール３５０、３５２及び３５４からのＬＭ確率が加算される。同様に、重みづけモジュール３５０、３５２及び３５４からのＴＭ確率が加算される。ＬｅＭ確率及びＤＭペナルティも同様に加算される。このようにして得られたＬＭ確率、ＴＭ確率、ＬｅＭペナルティ及びＤＭペナルティはデコーダ３６２に与えられる。 The weighted LM and TM probabilities and the weighted LeM and DM penalties are provided to the sum module 360 (see FIG. 8), where the LM probabilities from the weight modules 350, 352 and 354 are added. . Similarly, TM probabilities from weighting modules 350, 352 and 354 are added. The LeM probability and DM penalty are added in the same way. The LM probability, TM probability, LeM penalty and DM penalty obtained in this way are given to the decoder 362.

デコーダはこれらの値に基づいて、入力文４８の翻訳の最もそれらしい仮説を検索し、ｎベスト仮説を出力する。 Based on these values, the decoder searches for the most likely hypothesis for the translation of the input sentence 48 and outputs the n best hypothesis.

６．実験
６．１実験データ
提案された技術を評価するために、旅行会話コーパスについて実験を行った。実験用コーパスは、ＢＴＥＣコーパスの旅行用構成課題であり（キクイら、２００３年。音声対音声翻訳のためのコーパスの生成。ＥＵＲＯＳＰＥＥＣＨ予稿集、第３８１−３８４ページ）、(Kikui, et al., 2003. Creating Corpora for Speech-to-Speech Translation. In Proceedings of EUROSPEECH, pages 381-384)英語をターゲットとし、他の言語の各々をソース言語とした。トレーニング、開発、及び評価コーパス統計はテーブル１に示すとおりである。評価コーパスでは、一文につき１６個の参照翻訳文がある。
（テーブル１） 6). Experiment 6.1 Experimental data To evaluate the proposed technology, an experiment was conducted on a travel conversation corpus. The experimental corpus is a travel component of the BTEC corpus (Kikui et al., 2003. Generating a corpus for speech-to-speech translation. Eurospeech Proceedings, pages 381-384), (Kikui, et al., 2003. Creating Corpora for Speech-to-Speech Translation. In Proceedings of EUROSPEECH, pages 381-384) Targeted English and each of the other languages as the source language. Training, development, and evaluation corpus statistics are as shown in Table 1. In the evaluation corpus, there are 16 reference translations per sentence.
(Table 1)

データはクラスに分けられ（質問及び叙述）、さらに各クラスについてトレーニングセットと開発セットとに細分された。１０００個の文が開発データとして取除けられ、残りがトレーニングに用いられた。

The data was divided into classes (questions and descriptions) and further subdivided into a training set and a development set for each class. 1000 sentences were removed as development data, and the rest was used for training.

実験は様々な異なる言語に対して行われた。これらを以下のキーで表す：アラビア語（ａｒ）、デンマーク語（ｄａ）、ドイツ語（ｄｅ）、英語（ｅｎ）、スペイン語（ｅｓ）、フランス語（ｆｒ）、インドネシア語（マレー語）（ｉｄ）、イタリア語（ｉｔ）、日本語（ｊａ）、韓国語（ｋｏ）、マレーシア語（マレー語）（ｍｓ）、オランダ語（ｎｌ）、ポルトガル語（ｐｔ）、ロシア語（ｒｕ）、タイ語（ｔｈ）、ベトナム語（ｖｉ）、中国語（ｚｈ）である。 Experiments were conducted on a variety of different languages. These are represented by the following keys: Arabic (ar), Danish (da), German (de), English (en), Spanish (es), French (fr), Indonesian (Malay) (id ), Italian (it), Japanese (ja), Korean (ko), Malaysian (Malay) (ms), Dutch (nl), Portuguese (pt), Russian (ru), Thai (Th), Vietnamese (vi), and Chinese (zh).

［デコーダ］
実験で用いたデコーダ、ＣｌｅｏｐＡＴＲａ（クレオパトラ）は、ＰＨＡＲＡＯＨ（ファラオ）（フィリップコーエン、２００４年。ファラオ：句ベースの統計的機械翻訳モデルのためのビームサーチデコーダ。機械翻訳：実際のユーザから研究まで：第６回ＡＭＴＡカンファレンス、ワシントンＤＣ，シュプリンガーフェラーク、第１１５−１２４ページ）（Philipp Koehn. 2004. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. Machine translation: from real users to research: 6th conference of AMTA, Washington, DC, Springer Verlag, pp. 115-124.）及びＭＯＳＥＳ（モーゼ）（フィリップコーエンら、２００７年。モーゼ：統計的機械翻訳のためのオープンソースツールキット、ＡＣＬ２００７：デモ及びポスターセッション予稿集、プラハ、チェコ共和国、第１７７−１８０ページ）（Philipp Koehn et al., 2007. Moses: open source toolkit for statistical machine translation, ACL 2007: proceedings of demo and poster sessions, Prague, Czech Republic, pp. 177-180.）デコーダと同じ原理で動作する、出願人組織内のフレーズベースの統計的デコーダである。デコーダはこれらの実験で、ＭＯＳＥＳとほぼ同一の出力を生成するように構成された。デコーダは、モデルの多数の組を扱い、重みづけられた入力を受容し、デコードの間にダイナミックな補間処理を組入れるように修正された。 [decoder]
The decoder used in the experiment, CleopATRa, is PHARAOH (Pharaoh) (Philip Cohen, 2004. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. Machine translation: from real users to research: 6th AMTA Conference, Washington DC, Springer Ferrack, pages 115-124) (Philipp Koehn. 2004. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. Machine translation: from real users to research: 6th conference of AMTA, Washington, DC, Springer Verlag, pp. 115-124. and MOSES (Philip Cohen et al., 2007. Moses: open source toolkit for statistical machine translation, ACL 2007: demos and posters Session proceedings, Prague, Czech Republic Pp.177-180 (Philipp Koehn et al., 2007. Moses: open source toolkit for statistical machine translation, ACL 2007: proceedings of demo and poster sessions, Prague, Czech Republic, pp. 177-180.) A phrase-based statistical decoder within the applicant's organization that operates on the same principle. In these experiments, the decoder was configured to produce almost the same output as MOSES. The decoder was modified to handle multiple sets of models, accept weighted inputs, and incorporate dynamic interpolation during decoding.

［実際的な問題］
提案されたアプローチについて最も懸念されるのは、多数のモデルを扱う場合に起こりうる、リソースについての過大な要求である。しかしながら、この実験で用いるデコーダの重要な特徴の一つは、そのモデルをディスクに置き、モデルのうち、手元の文をデコードするのに必要な部分のみをロードできる能力である。これによって、多数のモデルをロードする際に、デコード時間をそれとわかるほど悪化させることなく、メモリのオーバーヘッドが減じられる。さらに、検索開始前に、各文のモデルのほとんどについて、補間可能性を前もって計算することができ、これによって検索メモリと処理時間の両方を減じることができる。 [Practical problems]
Of most concern for the proposed approach is the excessive demand for resources that can arise when dealing with a large number of models. However, one important feature of the decoder used in this experiment is the ability to place the model on disk and load only the part of the model necessary to decode the sentence at hand. This reduces memory overhead when loading a large number of models without appreciably degrading the decoding time. In addition, the interpolability can be calculated in advance for most of each sentence model before the search is started, thereby reducing both the search memory and the processing time.

［デコード条件］
デコーダパラメータの調整のために、それぞれの開発コーパスを用いて、ＢＬＥＵスコアに対する最少誤差トレーニングを行った。ＳＲＩ言語モデリングツールキット（アンドレアスストルク１９９９年。ＳＲＩＬＭ−拡張可能言語モデルツールキット）（Andreas Stolcke. 1999. SRILM - An Extensible Language Model Toolkit. http://www.speech.sri.com/projects/srilm/）とウィットン−ベル平滑化を用いて構築した５グラム言語モデルを用いた。モデルは長さモデルを含み、さらに、ＰＨＡＲＡＯＨデコーダで用いられる単純な距離ベースのディストーションモデルも含む。 [Decoding conditions]
For the adjustment of the decoder parameters, the minimum error training for the BLEU score was performed using the respective development corpus. SRI Language Modeling Toolkit (Andreas Stolk 1999. SRILM-Extensible Language Model Toolkit) (Andreas Stolcke. 1999. SRILM-An Extensible Language Model Toolkit. Http://www.speech.sri.com/projects/srilm/ And a 5 gram language model constructed using Witton-Bell smoothing. The model includes a length model and also includes a simple distance-based distortion model used in the PHARAOH decoder.

［補間重みの調整］
補間重みは、０から１の範囲で０．１ずつ増分する重みの組によって開発セットのＢＬＥＵスコアを最大化することで調整された。図１３はこの発明の２つのモデルの重みパラメータに対する挙動を示したものである。 [Interpolation weight adjustment]
The interpolation weights were adjusted by maximizing the development set BLEU score by a set of weights incrementing by 0.1 in the range of 0 to 1. FIG. 13 shows the behavior of the two models of the present invention with respect to the weight parameter.

図１３を参照して、破線５２２で示す中国語（ｚｈ）から英語への翻訳のＢＬＥＵスコアは、重みＷ１をゼロから増加させても改善が見られなかった。これに対して、実線５２０で示すインドネシア語（マレー語）（ｉｄ）から英語への翻訳の場合、Ｗ１を約２にするとＢＬＥＵスコアは最大となった。これは、ソース言語とターゲット言語との組合せに対する、このシステムの依存性を示す。 Referring to FIG. 13, the BLEU score for translation from Chinese (zh) to English indicated by a broken line 522 did not improve even when the weight W1 was increased from zero. On the other hand, in the case of translation from Indonesian (Malay) (id) to English indicated by a solid line 520, the BLEU score was maximized when W1 was about 2. This indicates the dependency of this system on the combination of source and target languages.

［評価スキーム］
ここで提案するアプローチの利点をバランスよく見るために、実験では、このシステムの評価に６種類の評価技術を用いた。すなわち、ＢＬＥＵ（キショーパピネニら、２００１年。Ｂｌｅｕ：機械翻訳の自動評価方法。ＩＢＭ調査レポート、ＲＣ２２１７６、９月１７日）（Kishore Papineni et al., 2001. Bleu: a method for automatic evaluation of machine translation. IBM Research Report, RC22176, September 17.）、ＮＩＳＴ（ジョージドディントン、２００２。ｎグラムの同時出現統計を用いた機械翻訳品質の自動評価。人間言語技術カンファレンス予稿集、サンディエゴ、カリフォルニア、第１３８−１４５ページ）（George Doddington. 2002 Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. Proceedings of Human Language Technology Conference, San Diego, California, pp. 138-145.）、ＷＥＲ（ＷｏｒｄＥｒｒｏｒＲａｔｅ：単語誤り率）、ＰＥＲ(ＰｏｓｉｔｉｏｎｉｎｄｅｐｅｎｄｅｎｔＷＥＲ：位置独立ＷＥＲ)、ＧＴＭ（ＧｅｎｅｒａｌＴｅｘｔＭａｔｃｈｅｒ：汎用テキスト一致器）、及びＭＥＴＥＯＲ（サタニエフバネリジ及びアロンラビ、２００５年。人の判断との相関が改善されたＭＴ評価のための自動メトリック、ＡＣＬ―２００５：機械翻訳及び／又は要約のための内在的及び外在的評価尺度に関するワークショップ、第６５−７２ページ）（Satanjeev Banerjee and Alon Lavie. 2005. METEOR:an automatic metric for MT evaluation with improved correlation with human judgments. ACL-2005: Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65-72.）である。 [Evaluation scheme]
In order to see the benefits of the proposed approach in a balanced manner, six different evaluation techniques were used in the experiment to evaluate this system. That is, BLEU (Kisho Papineni et al., 2001. Bleu: automatic evaluation method of machine translation. IBM research report, RC22176, September 17) (Kishore Papineni et al., 2001. Bleu: a method for automatic evaluation of machine translation IBM Research Report, RC22176, September 17.), NIST (George Dodington, 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. Proc. Of Human Language Technology Conference, San Diego, California, 138-145. Page) (George Doddington. 2002 Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. Proceedings of Human Language Technology Conference, San Diego, California, pp. 138-145.), WER (Word Error Rate: word error) Rate), PER (Position independent WER: rank) Independent WER), GTM (General Text Matcher), and METEOR (Sataniev Banerigi and Aron Rabi, 2005. Automatic metrics for MT evaluation with improved correlation with human judgment, ACL-2005: Workshop on intrinsic and external assessment scales for machine translation and / or summaries, pages 65-72) (Satanjeev Banerjee and Alon Lavie. 2005. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments ACL-2005: Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and / or Summarization, pp. 65-72.

６．２分類精度
分類器の性能（トレーニングセットの１０分割相互検証による）を表２に示す。ソース（同じ言語）とターゲット（英語）の句読法を予測する分類精度の数字を示した。当然のことながら、全てのシステムで、それ自身の句読法はより良く予測された。表でスコアが悪いものは、言語的特性（おそらくは、ソース文の質問がターゲットではしばしば陳述として表わされる）又はコーパス自体の特性を反映してものであろう。全ての言語について、分類器の精度は、特にコーパスそのものに一貫性を欠く可能性があること（従ってこの実験でのテストデータもそうであること）を考えれば、満足のいくものと思われる。 6.2 Classification accuracy Table 2 shows the performance of the classifier (by 10-fold cross-validation of the training set). The numbers of classification accuracy to predict the punctuation of source (same language) and target (English) are shown. Of course, on all systems, its own punctuation was better predicted. A bad score in the table will probably reflect a linguistic characteristic (perhaps the source sentence question is often expressed as a statement in the target) or a characteristic of the corpus itself. For all languages, the accuracy of the classifier seems to be satisfactory, especially considering that the corpus itself may be inconsistent (and so is the test data in this experiment).

６．３翻訳の品質
ＳＭＴシステムの性能を表３に示す。

6.3 Translation quality Table 3 shows the performance of the SMT system.

この表から、評価された実験条件のほとんどについて、全データでトレーニングされたＳＭＴシステムからなるベースラインシステムにくらべ、このシステムの性能が勝っていることが明らかである。性能が劣化している数値部分では、１つを除く全てで、結果は統計的には有意なものではなく、全ての事例で、他のＭＴ評価メトリックスは改善を示した。いくつかの言語対では驚くべき改善が見られ、特に、この技術を用いると、マレー語ｉｄとｍｓはいずれもＢＬＥＵが３．５ポイントも改善された。

From this table it is clear that for most of the experimental conditions evaluated, the performance of this system is superior to a baseline system consisting of an SMT system trained with all data. In the numerical part where performance was degraded, the results were not statistically significant in all but one, and in all cases the other MT evaluation metrics showed improvement. Some language pairs have seen remarkable improvements, especially with this technique, both Malay id and ms have improved BLEU by 3.5 points.

興味深いことに、マレー語の親戚であるオランダ語も、実質的に改善された。これは、利得に関する言語学的説明を証拠立てるものである。マレー語は非常に簡潔で規則正しい質問の構造を有し、質問語が質問文のはじめに出現し（ターゲット言語と同様に）、その言語において（たとえば英語の“ｄｏ”と異なり）他の機能を果たすことはない。おそらくこの表現の単純さのために、この発明のクラス特定モデルが、データ分割によってデータが減少したにも関わらず、データを良好にモデル化できたものと思われる。 Interestingly, the Malay relative, Dutch, has also improved substantially. This provides linguistic explanations for gain. Malay has a very concise and regular question structure, where the question word appears at the beginning of the question sentence (similar to the target language) and performs other functions in that language (unlike English “do”, for example) There is nothing. Perhaps because of the simplicity of this representation, the class specific model of the present invention was able to model the data well despite the data being reduced due to the data partitioning.

別の要因は、分類器の性能と思われ、これは全ての言語において高かった（約９８％）。残念ながら、表のスコアの多様性の裏にある理由を知るのは困難である。大きな要因の一つは、コーパスの品質の差と、ソースコーパスとターゲットコーパスとの関係とであろう。いくつかのコーパスは互いの直訳であり、他のものは別の言語からの重訳である。中国語がこの様な言語の一つであり、中国語と関連の深い日本語とタイ語では非常にうまくいったにも関わらず、この言語ではベースラインから改善できなかった理由がこれで説明できるかもしれない。 Another factor seemed to be classifier performance, which was high in all languages (about 98%). Unfortunately, it is difficult to know the reason behind the diversity of table scores. One of the major factors may be the difference in corpus quality and the relationship between the source corpus and the target corpus. Some corpora are direct translations of each other, others are multiple translations from different languages. This is the reason why Chinese was one of these languages, and although it was very successful in Japanese and Thai, which are closely related to Chinese, this language did not improve from the baseline. I may be able to do it.

［先行する方法との比較］
ここで提案した方法を、このシステムのハード重みを用いた実現例と比較するための実験を行った。その目的は、このフレームワーク内で、先行技術の非特許文献１で提案されたシステムにできる限り近づいてみることであった。分類確率でクラス特定モデルに重みを付けることに代えて、１と０との重みを用いた。これを達成するために、分類器からの確率を、確率が＞０．５であれば１の重みを与え、そうでなければ０の重みを用いるように２値化処理した。このシステムの性能を、表４の「ハード」という見出しの欄に示す。１つを除く全ての条件下で、このシステムよりも、発明で提案したアプローチのほうが性能が勝っているか、又は等しかった。 [Comparison with the previous method]
An experiment was conducted to compare the proposed method with an implementation using the hardware weight of this system. The aim was to try to get as close as possible to the system proposed in the prior art Non-Patent Document 1 within this framework. Instead of weighting the class specific model with classification probability, weights of 1 and 0 were used. In order to achieve this, the probability from the classifier was binarized so that a weight of 1 was given if the probability was> 0.5, and a weight of 0 was used otherwise. The performance of this system is shown in the column labeled “Hard” in Table 4. Under all conditions except one, the approach proposed in the invention outperformed or was better than this system.

表４の、「分類器なし」のラベルの欄は、発明のシステムの分類器の有効性を示している。これらの結果から、質問モデルと叙述モデルとの間の補間に等しい重み（０．５）を用いる効果が示された。このシステムは、分類器を用いたシステムほどではないが、相当の性能を示した。

The column labeled “No Classifier” in Table 4 shows the effectiveness of the classifier of the inventive system. These results showed the effect of using equal weight (0.5) for the interpolation between the query model and the narrative model. This system did not perform as well as a system using a classifier, but showed considerable performance.

７．結論
上述の実施の形態では、質問に特定のＳＭＴエンジンと叙述に特定のＳＭＴエンジンとからの２つのモデルを単一のデコード処理に組合せた。しかし、この発明は２つのクラスのシステムに限定されるものではない。式１から明らかなとおり、この発明は３又はそれ以上のクラスを含むシステムに適用可能である。 7). CONCLUSION In the above-described embodiment, the two models from the SMT engine specific to the question and the SMT engine specific to the description were combined into a single decoding process. However, the present invention is not limited to two classes of systems. As is apparent from Equation 1, the present invention is applicable to systems that include three or more classes.

この技術は、構成要素モデル間の確率による柔軟な重みづけでのトピック依存デコード処理を可能にする。実験は、疑問文と叙述文とのクラスにクラス特定モデルを構築することで、会話データに対するこの発明の実施の形態の有効性を示した。多数の言語対及びＭＴ評価メトリックスを用いた技術の広範な評価は、この発明の有効性を示す。ほとんどの場合、モデル補間なしのシステムに対し優位な改善を示すことができ、いくつかの言語対に対してはこのアプローチが優越している。全ての言語対の中で最も改善されたのはマレーシア語（マレー語）と英語であり、ベースラインシステムに対しＢＬＥＵが４．７ポイント（０．４６３から０．５１０）上昇した。 This technique enables topic-dependent decoding with flexible weighting based on the probability between component models. The experiment showed the effectiveness of the embodiment of the present invention for conversation data by constructing a class specific model in the class of question sentences and narrative sentences. Extensive evaluation of the technology using numerous language pairs and MT evaluation metrics demonstrates the effectiveness of the present invention. In most cases, a significant improvement over systems without model interpolation can be shown, and this approach is superior for some language pairs. The most improved of all language pairs was Malaysian (Malay) and English, increasing the BLEU by 4.7 points (0.463 to 0.510) over the baseline system.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each of the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

この発明の１実施の形態のＳＭＴシステム３０の全体ブロック図である。1 is an overall block diagram of an SMT system 30 according to an embodiment of the present invention. 図１に示すクラス依存ＳＭＴモデル１１２の３つの組とＳＭＴトレーニングモジュール７４の詳細なブロック図である。FIG. 2 is a detailed block diagram of the three sets of class dependent SMT models 112 and the SMT training module 74 shown in FIG. 句テーブル生成モジュール７６の詳細なブロック図である。4 is a detailed block diagram of a phrase table generation module 76. FIG. 分類器トレーニングモジュール７２の詳細なブロック図である。4 is a detailed block diagram of classifier training module 72. FIG. 重み推定モジュール７０ブロック図である。FIG. 6 is a block diagram of a weight estimation module 70. ＳＭＴモジュール９６の詳細なブロック図である。3 is a detailed block diagram of an SMT module 96. FIG. 重みづけモジュール３５２の簡略化されたブロック図である。FIG. 6 is a simplified block diagram of a weighting module 352. 合計モジュール３６０の簡略化されたブロック図である。FIG. 3 is a simplified block diagram of a summing module 360. 正規化モジュール９４のブロック図である。4 is a block diagram of a normalization module 94. FIG. 分類器９２の簡略化されたブロック図である。6 is a simplified block diagram of classifier 92. FIG. 「＜ｓ＞ｗｈｅｒｅｉｓｔｈｅｓｔａｔｉｏｎ＜／ｓ＞」の文から抽出されたｎグラム特徴量の組の例を示す図である。It is a figure which shows the example of the group of n-gram feature-value extracted from the sentence of "<s> where is the station </ s>". ＳＭＴシステム３０の動作処理を示すフローチャートである。3 is a flowchart showing an operation process of the SMT system 30. 実験に用いられたモデルのうち２つ、すなわち中国語（ｚｈ）とインドネシア語（ｉｄ）との、それらの重みパラメータに対する挙動を示す図である。It is a figure which shows the behavior with respect to those weight parameters of two models used in the experiment, namely, Chinese (zh) and Indonesian (id).

Explanation of symbols

３０ＳＭＴシステム
４０開発セット
４２トレーニングセット
４４トレーニングモジュール
４６ＳＭＴ装置
４８入力文
５０翻訳
７０重み推定モジュール
７２分類器トレーニングモジュール
７４ＳＭＴトレーニングモジュール
７６句テーブル生成モジュール
９２分類器
９６ＳＭＴモジュール
１１０分類器モデル
１１２クラス特定ＳＭＴモデル
１１４句テーブル
１３０ＳＭＴトレーニングモジュール
１３４質問特定ＳＭＴトレーニングモジュール
１３８叙述特定ＳＭＴトレーニングモジュール
１６０一般ＳＭＴモデル
１６２質問特定ＳＭＴモデル
１６４叙述特定ＳＭＴモデル
２９０及び４６０特徴量抽出モジュール
２９４最大エントロピモデリングモジュール
３２４重み最適化モジュール
３４０一般ＳＭＴサブシステム
３４２質問特定ＳＭＴサブシステム
３４４叙述特定ＳＭＴサブシステム
３６２デコーダ 30 SMT system 40 Development set 42 Training set 44 Training module 46 SMT device 48 Input sentence 50 Translation 70 Weight estimation module 72 Classifier training module 74 SMT training module 76 Phrase table generation module 92 Classifier 96 SMT module 110 Classifier model 112 Class Specific SMT Model 114 Phrase Table 130 SMT Training Module 134 Question Specific SMT Training Module 138 Descriptive Specific SMT Training Module 160 General SMT Model 162 Question Specific SMT Model 164 Descriptive Specific SMT Model 290 and 460 Feature Extraction Module 294 Maximum Entropy Modeling Module 324 Weight Optimization module 340 General SMT sub The stem 342 questions specific SMT subsystem 344 narrative certain SMT subsystem 362 decoder

Claims

Includes means for determining a vector of probabilities representing the class membership of the source sentence, the elements of the vector represents the probability of belonging respectively to the plurality of classes the source sentence predetermined further
Wherein provided for each plurality of classes, look including a plurality of sub Budekoda, the plurality of sub-decoders are statistically trained by collection of training data for each class, of the plurality of sub-decoder each of the each of the words and certain phrases of the source sentence, and outputs the probability of the translation word and translation word sequences in the target language, respectively,
According to the probability of the hypothesis obtained by combining the target language translation word and translation word sequences, means for outputting a high anchor theory of most likelihood in the target language of the source text as a translation for the source text further comprising a, the probability of the hypothesis of the target language, the probability output by the plurality of sub-decoders for each of the target language translation word and translation word sequence which forms the hypothesis before Kibe vector A statistical machine translation apparatus that is calculated using a value obtained by adding elements as weights .

The plurality of classes includes a general class and a plurality of specific classes,
The statistical machine translation device according to claim 1, wherein the plurality of specific classes are obtained by dividing the general class.

The statistical machine translation device according to claim 1 or 2, wherein one element of the vector corresponding to the general class is a constant in a range of 0 to 1.

The statistical machine translation apparatus according to claim 1, further comprising normalization means for normalizing elements of the vector so that a sum of the elements becomes 1. 5.

Means for determining the pre Kibe vector is statistically trained on maximum entropy models, assigning a membership probability to each of the classes, statistical according to any one of claims 1 to 4 Machine translation device.

Each of said plurality of sub Budekoda the class specific language model, class specific translation model, calculates the probability according to any combination of classes specified length model, or class specific distortion model or these models, claims 1 Item 6. The statistical machine translation device according to any one of Items 5.