JP2009059123A

JP2009059123A - Unit and method for predicting human assessment of translation quality

Info

Publication number: JP2009059123A
Application number: JP2007225037A
Authority: JP
Inventors: Paul Micheal; ミヒャエル・パウル; Finch Andrew; アンドリュー・フィンチ; Eiichiro Sumida; 英一郎隅田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2007-08-31
Filing date: 2007-08-31
Publication date: 2009-03-19

Abstract

<P>PROBLEM TO BE SOLVED: To provide a unit for stably predicting human assessment of machine translation quality. <P>SOLUTION: For stably predicting human assessments of machine translation quality, application module 32 includes: feature extraction module 76 for calculating a feature set 78 of translated text 74; binary classifiers 54 each for classifying translated text 74 into one of a predefined binary classes in accordance with selected features; a coding matrix 56 where each of the grades is associated with a row of classifying results of binary classifiers 54; and comparing module 84 for determining a grade of translated text 74 according to a distance between a binary vector 82 of binary decisions 80 by binary classifiers 54 and rows of coding matrix 56. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は機械翻訳の評価に関し、特に、機械翻訳（ｍａｃｈｉｎｅｔｒａｎｓｌａｔｉｏｎ：ＭＴ）品質の人による評定を予測する方法及び装置に関する。 The present invention relates to machine translation evaluation, and more particularly, to a method and apparatus for predicting machine translation (MT) quality human ratings.

ＭＴ品質の人による評価にはコストと時間がかかる。ＭＴ出力をより安価かつ迅速に評価するための様々な自動評価の対策が提案されてきた。最近のニューズワイヤＭＴ評価（ＮＩＳＴ（ＮａｔｉｏｎａｌＩｎｓｔｉｔｕｔｅｏｆＳｔａｎｄａｒｄｓａｎｄＴｅｃｈｎｏｌｏｇｙ）、ｈｔｔｐ：//ｗｗｗ．ｎｉｓｔ．ｇｏｖ／ｓｐｅｅｃｈ／ｔｅｓｔｓ／ｍｔ）及び旅行データ評価（ＩＷＳＬＴ（ｔｈｅＩｎｔｅｒｎａｔｉｏｎａｌＷｏｒｋｓｈｏｐｏｎＳｐｏｋｅｎＬａｎｇｕａｇｅＴｒａｎｓｌａｔｉｏｎ）、ｈｔｔｐ／／ｗｗｗ．ｓｌｃ．ａｒｔ．ｊｐ／ＩＷＳＬＴ２００６）での評価キャンペーンは、これら評価のための指標が人の判断とどの程度良好な相関があるかを調査している。この結果、ＭＴシステムの出力を文書レベルでランク付けすると、いくつかの指標では人の判断と高い相関が得られることが示された。しかし、各々の自動指標は翻訳出力の異なる局面に焦点をあてており、人による判断との相関は人による評定の種類（例えば流暢さ又は充分性）に依存する。さらに、どの自動指標も、単一の翻訳文の翻訳品質を予測するには満足でないことがわかった。 Evaluation by MT quality personnel is costly and time consuming. Various automatic evaluation measures have been proposed to evaluate MT output more cheaply and quickly. Recent Newswire MT assessment (NIST (National Institute of Standards and Technology), http://www.nist.gov/spec/test/mt) and travel data assessment (IWSLT (the International pound). //Www.slc.art.jp/IWSLT2006) is investigating how well the indicators for these evaluations correlate with human judgment. As a result, it was shown that when the output of the MT system was ranked at the document level, some indicators were highly correlated with human judgment. However, each automatic indicator focuses on a different aspect of the translation output, and the correlation with human judgment depends on the type of human rating (eg fluency or sufficiency). Furthermore, none of the automatic indicators proved satisfactory for predicting the translation quality of a single translation.

翻訳品質をどのように評定するかについて、様々な方策が提案されてきた。そのほとんどは翻訳の流暢さ、充分性及び受容可能性に関する翻訳品質の人による評定に焦点をあてている。流暢さとは、評価セグメントが英語を母国語とする人にどれだけ自然に聞こえるか、を示す。充分性については、評価者には元となる言語入力のほかに、「基準訳」（ｇｏｌｄｓｔａｎｄａｒｄ）の翻訳が与えられ、翻訳文中に、元の翻訳からの情報がどの程度表出されているかを判断しなければならない。受容可能性では、翻訳文の理解がどの程度容易かを判断する。流暢さ、充分性及び受容可能性の判断は、以下のテーブル１に挙げる等級のいずれかから成る。 Various strategies have been proposed for how to assess translation quality. Most of them focus on human assessment of translation quality for fluency, sufficiency and acceptability of translation. Fluency refers to how natural the evaluation segment sounds to people whose native language is English. For sufficiency, the evaluator is given a “standard translation” in addition to the original language input, and how much information from the original translation is expressed in the translation. Must be judged. Acceptability determines how easy it is to understand the translation. The judgment of fluency, sufficiency and acceptability consists of any of the grades listed in Table 1 below.

このような人による評価指標のコストが高いことから、機械翻訳のための自動評価指標の開発に多くの関心が寄せられることとなっている。テーブル２はＭＴ研究の分野で広く用いられているいくつかの指標を紹介するものである。

Due to the high cost of such evaluation indexes by humans, much attention has been paid to the development of automatic evaluation indexes for machine translation. Table 2 introduces some indicators widely used in the field of MT research.

翻訳品質の人による評定を予測するための、以前に提案された方策のほとんどは、決定木(ｄｅｃｉｓｉｏｎｔｒｅｅｓ：ＤＴ）、サポートベクトルマシン（ｓｕｐｐｏｒｔｖｅｃｔｏｒｍａｃｈｉｎｅｓ：ＳＶＭ）、又はパーセプトロン等の教師あり学習法を利用して、人の品質判断に近くなりうる判断モデルを学習している。このような分類器は人が評価したＭＴシステム出力から抽出した特徴量の組でトレーニングすることができる。

Most of the previously proposed strategies for predicting translation quality human ratings are supervised learning methods such as decision trees (DT), support vector machines (SVM), or perceptrons. Is used to learn a judgment model that can be close to human quality judgment. Such a classifier can be trained on a set of features extracted from human-evaluated MT system output.

非特許文献３に記載の研究では、単語／語句レベルで信頼度を推定する統計的尺度を用い、翻訳プロセス自体のシステム固有の特徴量を収集して２値分類器をトレーニングしている。自動評価スコアに対し経験によるしきい値を利用して、良い翻訳と悪い翻訳とを区別する。非特許文献３はまた、技術文献のドメインにおける非常に小さいデータセットについて、マルチクラス分類問題への様々な学習方策の利用可能性を検討している。 In the research described in Non-Patent Document 3, a statistical measure for estimating reliability at the word / phrase level is used to collect system-specific features of the translation process itself and train a binary classifier. Distinguish between good and bad translations by using an empirical threshold for automatic assessment scores. Non-Patent Document 3 also examines the availability of various learning strategies for multiclass classification problems for very small data sets in the domain of technical literature.

非特許文献１は多数の編集距離特徴量でトレーニングされたＤＴ分類器を利用しており、ここでは語彙（語幹、単語、品詞）と語義（シソーラスベースの意味的クラス）との組合せの一致を用いてＭＴシステムの出力と基準翻訳とを比較し、人による受容可能性を直接近似している。 Non-Patent Document 1 uses a DT classifier trained with a large number of edit distance features, and here, the combination of vocabulary (stem, word, part of speech) and semantics (thesaurus-based semantic class) is matched. Used to compare MT system output and reference translation to directly approximate human acceptability.

非特許文献５は人の判断を直接予測することに代えて、ニューズワイヤの「人が作った」翻訳と「機械で生成した」翻訳とを区別するために、自動スコアリング特徴量に基づく２値ＳＶＭ分類器をトレーニングしている。
ヤスヒロアキバ、ケンジイマムラ、及びエイイチロウスミタ、２００１年。機械翻訳出力の自動ランク付けのための多数編集距離の使用。ＭＴサミットＶＩＩＩ予稿集、１５−２０ページ。（Yasuhiro Akiba, Kenji Imamura, and Eiichiro Sumita. 2001. Using multiple edit distances to automatically rank machine translation output. In Proc. of MT Summit VIII, pages 15-20.）エリンオールウェイン、ロバートシャピア、及びヨーラムシンガー、２０００年。マルチクラスから２値へ：マージン分類器の統一アプローチ。機械学習研究ジャーナル、１：１１３−１４１。（Erin Allwein, Robert Schapire, and Yoram Singer. 2000. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113-141.）クリストファーＢ．カーク、２００４年。文レベルの機械翻訳信頼度尺度のトレーニング。第４回言語リソース及び評価国際会議（ＬＲＥＣ）、８２５−８２８ページ、ポルトガル。（Christopher B. Quirk. 2004. Training a sentence-level machine translation confidence measure. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC), pages 825-828, Portugal.）ルールクエスト。２００４年。データマイニングツールｃ５．０．http://rulequest.com/see5-info.html.（Rulequest. 2004. Data mining tool c5.0. http://rulequest.com/see5-info.html.）アレックスクレスザ及びスチュアートＭ．シーバ。２００４年。文レベルのＭＴ評価改良のための学習アプローチ。ＴＭＩ０４予稿集、ＵＳＡ。（Alex Kulesza and Stuart M. Shieber. 2004. A learning approach to improving sentence-level MT evaluation. In Proc. of the TMI04, USA.） Non-Patent Document 5 is based on automatic scoring features to distinguish between “wire-made” translations and “machine-generated” translations of Newswire instead of directly predicting human judgment. Training the value SVM classifier.
Yasuhiro Akiba, Kenji Imamura, and Eichiro Sumita, 2001. Use multiple edit distances for automatic ranking of machine translation output. MT Summit VIII Proceedings, 15-20 pages. (Yasuhiro Akiba, Kenji Imamura, and Eiichiro Sumita. 2001. Using multiple edit distances to automatically rank machine translation output. In Proc. Of MT Summit VIII, pages 15-20.) Erin Allway, Robert Shapia and Yoram Singer, 2000. Multi-class to binary: A unified approach to margin classifiers. Machine Learning Research Journal, 1: 113-141. (Erin Allwein, Robert Schapire, and Yoram Singer. 2000. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1: 113-141.) Christopher B. Kirk, 2004. Sentence level machine translation confidence measure training. 4th International Conference on Language Resources and Evaluation (LREC), pages 825-828, Portugal. (Christopher B. Quirk. 2004. Training a sentence-level machine translation confidence measure. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC), pages 825-828, Portugal.) Rule quest. 2004. Data mining tool c5.0. http://rulequest.com/see5-info.html. (Rulequest. 2004. Data mining tool c5.0. http://rulequest.com/see5-info.html.) Alex Cresza and Stuart M. Shiva. 2004. A learning approach to improve sentence-level MT evaluation. TMI04 Proceedings, USA. (Alex Kulesza and Stuart M. Shieber. 2004. A learning approach to improving sentence-level MT evaluation. In Proc. Of the TMI04, USA.)

以前に提案された方策では、翻訳品質の人による評定を予測するために教師あり学習を利用している。しかし、このようなマルチクラス分類器は主に、ＭＴシステム内部の、かつ言語依存の多数の特徴量でトレーニングされており、これらはＭＴエンジン又は言語が変わるたびに調整しなければならない。さらに、以前の方策では人の判断（マルチクラスタスク）を直接予測しようとしていた。このような直接的な分類タスクは大量のトレーニングデータを必要とする傾向があり、また、トレーニングしたとしても、翻訳品質を予測するには不安定であるか、又はあまり精密でない。 Previously proposed strategies use supervised learning to predict human ratings of translation quality. However, such multi-class classifiers are primarily trained with a number of features that are internal to the MT system and language dependent, which must be adjusted each time the MT engine or language changes. In addition, previous strategies have attempted to directly predict human judgment (multi-class tasks). Such direct classification tasks tend to require large amounts of training data and, even if trained, are unstable or less precise in predicting translation quality.

従って、この発明の目的の一つは、機械翻訳品質の人による評定を安定して予測する方法及び装置を提供することである。 Accordingly, one of the objects of the present invention is to provide a method and apparatus for stably predicting a human translation quality rating.

この発明の別の目的は機械翻訳の人による評定を高い精度をもって安定して予測する方法及び装置を提供することである。 Another object of the present invention is to provide a method and an apparatus for stably predicting a human translation rating with high accuracy.

この発明のさらなる目的は、機械翻訳品質の人による評定を安定して予測する、システム及び言語に依存しない方法及び装置を提供することである。 It is a further object of the present invention to provide a system and language independent method and apparatus for stably predicting human rating of machine translation quality.

この発明は、符号化マトリクスを用いた２値分類器の組合せに基づいて機械翻訳品質の人による評定を予測又は推定する装置及び方法に関する。マルチクラスカテゴリー化の問題を２値問題の組に還元し、これらを、多数自動評価指標の結果でトレーニングした標準分類学習アルゴリズムで解く。２値分類器はＢＬＥＵ、ＭＥＴＥＯＲ等の多数自動評価指標の特徴量でトレーニングされる。学習済みの判断モデルがＭＴ出力に１文ごとに適用され、文レベルでの翻訳品質の２値指標を生成する。マルチクラス分類問題はその後、符号化マトリクスを用いて、２値分類器の結果を組合せることで解決される。 The present invention relates to an apparatus and a method for predicting or estimating a human translation quality rating based on a combination of binary classifiers using an encoding matrix. The multi-class categorization problem is reduced to a set of binary problems, and these are solved with a standard classification learning algorithm trained on the results of many automatic evaluation indices. The binary classifier is trained with features of multiple automatic evaluation indexes such as BLEU and METEOR. The learned judgment model is applied to the MT output for each sentence, and a binary index of translation quality at the sentence level is generated. The multi-class classification problem is then solved by combining the binary classifier results using an encoding matrix.

特に、この発明の第１の局面は、機械翻訳品質の人による評定を推定するための装置に関する。人による評定は予め規定された等級によって与えられる。装置は、所与の翻訳の予め定められた特徴量の組を計算するための手段と、各々が、特徴量の組の中で選択された特徴量に従って、所与の翻訳を予め規定された２値クラスの一方に分類するための、１組の２値分類器と、等級の各々が２値分類器の組の分類結果の行と関連付けられる、符号化マトリクスを記憶するための手段と、２値分類器による２値分類の結果と符号化マトリクスとに従って、所与の翻訳の等級を決定するための手段と、を含む。 In particular, the first aspect of the present invention relates to an apparatus for estimating a human translation quality rating. Human ratings are given by a pre-defined grade. The apparatus predetermines a given translation according to a means for calculating a predetermined set of features for a given translation, each according to a feature selected in the set of features A set of binary classifiers for classifying into one of the binary classes, and means for storing an encoding matrix in which each of the grades is associated with a row of classification results for the set of binary classifiers; Means for determining the grade of a given translation according to the result of the binary classification by the binary classifier and the encoding matrix.

好ましくは、２値分類器の組の出力は、２値ベクトルを規定し、その要素の各々は第１の値又は第１の値と異なる第２の値である。符号化マトリクスの行はそれぞれ３値ベクトルを規定し、その各々は第１の値、第２の値、又は第１及び第２の値とは異なる第３の値であってもよい。第１及び第２の値は所与の翻訳が２値分類器の組の対応するものによってそれぞれ第１及び第２のクラスに分類されるべきことを示す。第３の値は、所与の翻訳が２値分類器の組の対応するものによって分類されないことを示す。決定するための手段は、２値ベクトルと３値ベクトルの各々との距離を計算するための手段と、距離において２値ベクトルと最も近い符号化マトリクスの行を見出すための手段と、２値ベクトルと最も近い行に対応する等級を、所与の翻訳の品質に対する推定される人による評定として選択するための手段と、を含む。 Preferably, the output of the set of binary classifiers defines a binary vector, each of its elements being a first value or a second value different from the first value. Each row of the encoding matrix defines a ternary vector, each of which may be a first value, a second value, or a third value different from the first and second values. The first and second values indicate that a given translation should be classified into the first and second classes, respectively, by the corresponding pair of binary classifiers. The third value indicates that a given translation is not classified by the corresponding one of the set of binary classifiers. Means for determining means for calculating a distance between the binary vector and each of the ternary vectors, means for finding the row of the encoding matrix closest to the binary vector in the distance, and the binary vector; And means for selecting the grade corresponding to the closest row as an estimated human rating for a given translation quality.

より好ましくは、距離を計算するための手段は、２値ベクトルと各３値ベクトルとの間のハミング距離を計算するための手段を含む。 More preferably, the means for calculating the distance includes means for calculating a Hamming distance between the binary vector and each ternary vector.

予め定められた特徴量の組を計算するための手段は、各々が、等級の組による、機械翻訳の品質の人による評定を自動的に評価するための、複数の予め選択された自動マルチクラス評価手段を含んでもよい。 The means for calculating the set of predetermined features is a plurality of preselected automatic multiclasses, each for automatically evaluating a human rating of machine translation quality by a set of grades. Evaluation means may be included.

好ましくは、予め定められた特徴量の組を計算するための手段は、予め定められた内部指標特徴値を計算するための自動評価手段をさらに含む。 Preferably, the means for calculating a predetermined feature amount set further includes an automatic evaluation means for calculating a predetermined internal index feature value.

この発明の第２の局面は、機械翻訳品質の人による評定を推定するためのコンピュータ化された方法に関する。人による評定は予め規定された等級によって与えられる。この方法は、所与の翻訳の予め定められた特徴量の組を計算するステップと、特徴量の組の中で選択された特徴量に従って、１組の２値分類器の各々によって所与の翻訳を予め規定された２値クラスの一方に分類するステップと、等級の各々が分類するステップで行われる分類結果の行と関連付けられる符号化マトリクスを、記憶部に記憶するステップと、分類するステップの２値分類結果の組と符号化マトリクスとに従って、所与の翻訳の等級を決定するステップと、を含む。 A second aspect of the invention relates to a computerized method for estimating human translation grades of machine translation quality. Human ratings are given by a pre-defined grade. The method includes the steps of calculating a predetermined feature set for a given translation and according to a feature selected in the feature set, by each of a set of binary classifiers. A step of classifying the translation into one of the predefined binary classes; a step of storing in the storage unit an encoding matrix associated with a row of the classification result performed in the step of classifying each of the grades; and a step of classifying Determining a grade for a given translation according to the set of binary classification results and the encoding matrix.

この発明の第３の局面は、コンピュータ上で実行されると、コンピュータを、所与の翻訳の予め定められた特徴量の組を計算するための手段と、各々が、特徴量の組の中で選択された特徴量に従って、所与の翻訳を予め規定された２値クラスの一方に分類するための、１組の２値分類器と、等級の各々が２値分類器の組の分類結果の行と関連付けられる、符号化マトリクスを記憶するための手段と、２値分類器による２値分類の結果と符号化マトリクスとに従って、所与の翻訳の等級を決定するための手段と、として機能させる、コンピュータプログラムに関する。 A third aspect of the invention, when executed on a computer, causes the computer to calculate a predetermined feature set for a given translation, each of which is included in the feature set. A set of binary classifiers for classifying a given translation into one of the predefined binary classes according to the feature quantity selected in, and the classification results of each class of binary classifiers Means for storing an encoding matrix associated with a row of the data, and means for determining a grade of a given translation according to the result of the binary classification by the binary classifier and the encoding matrix A computer program.

大規模な人による注釈つきの評価コーパスを用いた実験結果では、２値分類器への分解によって、マルチクラスのカテゴリー化問題よりも高い分類精度が達成されることが示された。加えて、提案された方法は、標準評価尺度に比べ、文レベルでの人の判断とより高い相関を達成する。 Experimental results using a large-scale annotated evaluation corpus showed that decomposition into a binary classifier achieves higher classification accuracy than multiclass categorization problems. In addition, the proposed method achieves a higher correlation with human judgment at the sentence level than the standard rating scale.

［第１の実施の形態］
概要
この発明の第１の実施の形態に従った、翻訳品質の人による評定を予測する装置を以下で説明する。以下の説明では、同じ部分は同じ参照符号で示す。それらの名称及び機能もまた同じである。従って、詳細な説明は繰返さない。 [First Embodiment]
Summary An apparatus for predicting translation quality assessment by a person according to a first embodiment of the present invention will be described below. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description will not be repeated.

この実施の形態のシステムは、翻訳品質の人による評定の予測に教師あり学習を利用するが、以下の２つの局面で先行技術のシステムと異なる。 The system of this embodiment uses supervised learning to predict a rating by a person with translation quality, but differs from the prior art system in the following two aspects.

（１）分類のパープレキシティの還元
マルチクラスの分類タスクの分解を、２値分類の組に還元する。これは学習タスクの複雑さを減じ、この結果、分類精度が高くなる。 (1) Reduction of classification perplexity The decomposition of multi-class classification tasks is reduced to a set of binary classifications. This reduces the complexity of the learning task, resulting in higher classification accuracy.

（２）特徴量の組
２値分類器は多数の自動評価指標（テーブル２を参照）の結果でトレーニングされ、このため、指標の各々が扱う翻訳品質の種々の局面を考慮することになる。この方法は、特定のＭＴシステム又は目的言語に依存しない。これは基準翻訳が利用可能であるかぎり、どのような翻訳又は目的言語に対しても、修正なしで適用することができる。 (2) Set of feature quantities The binary classifier is trained on the results of a number of automatic evaluation indices (see Table 2), and therefore considers various aspects of translation quality handled by each of the indices. This method is independent of the specific MT system or target language. This can be applied to any translation or target language without modification, as long as a reference translation is available.

この実施の形態に従った予測方法は、３段階に分割される。すなわち、（１）人と機械によって評価されるＭＴシステム出力のデータベースから抽出された特徴量の組によって、２値分類器がトレーニングされる学習段階、（２）開発セット上での、再組合せステップの分類精度を最大にする２値分類器の最適な組を選択する分解段階、（３）２値分類器を見たことのない文に適用し、２値分類器の結果を、最適化された符号化マトリクスを用いて組合せて人によるスコアを予測する、適用ステップ。 The prediction method according to this embodiment is divided into three stages. (1) A learning stage in which a binary classifier is trained by a set of feature values extracted from a database of MT system output evaluated by a person and a machine. (2) A recombination step on the development set. A decomposition stage that selects an optimal set of binary classifiers that maximizes the classification accuracy of (3), applying the binary classifier to a sentence that has never seen the binary classifier, and optimizing the result of the binary classifier An application step of predicting a human score in combination using the encoded matrices.

−学習段階
マルチクラス及び２値分類の問題に対する判断モデルが、標準学習アルゴリズムを用いて獲得される。提案に係る方法は、特定の分類学習方法に限定されるものではない。以下で説明する実験のためには、決定木（非特許文献４）の標準的な実現例を利用した。 -Learning stage Decision models for multi-class and binary classification problems are obtained using standard learning algorithms. The method according to the proposal is not limited to a specific classification learning method. For the experiment described below, a standard implementation example of a decision tree (Non-Patent Document 4) was used.

特徴量の組は、テーブル２に列挙した７種の自動評価指標のスコアから成る。全ての自動評価指標は英語ＭＴ出力からなる入力データセットに適用され、その翻訳品質が、テーブル１で紹介された指標を用いて人によってマニュアルで評定される。指標のスコアに加えて、指標の内部特徴量、例えば、ｎグラム精度スコア、基準とＭＴ出力との長さ比率等もまた利用され、この結果、５４個のトレーニング用特徴量が得られる。 The set of feature amounts is composed of scores of seven types of automatic evaluation indexes listed in Table 2. All automatic evaluation indices are applied to the input data set consisting of English MT output, and the translation quality is manually evaluated by the person using the indices introduced in Table 1. In addition to the index score, an internal feature quantity of the index, for example, an n-gram accuracy score, a length ratio between the reference and the MT output, and the like are also used, resulting in 54 training feature quantities.

−分解段階
マルチクラス問題をいくつかの２値分類問題に分解するには多くの方法がある。最もよく知られた方策は、１値対他（ｏｎｅ−ａｇａｉｎｓｔ−ａｌｌ）、及び全ペア（ａｌｌ−ｐａｉｒｓ）の方法である。１値対他の方策では、あるクラスの分類器をトレーニングするために、そのクラスに属するトレーニング例全てが肯定的な例として用いられ、それ以外は否定的な例とされる。全ペアの方策では、分類器はクラスの対ごとにトレーニングされ、問題とされるクラスのいずれにも属さないトレーニング例は全て無視される。 -Decomposition stage There are many ways to decompose a multi-class problem into several binary classification problems. The best known strategies are one-against-all and all-pairs methods. In one-value vs. other strategies, to train a class of classifiers, all training examples belonging to that class are used as positive examples, otherwise they are negative examples. In the all-pair strategy, the classifier is trained for each class pair, and any training examples that do not belong to any of the classes in question are ignored.

このようなマルチクラス問題の分解は、マルチクラス問題のクラスｃが２値分類器ｂの行と関連付けられるような符号化マトリクスＭで表される。もしｋをクラスの数とし、ｌを２値分類問題の数とすれば、この符号化マトリクスは以下のように定義される。 Such decomposition of the multiclass problem is represented by an encoding matrix M in which the class c of the multiclass problem is associated with the row of the binary classifier b. If k is the number of classes and l is the number of binary classification problems, this encoding matrix is defined as follows:

ここでｋはクラスの数であり、ｌは２値分類問題の数である。クラスｃに属するトレーニング例が２値分類器ｂに対し肯定的な例であると考えられる場合、ｍ_ｃ，ｂ＝＋１である。同様に、もしｍ_ｃ，ｂ＝−１であれば、クラスｃのトレーニング例はｂのトレーニングに対し否定的な例として用いられる。ｍ_ｃ，ｂ＝０は、それぞれのトレーニング例が分類器ｂのトレーニングには用いられないことを示す。

Here, k is the number of classes, and l is the number of binary classification problems. If a training example belonging to class c is considered a positive example for binary classifier b, then m _{c, b} = + 1. Similarly, if m _{c, b} = −1, the class c training example is used as a negative example for b training. m _{c, b} = 0 indicates that each training example is not used for training of the classifier b.

この実施の形態は、１値対他及び全ペアの２値分類器を利用する。加えて、トレーニングセット全体で、境界分類器をトレーニングする。この場合、問題となっているクラスより以上のクラスに分類された全てのトレーニング例は肯定的な例として用いられ、他の全てのトレーニング例は否定的な例とされる。テーブル３は背景技術の説明の欄で紹介された人による評定の問題を分解するのに利用される１７個の２値分類問題を列挙する。 This embodiment utilizes a binary classifier of one value vs. other and all pairs. In addition, train boundary classifiers throughout the training set. In this case, all training examples classified into classes higher than the class in question are used as positive examples, and all other training examples are negative examples. Table 3 lists the 17 binary classification problems that are used to resolve the rating problem by a person introduced in the background section.

それぞれのタスクについて最適な符号化マトリクスを特定するために、２値分類器をまず開発セットに対する分類精度に従って順序付けする。第２のステップとして、マルチクラスの性能を繰返し評価し、繰返しのたびに、性能が最悪の２値分類器を符号化マトリクスから除外する。最後に、マルチクラスタスクについて最良の分類精度を達成する符号化マトリクスを用いてテストセットを評価する。最適化された符号化マトリクスは、標準バイアス−分散トレードオフを反映し、区別能力と２値分類器の組合せの信頼性とのバランスをとる。

In order to identify the optimal encoding matrix for each task, the binary classifier is first ordered according to the classification accuracy for the development set. As a second step, the multi-class performance is evaluated repeatedly, and the binary classifier with the worst performance is excluded from the encoding matrix for each iteration. Finally, the test set is evaluated using an encoding matrix that achieves the best classification accuracy for the multiclass task. The optimized coding matrix reflects the standard bias-dispersion tradeoff and balances the discriminating ability and the reliability of the binary classifier combination.

−適用段階
入力例が与えられると、全ての２値分類器が符号化マトリクスの各列について一度ずつ適用され、この結果、ｌ個の２値分類結果のベクトルｖが得られる。マルチクラスのラベルは、符号化マトリクスＭの対応の行ｒが「最も近い」ことを表す、ラベルｃと予測される。 Application stage Given an input example, all binary classifiers are applied once for each column of the coding matrix, resulting in a vector v of l binary classification results. The multi-class label is predicted as label c, which represents that the corresponding row r of the encoding matrix M is “closest”.

非特許文献２では、ｒとｖとの距離を（ａ）対応するベクトル間で異なる位置の数を計数する一般化されたハミング距離と、（ｂ）２値分類器スコアの大きさを考慮した、損失による復号化（デコーディング）とによって計算する。どちらも有効であり、この実施の形態ではハミング距離の方策を用いる。 In Non-Patent Document 2, the distance between r and v is considered (a) a generalized Hamming distance that counts the number of different positions between corresponding vectors, and (b) the size of the binary classifier score. And by decoding with loss (decoding). Both are effective, and this embodiment uses a Hamming distance strategy.

受容可能性を予測するための構造
図１は、翻訳品質を予測する上述の方法を実現する、コーパス４０を利用したこの実施の形態の翻訳品質予測システム３０の全体構造を示す。図１を参照して、システム３０は、機械翻訳７２によってソーステキスト７０から翻訳された翻訳済テキスト７４の、人による評定を予測し、その評価８６を出力するためのアプリケーションユニット３２と、コーパス４０を利用して、アプリケーションユニット３２で用いられる符号化マトリクス５６と２値分類器５４との組を生成するための分類器準備ユニット３４とを含む。コーパス４０は多数の学習用翻訳セットを含む。学習用翻訳セットの各々は、ソーステキスト、その機械翻訳テキスト、及び特徴量パラメータの組を含む。コーパス４０、２値分類器５４及び符号化マトリクス５６の詳細は後述する。 Structure for Predicting Acceptability FIG. 1 shows the overall structure of a translation quality prediction system 30 of this embodiment using a corpus 40 that implements the above-described method for predicting translation quality. Referring to FIG. 1, system 30 predicts human ratings of translated text 74 translated from source text 70 by machine translation 72, and application unit 32 for outputting its rating 86, corpus 40 And a classifier preparation unit 34 for generating a set of an encoding matrix 56 and a binary classifier 54 used in the application unit 32. The corpus 40 includes a number of learning translation sets. Each learning translation set includes a set of source text, its machine translation text, and feature parameter. Details of the corpus 40, the binary classifier 54, and the encoding matrix 56 will be described later.

この実施の形態のアプリケーションユニット３２は翻訳済テキスト７４の受容可能性の人によるスコアを、テーブル１に示す５等級（５、４、３、２、１）で予測する。以下の説明から明らかになるように、図１に示したのと同様の構造で、流暢さ又は充分性について予測するシステムを実現することができる。 The application unit 32 of this embodiment predicts the human acceptability score of the translated text 74 with the 5 grades shown in Table 1 (5, 4, 3, 2, 1). As will become apparent from the following description, a system that predicts fluency or sufficiency can be implemented with a structure similar to that shown in FIG.

分類器準備ユニット３４は、２値分類器４６をトレーニングするための学習モジュール４４と、開発セット５２の予測精度が最高になるように、２値分類器４６を最適化された２値分類器５４へと最適化させる最適化モジュール５０と、を含む。 The classifier preparation unit 34 includes a learning module 44 for training the binary classifier 46 and a binary classifier 54 that optimizes the binary classifier 46 so that the prediction accuracy of the development set 52 is maximized. And an optimization module 50 for optimizing.

分解段階４２の結果として、符号化マトリクス４８が準備される。後述するように、符号化マトリクス４８の各行は３値ベクトルを形成する。２値分類器４６を最適化するにあたって、最適化モジュール５０は２値分類器４６のいくつかを削除し、これに従って符号化マトリクス４８を最適化された符号化マトリクス５６に最適化する。 As a result of the decomposition stage 42, an encoding matrix 48 is prepared. As will be described later, each row of the encoding matrix 48 forms a ternary vector. In optimizing the binary classifier 46, the optimization module 50 deletes some of the binary classifiers 46 and optimizes the encoding matrix 48 to the optimized encoding matrix 56 accordingly.

アプリケーションユニット３２は、翻訳済テキスト７４から所定の特徴量の組７８を抽出するための特徴量抽出モジュール７６と、特徴量の組７８の所定部分を受けるように接続され２値決定値８０を出力するための２値分類器５４と、を含む。２値決定値８０は２値ベクトル８２を形成する。 The application unit 32 is connected to receive a predetermined part of the feature quantity set 78 and a feature quantity extraction module 76 for extracting a predetermined feature quantity set 78 from the translated text 74, and outputs a binary decision value 80. A binary classifier 54. The binary decision value 80 forms a binary vector 82.

アプリケーションユニット３２はさらに、２値ベクトル８２を符号化マトリクス５６の各行と比較して、ハミング距離でどの行が２値ベクトル８２に最も近いかを判断する比較モジュール８４を含む。最も近い行に対応するクラスが、翻訳済テキスト７４のマルチクラス評価８６として選択される。 The application unit 32 further includes a comparison module 84 that compares the binary vector 82 with each row of the encoding matrix 56 to determine which row is closest to the binary vector 82 at the Hamming distance. The class corresponding to the closest line is selected as the multi-class evaluation 86 of the translated text 74.

図２はコーパス４０の詳細を示す。図２を参照して、コーパス４０は多数の学習用翻訳セット１００を含み、その各々はソーステキスト１１０、ソーステキスト１１０の機械翻訳１１２、及び特徴量の組１１４を含む。 FIG. 2 shows details of the corpus 40. Referring to FIG. 2, the corpus 40 includes a number of learning translation sets 100, each of which includes a source text 110, a machine translation 112 of the source text 110, and a feature set 114.

特徴量の組１１４は、人によるスコア（スコア１から５のいずれか）１２０、自動評価スコア１２２の組、及び内部指標特徴量１２４を含む。 The feature amount set 114 includes a human score (one of scores 1 to 5) 120, a set of automatic evaluation scores 122, and an internal index feature amount 124.

自動評価スコア１２２は複数個の自動評価スコア１３０、１３２、…１３４を含む。この実施の形態で用いられる自動評価スコアは、テーブル２に示す７種の指標を含む。 The automatic evaluation score 122 includes a plurality of automatic evaluation scores 130, 132,. The automatic evaluation score used in this embodiment includes seven types of indices shown in Table 2.

内部指標特徴量１２４は、ｎグラム精度スコア１５０、基準とＭＴ出力との長さ比率等を含む。翻訳の品質を示すものとして知られる特徴量はいずれも、内部指標特徴量１２４の要素として用いることができる。 The internal index feature value 124 includes an n-gram accuracy score 150, a length ratio between the reference and the MT output, and the like. Any feature quantity known to indicate the quality of translation can be used as an element of the internal index feature quantity 124.

図３は学習用翻訳セット１００がどのように準備されるかを示す。図３を参照して、ソーステキスト１１０が何らかのソースから収集される。ソーステキスト１１０を何らかの種類の翻訳機械に供給することによって、機械翻訳１１２が得られる。 FIG. 3 shows how the learning translation set 100 is prepared. Referring to FIG. 3, source text 110 is collected from some source. Machine translation 112 is obtained by supplying source text 110 to some type of translation machine.

人によるスコア１２０は、機械翻訳１１２の人による評定１７０によって準備される。もし複数の人間により人による評定を提示すべきであれば、それらのスコアの平均を人によるスコア１２０として用いる。 The human score 120 is prepared by a human rating 170 of the machine translation 112. If human ratings are to be presented by multiple people, the average of those scores is used as the human score 120.

自動評価スコア１２２及び内部指標特徴量１２４は、それぞれの自動評価システム１７２、１７４、…、１８０によって準備される。 The automatic evaluation score 122 and the internal index feature value 124 are prepared by the respective automatic evaluation systems 172, 174,.

図４はこの実施の形態で用いられる２値分類器の分解の機構を示す。図４を参照して、２値分類器４６は３種類に分けられる。上述した、１値対他、全ペア、及び境界である。 FIG. 4 shows the decomposition mechanism of the binary classifier used in this embodiment. Referring to FIG. 4, the binary classifier 46 is classified into three types. One-value pairs, all other pairs, and boundaries as described above.

１値対他の分類器は、５対他、４対他、３対他、２対他及び１対他、を含む。 One-value vs. other classifiers include 5 vs. Other, 4 vs. Other, 3 vs. Other, 2 vs. Other and 1 vs. Other.

１値対他の方策では、クラスの各々に対し、あるクラスの分類器のトレーニングでは、そのクラスに属する全てのトレーニング例はトレーニングでは肯定的な例として用いられその他の例は全て否定的な例とされる。例えば、５対他分類器は、クラス「５」に属する全てのトレーニング例が肯定的な例として用いられ、他の全てが否定的な例として用いられるようなトレーニング例でトレーニングされる。この結果、特徴量の組７８により、翻訳済テキスト７４がクラス「５」に属すると示されるときは、５対他は＋１を出力し、そうでなければ−１を出力する。 For one-value vs. other strategies, for each class, in the training of a class of classifiers, all training examples belonging to that class are used as positive examples in training and all other examples are negative examples It is said. For example, a 5-to-other classifier is trained with a training example in which all training examples belonging to class “5” are used as positive examples and all others are used as negative examples. As a result, when the translated text 74 is indicated to belong to the class “5” by the feature value set 78, 5 vs. others output +1, and otherwise outputs -1.

全ペアの方策では、分類器はクラスの対ごとにトレーニングされるが、このとき、問題となるクラスのいずれにも属さないトレーニング例は全て無視される。この実施の形態の全ペア分類器は、５＿４、５＿３、５＿２、５＿１、４＿３、４＿２、４＿１、３＿２、３＿１及び２＿１分類器を含む。 In the all-pair strategy, the classifier is trained for each pair of classes, but all training examples that do not belong to any of the problematic classes are ignored. The all-pair classifier of this embodiment includes 5_4, 5_3, 5_2, 5_1, 4_3, 4_2, 4_1, 3_2, 3_1, and 2_1 classifiers.

例えば、５＿４分類器のトレーニングでは、クラス「５」に属するすべてのトレーニング例が肯定的な例、クラス「４」に属する例が否定的な例、として用いられ、その他のトレーニング例は無視される。 For example, in 5_4 classifier training, all training examples belonging to class “5” are used as positive examples, examples belonging to class “4” are used as negative examples, and other training examples are ignored. .

境界アプローチでは、分類器はトレーニングセット全体でトレーニングされる。この場合、問題となるクラス以上の良好なクラスの評釈が付けられた全てのトレーニング例は肯定的な例として用いられ、その他のトレーニング例は全て否定的な例とされる。この実施の形態の境界分類器は５４＿３２１及び５４３＿２１分類器を含む。 In the boundary approach, the classifier is trained on the entire training set. In this case, all training examples with a good class comment above the class in question are used as positive examples and all other training examples are negative examples. The boundary classifier of this embodiment includes 54_321 and 543_21 classifiers.

例えば、５４＿３２１分類器のトレーニングでは、クラス「５」及び「４」の全ての例が肯定的な例として用いられ、クラス「３」、「２」又は「１」である他の全ての例が否定的な例として用いられる。 For example, in the 54_321 classifier training, all examples of classes “5” and “4” are used as positive examples, and all other examples of class “3”, “2” or “1” are used. Used as a negative example.

図５（Ａ）は５対他の分類器をトレーニングするための例２０２がどのように生成されるかを示す。図５（Ａ）を参照して、トレーニングデータ生成２００のためにコーパス４０内の全データが用いられる。生成された例２０２は各々２値ラベル２０４（人によるスコアが「５」なら＋１、そうでなければ−１）と、特徴量２０６と、を含む。特徴量２０６は、図３に示すように、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 5A shows how an example 202 for training 5 vs. other classifiers is generated. Referring to FIG. 5A, all data in corpus 40 is used for training data generation 200. The generated examples 202 each include a binary label 204 (+1 if the human score is “5”, −1 otherwise) and a feature amount 206. As shown in FIG. 3, the feature amount 206 includes an automatic evaluation score 122 and an internal index feature amount 124.

図５（Ｂ）は３対他分類器のトレーニングのための例２１２がどのように生成されるかを示す。図５（Ｂ）を参照して、トレーニングデータ生成２１０のために、コーパス４０内の全データを用いる。生成された例２１２は各々、２値ラベル２１４（人によるスコアが「３」なら＋１、それ以外は−１）と、特徴量２１６と、を含む。特徴量２１６は、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 5B shows how an example 212 for training a three-to-other classifier is generated. With reference to FIG. 5B, all data in the corpus 40 is used for the training data generation 210. Each of the generated examples 212 includes a binary label 214 (+1 if the score by a person is “3”, −1 otherwise) and a feature quantity 216. The feature quantity 216 includes an automatic evaluation score 122 and an internal index feature quantity 124.

図６（Ａ）は５＿４分類器のトレーニングのための例２２４がどのように生成されるかを示す。図６（Ａ）を参照して、人によるスコアが「５」又は「４」である例がデータ抽出プロセス２２０によって抽出され、その後、抽出されたデータを、トレーニングデータ生成２２２のために用いる。生成された例２２４は各々２値ラベル２２６（人によるスコアが「５」なら＋１、人によるスコアが「４」なら−１）と、特徴量２２８と、を含む。特徴量２２８は、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 6A shows how an example 224 for training a 5_4 classifier is generated. With reference to FIG. 6A, an example where the human score is “5” or “4” is extracted by the data extraction process 220, and then the extracted data is used for training data generation 222. Each of the generated examples 224 includes a binary label 226 (+1 if the human score is “5”, −1 if the human score is “4”), and a feature quantity 228. The feature quantity 228 includes an automatic evaluation score 122 and an internal index feature quantity 124.

図６（Ｂ）は５＿２分類器のトレーニングのための例２３４がどのように生成されるかを示す。図６（Ｂ）を参照して、人によるスコアが「５」又は「２」である例がデータ抽出プロセス２３０によって抽出される。抽出されたデータを、トレーニングデータ生成２３２のために用いる。生成された例２３２は各々２値ラベル２３６（人によるスコアが「５」なら＋１、人によるスコアが「２」なら−１）と、特徴量２３８と、を含む。特徴量２３８は、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 6B shows how an example 234 for training a 5_2 classifier is generated. With reference to FIG. 6B, an example in which a human score is “5” or “2” is extracted by the data extraction process 230. The extracted data is used for training data generation 232. Each of the generated examples 232 includes a binary label 236 (+1 if the human score is “5”, −1 if the human score is “2”), and a feature amount 238. The feature quantity 238 includes an automatic evaluation score 122 and an internal index feature quantity 124.

図７（Ａ）は３＿２分類器のトレーニングのための例２４４がどのように生成されるかを示す。図７（Ａ）を参照して、人によるスコアが「３」又は「２」である例がデータ抽出プロセス２４０によって抽出され、その後、抽出されたデータを、トレーニングデータ生成２４２のために用いる。生成された例２４４は各々２値ラベル２４６（人によるスコアが「３」なら＋１、人によるスコアが「２」なら−１）と、特徴量２４８と、を含む。特徴量２４８は、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 7A shows how an example 244 for training a 3_2 classifier is generated. With reference to FIG. 7A, an example where the human score is “3” or “2” is extracted by the data extraction process 240, and then the extracted data is used for training data generation 242. Each of the generated examples 244 includes a binary label 246 (+1 if the human score is “3”, −1 if the human score is “2”), and a feature amount 248. The feature quantity 248 includes an automatic evaluation score 122 and an internal index feature quantity 124.

図７（Ｂ）は３＿１分類器のトレーニングのための例２５４がどのように生成されるかを示す。図７（Ｂ）を参照して、人によるスコアが「３」又は「１」である例がデータ抽出プロセス２５０によって抽出される。抽出されたデータを、トレーニングデータ生成２５２のために用いる。生成された例２５４は各々２値ラベル２５６（人によるスコアが「３」なら＋１、人によるスコアが「１」なら−１）と、特徴量２５８と、を含む。特徴量２５８は、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 7B shows how an example 254 for training the 3_1 classifier is generated. With reference to FIG. 7B, an example in which a human score is “3” or “1” is extracted by the data extraction process 250. The extracted data is used for training data generation 252. Each of the generated examples 254 includes a binary label 256 (+1 if the human score is “3”, −1 if the human score is “1”), and a feature amount 258. The feature amount 258 includes an automatic evaluation score 122 and an internal index feature amount 124.

図８（Ａ）は５４＿３２１境界分類器のトレーニングのための例２６２がどのように生成されるかを示す。図８（Ａ）を参照して、トレーニングデータ生成２６０のため、コーパス４０内の全ての例を用いる。生成された例２６２は各々２値ラベル２６４（人によるスコアが「５」又は「４」なら＋１、それ以外は−１）と、特徴量２６６と、を含む。特徴量２６６は、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 8A shows how an example 262 for training the 54_321 boundary classifier is generated. With reference to FIG. 8A, all examples in the corpus 40 are used for training data generation 260. Each of the generated examples 262 includes a binary label 264 (+1 if the human score is “5” or “4”, −1 otherwise), and a feature quantity 266. The feature quantity 266 includes an automatic evaluation score 122 and an internal index feature quantity 124.

図８（Ｂ）は５４３＿２１境界分類器のトレーニングのための例２７２がどのように生成されるかを示す。図８（Ｂ）を参照して、トレーニングデータ生成２７０のため、コーパス４０内の全ての例を用いる。生成された例２７２は各々２値ラベル２７４（人によるスコアが「５」、「４」又は「３」なら＋１、それ以外は−１）と、特徴量２７６と、を含む。特徴量２７６は、自動評価スコア１２２と、内部指標特徴量１２４とを含む。 FIG. 8B shows how an example 272 for training the 543_21 boundary classifier is generated. With reference to FIG. 8B, all examples in the corpus 40 are used for training data generation 270. Each of the generated examples 272 includes a binary label 274 (+1 if the human score is “5”, “4” or “3”, −1 otherwise), and a feature amount 276. The feature quantity 276 includes an automatic evaluation score 122 and an internal index feature quantity 124.

図９はこの実施の形態で用いられる符号化マトリクス４８を示す。図９を参照して、クラス(クラス「１」から「５」）が左端の列に配置され、２値分類器は最上部の行に配列される。左端の２番目から６番目の列（「５」から「１」）はそれぞれ、５、４、３、２及び１対他の分類器をそれぞれ示す。 FIG. 9 shows an encoding matrix 48 used in this embodiment. Referring to FIG. 9, classes (classes “1” to “5”) are arranged in the leftmost column, and the binary classifier is arranged in the top row. The second to sixth columns ("5" to "1") at the left end indicate 5, 4, 3, 2, and 1 to other classifiers, respectively.

図９からわかるように、符号化マトリクス４８の各行は３値ベクトルを形成する。これらベクトルの要素は＋１、−１又は０である。もしある特定のクラスに属するトレーニング例が２値分類器の肯定的な例と考えられるならば、そのクラスとその分類器との組合せに対応する要素が「＋１」で示される。同様に、もし要素が「−１」なら、そのクラスのトレーニング例は、その分類器のトレーニングのためには否定的な例として用いられている。「０」の要素は、それぞれのトレーニング例が対応する分類器のトレーニングには用いられていないことを示す。 As can be seen from FIG. 9, each row of the encoding matrix 48 forms a ternary vector. The elements of these vectors are +1, -1 or 0. If a training example belonging to a particular class is considered a positive example of a binary classifier, the element corresponding to the combination of that class and that classifier is indicated by “+1”. Similarly, if the element is “−1”, the training example for that class is used as a negative example for training the classifier. The element “0” indicates that each training example is not used for training the corresponding classifier.

図１０はコンピュータプログラムの形で実現された最適化モジュール５０のフローである。図１からわかるように、分解段階４２の結果として、２値分類器４６と符号化マトリクス４８とが準備される。なお、以下のプロセスは、図１に示す開発セット５２に対して行われる。 FIG. 10 is a flow of the optimization module 50 realized in the form of a computer program. As can be seen from FIG. 1, as a result of the decomposition stage 42, a binary classifier 46 and an encoding matrix 48 are prepared. The following process is performed on the development set 52 shown in FIG.

図１０を参照して、プログラムは２値分類器４６の各々の精度を計算するステップ２９０と、ステップ２９０に続いて、２値分類器４６をそれらの分類精度に従って順序付けするステップ２９２と、を含む。 Referring to FIG. 10, the program includes a step 290 for calculating the accuracy of each of the binary classifiers 46, and a step 292 following step 290 for ordering the binary classifiers 46 according to their classification accuracy. .

プログラムはさらに、残りの２値分類器がアプリケーションユニット３２で２値分類器５４として用いられる場合に、マルチクラス精度を評価するステップ２９４と、これに続いて、２値分類器の中から最も性能の悪い２値分類器を除外するステップ２９６と、最も性能の悪い２値分類器を除外した「後」の符号化マトリクスの精度が除外「前」の符号化マトリクスのそれよりも低いか否かを判断し、その結果に応じて分岐するステップ２９８と、を含む。もしステップ２９８の結果がＮＯであれば、制御はステップ２９４に戻り、そうでなければ制御は次のステップに進む。 The program further includes a step 294 of evaluating multi-class accuracy when the remaining binary classifier is used as the binary classifier 54 in the application unit 32, followed by the best performance among the binary classifiers. The step 296 of excluding the poor classifier binary classifier and whether the accuracy of the “after” encoding matrix excluding the worst performing binary classifier is lower than that of the excluding “previous” encoding matrix , And branching according to the result. If the result of step 298 is NO, control returns to step 294, otherwise control proceeds to the next step.

プログラムはさらに、ステップ２９８での判断がＹＥＳであった場合に行われる、繰返しステップ２９４から２９８の間に現れた組の中で最も性能のよい２値分類器の組を選択するステップ３００と、ステップ３００で選択された２値分類器の組に従って符号化マトリクスを再構成するステップ３０２と、を含む。 The program further includes a step 300 of selecting the best performing set of binary classifiers among the sets appearing between iteration steps 294 to 298, which is performed if the determination in step 298 is YES; Reconstructing an encoding matrix according to the set of binary classifiers selected in step 300.

図１１は上述の最適化プロセスの結果を例示する。図１１を参照して、２値分類器４６は横軸の底部に順序付けられ、最も性能の悪い組（全部）が最も左に位置する。横軸上の分類器の名称は、繰返しの間に除外されたものを示す。繰返されたステップ２９４で評価されたマルチクラス精度を縦軸に示す。 FIG. 11 illustrates the result of the optimization process described above. Referring to FIG. 11, binary classifiers 46 are ordered at the bottom of the horizontal axis, and the worst-performing group (all) is located on the leftmost side. The name of the classifier on the horizontal axis indicates what was excluded during the iteration. The vertical axis indicates the multi-class accuracy evaluated in the repeated step 294.

例えば、全ての２値分類器４６を用いると、精度は約５６％である。ステップ２９２で最も悪い性能を示した３＿２、２＿１、４＿１、４＿３及び４＿２をこの順で除外すると、精度は約５６％にとどまる。しかし、５＿４を除外すると、精度は６２％まで上昇し、さらに分類器を除外しても結果は向上しない。従って、この例では、１対他、５４３＿２１、３＿１、５４＿３２１、５対他、４対他、５＿２、５＿１、５＿３、２対他及び３対他の分類器が最適な分類器の組であると結論づけることができる。図１１では、３対他の分類器は分類器の組から除外されていないため、図には表れない。 For example, using all binary classifiers 46, the accuracy is about 56%. If 3_2, 2_1, 4_1, 4_3 and 4_2, which showed the worst performance in step 292, are excluded in this order, the accuracy will be only about 56%. However, when 5_4 is excluded, the accuracy increases to 62%, and even if the classifier is further excluded, the result is not improved. Therefore, in this example, 1 to other, 543_21, 3_1, 54_321, 5 to other, 4 to other, 5_2, 5_1, 5_3, 2 to other and 3 to other classifiers are the best classifier sets. You can conclude. In FIG. 11, 3 vs. other classifiers are not excluded from the set of classifiers and therefore do not appear in the figure.

最適化された２値分類器５４の詳細を図１２に示す。図１２を参照して、２値分類器５４は、５、４、３、２、１値対他それぞれの分類器３２０、３２２、３２４、３２６及び３２８と、５＿３、５＿２、５＿１、及び３＿１対分類器３４２、３４４、３４６及び３８２と、５４＿３２１及び５４３＿２１境界分類器４００及び４０２と、を含む。他の分類器３４０、３６０、３６２、３６４、３８０及び３９０は、この例では用いられない。 Details of the optimized binary classifier 54 are shown in FIG. Referring to FIG. 12, the binary classifier 54 includes 5, 4, 3, 2, 1 value pairs, etc., and respective classifiers 320, 322, 324, 326, and 328, and 5_3, 5_2, 5_1, and 3_1 pairs. Classifiers 342, 344, 346, and 382, and 54_321 and 543_21 boundary classifiers 400 and 402 are included. The other classifiers 340, 360, 362, 364, 380 and 390 are not used in this example.

分類器の精度はコーパス４０と分解段階４２とに依存することに注目されたい。同様に、全体的なマルチクラスの精度もコーパス４０と分解段階４２とに依存する。従って、コーパス又は分解段階が異なれば、最適化された２値分類器５４の構成は図１２に示したものとは異なるであろう。 Note that the accuracy of the classifier depends on the corpus 40 and the decomposition stage 42. Similarly, the overall multi-class accuracy also depends on the corpus 40 and the decomposition stage 42. Therefore, the configuration of the optimized binary classifier 54 will differ from that shown in FIG. 12 for different corpus or decomposition stages.

最適化された符号化マトリクス５６の構成を図１３に示す。図１３からわかるように、符号化マトリクス５６では、５＿４、４＿３、４＿２、４＿１、３＿２、及び２＿１分類器は除外され、５、４、３、２及び１対他の分類器と、５＿３、５＿２、５＿１、３＿１、５４＿３２１及び５４３＿２１分類器が残っている。列の数が少ないため、２値ベクトル８２での計算コストが削減される。 The configuration of the optimized encoding matrix 56 is shown in FIG. As can be seen from FIG. 13, in the encoding matrix 56, the 5_4, 4_3, 4_2, 4_1, 3_2, and 2_1 classifiers are excluded, and the 5, 4, 3, 2, and 1 to other classifiers and 5_3, 5_2. The 5_1, 3_1, 54_321, and 543_21 classifiers remain. Since the number of columns is small, the calculation cost for the binary vector 82 is reduced.

図１４は図１に示す特徴量抽出モジュール７６で行われる特徴量抽出のスキームを示す。図１４を参照して、ソーステキストと翻訳済テキスト７４とが与えられると、自動評価器４１０はそれぞれ翻訳済テキスト７４の品質を評価し、内部指標特徴量のための特徴量抽出部４１２がそれぞれの特徴量を計算する。特徴量の組７８は、自動評価器４１０の出力それぞれが記憶される、自動スコアのためのフィールド４３０、４３２、…４３４と、特徴量抽出部４１２のそれぞれの出力が記憶される内部指標特徴量のためのフィールド４４０、４４２、…とを含む。この特徴量の組７８は、図１に示す２値分類器５４に与えられる。 FIG. 14 shows a scheme of feature quantity extraction performed by the feature quantity extraction module 76 shown in FIG. Referring to FIG. 14, when the source text and the translated text 74 are given, the automatic evaluator 410 evaluates the quality of the translated text 74, and the feature quantity extracting unit 412 for the internal index feature quantity respectively. The feature amount of is calculated. A set 78 of feature amounts is an internal index feature amount in which outputs of each of the features 430, 432,... Field 440, 442,. The feature quantity set 78 is given to the binary classifier 54 shown in FIG.

動作
この実施の形態のシステム３０は以下のように動作する。始めに、２値分類器４６が学習モジュール４４によってトレーニングされる。マルチクラス分類を２値分類器４６に分解するにあたって、符号化マトリクス４８もまた準備される。 Operation The system 30 of this embodiment operates as follows. Initially, the binary classifier 46 is trained by the learning module 44. In decomposing the multi-class classification into a binary classifier 46, an encoding matrix 48 is also provided.

最適化モジュール５０が以下のように２値分類器４６を最適化する。始めに、開発セット５２での２値分類器４６の精度が計算され、２値分類器４６はそれらの精度に従って順序づけられる。 The optimization module 50 optimizes the binary classifier 46 as follows. First, the accuracy of the binary classifier 46 in the development set 52 is calculated and the binary classifier 46 is ordered according to their accuracy.

完全な符号化マトリクス（ＡＬＬ）から始めて、マルチクラス評価が行われ、次の世代では最も性能の悪い分類が除外される。このプロセスを繰返す。図１１の破線の四角３１０は、例示の開発セット評価で利用された符号化マトリクスのために選択された２値分類器のサブセットを示す。このサブセットは図１で２値分類器５４として示したもので、アプリケーションユニット３２に組入れられている。同時に、最適化モジュール５０は最適化された２値分類器５４に従って、符号化マトリクス４８を最適化する。最適化された符号化マトリクス５６はアプリケーションユニット３２の比較モジュール８４によって用いられる。 Starting with a complete coding matrix (ALL), a multiclass evaluation is performed and the worst performing classification is excluded in the next generation. Repeat this process. The dashed box 310 in FIG. 11 shows the subset of binary classifiers selected for the coding matrix utilized in the example development set evaluation. This subset is shown in FIG. 1 as a binary classifier 54 and is incorporated in the application unit 32. At the same time, the optimization module 50 optimizes the encoding matrix 48 according to the optimized binary classifier 54. The optimized encoding matrix 56 is used by the comparison module 84 of the application unit 32.

アプリケーション段階では、ソーステキスト７０と、ソーステキスト７０の機械翻訳７２による翻訳である翻訳済テキスト７４とが特徴量抽出モジュール７６に与えられる。特徴量抽出モジュール７６はソーステキスト７０及び翻訳済テキスト７４から特徴量の組を計算するか又は抽出して、結果として得られる特徴量の組７８を２値分類器５４に供給する。 At the application stage, the source text 70 and the translated text 74 which is a translation of the source text 70 by the machine translation 72 are given to the feature quantity extraction module 76. The feature extraction module 76 calculates or extracts a feature set from the source text 70 and the translated text 74 and supplies the resulting feature set 78 to the binary classifier 54.

２値分類器４６の最適化されたバージョンである２値分類器５４は２値判断８０を出力し、これが２値ベクトル８２を形成する。 A binary classifier 54, which is an optimized version of the binary classifier 46, outputs a binary decision 80, which forms a binary vector 82.

比較モジュール８４は符号化マトリクス５６の行から成る３値ベクトルの各々と２値ベクトルとを比較し、ハミング距離でどの３値ベクトル（行）が２値ベクトル８２に最も近いかを判断する。比較モジュール８４は最も近い行に対応するクラス識別子を、評価８６として出力する。結果として得られる出力は翻訳済テキスト７４の推定された、又は予測された等級を示す。 The comparison module 84 compares each of the ternary vectors comprising the rows of the encoding matrix 56 with the binary vector and determines which ternary vector (row) is closest to the binary vector 82 at the Hamming distance. The comparison module 84 outputs the class identifier corresponding to the closest row as an evaluation 86. The resulting output indicates the estimated or predicted grade of the translated text 74.

流暢さ及び充分性
この発明の上述の実施の形態は、流暢さ又は充分性といった、人による他の評定にも適用可能である。図１５は、流暢さ（図１５（Ａ））及び充分性（図１５（Ｂ））の実験における最適化プロセスの間のシステム性能（マルチクラス精度）の例を示す。 Fluency and Sufficiency The above-described embodiments of the invention are applicable to other human ratings such as fluency or sufficiency. FIG. 15 shows an example of system performance (multi-class accuracy) during the optimization process in experiments with fluency (FIG. 15A) and sufficiency (FIG. 15B).

図１５は流暢さ（図１５（Ａ））と充分性（図１５（Ｂ））について例示の開発セット５２を用いた２値分類組合せを繰返し評価したもののまとめである。完全な符号化マトリクス（全部）から始めて、次の繰返しでは最も性能の悪い２値分類が除外される。破線の四角４５０及び４５２はそれぞれ流暢さと充分性とのテストセットの比較に利用された符号化マトリクスのために選択された、２値分類器サブセットを示す。 FIG. 15 is a summary of repeated evaluations of binary classification combinations using the exemplary development set 52 for fluency (FIG. 15A) and sufficiency (FIG. 15B). Starting with a complete coding matrix (all), the next iteration will eliminate the worst performing binary classification. Dashed squares 450 and 452 represent the binary classifier subsets selected for the encoding matrix utilized in comparing the fluency and sufficiency test sets, respectively.

評価
この実施の形態の評価は出願人によりコーパス４０として集積された基本旅行表現コーパス（ＢａｓｉｃＴｒａｖｅｌＥｘｐｒｅｓｓｉｏｎＣｏｒｐｕｓ：ＢＴＥＣ））を用いて行われた。ＢＴＥＣは、外国への旅行者向け慣用句集でよく見られるものと同様の旅行関連の文を含む。合計で３，５２４個の日本語の入力文が様々な種類のＭＴシステムで翻訳され、８２，４０６個の英訳文が生成された。５４，３０２個の翻訳文に、受容性についての人によるスコアの注釈が付けられ、３６，３０２個の翻訳文に、充分性／流暢さについての人によるスコアの注釈が付けられた。所与の翻訳に対する人によるスコアの分布を図１６にまとめた。単一の翻訳出力に対し多数の人による判断がなされた場合には、この実験では、それぞれの人によるスコアの中央値を用いた。 Evaluation The evaluation of this embodiment was performed using the basic travel expression corpus (Basic Travel Expression Corpus (BTEC)) integrated as the corpus 40 by the applicant. The BTEC includes travel-related sentences similar to those commonly found in foreign travel idioms. In total, 3,524 Japanese input sentences were translated by various types of MT systems, and 82,406 English translations were generated. 54,302 translations were annotated with human score for acceptability and 36,302 translations were annotated with human score for sufficiency / fluency. The distribution of human scores for a given translation is summarized in FIG. If a single translation output was judged by many people, the median score for each person was used in this experiment.

注釈を付されたコーパスは、３つのデータセットに分割された。（１）充分性／流暢さに関する２５，９８８個の翻訳と受容可能性に関する４９，５１６個のＭＴ出力とから成るトレーニングセット、（２）３つの指標全部に対する、２，０２４個の文（５０６個の入力文の各々に対し４個のＭＴ出力）から成る開発セット、（３）ＩＷＳＬＴ評価キャンペーン（ＣＳＴＡＲ０３データセット、５０６入力文。「ＣＳＴＡＲ」はＣｏｎｓｏｒｔｉｕｍｆｏｒＳｐｅｅｃｈＴｒａｎｓｌａｔｉｏｎＡｄｖａｎｃｅｄＲｅｓｅａｒｃｈ：音声翻訳先端研究コンソーシアム、の略である。）から採られたテストセット、である。流暢さと充分性とについては、７，５９０個のテスト文と各々に対する１５個のＭＴ出力とが利用可能であった。受容可能性については、３，０３６個の文と各々に対する６個のＭＴ出力とが評価に用いられた。 The annotated corpus was divided into three data sets. (1) Training set consisting of 25,988 translations for sufficiency / fluency and 49,516 MT outputs for acceptability, (2) 2,024 sentences (506 for all three indicators) (3) IWSLT evaluation campaign (CSTAR03 data set, 506 input sentence. “CSTAR” is Consortium for Speech Translation Research: Speech Translation Advanced Research Consortium) Is a test set taken from. For fluency and sufficiency, 7,590 test sentences and 15 MT outputs for each were available. For acceptability, 3,036 sentences and 6 MT outputs for each were used in the evaluation.

−符号化マトリクスの最適化
符号化マトリクスは、まず分解段階で作成され、その後、２値分類器の最適化を反映して最適化モジュール５０によって最適化される。 -Optimization of the encoding matrix The encoding matrix is first created in the decomposition stage and then optimized by the optimization module 50 reflecting the optimization of the binary classifier.

−分類精度
マルチクラス分類タスクのベースラインは、トレーニング用データセットで最も頻繁に起こるクラスとして定義される。テーブル４は、３個の主観的評価指標全てについてのベースライン性能をまとめたものである。 Classification accuracy The baseline of a multi-class classification task is defined as the most frequently occurring class in the training data set. Table 4 summarizes the baseline performance for all three subjective metrics.

マルチクラスタスクの分類精度、すなわちトレーニングセットから直接学習したマルチクラス分類器と、２値分類器の性能とを図１７にまとめる。

FIG. 17 summarizes the classification accuracy of the multiclass task, that is, the performance of the multiclass classifier directly learned from the training set and the binary classifier.

この実施の形態の分類精度は流暢さに関して５５．２％、充分性に関して６２．６％、受容可能性に関して６２．３％であった。従って、この実施の形態は、全ての主観的評価指標について、ベースライン及びマルチクラス分類クラスよりも良好な性能を発揮し、ベースライン／マルチクラス性能と比較して流暢さについて２２．７％／６．０％、充分性について３１．５％／６．６％、受容可能性について１９．３％／１．２％の利得を達成した。 The classification accuracy of this embodiment was 55.2% for fluency, 62.6% for sufficiency, and 62.3% for acceptability. Thus, this embodiment provides better performance than baseline and multi-class classification classes for all subjective metrics and 22.7% / fluency compared to baseline / multi-class performance. Gains of 6.0%, 31.5% / 6.6% for sufficiency and 19.3% / 1.2% for acceptability were achieved.

さらに、２値分類器の性能は、分類タスク及び評価指標に依存して大きく変化する。全部対１の分類器では８０％から９０％の精度が達成されるが、境界分類器では７５％から８１％であり、全ペア分類器では５５％から９１％であった。 Furthermore, the performance of the binary classifier varies greatly depending on the classification task and the evaluation index. An accuracy of 80% to 90% is achieved for the all-to-one classifier, but 75% to 81% for the boundary classifier and 55% to 91% for the all-pair classifier.

−人による評定との相関
この実施の形態に従った指標の、文レベルでの人の判断に対する相関を調べるため、得られた結果についてスペアマン（Ｓｐｅａｒｍａｎ）ランク相関係数を計算した。加えて、テーブル２に列挙した自動評価指標とマルチクラス分類器とを用いて、テスト文をランク付けし、人による評定に対するそのスペアマンランク相関を計算した。その相関係数を図１８に要約する。 -Correlation with rating by person In order to examine the correlation of the index according to this embodiment to the judgment of the person at the sentence level, the Spearman rank correlation coefficient was calculated for the obtained results. In addition, the test sentences were ranked using the automatic evaluation index and the multi-class classifier listed in Table 2, and their Spareman rank correlation with respect to human ratings was calculated. The correlation coefficients are summarized in FIG.

結果から、この実施の形態は他の全ての指標より高い性能を発揮したことが示され、流暢さ／充分性／受容可能性についての相関係数はそれぞれ０．６３２／０．７５９／０．７６９であった。 The results show that this embodiment performed better than all other indicators, and the correlation coefficients for fluency / sufficiency / acceptability were 0.632 / 0.759 / 0. 769.

コンピュータによる実現
上述の実施の形態は、コンピュータシステムと、当該システム上で実行されるコンピュータプログラムとによって実現可能である。図１９はこの実施の形態で用いられるコンピュータシステム６５０の外観を示し、図２０はコンピュータシステム６５０のブロック。ここで示すコンピュータシステム６５０は単なる例示であって、さまざまな他の構成が利用可能である。 Realization by Computer The above-described embodiment can be realized by a computer system and a computer program executed on the system. FIG. 19 shows an appearance of a computer system 650 used in this embodiment, and FIG. 20 is a block diagram of the computer system 650. The computer system 650 shown here is merely exemplary and various other configurations are available.

図１９を参照して、コンピュータシステム６５０は、コンピュータ６６０と、いずれもコンピュータ６６０に接続された、モニター６６２と、キーボード６６６と、マウス６６８と、スピーカー６９２と、マイクロフォン６９０とを含む。さらに、コンピュータ６６０は、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）ドライブ６７０及び半導体メモリポート６７２を含む。 Referring to FIG. 19, a computer system 650 includes a computer 660, a monitor 662, a keyboard 666, a mouse 668, a speaker 692, and a microphone 690, all connected to the computer 660. Further, the computer 660 includes a DVD (Digital Versatile Disc) drive 670 and a semiconductor memory port 672.

図２０を参照して、コンピュータ６６０はさらに、ＤＶＤ６７０及び半導体メモリポート６７２に接続されたバス６８６と、上述した装置を実現するコンピュータプログラムを実行するためのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６７６と、コンピュータ６６０のブートアッププログラムを記憶するＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）６７８と、ＣＰＵ６７６によって使用される作業領域及びＣＰＵ６７６によって実行されるプログラムの記憶領域を提供するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）６８０と、コーパス４０（図１を参照）、ソーステキスト７０、翻訳済テキスト７４、ソーステキスト７０を翻訳するための機械翻訳プログラム、特徴量抽出モジュール７６で用いられる他の機械翻訳プログラム、機械翻訳で必要とされる全てのデータ、２値分類器５４及び符号化マトリクス５６を記憶するハードディスクドライブ（ＨａｒｄＤｉｓｋＤｒｉｖｅ：ＨＤＤ）６７４と、を含む。これらの要素は全てバス６８６を介してＣＰＵ６７６に接続される。 Referring to FIG. 20, a computer 660 further includes a bus 686 connected to DVD 670 and semiconductor memory port 672, a CPU (Central Processing Unit) 676 for executing a computer program for realizing the above-described device, and computer 660. ROM (Read-Only Memory) 678 for storing the boot-up program of RAM, RAM (Random Access Memory) 680 for providing a work area used by CPU 676 and a storage area for programs executed by CPU 676, and corpus 40 (FIG. 1), source text 70, translated text 74, machine translation program for translating source text 70, and other machines used in feature quantity extraction module 76 A hard disk drive (HDD) 674 for storing a translation program, all data required for machine translation, a binary classifier 54 and an encoding matrix 56. All these elements are connected to the CPU 676 via the bus 686.

コンピュータ６６０が分類器準備ユニット３４として用いられる場合、ＨＤＤ６７４はさらに、学習モジュール４４及び最適化モジュール５０と、２値分類器４６及び５４と、符号化マトリクス４８及び５６と、２値分類器及び符号化マトリクスの最適化に用いられる開発セット５２と、のためのプログラムを記憶する。 When computer 660 is used as classifier preparation unit 34, HDD 674 further includes learning module 44 and optimization module 50, binary classifiers 46 and 54, encoding matrices 48 and 56, binary classifier and code. And a development set 52 used to optimize the optimization matrix.

コンピュータ６６０はさらに、コンピュータ６６０のネットワーク６５２への接続を提供するための、バス６８６に接続されたネットワークインターフェース（Ｉ／Ｆ）６９６を含む。 Computer 660 further includes a network interface (I / F) 696 connected to bus 686 to provide a connection of computer 660 to network 652.

上述の実施の形態のシステムを実現するソフトウェアはＤＶＤ６８２又は半導体メモリ６８４等の記憶媒体に記憶されたオブジェクトコードの形で流通し、ＤＶＤドライブ６７０又は半導体メモリポート６７２等の読出装置を介してコンピュータ６６０に提供され、ＨＤＤ６７４に記憶される。ＣＰＵ６７６がプログラムを実行する際には、プログラムはＨＤＤ６７４から読出されてＲＡＭ６８０に記憶される。ＣＰＵ６７６の図示しないプログラムカウンタによって指定されたアドレスからＣＰＵ６７６へ命令がフェッチされ、その命令が実行される。ＣＰＵ６７６はＣＰＵ６７６内のレジスタ、ＲＡＭ６８０、又はＨＤＤ６７４から処理すべきデータを読出し、処理の結果をＣＰＵ６７６内のレジスタ、ＲＡＭ６８０、又はＨＤＤ６７４に記憶する。 Software that implements the system of the above-described embodiment is distributed in the form of an object code stored in a storage medium such as the DVD 682 or the semiconductor memory 684, and the computer 660 via a reading device such as the DVD drive 670 or the semiconductor memory port 672. And stored in the HDD 674. When CPU 676 executes a program, the program is read from HDD 674 and stored in RAM 680. An instruction is fetched to the CPU 676 from an address designated by a program counter (not shown) of the CPU 676, and the instruction is executed. The CPU 676 reads data to be processed from the register in the CPU 676, the RAM 680, or the HDD 674, and stores the processing result in the register in the CPU 676, the RAM 680, or the HDD 674.

コンピュータシステム６５０の一般的動作は周知であるので、ここでは詳細な説明は行なわない。 The general operation of computer system 650 is well known and will not be described in detail here.

ソフトウェアの流通の方法に関して、ソフトウェアは必ずしも記憶媒体上に固定されたものでなくてもよい。例えば、ソフトウェアはネットワーク６５２に接続された別のコンピュータからコンピュータ６６０に送信されてもよい。ソフトウェアの一部がＨＤＤ６７４に記憶され、ソフトウェアの残りの部分をネットワークを介してＨＤＤ６７４に取込み、実行の際に統合する様にしてもよい。 Regarding the software distribution method, the software does not necessarily have to be fixed on a storage medium. For example, the software may be transmitted to computer 660 from another computer connected to network 652. A part of the software may be stored in the HDD 674, and the remaining part of the software may be taken into the HDD 674 via the network and integrated at the time of execution.

典型的には、現代のコンピュータはコンピュータのオペレーティングシステム（ＯＳ）によって提供される一般的な機能を利用し、所望の目的に従って制御された態様でこれら機能を実行する。従って、ＯＳ又は第３者から提供されうる一般的な機能を含まず、一般的な機能の実行順序の組合せのみを指定したプログラムであっても、そのプログラムが全体として所望の目的を達成する制御構造を有する限り、そのプログラムがこの発明の範囲に包含されることは明らかである。 Typically, modern computers utilize common functions provided by a computer operating system (OS) and perform these functions in a controlled manner according to the desired purpose. Therefore, even if the program does not include a general function that can be provided by the OS or a third party and specifies only a combination of execution order of the general function, the program achieves a desired purpose as a whole. As long as it has a structure, it is clear that the program is included in the scope of the present invention.

可能な変形例
上述の実施の形態は日本語から英語への翻訳品質に関するものであったが、この発明はこれに限定されるものではない。コーパス４０として使用できるコーパスが入手できるかぎり、この発明はどのような言語の組合せにも適用可能である。 Possible Modifications The embodiment described above relates to the translation quality from Japanese to English, but the present invention is not limited to this. As long as a corpus that can be used as the corpus 40 is available, the present invention is applicable to any combination of languages.

上述の実施の形態では、最初に全ての分類器を使用し、最悪の性能の分類器を除いていくことによって、最適化モジュール５０が２値分類器４６を２値分類器５４へと最適化する。この発明はこのような最適化スキームに限定されるものではない。最適化の結果得られる分類器５４が、他の分類器の組合せよりも良好な性能を発揮するのであれば、どのような最適化スキームを用いてもよい。例えば、２値分類器の可能な組合せ全てを調査し、最高の性能が得られる組合せを２値分類器の最適化された組として選択してもよい。これに代えて、開発セット５２での２値分類器４６の性能を最初に計算し、所定のしきい値より高い精度の２値分類器のみを最適化された２値分類器５４として用いてもよい。 In the embodiment described above, the optimization module 50 optimizes the binary classifier 46 to the binary classifier 54 by first using all classifiers and removing the worst performing classifiers. To do. The present invention is not limited to such an optimization scheme. Any optimization scheme may be used as long as the classifier 54 obtained as a result of optimization exhibits better performance than other combinations of classifiers. For example, all possible combinations of binary classifiers may be examined and the combination that yields the best performance may be selected as the optimized set of binary classifiers. Instead, the performance of the binary classifier 46 in the development set 52 is calculated first, and only the binary classifier with an accuracy higher than a predetermined threshold is used as the optimized binary classifier 54. Also good.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

この発明の実施の形態のシステム３０の全体構造を示す図である。It is a figure which shows the whole structure of the system 30 of embodiment of this invention. コーパス４０の詳細を示す図である。3 is a diagram showing details of the corpus 40. FIG. 学習用翻訳セット１００がどのように準備されるかを示す図である。It is a figure which shows how the translation set for learning 100 is prepared. この実施の形態で用いられる２値分類器の分解スキームを示す図である。It is a figure which shows the decomposition | disassembly scheme of the binary classifier used by this embodiment. １値対他の分類器のトレーニングのための例をどのように生成するかを示す図である。FIG. 7 shows how to generate an example for training one value versus another classifier. ５＿４及び５＿２分類器のトレーニングのための例をどのように生成するかを示す図である。FIG. 6 shows how to generate an example for training a 5_4 and 5_2 classifier. ３＿２及び３＿１分類器のトレーニングのための例をどのように生成するかを示す図である。FIG. 6 shows how to generate an example for training 3_2 and 3_1 classifiers. ５＿４及び５＿２分類器のトレーニングのための例をどのように生成するかを示す図である。FIG. 6 shows how to generate an example for training a 5_4 and 5_2 classifier. 第１の実施の形態で用いられる符号化マトリクス４８を示す図である。It is a figure which shows the encoding matrix 48 used in 1st Embodiment. コンピュータプログラムの形で実現された最適化モジュールのフロー図である。FIG. 5 is a flow diagram of an optimization module implemented in the form of a computer program. 最適化プロセスの結果例を示す図である。It is a figure which shows the example of a result of an optimization process. 最適化された２値分類器５４の詳細を示す図である。It is a figure which shows the detail of the optimized binary classifier 54. FIG. 最適化された符号化マトリクス５６の構成を示す図である。It is a figure which shows the structure of the encoding matrix 56 optimized. 特徴量抽出モジュール７６で行われる特徴量抽出のスキームを示す図である。It is a figure which shows the scheme of the feature-value extraction performed with the feature-value extraction module 76. FIG. 流暢さと充分性についての実験の最適化プロセスにおけるシステム性能（マルチクラス精度）の例を示す図である。It is a figure which shows the example of the system performance (multiclass precision) in the optimization process of the experiment about fluency and sufficiency. 実験での所与の翻訳に対する人のスコアの分布をまとめた図である。It is the figure which put together the distribution of the person's score with respect to the given translation in experiment. マルチクラスタスクと２値分類器性能との分類精度をまとめた図である。It is the figure which put together the classification accuracy of a multiclass task and binary classifier performance. 異なる分類器の相関係数をまとめた図である。It is the figure which put together the correlation coefficient of a different classifier. コンピュータシステム６５０の正面図である。2 is a front view of a computer system 650. FIG. コンピュータシステム６５０を示すブロック図である。FIG. 7 is a block diagram illustrating a computer system 650.

Explanation of symbols

３０翻訳品質予測システム
３２アプリケーションユニット
３４分類器準備ユニット
４０コーパス
４４学習モジュール
４６、５４２値分類器
４８、５０符号化マトリクス
５０最適化モジュール
５２開発セット
７０ソーステキスト
７４翻訳済テキスト
７６特徴量抽出モジュール
７８特徴量の組
８０２値決定値
８２２値ベクトル
８４比較モジュール 30 translation quality prediction system 32 application unit 34 classifier preparation unit 40 corpus 44 learning module 46, 54 binary classifier 48, 50 encoding matrix 50 optimization module 52 development set 70 source text 74 translated text 76 feature extraction module 78 Feature value set 80 Binary decision value 82 Binary vector 84 Comparison module

Claims

A device for estimating a machine translation quality rating by a person, the rating being given by a pre-defined grade;
Means for calculating a predetermined set of features for a given translation;
A set of binary classifiers, each for classifying the given translation into one of the predefined binary classes according to a feature selected in the set of features;
Means for storing an encoding matrix, wherein each of the classes is associated with a row of classification results of the set of binary classifiers;
Means for determining a grade of the given translation according to the result of binary classification by the binary classifier and the encoding matrix.

The output of the set of binary classifiers defines a binary vector, each of its elements being a first value or a second value different from the first value;
Each row of the encoding matrix defines a ternary vector, each of which is the first value, the second value, or a third value different from the first and second values;
The first and second values indicate that the given translation should be classified into first and second classes, respectively, by the corresponding one of the set of binary classifiers;
The third value indicates that the given translation is not classified by the corresponding one of the set of binary classifiers;
The means for determining is
Means for calculating a distance between the binary vector and each of the rows;
Means for finding a row of the encoding matrix closest to a binary vector at the distance;
Means for selecting a grade corresponding to a row closest to a binary vector as an estimated human rating for the quality of the given translation.

The apparatus according to claim 1 or 2, wherein the means for calculating the distance includes means for calculating a Hamming distance between the binary vector and each row.

The means for calculating the predetermined set of features is a plurality of preselected automatic, each for automatically evaluating a human rating of machine translation quality by the set of grades. The apparatus according to claim 1, comprising multi-class evaluation means.

The apparatus according to claim 4, wherein the means for calculating the predetermined feature amount set further includes an automatic evaluation means for calculating a predetermined internal index feature value.

A computerized method for estimating a human rating of machine translation quality, wherein the human rating is given by a predefined grade, the method comprising:
Calculating a predetermined set of features for a given translation;
Classifying the given translation into one of a predefined binary class by each of a set of binary classifiers according to a feature selected in the set of features;
Storing in the storage unit an encoding matrix associated with each classification result row performed in the classifying step of each of the classes;
Determining the grade of the given translation according to the set of binary classification results of the classifying step and the encoding matrix.

When executed on a computer, the computer is
Means for calculating a predetermined set of features for a given translation;
A set of binary classifiers, each for classifying the given translation into one of the predefined binary classes according to a feature selected in the set of features;
Means for storing an encoding matrix, wherein each of the classes is associated with a row of classification results of the set of binary classifiers;
A computer program that functions as means for determining a grade of the given translation according to a result of binary classification by the binary classifier and the encoding matrix.