JP5684084B2

JP5684084B2 - Misclassification detection apparatus, method, and program

Info

Publication number: JP5684084B2
Application number: JP2011220337A
Authority: JP
Inventors: 昭典藤野; 具治岩田; 永田　昌明; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-10-04
Filing date: 2011-10-04
Publication date: 2015-03-11
Anticipated expiration: 2031-10-04
Also published as: JP2013080395A

Description

本発明は、誤分類検出装置、方法、及びプログラムに係り、特に、サンプル集合の中から、誤ったカテゴリに分類されたコンテンツのサンプルを検出する誤分類検出装置、方法、及びプログラムに関する。 The present invention relates to a misclassification detection apparatus, method, and program, and more particularly to a misclassification detection apparatus, method, and program for detecting a sample of content classified into an incorrect category from a sample set.

コンテンツのカテゴリ分類は、多くの場合、人手による分類作業によって行われる。あるいは、人手によってカテゴリに分類されたいくつかのコンテンツを訓練データとして用いて統計的分類器を設計し、新規のコンテンツのカテゴリを推定するのに統計的分類器を用いることでコンテンツの自動分類を行う。 In many cases, content classification is performed manually. Alternatively, you can design a statistical classifier using some content manually categorized as training data, and use the statistical classifier to estimate new content categories. Do.

しかし、人手による分類作業には、コンテンツを誤ったカテゴリに分類する誤分類の危険性が常に存在する。また、誤ったカテゴリに分類されたコンテンツは、統計的分類器の自動分類性能の低下をもたらす。それ故、与えられた分類済みのサンプルの中から、誤ったカテゴリに分類されているサンプルを検出する誤分類検出技術は重要である。 However, there is always a risk of misclassification that classifies content into an incorrect category in manual classification work. In addition, content classified into an incorrect category causes a decrease in the automatic classification performance of the statistical classifier. Therefore, a misclassification detection technique for detecting a sample classified in a wrong category from given classified samples is important.

従来の技術では、分類済みのサンプルの集合の中から誤分類されたサンプルを推定するため、まず、分類済みのサンプルのすべてを訓練データとし、交差検定法を用いて学習した統計的分類器を用いてサンプルのカテゴリを推定する。次に、その推定されたカテゴリが分類されているカテゴリと一致しないサンプルを、誤分類されたサンプルとして検出する。検出精度を高めるため、非特許文献１、２の技術では、複数の統計的分類器で得られるカテゴリ推定の結果の多数決を取ることで、統計的分類器の種類に依存するカテゴリ推定のバイアスの悪影響を抑制している。非特許文献３、４の技術では、カテゴリの種類が２つしかない問題で、１つのサンプルのカテゴリを異なるカテゴリに置き換えて学習させた統計的分類器を用いて別のサンプルのカテゴリを推定する。カテゴリを置き換えるサンプルを変えて集めた複数の推定結果から最終判定を行うことで、カテゴリ推定の精度を高めている。 In the conventional technique, in order to estimate misclassified samples from a set of classified samples, first, a statistical classifier trained using cross-validation with all the classified samples as training data is used. To estimate the sample category. Next, a sample whose estimated category does not match the classified category is detected as a misclassified sample. In order to improve the detection accuracy, the techniques of Non-Patent Documents 1 and 2 take the majority of the category estimation results obtained by a plurality of statistical classifiers, thereby reducing the bias of category estimation depending on the type of statistical classifier. Suppresses adverse effects. In the techniques of Non-Patent Documents 3 and 4, since there are only two types of categories, the category of another sample is estimated using a statistical classifier trained by replacing the category of one sample with a different category. . The accuracy of category estimation is improved by performing final determination from a plurality of estimation results collected by changing the sample for replacing the category.

Carla E. Brodley and Mark A. Friedl. Identifying mislabeled training data.Journal of Artificial Intelligence Research, 11(11):131−166, 1999.Carla E. Brodley and Mark A. Friedl. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11 (11): 131-166, 1999. Sundara Venkataraman, Dimitris Metaxas, Dmitriy Fradkin, Casimir Kulikowski, and Ilya Muchnik. Distinguishing mislabeled data from correctly labeled data in classifier design. In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’04), pages 668−672, 2004.Sundara Venkataraman, Dimitris Metaxas, Dmitriy Fradkin, Casimir Kulikowski, and Ilya Muchnik.Distinguishing mislabeled data from correctly labeled data in classifier design.In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'04), pages 668-672 , 2004. Andrea Mallosini, Enrico Blanzieri, and Raymond T. Ng. Detecting potential labeling errors in microarrays by data perturbation. Bioinformatics, 22(17):2114−2121, 2006.Andrea Mallosini, Enrico Blanzieri, and Raymond T. Ng.Detecting potential labeling errors in microarrays by data perturbation.Bioinformatics, 22 (17): 2114-2121, 2006. Chen Zhang, Chunguo Wu, Enrico Blanzieri, You Zhou, Yan Wang, Wei Du, and Yanchun Liang. Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model. Bioinformatics, 25(20):2708−2714, 2009.Chen Zhang, Chunguo Wu, Enrico Blanzieri, You Zhou, Yan Wang, Wei Du, and Yanchun Liang.Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model.Bioinformatics, 25 (20): 2708− 2714, 2009.

上記の非特許文献１、２の技術では、カテゴリの推定に用いる統計的分類器を、推定対象のサンプルを除いた訓練データを用いて学習させる。その訓練データの中には、誤分類されたサンプルが含まれている。一般に、誤分類されたサンプルを含む訓練データ集合を用いて学習させた統計的分類器の性能は、誤分類されたサンプルを訓練データ集合から除外して学習させた統計的分類器の性能より劣る。したがって、統計的分類器の精度を向上させるために、誤分類されたサンプルが統計的分類器の学習に与える悪影響を抑制する技術が必要となる。 In the techniques of Non-Patent Documents 1 and 2 described above, a statistical classifier used for category estimation is learned using training data excluding samples to be estimated. The training data includes misclassified samples. In general, the performance of a statistical classifier trained with a training data set containing misclassified samples is inferior to that of a statistical classifier trained with exclusion of misclassified samples from the training data set . Therefore, in order to improve the accuracy of the statistical classifier, there is a need for a technique that suppresses adverse effects of misclassified samples on the learning of the statistical classifier.

上記の非特許文献３、４では、訓練データ中のサンプルに付与されたカテゴリを入れ換えて学習を行うことで、カテゴリの推定精度を向上させている。しかし、上記の非特許文献３、４に記載の技術は、カテゴリの種類が２つの場合のみの問題を想定しており、カテゴリが複数ある分類問題には適用できない。 In the non-patent documents 3 and 4 described above, the category estimation accuracy is improved by performing learning by replacing the categories assigned to the samples in the training data. However, the techniques described in Non-Patent Documents 3 and 4 assume a problem only when there are two types of categories, and cannot be applied to a classification problem with a plurality of categories.

本発明は、上記の事情を鑑みてなされたもので、検出に利用する統計的分類器の学習に誤分類されたサンプルが与える悪影響を抑制して、カテゴリが複数ある一般的な分類問題で誤分類されたサンプルを検出することができる誤分類検出装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and suppresses adverse effects caused by misclassified samples in the learning of a statistical classifier used for detection, and is an error in a general classification problem having a plurality of categories. An object of the present invention is to provide a misclassification detection apparatus, method, and program capable of detecting a classified sample.

上記の目的を達成するために本発明に係る誤分類検出装置は、誤ったカテゴリに分類されたコンテンツである誤分類サンプルを含む、コンテンツの属するカテゴリが既知のサンプル集合の中から、誤分類サンプルを検出する誤分類検出装置であって、
コンテンツｘとカテゴリｙで表わされるサンプルの同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの推定値＾Θを、一点除外交差検定法に基づく前記同時確率モデルｐ（ｘ、ｙ；Θ）の予測尤度を最大化させるように、サンプルｎごとに設定した重みｗ_nを用いて計算する確率モデル生成手段と、
前記確率モデル生成手段によって計算された前記同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータ値Θの推定値＾Θに基づいて、各サンプルｎについてコンテンツｘ_nが分類されているカテゴリｙ_nの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）を計算し、各サンプルｎの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）に基づいて、誤分類サンプルを検出する誤分類サンプル検出手段と、
を含み、
前記確率モデル生成手段は、
誤って分類されているサンプルｎに大きな値を設定すべき重みｗ _0n 及び正しく分類されているサンプルｎに大きな値を設定すべき重みｗ _1n を規定する重みパラメータ行列の推定値^Ｗに基づいて求められる、一点除外交差検定法に基づくパラメータΘの推定値＾Θ _-n を用いて、各サンプルｎが正しく分類されているか否かを表す潜在変数ｚの予測を与える正誤予測確率Ｐ（ｚ｜ｘ _n 、ｙ _n ；Θ _-n ）を計算する正誤予測確率計算手段と、
前記正誤予測確率計算手段によって計算された前記正誤予測確率Ｐ（ｚ｜ｘ _n 、ｙ _n ；＾Θ _-n ）を用いて、前記重みパラメータ行列の推定値^Ｗを計算する重み計算手段と、
前記重みパラメータ行列の推定値^Ｗの変化量を計算し収束条件を満たす前記重みパラメータ行列の推定値^Ｗが求まるまで、あるいは所定の回数に達するまで、前記正誤予測確率計算手段による計算及び前記重み計算手段による計算の繰り返し処理を行う収束判定手段と、
前記収束判定手段による繰り返し処理により得られた前記重みパラメータ行列の推定値^Ｗを用いて、サンプルｎごとに、前記同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの一点除外交差検定法に基づく推定値^Θ _-n を計算するパラメータ計算手段と、を含んで構成されている。 In order to achieve the above object, a misclassification detection apparatus according to the present invention includes a misclassification sample that is a content classified into an incorrect category, and includes a misclassification sample from a set of samples to which the category to which the content belongs is known. A misclassification detection device for detecting
Content x and the joint probability model p (x, y; theta) of the sample represented by category y an estimate ^ theta of parameters theta of, the joint probability model p (x based on one point out cross assay, y; A probability model generating means for calculating using a weight w _n set for each sample n so as to maximize the prediction likelihood of Θ )
Based on the estimated value { circumflex over (θ )} of the parameter value Θ of the joint probability model p (x 1 , y 2 Θ ) calculated by the probability model generation unit, the content x _n is classified for each sample _n . A misclassification that calculates a predicted class posterior probability P (y _n | x _n ; ^ Θ ) and detects misclassified samples based on the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n Sample detection means;
Only including,
The probability model generation means includes:
Based on a weight parameter estimate ^ W that defines a weight w _0n to set a large value for the incorrectly classified sample n and a weight w _1n to set a large value to the correctly classified sample n Using the estimated value Θ Θ _n of the parameter Θ based on the one-point exclusion cross-validation method, the correct or incorrect prediction probability P (z |) that gives a prediction of the latent variable z indicating whether each sample n is correctly classified x _n , y _n ; Θ _−n ), a correct / incorrect prediction probability calculation means;
Weight calculation means for calculating an estimated value ^ W of the weight parameter matrix using the correct / incorrect prediction probability P (z | x _n , y _n ; ^ Θ _-n ) calculated by the correct / incorrect prediction probability calculation means ;
The amount of change of the estimated value ^ W of the weight parameter matrix is calculated and the calculation by the correct / incorrect prediction probability calculating means until the estimated value ^ W of the weight parameter matrix satisfying the convergence condition is obtained or until a predetermined number of times is reached, and Convergence determining means for repeatedly performing calculation by weight calculating means;
Using the estimated value ^ W of the weight parameter matrix obtained by the iterative processing by the convergence determination means, a one-point exclusion cross-validation method for the parameter Θ of the joint probability model p (x, y; Θ) for each sample n And a parameter calculation means for calculating an estimated value ^ Θ _-n based on .

本発明に係る誤分類検出方法は、誤ったカテゴリに分類されたコンテンツである誤分類サンプルを含む、コンテンツの属するカテゴリが既知のサンプル集合の中から、誤分類サンプルを検出する誤分類検出方法であって、
確率モデル生成手段によって、コンテンツｘとカテゴリｙで表わされるサンプルの同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの推定値＾Θを、一点除外交差検定法に基づく前記同時確率モデルｐ（ｘ、ｙ；Θ）の予測尤度を最大化させるように、サンプルｎごとに設定した重みｗ_nを用いて計算するステップと、
誤分類サンプル検出手段によって、前記確率モデル生成手段によって計算された前記同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータ値Θの推定値＾Θに基づいて、各サンプルｎについてコンテンツｘ_nが分類されているカテゴリｙ_nの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）を計算し、各サンプルｎの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）に基づいて、誤分類サンプルを検出するステップと、
を含み、
前記同時確率モデルｐ（ｘ _n 、ｙ _n ）のパラメータ値Θを計算するステップは、
正誤予測確率計算手段によって、誤って分類されているサンプルｎに大きな値を設定すべき重みｗ _0n 及び正しく分類されているサンプルｎに大きな値を設定すべき重みｗ _1n を規定する重みパラメータ行列の推定値^Ｗに基づいて求められる、一点除外交差検定法に基づくパラメータΘの推定値＾Θ _-n を用いて、各サンプルｎが正しく分類されているか否かを表す潜在変数ｚの予測を与える正誤予測確率Ｐ（ｚ｜ｘ _n 、ｙ _n ；Θ _-n ）を計算するステップと、
重み計算手段によって、前記正誤予測確率計算手段によって計算された前記正誤予測確率Ｐ（ｚ｜ｘ _n 、ｙ _n ；＾Θ _-n ）を用いて、前記重みパラメータ行列の推定値^Ｗを計算するステップと、
収束判定手段によって、前記重みパラメータ行列の推定値^Ｗの変化量を計算し収束条件を満たす前記重みパラメータ行列の推定値^Ｗが求まるまで、あるいは所定の回数に達するまで、前記正誤予測確率計算手段による計算及び前記重み計算手段による計算の繰り返し処理を行うステップと、
パラメータ計算手段によって、前記収束判定手段による繰り返し処理により得られた前記重みパラメータ行列の推定値^Ｗを用いて、サンプルｎごとに、前記同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの一点除外交差検定法に基づく推定値^Θ _-n を計算するステップと、
を含むことを特徴とする。 The misclassification detection method according to the present invention is a misclassification detection method for detecting a misclassification sample from a sample set including a category to which a content belongs, including a misclassification sample that is a content classified into an incorrect category. There,
By the probability model generation means, content x and the joint probability model p (x, y; theta) of the sample represented by category y an estimate of the parameter theta of ^ theta, the joint probability model based on a single point out cross assay p (x, y; Θ) the predicted likelihood of so as to maximize, calculating using the weight w _n set for each sample n,
The content x _n is classified for each sample n based on the estimated value ΘΘ of the parameter value Θ of the joint probability model p (x 1 , y 2 Θ ) calculated by the probability model generation means by the misclassified sample detection means. Calculated predicted class posterior probability P (y _n | x _n ; ^ Θ ) of the category y _{n, and based on} the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n Detecting a classification sample;
Only including,
Calculating the parameter value Θ of the joint probability model p (x _n , y _n ),
The weight parameter matrix defining the weight w _0n to set a large value for the sample n incorrectly classified and the weight w _1n to set a large value to the sample n correctly classified by the correct / incorrect prediction probability calculation means. Give a prediction of the latent variable z that represents whether each sample n is correctly classified, using the estimated value ^ Θ _-n of the parameter Θ based on the one-point exclusion cross-validation method based on the estimated value ^ W Calculating a correct / incorrect prediction probability P (z | x _n , y _n ; Θ _−n ) ;
The weight calculation means calculates the estimated value ^ W of the weight parameter matrix using the correct / incorrect prediction probability P (z | x _n , y _n ; ^ Θ _-n ) calculated by the correct / incorrect prediction probability calculation means. Steps,
By calculating the amount of change of the estimated value ^ W of the weight parameter matrix by the convergence determination means and calculating the correct or incorrect prediction probability until the estimated value ^ W of the weight parameter matrix satisfying the convergence condition is obtained or until a predetermined number of times is reached. Performing a calculation process by means and a repetition process of calculation by the weight calculation means;
By using the estimated value ^ W of the weight parameter matrix obtained by the iterative processing by the convergence determination means by the parameter calculation means, the parameter Θ of the joint probability model p (x, y; Θ) is calculated for each sample n. Calculating an estimate ^ Θ _-n based on a one-point exclusion cross-validation method ;
The characterized by containing Mukoto.

本発明によれば、確率モデル生成手段によって、コンテンツｘとカテゴリｙで表わされるサンプルの同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの推定値＾Θを、一点除外交差検定法に基づく前記同時確率モデルｐ（ｘ、ｙ；Θ）の予測尤度を最大化させるように、サンプルｎごとに設定した重みｗ_nを用いて計算する。 According to the present invention, the probability model generation means, content x and the joint probability model p (x, y; theta) of the sample represented by category y an estimate ^ theta of parameters theta of one point out cross assay the joint probability model p (x, y; Θ) based on the predicted likelihood of so as to maximize, calculated using the weights w _n set for each sample n.

そして、誤分類サンプル検出手段によって、前記確率モデル生成手段によって計算された前記同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータ値Θの推定値＾Θに基づいて、各サンプルｎについてコンテンツｘ_nが分類されているカテゴリｙ_nの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）を計算し、各サンプルｎの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）に基づいて、誤分類サンプルを検出する。 Then, based on the estimated value ^ Θ of the parameter value Θ of the joint probability model p (x , y ; Θ ) calculated by the probability model generation means by the misclassified sample detection means, content x _n for each sample _n Calculate the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of the category y _n into which the categorization is classified, and based on the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n Detect misclassified samples.

このように、同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの推定値＾Θを、一点除外交差検定法に基づく同時確率モデルｐ（ｘ、ｙ；Θ）の予測尤度を最大化させるように、サンプルｎごとに設定した重みを用いて計算し、同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータ値Θの推定値＾Θに基づいて計算される各サンプルｎの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）を用いて、誤分類サンプルを検出することにより、検出に利用する統計的分類器の学習に誤分類されたサンプルが与える悪影響を抑制して、カテゴリが複数ある一般的な分類問題で誤分類されたサンプルを検出することができる。 Maximum predicted likelihood Thus, the joint probability model p (x, y;; Θ ) estimate of the parameter theta of ^ theta a joint probability model p (Θ x, y) based on one point out cross assay The prediction class of each sample n is calculated using the weight set for each sample n and calculated based on the estimated value ΘΘ of the parameter value Θ of the joint probability model p (x 1 , y 2 Θ ) By detecting the misclassified sample using the posterior probability P (y _n | x _n ; ^ Θ ), the adverse effect of the misclassified sample on the learning of the statistical classifier used for detection is suppressed, Samples misclassified by a general classification problem with multiple categories can be detected.

本発明に係る確率モデル生成手段は、一点除外交差検定法に基づく各サンプルｎの対数尤度の和を最大化させるように、正しく分類されているサンプルｎに大きな値を設定すべき重みｗ_1nを規定する重みパラメータ行列の推定値^Ｗを計算する重み計算手段と、前記重みパラメータ行列の推定値^Ｗの変化量を計算し収束条件を満たす前記重みパラメータ行列の推定値^Ｗが求まるまで、あるいは所定の回数に達するまで、前記重み計算手段による計算の繰り返し処理を行う収束判定手段と、前記収束判定手段による繰り返し処理により得られた前記重みパラメータ行列の推定値^Ｗを用いて、サンプルｎごとに、前記同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの一点除外交差検定法に基づく推定値^Θ_-nを計算するパラメータ計算手段と、を含むようにすることができる。 The probability model generation means according to the present invention provides a weight w _1n for setting a large value to the correctly classified sample n so as to maximize the sum of log likelihoods of each sample n based on the one-point exclusion cross-validation method. A weight calculation means for calculating the estimated value ^ W of the weight parameter matrix that prescribes the weight, and until the estimated value ^ W of the weight parameter matrix satisfying the convergence condition is calculated by calculating the amount of change of the estimated value ^ W of the weight parameter matrix Or until a predetermined number of times is reached, a convergence determination unit that repeats the calculation by the weight calculation unit, and an estimated value ^ W of the weight parameter matrix obtained by the repetition process by the convergence determination unit, for each n, before Symbol joint probability model p (x, y; Θ) and parameter calculation means for calculating an estimated value ^ theta _-n based on one point out cross assay parameters theta of, It can be made to contain.

本発明に係るプログラムは、コンピュータを、上記の誤分類検出装置の各手段として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit of the misclassification detection apparatus.

以上説明したように、本発明の誤分類検出装置、方法、及びプログラムによれば、同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータΘの推定値＾Θを、一点除外交差検定法に基づく同時確率モデルｐ（ｘ、ｙ；Θ）の予測尤度を最大化させるように、サンプルｎごとに設定した重みを用いて計算し、同時確率モデルｐ（ｘ、ｙ；Θ）のパラメータ値Θの推定値＾Θに基づいて計算される各サンプルｎの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n ；＾Θ）を用いて、誤分類サンプルを検出することにより、検出に利用する統計的分類器の学習に誤分類されたサンプルが与える悪影響を抑制して、カテゴリが複数ある一般的な分類問題で誤分類されたサンプルを検出することができる、という効果が得られる。 As described above, misclassification detecting apparatus of the present invention, a method, and according to the program, the joint probability model p (x, y; Θ) an estimate ^ theta of parameters theta of, on one point out cross assay joint probability model p (x, y; Θ) based predictions likelihood so as to maximize, using the weight set for each sample n is calculated, the joint probability model p (x, y; Θ) parameter values A statistical class utilized for detection by detecting misclassified samples using the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n calculated based on the estimated value of Θ. It is possible to suppress the adverse effect of the misclassified sample on the learning of the classifier, and to detect the misclassified sample by the general classification problem having a plurality of categories.

本発明の第１の実施の形態に係る誤分類検出装置の構成を示す概略図である。It is the schematic which shows the structure of the misclassification detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る誤分類検出装置における確率モデル生成部の構成を示す図である。It is a figure which shows the structure of the probability model production | generation part in the misclassification detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る誤分類検出装置における誤分類検出処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the misclassification detection process routine in the misclassification detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る誤分類検出装置における確率モデル生成処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the probability model generation process routine in the misclassification detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る誤分類検出装置における確率モデル生成部の構成を示す図である。It is a figure which shows the structure of the probability model production | generation part in the misclassification detection apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る誤分類検出装置における確率モデル生成処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the probability model generation process routine in the misclassification detection apparatus which concerns on the 2nd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。データベースに含まれる論文、特許等の文書、オンラインニュースデータ、電子メール等のテキスト情報から成るコンテンツや、Webデータ、blogデータ等のテキスト情報とリンク情報から成るコンテンツ、あるいは画像データ等のコンテンツ、といった特徴ベクトルにより表現することが可能なコンテンツを、スポーツ、音楽、数学といった種別を表すカテゴリに分類したサンプルの集合の中から、誤ったカテゴリに分類されているサンプルを検出する誤分類検出装置に本発明を適用した場合について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Content consisting of text information such as articles, patents, online news data, e-mail, etc. included in the database, content consisting of text information and link information such as Web data, blog data, or content such as image data, etc. This is a misclassification detection device that detects samples classified into the wrong category from a set of samples classified into categories that represent categories such as sports, music, and mathematics. A case where the invention is applied will be described.

〔第１の実施の形態〕
＜システム構成＞
本発明の第１の実施の形態に係る誤分類検出装置１００は、所属するカテゴリのラベルが付与されたコンテンツを含むサンプルの集合が入力され、入力されたサンプルの集合の中から、誤ったカテゴリのラベルが付与されているサンプルを検出して出力する。この誤分類検出装置１００は、ＣＰＵと、ＲＡＭと、後述する誤分類検出処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。図１に示すように、誤分類検出装置１００は、入力部１０と、演算部２０と、出力部３０とを備えている。 [First Embodiment]
<System configuration>
The misclassification detection apparatus 100 according to the first exemplary embodiment of the present invention receives a set of samples including content to which a label of the category to which the user belongs belongs, and selects an incorrect category from the set of input samples. Detect and output the sample with the label. This misclassification detection apparatus 100 is constituted by a computer including a CPU, a RAM, and a ROM storing a program for executing a misclassification detection processing routine to be described later, and is functionally configured as follows. Has been. As shown in FIG. 1, the misclassification detection apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 30.

入力部１０は、入力された、所属するカテゴリのラベルが付与されたコンテンツを含むサンプルの集合を受け付ける。コンテンツに含まれる単語や画素、リンク、あるいはそれらの組み合わせ等により構成される特徴量空間をＴ＝｛ｔ₁, ... ,ｔ_i, ... ,ｔ_V｝とするとき、コンテンツの特徴ベクトルｘは、コンテンツに含まれるｔiの頻度ｘiをもとにｘ＝｛ｘ₁, ... ,ｘ_i, ... ,ｘ_V）^Tで表現される。Vはコンテンツに含まれる可能性がある特徴の種類の数を表す。例えば、コンテンツがテキストデータである場合、Vはコンテンツに出現する可能性がある語彙の総数を表す。サンプル集合の各サンプルは、コンテンツの特徴ベクトルｘと、所属するカテゴリのラベルｙとを含む。 The input unit 10 receives a set of samples including input content to which a category label to which the user belongs is assigned. Words or pixels, link or T = the feature space composed of combinations thereof, and the like, in the content _{_{{t 1, ..., t i}} , ..., t V} when the characteristic of the content The vector x is expressed by x = {x ₁ ,..., X _i ,..., X _V ) ^T based on the frequency xi of ti included in the content. V represents the number of types of features that may be included in the content. For example, when the content is text data, V represents the total number of vocabularies that can appear in the content. Each sample in the sample set includes a content feature vector x and a label y of the category to which the sample belongs.

また、入力部１０は、入力された、後述する各種パラメータ（ハイパーパラメータベクトルηと、カテゴリの事前確率P(y)、正誤の事前確率P(z)と、n番目のサンプルのカテゴリy_nが誤っている場合のクラス条件付確率P(y|x, z₀)）を受け付ける。 In addition, the input unit 10 inputs various parameters (hyperparameter vector η, prior probability P (y) of category, prior probability P (z) of right and wrong, and category y _{n of the nth} sample, which will be described later. Accept the class conditional probability P (y | x, z ₀ )) if it is incorrect.

演算部２０は、サンプルデータベース２１、確率モデル生成部２２、記憶部２３、及び誤分類サンプル検出部２４を備えている。 The calculation unit 20 includes a sample database 21, a probability model generation unit 22, a storage unit 23, and a misclassification sample detection unit 24.

サンプルデータベース２１は、入力部１０により受け付けたサンプル集合を記憶する。ここで、誤分類検出対象のサンプル集合を、D={(x_n, y_n)}^N _n=1とする。 The sample database 21 stores the sample set received by the input unit 10. Here, it is assumed that the sample set for misclassification detection is D = {(x _n , y _n )} ^N _{n = 1} .

確率モデル生成部２２は、誤分類検出対象のサンプル集合D={(x_n, y_n)}^N _n=1に対して、同時確率モデルp(x, y; θ_y)のパラメータΘ=[θ₁, . . . , θ_k, . . . , θ_K]の一点交差検定法に基づく推定値{＾Θ_−n}^N _n=1を計算する。計算された確率モデルのパラメータの推定値{＾Θ_−n}^N _n=1は、記憶部２３に記憶される。 The probability model generation unit 22 sets the parameter Θ = [of the joint probability model p (x, y; θ _y ) for the sample set D = {(x _n , y _n )} ^N _{n = 1} to be misclassified. θ ₁ ,..., θ _k ,..., θ _K ] are calculated based on an estimated value {^ Θ _−n } ^N _{n = 1} . The estimated value {^ Θ _−n } ^N _{n = 1} of the calculated probability model parameter is stored in the storage unit 23.

ここで、ｎは誤分類検出対象のサンプル集合に含まれるサンプルのID番号を表し、x_nはn番目のサンプルの特徴ベクトル、y∈{1, . . . , k, . . . ,K} はサンプルが属するカテゴリを表す。＾Θ_−nはn番目のサンプル(x_n,y_n)をサンプル集合Dから除外して得られるサブ集合D_−n={(x_n', y_n')}_n'≠nを用いて計算される確率モデルのパラメータの推定値であり、pは確率密度を表す。 Here, n represents an ID number of a sample included in the sample set to be misclassified, x _n is a feature vector of the nth sample, y∈ {1,..., K,. Represents the category to which the sample belongs. ^ Θ _−n is a subset D _−n = {(x _{n ′} , y _{n ′} )} _{n ′ ≠ n} obtained by excluding the n th sample (x _n , y _n ) from the sample set D This is an estimated value of the parameter of the probability model to be calculated, and p represents the probability density.

誤分類サンプル検出部２４は、同時確率モデルのパラメータの推定値＾Θ_−nを用いて各サンプルの予測クラス事後確率P(y_n|x_n;＾Θ_−n)＝p(x_n,y_n;＾θ_yn,−n)/Σ^K _k=1p(x_n,k;＾θ_k,−n)を計算し、その予測クラス事後確率が小さいサンプルを誤ったカテゴリに分類された疑いがあるサンプルとして検出する。あるいは、誤分類サンプル検出部２４は、サンプルが分類されているカテゴリy_n以外のカテゴリy≠y_nに対する予測事後確率の最大値と予測クラス事後確率との比R_n=P(y_n|x_n; ＾Θ_−n)/max_y≠ynP(y|x_n;＾Θ_−n)を計算し、R_nが小さいサンプルを検出しても良い。ここで、Pは確率値を表す。 The misclassified sample detection unit 24 uses the estimated value ^ Θ _−n of the parameter of the joint probability model to predict the posterior probability P (y _n | x _n ; ^ Θ _−n ) = p (x _n , y) of each sample. _n ; ^ θ _{yn, −n} ) / Σ ^K _{k = 1} p (x _n , k; ^ θ _{k, −n} ), and the sample with a small predicted class posterior probability is classified as an incorrect category Detect as a sample. Alternatively, the misclassified sample detection unit 24 calculates the ratio R _n = P (y _n | x) between the maximum value of the predicted posterior probability and the predicted class posterior probability for categories y ≠ y _n other than the category y _n in which the sample is classified. _{_{n; ^ Θ -n) / max}} y ≠ yn P (y | x n; ^ Θ -n) is calculated and may be detected sample R _n is small. Here, P represents a probability value.

出力部３０は、誤分類サンプルの検出結果をユーザに対して出力する。 The output unit 30 outputs the detection result of the misclassified sample to the user.

図２に示すように、確率モデル生成部２２は、正誤予測確率計算部３１と、重み計算部３２と、第１収束判定部３３と、第２収束判定部３４と、パラメータ計算部３５と、を備える。 As shown in FIG. 2, the probability model generation unit 22 includes an accuracy prediction probability calculation unit 31, a weight calculation unit 32, a first convergence determination unit 33, a second convergence determination unit 34, a parameter calculation unit 35, Is provided.

正誤予測確率計算部３１は、サンプルデータベース２１に記憶されたサンプル集合D={(x_n,y_n)}^N _n=1を読み込んで、重みパラメータ行列の初期値W⁽⁰⁾、もしくは第２収束判定部３４から入力される収束途中の重みパラメータ行列W^(t)と、ハイパーパラメータベクトルηと、カテゴリの事前確率P(y)と、正誤の事前確率P(z)と、n番目のサンプルのカテゴリy_nが誤っている場合のクラス条件付確率P(y|x, z₀) と、を用いて、各サンプルｎの正誤予測確率P(z|x_n,y_n;＾Θ^(t) _−n)を計算する。ここで、z∈{z₁,z₀}はサンプルのコンテンツｘが分類されているカテゴリyが正しいか否かを表す潜在変数であり、z=z₁の場合は正しく分類されていることを意味し、z=z₀の場合は誤って分類されていることを意味する。重みパラメータ行列Wは、ｎ番目のサンプルが正しく分類されている可能性が高いほど大きな値が与えられる重みw_1nを要素とする重みベクトルw₁=(w₁₁, ... ,w_1n, ... ,w_1N)^Tと、ｎ番目のサンプルが誤って分類されている可能性が高いほど大きな値が与えられる重みw_0nを要素とする重みベクトルw₀=(w₀₁, . . . ,w_0n, . . . ,w_0N)^Tと、から成る行列W=[w₁,w₀]である。a^Tはaの転置ベクトルを表す。 The correct / incorrect prediction probability calculation unit 31 reads the sample set D = {(x _n , y _n )} ^N _{n = 1} stored in the sample database 21 and sets the initial value W ⁽⁰⁾ of the weight parameter matrix or the second The weight parameter matrix W ^{(t) in the} middle of convergence input from the convergence determination unit 34, the hyperparameter vector η, the prior probability P (y) of the category, the prior probability P (z) of accuracy, and the nth sample And the class conditional probability P (y | x, z ₀ ) when the category y _n is incorrect, and the correct / incorrect prediction probability P (z | x _n , y _n ; ^ Θ ^{(t )} _-N ) is calculated. Here, z∈ {z ₁ , z ₀ } is a latent variable indicating whether or not the category y into which the sample content x is classified is correct. If z = z ₁ , the classification is correct. This means that if z = z ₀ , it is classified incorrectly. The weight parameter matrix W is a weight vector w ₁ = (w ₁₁ ,..., W _1n ,... Having weights w _1n that are given larger values as the n-th sample is more likely to be correctly classified. .,, w _1N ) ^T and weight vector w ₀ = (w ₀₁ ,..., with elements having weights w _0n that are given larger values as the nth sample is more likely to be misclassified. w _0n , _... , w _0N ) ^T and a matrix W = [w ₁ , w ₀ ]. a ^T represents a transposed vector of a.

重み計算部３２は、重みパラメータ行列の初期値W⁽⁰⁾、もしくは第２収束判定部３４から入力される収束途中の重みパラメータ行列W^(t)、もしくは第１収束判定部３３から入力される収束途中の重みパラメータ行列W^(s)と、各サンプルｎの正誤予測確率P(z|x_n,y_n;＾Θ^(t) _−n)と、を用いて重みパラメータ行列の更新値W^(s+1)を計算する。 The weight calculation unit 32 is input from the initial value W ⁽⁰⁾ of the weight parameter matrix, the weight parameter matrix W ^{(t) during} convergence input from the second convergence determination unit 34, or the first convergence determination unit 33. Using the weight parameter matrix W ^{(s) in the} middle of convergence and the correct / incorrect prediction probability P (z | x _n , y _n ; ^ Θ ^(t) _−n ) of each sample n, the updated value W ^{( s + 1)} is calculated.

第１収束判定部３３は、重みパラメータ行列の変化量d(s)を計算し、収束条件d(s)<ε_sを満たせば、W^(t+1)←W^(s+1)として重みパラメータ行列の推定値W^(t+1)を第２収束判定部３４に出力する。収束条件を満たさなければ、パラメータの学習のステップをs←s+1のように更新して、重み計算部３２の処理を再度実施する。この処理は収束条件を満たすか、sが所定の回数s_maxに到達するまで繰り返される。 The first convergence determination unit 33 calculates the change amount d (s) of the weight parameter matrix, and if the convergence condition d (s) <ε _s is satisfied, the weight is set as W ^{(t + 1)} ← W ^{(s + 1).} The parameter matrix estimation value W ^{(t + 1)} is output to the second convergence determination unit 34. If the convergence condition is not satisfied, the parameter learning step is updated as s ← s + 1, and the process of the weight calculation unit 32 is performed again. This process is repeated until the convergence condition is satisfied or s reaches a predetermined number of times s _max .

第２収束判定部３４は、重みパラメータ行列の変化量d(t)を計算し、収束条件d(t)<εを満たせば、＾W←W^(t+1)として、重みパラメータ行列の推定値＾Wを出力する。収束条件を満たさなければ、パラメータの学習のステップをt←t+1のように更新して、正誤予測確率計算部３１、重み計算部３２、及び第１収束判定部３３による一連の処理を再度実施する。この処理は収束条件を満たすか、tが所定の回数t_maxに到達するまで繰り返される。 The second convergence determination unit 34 calculates the change d (t) of the weight parameter matrix, and if the convergence condition d (t) <ε is satisfied, the weight parameter matrix is estimated as ^ W ← W ^{(t + 1).} Outputs the value ^ W. If the convergence condition is not satisfied, the parameter learning step is updated as t ← t + 1, and a series of processing by the correct / incorrect prediction probability calculation unit 31, the weight calculation unit 32, and the first convergence determination unit 33 is performed again. carry out. This process is repeated until the convergence condition is satisfied or t reaches a predetermined number t _max .

パラメータ計算部３５は、重みパラメータ行列の推定値＾Wを用いて、確率モデルの一点除外交差検定法に基づく推定値{＾Θ_−n}^N _n=1を計算して出力する。 The parameter calculation unit 35 calculates and outputs an estimated value {^ Θ _−n } ^N _{n = 1} based on the one-point exclusion cross validation method of the probability model using the estimated value ^ W of the weight parameter matrix.

ここで、本実施の形態における確率モデルについて説明する。以下では、確率モデルp(x,y;θy)に、多項分布に基づくNaive Bayesモデル(以下、NB モデル)を用いる場合を例に説明する。 Here, the probability model in the present embodiment will be described. Hereinafter, a case where a Naive Bayes model (hereinafter referred to as an NB model) based on a multinomial distribution is used as the probability model p (x, y; θy) will be described as an example.

多項分布に基づくNBモデルでは、コンテンツが正しいカテゴリに分類されている場合に、カテゴリyと特徴ベクトルxの同時確率モデルp(x,y;θ_y)=p(x|y;θ_y)P(y)のp(x|y;θ_y)を、カテゴリyにおけるそれぞれの特徴t_iの出現確率θy_iが独立であると仮定して、以下の（１)式で定義する。 In the NB model based on the multinomial distribution, when the content is classified into the correct category, the joint probability model p (x, y; θ _y ) = p (x | y; θ _y ) P of category y and feature vector x p (x | y; θ _y ) of (y) is defined by the following equation (1), assuming that the appearance probability θy _i of each feature t _i in category y is independent.

ここで、θ_y=(θ_y1, . . . ,θ_yi, . . . ,θ_yV)^Tであり、θ_yi>0かつ||θ_y||₁=Σ^V _i=1θ_yi=1である。また、Θ=[θ₁, . . . ,θ_k, . . . ,θ_K]^TはNBモデルのパラメータ行列を表す。P(y)>0はカテゴリyの出現確率を表し、Σ^K _k=1P(k)=1を満たす。 Where θ _y = (θ _y1 , _... , Θ _yi , _... , Θ _yV ) ^T , θ _yi > 0 and || θ _y || ₁ = Σ ^V _{i = 1} θ _yi = 1 It is. Θ = [θ ₁ ,..., Θ _k ,..., Θ _K ] ^T represents a parameter matrix of the NB model. P (y)> 0 represents the appearance probability of the category y and satisfies Σ ^K _{k = 1} P (k) = 1.

また、本実施の形態では、誤ったカテゴリに分類されたサンプルの特徴ベクトルxの確率モデルを、それぞれの特徴t_iの出現確率θ_z0iが独立であると仮定して、以下の（２）式で定義する。 Further, in the present embodiment, assuming that the probability probability θ _z0i of each feature t _i is independent of the probability model of the feature vector x of the sample classified into the wrong category, the following equation (2) Defined in

ここで、θ_z0=(θ_z01, . . . ,θ_z0i, . . . , θ_z0V)^Tであり、θ_z0i>0かつ||θ_z0||₁=Σ^V _i=1θ_z0i=1である。 Where θ _z0 = (θ _z01 , _... , Θ _z0i , _... , Θ _z0V ) ^T , θ _z0i > 0 and || θ _z0 || ₁ = Σ ^V _{i = 1} θ _z0i = 1 It is.

＜誤分類検出装置の作用＞
次に、第１の実施の形態に係る誤分類検出装置１００の作用について説明する。まず、所属するクラスがラベル付けされたコンテンツを含むサンプルの集合が誤分類検出装置１００に入力されると、誤分類検出装置１００によって、入力されたサンプル集合が、サンプルデータベース２１へ格納される。また、各種パラメータ（ハイパーパラメータベクトルηと、カテゴリの事前確率P(y)と、正誤の事前確率P(z)と、n番目のサンプルのカテゴリy_nが誤っている場合のクラス条件付確率P(y|x, z₀)））が、誤分類検出装置１００に入力されると、誤分類検出装置１００によって、図３に示す誤分類検出処理ルーチンが実行される。 <Operation of misclassification detection device>
Next, the operation of the misclassification detection apparatus 100 according to the first embodiment will be described. First, when a set of samples including content labeled with the class to which it belongs is input to the misclassification detection apparatus 100, the input sample set is stored in the sample database 21 by the misclassification detection apparatus 100. In addition, various parameters (hyper parameter vector η, prior probability P (y) of category, correct prior probability P (z), and class conditional probability P when n-th sample category y _n is incorrect. (y | x, z ₀ ))) is input to the misclassification detection apparatus 100, the misclassification detection apparatus 100 executes the misclassification detection processing routine shown in FIG.

まず、ステップＳ１０１において、確率モデル生成部２２によって、サンプルデータベース２１から、誤分類検出対象のサンプル集合D={(x_n,y_n)}^N _n=1を読み込んで、各サンプルｎについて、確率モデルのパラメータの一点除外交差検定法に基づく推定値＾Θ_−nの計算を行う。＾Θ_−nの各要素は、入力された重みパラメータ行列Ｗ⁽⁰⁾または計算された重みパラメータ行列Ｗ^(t)の要素である重みパラメータベクトルw₁と、入力されたハイパーパラメータ値η_y,∀_yを用いて、以下の（３）式で計算される。 First, in step S101, the probability model generation unit 22 reads the sample set D = {(x _n , y _n )} ^N _{n = 1} from the sample database 21 and sets the probability for each sample n. Calculate the estimated value ^ Θ- _n based on the model parameter one-point exclusion cross-validation method. Each element of ^ Θ _−n includes a weight parameter vector w ₁ that is an element of the input weight parameter matrix W ⁽⁰⁾ or the calculated weight parameter matrix W ^(t) , and an input hyperparameter value η _y , ∀ with _y, is calculated by the following equation (3).

ここで、I_y(y_n')は、y_n'=yの場合にI_y(y_n')=1とし、y_n'≠yの場合にI_y(y_n')=0とする指示関数である。||x_n'||₁はx_n'のL1ノルムを表す。 Here, I _{y (y} _{n ')} is, y _n' _'and _{_{= 1, y n I y (}} y n)' in the case of = y and _{_{I y (y n ') =}} 0 in the case of ≠ y It is an indicator function. || x _{n ′} || ₁ represents the L1 norm of x _{n ′} .

同様に、誤ったカテゴリに分類されたサンプルの特徴ベクトルxの確率モデルp(x|z₀;θ_z0)のパラメータの、一点除外交差検定法に基づく推定値＾θ_z0,−nの各要素を、入力された重みパラメータ行列Ｗ⁽⁰⁾または計算された重みパラメータ行列Ｗ^(t)の要素である、重みパラメータベクトルw₀と、入力されたハイパーパラメータ値η_z0とを用いて、以下の（４）式に従って計算する。 Similarly, each element of the estimated value ^ θ _{z0, −n} based on the one-point exclusion cross- _validation of the parameter of the probability model p (x | z ₀ ; θ _z0 ) of the feature vector x of the sample classified into the wrong category Using the weight parameter vector w ₀ , which is an element of the input weight parameter matrix W ⁽⁰⁾ or the calculated weight parameter matrix W ^(t) , and the input _{hyperparameter} value η _z0 , Calculate according to equation (4).

ハイパーパラメータベクトルη=(η₁, . . . ,η_k, . . . ,η_K,η_z0)は、パラメータ計算のために事前に設定された定数値である。 The _{hyperparameter} vector η = (η ₁ , _... , Η _k , _... , Η _K , η _z0 ) is a constant value set in advance for parameter calculation.

ここで、確率モデルのパラメータの一点除外交差検定法に基づく推定値＾Θ_−nを計算する原理について説明する。 Here, the principle of calculating the estimated value ^ Θ- _n based on the one-point exclusion cross-validation method of the parameters of the probability model will be described.

本実施の形態では、重みパラメータ行列Wの値を、サンプル集合Dに対する同時確率モデルp(x, y)の、一点除外交差検定法に基づく予測尤度の最大化により与える。サンプル集合の中には誤ったカテゴリに分類されたサンプルも含まれるため、コンテンツの特徴ベクトルxと、カテゴリyと、正誤を表す潜在変数z∈{z₁,z₀}との同時確率モデルp(x,y,z)=p(x,y|z)P(z)を用いて、同時確率モデルp(x,y)=Σ¹ _j=0p(x,y|z_j)P(z_j)を設計する。サンプルのクラスが正しい場合(z=z₁)のp(x,y|z₁)を以下の（５）式で与える。 In the present embodiment, the value of the weight parameter matrix W is given by maximizing the prediction likelihood of the joint probability model p (x, y) for the sample set D based on the one-point exclusion cross validation method. Since the sample set includes samples classified in the wrong category, the joint probability model p of the content feature vector x, the category y, and the latent variable z∈ {z ₁ , z ₀ } representing the right or wrong Using (x, y, z) = p (x, y | z) P (z), the joint probability model p (x, y) = Σ ¹ _{j = 0} p (x, y | z _j ) P ( z _j ) is designed. When the class of the sample is correct (z = z ₁ ), p (x, y | z ₁ ) is given by the following equation (5).

また、サンプルのクラスが誤りである場合(z=z₀)のp(x,y|z₀)を、以下の（６)式で与える。 Further, p (x, y | z ₀ ) when the sample class is incorrect (z = z ₀ ) is given by the following equation (6).

このとき、重みパラメータ行列Wの値は、以下の（７)式に示す目的関数の最大化により与えられる。 At this time, the value of the weight parameter matrix W is given by maximization of the objective function shown in the following equation (7).

なお、上記（７)式のＬ（Ｗ）は、一点除外交差検定法に基づく同時確率モデルｐ（ｘ_n、ｙ_n）の予測尤度に対応している。 Note that L (W) in the above equation (7) corresponds to the prediction likelihood of the joint probability model p (x _n , y _n ) based on the one-point exclusion cross-validation method.

上記（７）式に示す目的関数を最大化させるWの値は、期待値最大化(EM)アルゴリズムのような繰り返し計算を二重に行うことによって求めることができる。なお、ＥＭアルゴリズムについては、参考文献（A. P. Dempster, N. M.Laird, and D. B. Rubin: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1−38 (1977)）に記載されているため、詳細な説明を省略する。 The value of W that maximizes the objective function shown in the above equation (7) can be obtained by performing repeated calculations such as the expected value maximization (EM) algorithm twice. Regarding the EM algorithm, reference literature (AP Dempster, NMLaird, and DB Rubin: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38 (1977)). Since it is described, detailed description is omitted.

Wの学習ステップ(t)での推定値をW^(t)とするとき、log b≦b−1よりL(W)−L(W^(t))≧Q(W,W^(t))−Q(W^(t),W^(t))を満たすＱ関数を以下の（８）式で与えることができる。 When the estimated value of the W in the learning step (t) and ^{W (t), log b ≦} b-1 than ^{L (W) -L (W (} t)) ≧ Q (W, W (t)) - A Q function satisfying Q (W ^(t) , W ^(t) ) can be given by the following equation (8).

上記（８）式に示すＱ関数は、一点除外交差検定法に基づく各サンプルの対数尤度を、上記正誤予測確率で重み付けして加算することにより与えられる。 The Q function shown in the above equation (8) is given by weighting and adding the log likelihood of each sample based on the one-point exclusion cross-validation method with the correct / incorrect prediction probability.

学習ステップ(t+1)でQ(W,W^(t))を最大化させる値を推定値W^(t+1)として繰り返し計算することでL(W)を局所的に最大化させるWを求めることができる。 W that maximizes L (W) locally by repeatedly calculating the value that maximizes Q (W, W ^(t) ) as an estimated value W ^{(t + 1) in the} learning step (t + 1) Can be sought.

そして、最終的に求められた重みパラメータ行列の推定値^Ｗに基づいて、上記（３）式に従って、確率モデルのパラメータの一点除外交差検定法に基づく推定値{＾Θ_−n}^N _n=1を計算する。 Then, based on the estimated value ^ W of the weight parameter matrix finally obtained, according to the above equation (3), the estimated value {^ Θ _−n } ^N _{n =} Calculate ₁

上記ステップＳ１０１の処理は、図４に示す確率モデル生成処理ルーチンによって実現される。 The processing in step S101 is realized by a probability model generation processing routine shown in FIG.

ステップＳ１１１において、正誤予測確率計算部３１によって、各サンプルｎに対して、以下のように、上記（９）式、（１０）式で与えられるP(z|x_n,y_n;＾Θ^(t) _−n)を計算する。 In step S111, the correct / error prediction probability calculation unit 31 applies P (z | x _n , y _n ; ^ Θ ⁽ ) given by the above equations (9) and (10) for each sample n as follows. ^t) _-n ) is calculated.

まず、入力された重みパラメータ行列の推定値W^(t)を上記（３）式及び（４）式に代入して得られるパラメータ値を＾Θ^(t) _−n=[＾θ^(t) _1,−n, ... , ＾θ^(t) _k,−n, ... , ＾θ^(t) _K,−n, ＾θ^(t) _z0,−n]とする。つぎに、その値を上記（１）式のθ_yと上記（２）式のθ_z0に代入することで、p(x_n|y_n;＾θ _yn,−n(w^(t) ₁)) とp(x_n;＾θ_z0,−n(w^(t) ₀ ))の値を得る。さらに、その値を、上記（９）式と（１０）式に代入して計算することでP(z|x_n,y_n;＾Θ^(t) _−n)を得る。すなわち、正誤予測確率計算部３１は、正しいカテゴリに分類されているサンプルに対して設計した同時確率モデルと、誤分類サンプルに対して設計した同時確率モデルと、を用いて正誤予測確率を計算する。上記（９）式と（１０）式に含まれる事前確率P(y),P(z)と、クラス条件付確率P(y|x, z₀)は、未知パラメータとして値を推定することも可能であるが、パラメータ計算アルゴリズムの簡略化と、これらのパラメータ値の調節によって精度向上を図るために、本実施の形態では、外部より与えるパラメータ値とする。 First, a parameter value obtained by substituting the input estimated value W ^(t) of the weight parameter matrix into the above equations (3) and (4) is represented as ^ Θ ^(t) _−n = [^ θ ^(t) _{1 , −n} , ..., ^ θ ^(t) _{k, −n} , ..., ^ θ ^(t) _{K, −n} , ^ θ ^(t) _{z0, −n} ]. Next, p (x _n | y _n ; ^ θ _{yn, −n} (w ^(t) ₁ ) is obtained by substituting the values into θ _y in the above equation (1) and θ _z0 in the above equation (2). ) And p (x _n ; ^ θ _{z0, −n} (w ^(t) ₀ )). Further, P (z | x _n , y _n ; ^ Θ ^(t) _−n ) is obtained by substituting the value into the above equations (9) and (10) and calculating. In other words, the correct / incorrect prediction probability calculation unit 31 calculates the correct / incorrect prediction probability using a joint probability model designed for a sample classified into a correct category and a joint probability model designed for a misclassified sample. . Prior probabilities P (y) and P (z) and class conditional probabilities P (y | x, z ₀ ) included in the equations (9) and (10) may be estimated as unknown parameters. Although possible, in order to improve accuracy by simplifying the parameter calculation algorithm and adjusting these parameter values, in this embodiment, the parameter values are given from the outside.

また、Q(W,W^(t))を最大化させるWの値は、Wの学習ステップ(s)での推定値をW^(s)とするとき、log ≦b−1よりQ(W,W^(t))−Q(W^(s),W^(t))≧Q'(W,W^(s)|W^(t))−Q'(W^(s),W^(s)|W^(t))を満たすQ関数を以下の（１１）式で与えることができる。 Also, the value of W that maximizes Q (W, W ^(t) ) is Q (W, W ^(t) from log ≦ b−1, where W ^(s) is the estimated value in W learning step (s). W ^(t) ) −Q (W ^(s) , W ^(t) ) ≧ Q '(W, W ^(s) | W ^(t) ) −Q ′ (W ^(s) , W ^(s) | W ^{( A} Q function satisfying ^t) ) can be given by the following equation (11).

したがって、学習ステップ(s+1)でQ'(W,W^(s)|W^(t))を最大化させる値を推定値W^(s+1)として繰り返し計算することでQ(W,W^(t))をW^(t)の近傍で局所的に最大化させるWを求めることができる。 Therefore, the value that maximizes Q ′ (W, W ^(s) | W ^(t) ) in the learning step (s + ¹⁾ is repeatedly calculated as the estimated value W ^{(s + 1)} , so that Q (W, W W that maximizes ^(t) ) locally in the vicinity of W ^(t) can be obtained.

ステップＳ１１２では、重み計算部３２によって、Q'(W,W^(s)|W^(t))を最大化させるWの解W^(s+1)を、以下の（１４）式、（１５）式に従って計算する。 In step S112, the weight calculation unit 32 calculates a solution W ^{(s + 1)} of W that maximizes Q ′ (W, W ^(s) | W ^(t) ⁾ by the following equation (14), (15): Calculate according to the formula.

学習ステップ（ｓ＋１）における重みパラメータ行列Ｗ^(s+1)を計算した後、ステップＳ１１３において、第１収束判定部３３によって、例えば以下の（１６）式で与える収束条件を満たすか否かを判定する。 After calculating the learning step (s + 1) in the weight parameter matrix W ^{(s + 1),} the determination in step S113, the first convergence determining unit 33, for example, whether the convergence conditions are satisfied given by the following equation (16) To do.

ここで、||w_(s)j||₂はベクトルw^(s) _jのL2ノルムを表す。ε_sは設計者が与える微小な値である。上記ステップＳ１１３で収束条件を満たすと判定された場合は、W^(s+1)をW^(t+1)として、ステップＳ１１４へ進む。一方、収束条件を満たさないと判定された場合は、s←s+1として、上記ステップＳ１１２からステップＳ１１３までの処理を繰り返す。 Here, || w _{(s) j} || ₂ represents the L2 norm of vector w ^(s) _j . ε _s is a minute value given by the designer. If it is determined in step S113 that the convergence condition is satisfied, W ^{(s + 1)} is set as W ^{(t + 1)} , and the process proceeds to step S114. On the other hand, if it is determined that the convergence condition is not satisfied, the process from step S112 to step S113 is repeated as s ← s + 1.

学習ステップ(t+1)における重みパラメータ行列W^(t+1)を計算したあと、ステップＳ１１４において、第２収束判定部３４によって、例えば以下の（１７）式で与える収束条件を満たすか否かを判定する。 After calculating the weighting parameter matrix W ^{(t + 1)} in the learning step (t + 1), in step S114, the second convergence determining unit 34, for example, whether the convergence conditions are satisfied given by the following equation (17) Determine.

ここで、ε_tは設計者が与える微小な値である。上記ステップＳ１１４で収束条件を満たすと判定された場合は、W^(t+1)を重みパラメータ行列の推定値＾W としてステップＳ１１５へ進む。一方、収束条件を満たさない場合は、t←t+1として、上記ステップＳ１１１からステップＳ１１４までの処理を繰り返す。 Here, ε _t is a minute value given by the designer. If it is determined in step S114 that the convergence condition is satisfied, the process proceeds to step S115 with W ^{(t + 1)} as an estimated value ^ W of the weight parameter matrix. On the other hand, if the convergence condition is not satisfied, the process from step S111 to step S114 is repeated as t ← t + 1.

ステップＳ１１５では、パラメータ計算部３５によって、重みパラメータ行列の推定値＾Wを上記（３）式に代入して、サンプルｎごとに確率モデルの一点除外交差検定法に基づくパラメータの推定値＾Θ_-nを計算して、記憶部２３に格納し、確率モデル生成処理ルーチンを終了する。 In step S115, the parameter calculation unit 35, the estimated value ^ W weight parameter matrices are substituted into equation (3), the estimated value of the parameter based on one point out cross assay probabilistic model for each sample n ^ theta _{- n} is calculated and stored in the storage unit 23, and the probability model generation processing routine is terminated.

以上説明したパラメータ計算アルゴリズムを整理して書くと下記のようになる。 The parameter calculation algorithm described above is summarized and written as follows.

手順１：各種パラメータの設定。
1. ハイパーパラメータηと、事前確率P(y),P(z)と、クラス条件付確率P(y|x, z₀)とを所定値として外部から設定。
2. 収束条件のパラメータε_t,ε_sと、最大繰り返し計算数t_max,s_maxの値を設定。 Step 1: Set various parameters.
1. Hyper parameter η, prior probabilities P (y), P (z), and class conditional probability P (y | x, z ₀ ) are set from the outside as predetermined values.
2. Set the convergence condition parameters ε _t , ε _s and the maximum number of iterations t _max , s _max .

手順２；学習ステップtと重みパラメータ行列の初期値を設定。
1. tに0を代入。
2. 重みパラメータ行列の値W^(t)を設定。 Procedure 2: Set initial values of learning step t and weight parameter matrix.
1. Substitute 0 for t.
2. Set the value W ^(t) of the weight parameter matrix.

手順3：重みパラメータ行列の推定値＾Wを計算。
1. W^(t)を用いて、上記（１）式〜（４）式、（９）式、（１０）式により、サンプルｎごとに、サンプルの正誤予測確率P(z|x_n,y_n;＾Θ^(t) _−n)を計算(ステップＳ１１１、図4)。
2. sに0を代入。W^(s)にW^(t)を代入。
3. Q(W,W^(t))を最大化させる重みパラメータ行列の値W^(t+1)を計算。
(a) W^(s)を用いて、上記（１２）式〜（１５）式によりW^(s+1)を計算(ステップＳ１１２、図4)。
(b) 上記（１６）式を用いて収束判定処理を実行(ステップＳ１１３、図4)。
4. 上記（１７）式を用いて収束判定処理を実行(ステップＳ１１４、図4)。 Step 3: Calculate the estimated value ^ W of the weight parameter matrix.
1. Using W ^(t) , the correct / incorrect prediction probability P (z | x _n , y of the sample for each sample n according to the above equations (1) to (4), (9), and (10). _n ; ^ Θ ^(t) _−n ) is calculated (step S111, FIG. 4).
2. Substitute 0 for s. Substitute W ^(t) for W ^(s) .
3. Calculate the weight parameter matrix value W ^{(t + 1)} that maximizes Q (W, W ^(t) ).
(a) Using W ^(s) , W ^{(s + 1)} is calculated by the above equations (12) to (15) (step S112, FIG. 4).
(b) Convergence determination processing is executed using the above equation (16) (step S113, FIG. 4).
4. Convergence determination processing is executed using the above equation (17) (step S114, FIG. 4).

手順4：収束した重みパラメータ行列の推定値＾Wを上記（３）式に代入して、サンプルｎごとに一点除外交差検定法に基づく確率モデルのパラメータの推定値＾Θ_-nを計算(ステップＳ１１５、図4)。 Step 4: Substitute the estimated weight parameter matrix estimated value ^ W into the above equation (3), and calculate the estimated model parameter estimated value ^ Θ- _n based on the one-point exclusion cross validation method for each sample n (step S115, FIG. 4).

手順5：パラメータの推定値{＾Θ_−n}^N _n=1を誤分類サンプル検出部２４に出力。 Step 5: Estimate the parameter {^ Θ _−n } ^N _{n = 1} is output to the misclassified sample detection unit 24.

そして、誤分類検出処理ルーチンのステップＳ１０２では、誤分類サンプル検出部２４によって、同時確率モデルのパラメータの推定値＾Θ_−nを用いて各サンプルｎの予測クラス事後確率P(y_n|x_n;＾Θ_−n)=p(x_n|y_n;＾θ_yn,−n)P(y_n)/Σ^K _k=1p(x_n|k;＾θ_k,−n)P(k)を計算し、その予測クラス事後確率が閾値以下となるサンプルを誤ったカテゴリに分類された疑いがあるサンプルとして検出する。あるいは、誤分類サンプル検出部２４によって、サンプルが分類されているカテゴリy_n以外のカテゴリy≠y_nに対する予測事後確率の最大値と予測クラス事後確率の比R_n=P(y_n|x_n;＾Θ_−n)/max_y≠ynP(y|x_n;＾Θ_−n)を計算し、R_nが閾値以下となるサンプルを検出しても良い。 In step S102 of the misclassification detection processing routine, the misclassification sample detection unit 24 uses the estimated value ^ Θ- _n of the parameter of the joint probability model to predict the posterior probability P (y _n | x _{n of} each sample _n. ; ^ Θ _−n ) = p (x _n | y _n ; ^ θ _{yn, −n} ) P (y _n ) / Σ ^K _{k = 1} p (x _n | k; ^ θ _{k, −n} ) P (k ) And the sample whose predicted class posterior probability is less than or equal to the threshold is detected as a sample that is suspected of being classified into the wrong category. Alternatively, the ratio between the maximum value of the predicted posterior probability and the predicted class posterior probability for a category y ≠ y _n other than the category y _n in which the sample is classified by the misclassified sample detection unit 24 R _n = P (y _n | x _n ; ^ Θ− _n ) / max _{y ≠ ynP} (y | _xn ; ^ Θ− _n ) may be calculated to detect samples in which R _n is equal to or less than a threshold value.

上記ステップＳ１０２の処理は、一般的な並び換えを行うアルゴリズムで簡単に実現することが可能であるので、これ以上の説明は省略する。 The processing in step S102 can be easily realized by a general rearrangement algorithm, and thus further description is omitted.

以上説明したように、第１の実施の形態に係る誤分類検出装置によれば、同時確率モデルｐ（ｘ_n、ｙ_n）のパラメータ値Θを、一点除外交差検定法に基づく同時確率モデルｐ（ｘ_n、ｙ_n）の予測尤度を最大化させるように、サンプルｎごとに設定した重みを用いて計算し、同時確率モデルｐ（ｘ_n、ｙ_n）のパラメータ値Θに基づいて計算される各サンプルｎの予測クラス事後確率Ｐ（ｙ_n｜ｘ_n）を用いて、誤分類サンプルを検出することにより、検出に利用する統計的分類器の学習に、誤分類されたサンプルが与える悪影響を抑制して、カテゴリが複数ある一般的な分類問題で誤分類されたサンプルを検出することができる。 As described above, according to the misclassification detection apparatus according to the first embodiment, the parameter value Θ of the joint probability model p (x _n , y _n ) is changed to the joint probability model p based on the one-point exclusion cross validation method. (x _n, y _n) so as to maximize the prediction likelihood of, using the weight set for each sample n is calculated, the joint probability model p (x _n, y _n) calculated based on the parameter value Θ of The misclassified samples are provided to the learning of the statistical classifier used for detection by detecting misclassified samples using the predicted class posterior probabilities P (y _n | x _n ) of each sample n It is possible to detect a sample misclassified by a general classification problem having a plurality of categories while suppressing adverse effects.

訓練データ集合に含まれる各サンプルに重みを与え、その重み付き訓練データ集合を用いて確率モデルに基づく統計的分類器のパラメータ値Θを学習させることで、訓練データ集合に含まれる誤分類されたサンプルが確率モデルの学習に与える悪影響を低減させることを可能にする。また、訓練データ集合に含まれる各サンプルに与えられた重みを、一点交差検定法に基づいて得られる確率モデルの尤度を最大化させるように設定することで、誤分類されたサンプルが正確に分類されたサンプルより少ない場合に、誤分類されたサンプルより正確に分類されたサンプルのカテゴリに高い予測確率を与える確率モデルを得る。この効果によって、カテゴリの予測確率を用いて誤分類されたサンプルの検出を行う装置の性能が高まる。 Each sample included in the training data set was weighted, and the weighted training data set was used to learn the parameter value Θ of the statistical classifier based on the probability model. It makes it possible to reduce the adverse effects that the sample has on learning the probabilistic model. In addition, by setting the weight given to each sample included in the training data set to maximize the likelihood of the probability model obtained based on the one-point cross-validation method, misclassified samples can be accurately identified. A probability model is obtained that gives a higher predictive probability for a category of samples that are more accurately classified than misclassified samples if there are fewer than classified samples. This effect enhances the performance of the device that detects misclassified samples using the predicted probability of the category.

〔第２の実施の形態〕
＜システム構成＞
次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Second Embodiment]
<System configuration>
Next, a second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、正誤予測確率計算部と第２収束判定部が省略されている点が、第１の実施の形態と異なっている。 The second embodiment is different from the first embodiment in that the correct / incorrect prediction probability calculation unit and the second convergence determination unit are omitted.

図５に示すように、第２の実施の形態に係る誤分類検出装置の確率モデル生成部２２２は、重み計算部２３２と、第１収束判定部２３３と、パラメータ計算部３５とを備えている。 As shown in FIG. 5, the probability model generation unit 222 of the misclassification detection apparatus according to the second embodiment includes a weight calculation unit 232, a first convergence determination unit 233, and a parameter calculation unit 35. .

重み計算部２３２は、重みパラメータ行列の初期値W⁽⁰⁾、もしくは第１収束判定部２３３から入力される収束途中の重みパラメータ行列W^(s)を用いて重みパラメータ行列の更新値W^(s+1)を計算する。 The weight calculation unit 232 uses the initial value W ⁽⁰⁾ of the weight parameter matrix or the weight parameter matrix W ^(s) in the middle of convergence input from the first convergence determination unit 233 to update the weight parameter matrix W ^{(s +1)} is calculated.

第１収束判定部２３３は、重みパラメータ行列の変化量d(s)を計算し、収束条件d(s)<ε_sを満たせば、^W←W^(s+1)として重みパラメータ行列の推定値^Wを出力する。収束条件を満たさなければ、パラメータの学習のステップをs←s+1のように更新して、重み計算部２３２の処理を再度実施する。この処理は収束条件を満たすか、sが所定の回数s_maxに到達するまで繰り返される。 The first convergence determination unit 233 calculates the change d (s) of the weight parameter matrix, and if the convergence condition d (s) <ε _s is satisfied, the weight parameter matrix is estimated as ^ W ← W ^{(s + 1).} Outputs the value ^ W. If the convergence condition is not satisfied, the parameter learning step is updated as s ← s + 1, and the process of the weight calculation unit 232 is performed again. This process is repeated until the convergence condition is satisfied or s reaches a predetermined number of times s _max .

本実施の形態では、コンテンツの特徴ベクトルxと、カテゴリyと、正誤を表す潜在変数z∈{z₁,z₀}との同時確率モデルp(x,y,z)=p(x,y|z)P(z)に対して、Ｐ（ｚ₁）＝１、Ｐ（ｚ₀）＝０とする。 In the present embodiment, a joint probability model p (x, y, z) = p (x, y) of a content feature vector x, a category y, and a latent variable z∈ {z ₁ , z ₀ } representing correctness / incorrectness. | z) For P (z), P (z ₁ ) = 1 and P (z ₀ ) = 0.

重みパラメータ行列Wの値は、以下の（１８)式に示す目的関数の最大化により与えられる。なお、Ｐ（ｚ₀）＝０としているため、重みパラメータ行列Wのうち、重みベクトルw₀=(w₀₁, . . . ,w_0n, . . . ,w_0N)^Tは計算されない。 The value of the weight parameter matrix W is given by maximization of the objective function shown in the following equation (18). Since P (z ₀ ) = 0, the weight vector w ₀ = (w ₀₁ , _... , W _0n , _... , W _0N ) ^T in the weight parameter matrix W is not calculated.

上記（１８）式に示す目的関数を最大化させるWの値は、上記の第１の実施の形態と同様に、期待値最大化(EM）アルゴリズムのような繰り返し計算を行うことによって求めることができる。 The value of W that maximizes the objective function shown in the above equation (18) can be obtained by performing an iterative calculation such as an expected value maximization (EM) algorithm, as in the first embodiment. it can.

Wの学習ステップ(ｓ)での推定値をW^(s)とするとき、log b≦b−1よりL(W)−L(W^(s))≧Q(W,W^(s))−Q(W^(s),W^(s))を満たすＱ関数を以下の（１９）式で与えることができる。 When the estimated value of the W in the learning step (s) and ^{W (s), log b ≦} b-1 than ^{L (W) -L (W (} s)) ≧ Q (W, W (s)) - A Q function satisfying Q (W ^(s) , W ^(s) ) can be given by the following equation (19).

以上のように、重みパラメータ行列Wの値は、一点除外交差検定法に基づく各サンプルの対数尤度の和の最大化により与えられる。なお、上記（１８)式のＬ（Ｗ）は、一点除外交差検定法に基づく各サンプルｎの対数尤度の和に対応している。 As described above, the value of the weight parameter matrix W is given by maximizing the sum of log likelihoods of each sample based on the one-point exclusion cross-validation method. Note that L (W) in the above equation (18) corresponds to the sum of log likelihoods of each sample n based on the one-point exclusion cross-validation method.

そして、最終的に求められた重みパラメータ行列の推定値^Ｗに基づいて、上記（３）式に従って、確率モデルの一点除外交差検定法に基づく推定値{＾Θ_−n}^N _n=1を計算する。 Then, based on the estimated value ^ W of the finally obtained weight parameter matrix, the estimated value {^ Θ _−n } ^N _{n = 1} based on the one-point exclusion cross-validation method of the probability model is calculated according to the above equation (3). calculate.

＜誤分類検出装置の作用＞
まず、所属するクラスがラベル付けされたコンテンツを含むサンプルの集合が誤分類検出装置１００に入力されると、誤分類検出装置１００によって、入力されたサンプル集合が、サンプルデータベース２１へ格納される。また、ハイパーパラメータベクトルηが、誤分類検出装置１００に入力されると、誤分類検出装置１００によって、上記第１の実施の形態と同様に、誤分類検出処理ルーチンが実行される。 <Operation of misclassification detection device>
First, when a set of samples including content labeled with the class to which it belongs is input to the misclassification detection apparatus 100, the input sample set is stored in the sample database 21 by the misclassification detection apparatus 100. When the hyperparameter vector η is input to the misclassification detection apparatus 100, the misclassification detection process routine is executed by the misclassification detection apparatus 100 as in the first embodiment.

また、第２の実施の形態に係る確率モデル生成処理ルーチンについて、図６を用いて説明する。なお、第１の実施の形態と同様の処理については、同一符号を付して詳細な説明を省略する。 A probability model generation processing routine according to the second embodiment will be described with reference to FIG. In addition, about the process similar to 1st Embodiment, the same code | symbol is attached | subjected and detailed description is abbreviate | omitted.

まず、ステップＳ２１１において、重み計算部２３２によって、上記（１９）式のＱ関数を最大化させるWの解W^(s+1)を計算する。 First, in step S211, the weight calculation unit 232 calculates a solution W ^{(s + 1)} of W that maximizes the Q function of the above equation (19).

学習ステップ（ｓ＋１）における重みパラメータ行列Ｗ^(s+1)を計算した後、ステップＳ２１２において、第１収束判定部２３３によって、上記（１６）式で与える収束条件を満たすか否かを判定する。 After calculating the learning step (s + 1) in the weight parameter matrix W ^{(s + 1),} at step S212, the the first convergence determining unit 233 determines whether or not a convergence condition is satisfied given by Equation (16).

上記ステップＳ２１２で収束条件を満たすと判定された場合は、W^(s+1)を^Wとして、ステップＳ１１５へ進む。一方、収束条件を満たさないと判定された場合は、s←s+1として、上記ステップＳ２１１の処理を繰り返す。 If it is determined in step S212 that the convergence condition is satisfied, W ^{(s + 1)} is set as ^ W and the process proceeds to step S115. On the other hand, if it is determined that the convergence condition is not satisfied, the process of step S211 is repeated as s ← s + 1.

ステップＳ１１５では、重みパラメータ行列の推定値＾W を上記（３）式に代入して、サンプルｎごとに確率モデルの一点除外交差検定法に基づくパラメータの推定値＾Θ_-nを計算して、記憶部２３に格納し、確率モデル生成処理ルーチンを終了する。 In step S115, the estimated value ^ W of the weight parameter matrix is substituted into the above equation (3) to calculate the estimated value ^ Θ- _n of the parameter based on the one-point exclusion cross validation method for each sample n, Store in the storage unit 23, and the probability model generation processing routine ends.

なお、第２の実施の形態に係る誤分類検出装置の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 Note that other configurations and operations of the misclassification detection apparatus according to the second embodiment are the same as those of the first embodiment, and thus description thereof is omitted.

上記で計算される確率モデルのパラメータの推定値＾Θ_−nは、上記の第１の実施の形態において、正誤の事前確率P(z)を、P(z₁)=1,P(z₀)=0に設定する場合に計算される確率モデルのパラメータの推定値＾Θ_−nと一致する。このように、確率モデル生成部を簡略化して設計することができる。 In the first embodiment, the estimated value ^ Θ _−n of the probability model parameter calculated as described above is obtained by calculating the correctness prior probability P (z) as P (z ₁ ) = 1, P (z ₀ ) Is equal to the estimated value ^ Θ- _n of the probability model parameter calculated when set to 0. Thus, the probability model generation unit can be designed in a simplified manner.

〔実施例〕
次に、上記の実施の形態に係る手法を適用して実験を行った結果について説明する。〔Example〕
Next, the results of experiments performed by applying the method according to the above embodiment will be described.

上位カテゴリとしてコンピュータに属する文書データを，５つのサブカテゴリのいずれかに分類する問題で、誤ったサブカテゴリに分類された文書データを検出する評価実験を行った。テキスト分類問題で性能評価に良く用いられるデータベース20 newsgroups（20News、参考文献(K. Nigam, A. McCallum, S. Thrun, and T. Mitchell: Text classification from labeled and unlabeled documents using EM. Machine Learning, Vol. 39, pp. 103−134, 2000.)参照）を用いた。 An evaluation experiment was performed to detect document data classified into an incorrect subcategory in the problem of classifying document data belonging to a computer as a higher category into one of five subcategories. 20 newsgroups (20News, reference (K. Nigam, A. McCallum, S. Thrun, and T. Mitchell: Text classification from labeled and unlabeled documents using EM. Machine Learning, Vol. 39, pp. 103-134, 2000.)).

評価用データセットを作成するため、1000個のサンプルを5つのサブカテゴリに属する文書データの中から無作為に抽出した。そして1000個のサンプルの中からｒ_m％のサンプルを無作為に選択し、文書データが属するサブカテゴリを別の4つのサブカテゴリのいずれかに無作為に変更することで誤分類サンプルを作成した。この操作によって得られた誤分類サンプルを含むデータセットを誤分類検出対象のサンプル集合として性能評価に用いた。性能評価の尺度には、情報検索タスクなどでサンプルの順位付けの良さを測るのによく利用される平均適合率(AP, Average Precision) を用いた。平均適合率は、誤分類サンプルの総数をMとするとき、以下の（２０）式で計算される。 In order to create an evaluation data set, 1000 samples were randomly extracted from document data belonging to 5 subcategories. Then, r _m % samples were randomly selected from the 1000 samples, and misclassified samples were created by randomly changing the subcategory to which the document data belongs to one of the other four subcategories. A data set including misclassified samples obtained by this operation was used for performance evaluation as a sample set for misclassification detection. As a performance evaluation scale, the average precision (AP), which is often used to measure the ranking of samples in information retrieval tasks, was used. The average precision is calculated by the following equation (20), where M is the total number of misclassified samples.

平均適合率は、値が大きいほどサンプルの順位付けの性能が高いことを示す。 The average precision indicates that the larger the value, the higher the performance of ranking the samples.

表１に、上記第１の実施の形態で説明した確率モデル生成部２２をもつ本発明に係わる装置でP(z₁)=0.5とした場合(方法1)に得られた平均適合率と、上記第２の実施の形態で説明した確率モデル生成部２２２をもつ本発明に係わる装置(方法2)で得られた平均適合率と、重みパラメータ行列を導入せずに単純に一点除外交差検定法をNBモデルに適用(方法3)して得られた平均適合率の結果を示す。 Table 1 shows the average precision obtained when P (z ₁ ) = 0.5 (method 1) in the apparatus according to the present invention having the probability model generation unit 22 described in the first embodiment, and The average precision obtained with the apparatus (method 2) according to the present invention having the probability model generation unit 222 described in the second embodiment and a simple one-point exclusion cross-validation method without introducing a weight parameter matrix Shows the result of the average precision obtained by applying to the NB model (Method 3).

実験では、各方法で得られたパラメータの推定値{＾Θ_−n}^N _n=1を用いてR_n=P(y_n|x_n;＾Θ_−n)/max_y-≠yn P(y|x_n;＾Θ_−n)を計算し、その値が小さい順に誤分類が疑われるサンプルとして検出した。上記表１より、ｒ_mの値を変えて行った実験のすべての場合で、方法１、方法２で得られた平均適合率が、方法３で得られた平均適合率を上回った。以上の結果より、誤って分類されている危険性が高い順にサンプルを検出するのに、本発明に係わる装置は効果があることが分かった。 In the experiment, R _n = P (y _n | x _n ; ^ Θ _−n ) / max _{y− ≠ yn} P (using the estimated values {^ Θ _−n } ^N _{n = 1} obtained by each method. y | x _n ; ^ Θ _−n ) was calculated and detected as samples in which misclassification was suspected in ascending order. From the above Table 1, in all cases the experiments conducted by changing the value of r _m, Method 1, is average precision obtained by the process 2, higher than the average precision obtained in Method 3. From the above results, it was found that the apparatus according to the present invention is effective in detecting samples in descending order of risk of being classified incorrectly.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 For example, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program may be provided by being stored in a computer-readable recording medium.

１０入力部
２０演算部
２２、２２２確率モデル生成部
２４誤分類サンプル検出部
３０出力部
３１正誤予測確率計算部
３２、２３２重み計算部
３３第１収束判定部
３４第２収束判定部
３５パラメータ計算部
１００誤分類検出装置
２３３収束判定部
DESCRIPTION OF SYMBOLS 10 Input part 20 Calculation part 22, 222 Probability model production | generation part 24 Misclassification sample detection part 30 Output part 31 Correct / correct prediction probability calculation part 32, 232 Weight calculation part 33 1st convergence determination part 34 2nd convergence determination part 35 Parameter calculation part 100 Misclassification Detection Device 233 Convergence Determination Unit

Claims

A misclassification detection device that detects misclassified samples from a set of samples to which a category to which content belongs is known, including misclassified samples that are contents classified into wrong categories,
Content x and the joint probability model p (x, y; theta) of the sample represented by category y an estimate ^ theta of parameters theta of, the joint probability model p (x based on one point out cross assay, y; A probability model generating means for calculating using a weight w _n set for each sample n so as to maximize the prediction likelihood of Θ )
Based on the estimated value { circumflex over (θ )} of the parameter value Θ of the joint probability model p (x 1 , y 2 Θ ) calculated by the probability model generation unit, the content x _n is classified for each sample _n . A misclassification that calculates a predicted class posterior probability P (y _n | x _n ; ^ Θ ) and detects misclassified samples based on the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n Sample detection means;
Only including,
The probability model generation means includes:
Based on a weight parameter estimate ^ W that defines a weight w _0n to set a large value for the incorrectly classified sample n and a weight w _1n to set a large value to the correctly classified sample n Using the estimated value Θ Θ _n of the parameter Θ based on the one-point exclusion cross-validation method, the correct or incorrect prediction probability P (z |) that gives a prediction of the latent variable z indicating whether each sample n is correctly classified x _n , y _n ; Θ _−n ), a correct / incorrect prediction probability calculation means;
Weight calculation means for calculating an estimated value ^ W of the weight parameter matrix using the correct / incorrect prediction probability P (z | x _n , y _n ; ^ Θ _-n ) calculated by the correct / incorrect prediction probability calculation means ;
The amount of change of the estimated value ^ W of the weight parameter matrix is calculated and the calculation by the correct / incorrect prediction probability calculating means until the estimated value ^ W of the weight parameter matrix satisfying the convergence condition is obtained or until a predetermined number of times is reached, and Convergence determining means for repeatedly performing calculation by weight calculating means;
Using the estimated value ^ W of the weight parameter matrix obtained by the iterative processing by the convergence determination means, a one-point exclusion cross-validation method for the parameter Θ of the joint probability model p (x, y; Θ) for each sample n A parameter calculation means for calculating an estimated value ^ Θ _-n based on
A misclassification detection device including:

A misclassification detection device that detects misclassified samples from a set of samples to which a category to which content belongs is known, including misclassified samples that are contents classified into wrong categories,
Content x and the joint probability model p (x, y; theta) of the sample represented by category y an estimate ^ theta of parameters theta of, the joint probability model p (x based on one point out cross assay, y; A probability model generating means for calculating using a weight w _n set for each sample n so as to maximize the prediction likelihood of Θ )
Based on the estimated value { circumflex over (θ )} of the parameter value Θ of the joint probability model p (x 1 , y 2 Θ ) calculated by the probability model generation unit, the content x _n is classified for each sample _n . A misclassification that calculates a predicted class posterior probability P (y _n | x _n ; ^ Θ ) and detects misclassified samples based on the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n Sample detection means;
Only including,
The probability model generation means includes:
An estimate of the weight parameter matrix that defines the weight w _1n that should be set to a large value for the correctly classified sample n so as to maximize the sum of the log likelihoods of each sample n based on the one-point exclusion cross-validation method ^ A weight calculation means for calculating W;
The amount of change in the estimated value ^ W of the weight parameter matrix is calculated, and the calculation process by the weight calculation means is repeated until the estimated value ^ W of the weight parameter matrix satisfying the convergence condition is obtained or until a predetermined number of times is reached. A convergence determination means to perform;
Using an estimated value ^ W obtained the weight parameter matrix by repeating processing by the convergence determination unit, for each sample n, before Symbol joint probability model p (x, y; Θ) one point out cross test parameters theta of A parameter calculation means for calculating the estimated value ^ Θ _-n based on the method ;
A misclassification detection device including:

4. The joint probability model p (x, y; Θ) = p (x | y; Θ) P (y) of p (x | y; Θ) is given by a Naive Bayes model whose parameter is represented by Θ. The misclassification detection apparatus according to 1 or 2.

A misclassification detection method for detecting a misclassified sample from a set of samples in which a category to which the content belongs is known, including a misclassified sample that is content classified into a wrong category,
By the probability model generation means, content x and the joint probability model p (x, y; theta) of the sample represented by category y an estimate of the parameter theta of ^ theta, the joint probability model based on a single point out cross assay p (x, y; Θ) the predicted likelihood of so as to maximize, calculating using the weight w _n set for each sample n,
The content x _n is classified for each sample n based on the estimated value ΘΘ of the parameter value Θ of the joint probability model p (x 1 , y 2 Θ ) calculated by the probability model generation means by the misclassified sample detection means. Calculated predicted class posterior probability P (y _n | x _n ; ^ Θ ) of the category y _{n, and based on} the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n Detecting a classification sample;
Only including,
Calculating the parameter value Θ of the joint probability model p (x _n , y _n ),
The weight parameter matrix defining the weight w _0n to set a large value for the sample n incorrectly classified and the weight w _1n to set a large value to the sample n correctly classified by the correct / incorrect prediction probability calculation means. Give a prediction of the latent variable z that represents whether each sample n is correctly classified, using the estimated value ^ Θ _-n of the parameter Θ based on the one-point exclusion cross-validation method based on the estimated value ^ W Calculating a correct / incorrect prediction probability P (z | x _n , y _n ; Θ _−n ) ;
The weight calculation means calculates the estimated value ^ W of the weight parameter matrix using the correct / incorrect prediction probability P (z | x _n , y _n ; ^ Θ _-n ) calculated by the correct / incorrect prediction probability calculation means. Steps,
By calculating the amount of change of the estimated value ^ W of the weight parameter matrix by the convergence determination means and calculating the correct or incorrect prediction probability until the estimated value ^ W of the weight parameter matrix satisfying the convergence condition is obtained or until a predetermined number of times is reached. Performing a calculation process by means and a repetition process of calculation by the weight calculation means;
By using the estimated value ^ W of the weight parameter matrix obtained by the iterative processing by the convergence determination means by the parameter calculation means, the parameter Θ of the joint probability model p (x, y; Θ) is calculated for each sample n. Calculating an estimate ^ Θ _-n based on a one-point exclusion cross-validation method ;
Misclassification detection method characterized by including Mukoto a.

A misclassification detection method for detecting a misclassified sample from a set of samples in which a category to which the content belongs is known, including a misclassified sample that is content classified into a wrong category,
By the probability model generation means, content x and the joint probability model p (x, y; theta) of the sample represented by category y an estimate of the parameter theta of ^ theta, the joint probability model based on a single point out cross assay p (x, y; Θ) the predicted likelihood of so as to maximize, calculating using the weight w _n set for each sample n,
The content x _n is classified for each sample n based on the estimated value ΘΘ of the parameter value Θ of the joint probability model p (x 1 , y 2 Θ ) calculated by the probability model generation means by the misclassified sample detection means. Calculated predicted class posterior probability P (y _n | x _n ; ^ Θ ) of the category y _{n, and based on} the predicted class posterior probability P (y _n | x _n ; ^ Θ ) of each sample n Detecting a classification sample;
Only including,
Calculating the parameter value Θ of the joint probability model p (x _n , y _n ),
A weight parameter that defines a weight w _1n for which a large value should be set for a correctly classified sample n so that the sum of log likelihoods of each sample n based on the one-point exclusion cross-validation method is maximized by the weight calculation means Calculating the matrix estimate ^ W;
By the convergence determination means, the amount of change of the estimated value ^ W of the weight parameter matrix is calculated, and until the estimated value ^ W of the weight parameter matrix satisfying the convergence condition is obtained or until a predetermined number of times is reached, the weight calculation means A step to repeat the calculation;
By the parameter calculating means, using the estimated value ^ W obtained the weight parameter matrix by repeating processing by the convergence determination unit, for each sample n, before Symbol joint probability model p (x, y; Θ) parameters theta Calculating an estimate ^ Θ _-n based on the one-point exclusion cross validation method ,
Including misclassification detection methods.

4. The joint probability model p (x, y; Θ) = p (x | y; Θ) P (y) of p (x | y; Θ) is given by a Naive Bayes model whose parameter is represented by Θ. 4. The misclassification detection method according to 4 or 5.

The program for functioning a computer as each means of the misclassification detection apparatus of any one of Claims 1-3.