JP2011221873A

JP2011221873A - Data classification method, apparatus and program

Info

Publication number: JP2011221873A
Application number: JP2010091773A
Authority: JP
Inventors: Toshiro Uchiyama; 俊郎内山; Takayuki Adachi; 貴行足立; Masashi Uchiyama; 匡内山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-04-12
Filing date: 2010-04-12
Publication date: 2011-11-04

Abstract

PROBLEM TO BE SOLVED: To improve accuracy in inclusion of a right answer as to a problem for determining a class group of classes to any one of which input data belongs.SOLUTION: Each likelihood calculation means calculates likelihood to be likelihood that predetermined input data respectively belong to a plurality of classes. About the predetermined input data, N likelihoods calculated by one of the plurality of likelihood calculation means are extracted in the ascending order, a certainty factor to be probability indicating that a correct class for the input data exists in a class group corresponding to the N likelihoods is calculated, and the class group corresponding to upper N likelihoods calculated by the likelihood calculation means which calculates the maximum certainty factor or the class group corresponding to upper N likelihoods in which a result obtained by reflecting the certainty factor to the likelihood and converting the likelihood is large is determined as a class group of classes to any one of which the input data may belong.

Description

本発明は、データ分類方法及び装置及びプログラムに係り、特に、入力データをいずれかのクラスに分類するためのデータ分類方法及び装置及びプログラムに関する。 The present invention relates to a data classification method, apparatus, and program, and more particularly, to a data classification method, apparatus, and program for classifying input data into any class.

入力データを分類する際に、複数の尤度算出手段の出力を同時に用いる方法として、最大尤度に対するクラスが入力データに対する正しいクラスであることを示す確信度を算出して、最大の確信度を示す尤度算出手段が出力するクラスや、確信度の大きさに応じて尤度を重み付け加算した値が最大となるクラスを、入力データが属するクラスとする分類方法がある（例えば、非特許文献1参照）。 When classifying input data, as a method of simultaneously using the outputs of multiple likelihood calculating means, calculate the certainty factor indicating that the class for the maximum likelihood is the correct class for the input data, and calculate the maximum certainty factor. There is a classification method in which a class output by the likelihood calculation means shown or a class having a maximum value obtained by weighted addition of likelihoods according to the level of certainty is defined as a class to which input data belongs (for example, non-patent literature) 1).

内山俊郎、別所克人、内山匡、奥雅博、"確信度推定を用いた複数分類器の結合"、人口知能学会、知能ベースシステム研究会予稿集、2009年1月．Toshio Uchiyama, Katsuto Bessho, Atsushi Uchiyama, Masahiro Oku, "Combination of multiple classifiers using belief estimation," Population Intelligence Society, Intelligent Base Systems Research Group Proceedings, January 2009.

しかしながら、上記の非特許文献１の技術では、複数の分類器の出力を統合して入力データに対する唯一のクラスを決定する方法であり、いずれかに属する複数のクラス（クラス群）を決定する問題には使えないという問題がある。 However, the technique of Non-Patent Document 1 described above is a method of determining the only class for input data by integrating the outputs of a plurality of classifiers, and the problem of determining a plurality of classes (class groups) belonging to any one of them There is a problem that cannot be used.

本発明は、上記の点に鑑みなされたもので、入力データがいずれかに属するクラス群を決定する問題に対し、正解のクラスが含まれるという意味での精度を向上させることが可能なデータ分類方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and is a data classification capable of improving accuracy in the sense that a correct class is included for a problem of determining a class group to which input data belongs. An object is to provide a method, an apparatus, and a program.

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、所定の入力データがいずれかに属するクラス群を決定するデータ分類装置であって、
所定の入力データが複数のクラスのそれぞれに属することの尤もらしさである尤度を算出し、尤度記憶手段１２に格納する機能を有し、分類手法または構成要素や特徴が互いに異なる複数の尤度算出手段１２０と、
所定の入力データについて、尤度記憶手段１２に格納されている、複数の尤度算出手段のうちの１つの尤度算出手段が算出した尤度を大きい順にＮ個抽出し、該Ｎ個の尤度に対応したクラス群の中に、該入力データに対する正しいクラスがあることを示す確からしさである確信度を算出し、確信度記憶手段１９に格納する確信度算出手段１３０と、
確信度記憶手段１９から尤度算出手段１２０の確信度を抽出し、最大の確信度を算出した尤度算出手段が算出した尤度の大きい上位Ｎ個に対応するクラス群、または、該尤度に該確信度を反映して変換した結果の大きい上位Ｎ個に対応するクラス群を、入力データがいずれかに属するクラス群であると決定するデータクラス群決定手段１５０と、を有する。 The present invention (Claim 1) is a data classification device for determining a class group to which predetermined input data belongs,
It has a function of calculating likelihood that is the likelihood that predetermined input data belongs to each of a plurality of classes, and storing the likelihood in the likelihood storage means 12, and a plurality of likelihoods having different classification methods or components and features. Degree calculation means 120;
For the predetermined input data, N likelihoods calculated by one likelihood calculating means among a plurality of likelihood calculating means stored in the likelihood storage means 12 are extracted in descending order, and the N likelihoods are extracted. A certainty factor which is a certainty indicating that there is a correct class for the input data in the class group corresponding to the degree, and stores the certainty factor in the certainty factor memory unit 19;
The confidence level of the likelihood calculating means 120 is extracted from the certainty degree storage means 19, and the class group corresponding to the top N pieces with the highest likelihood calculated by the likelihood calculating means that has calculated the maximum certainty degree, or the likelihood And a data class group determining unit 150 that determines that the class group corresponding to the top N items having a large conversion result reflecting the certainty factor is a class group to which the input data belongs.

また、本発明（請求項２）は、請求項１のデータ分類装置において、
確信度算出手段１３０は、
確信度の大きさに応じて各クラスに属する尤度を重み付けし、全ての尤度算出手段にわたって加算した値を求める手段を含み、
データクラス群決定手段１５０は、
全ての尤度算出手段にわたって加算した値の大きい方からＮ個に対応するクラス群を入力データがいずれかに属するクラス群であると決定する手段を含む。 The present invention (Claim 2) is the data classification apparatus according to Claim 1,
The certainty factor calculation means 130
Including means for weighting the likelihood belonging to each class according to the degree of certainty, and obtaining a value added over all likelihood calculating means;
The data class group determining means 150
Means for determining that the class group corresponding to the N class having the larger value added over all likelihood calculating means is the class group to which the input data belongs.

図２は、本発明の原理を説明するための図である。 FIG. 2 is a diagram for explaining the principle of the present invention.

本発明（請求項３）は、所定の入力データがいずれかに属するクラス群を決定するデータ分類方法であって、
分類手法または構成要素や特徴が互いに異なる複数の尤度算出手段、尤度記憶手段、確信度記憶手段を有する装置において、
各尤度算出手段が、所定の入力データが複数のクラスのそれぞれに属することの尤もらしさである尤度を算出し、尤度記憶手段に格納する尤度算出ステップ（ステップ１）と、
確信度算出手段が、所定の入力データについて、尤度記憶手段に格納されている、複数の尤度算出手段のうちの１つの尤度算出手段が算出した尤度を大きい順にＮ個抽出し、該Ｎ個の尤度に対応したクラス群の中に、該入力データに対する正しいクラスがあることを示す確からしさである確信度を算出し、確信度記憶手段１９に格納する確信度算出ステップ（ステップ２）と、
確信度記憶手段から前記尤度算出手段の確信度を抽出し、最大の確信度を算出した尤度算出手段が算出した尤度の大きい上位Ｎ個に対応するクラス群、または、該尤度に該確信度を反映して変換した結果の大きい上位Ｎ個に対応するクラス群を、入力データがいずれかに属するクラス群であると決定するデータクラス群決定ステップ（ステップ３）と、を行う。 The present invention (Claim 3) is a data classification method for determining a class group to which predetermined input data belongs,
In an apparatus having a plurality of likelihood calculating means, likelihood storing means, and certainty degree storing means with different classification methods or components and features,
A likelihood calculating step (step 1) in which each likelihood calculating means calculates a likelihood that is a likelihood that predetermined input data belongs to each of a plurality of classes, and stores the likelihood in the likelihood storage means;
The certainty factor calculation means extracts N pieces of likelihoods calculated by one likelihood calculation means among a plurality of likelihood calculation means stored in the likelihood storage means for predetermined input data in descending order, A certainty factor calculating step for calculating a certainty factor that is a certainty indicating that there is a correct class for the input data in the class group corresponding to the N likelihoods, and storing the certainty factor in the certainty factor storage means 19 (step 2) and
The reliability of the likelihood calculation means is extracted from the reliability storage means, and the class group corresponding to the top N pieces with the highest likelihood calculated by the likelihood calculation means that has calculated the maximum certainty degree, or the likelihood A data class group determining step (step 3) is performed for determining that the class group corresponding to the top N items having a large conversion result reflecting the certainty factor is the class group to which the input data belongs.

また、本発明（請求項４）は、請求項４の確信度算出ステップ（ステップ２）において、
確信度の大きさに応じて各クラスに属する尤度を重み付けし、全ての尤度算出手段にわたって加算した値を求め、
データクラス群決定ステップ（ステップ３）において、
全ての尤度算出手段にわたって加算した値の大きい方からＮ個に対応するクラス群を入力データがいずれかに属するクラス群であると決定する。 Further, according to the present invention (Claim 4), in the certainty calculation step (Step 2) of Claim 4,
Weighting the likelihood belonging to each class according to the magnitude of certainty, obtaining a value added over all likelihood calculating means,
In the data class group determination step (step 3),
The class group corresponding to the N classes having the larger value added over all likelihood calculation means is determined as the class group to which the input data belongs.

本発明（請求項５）は、請求項１または２のいずれか１項に記載のデータ分類装置を構成する手段としてコンピュータを機能させるためのデータ分類プログラムである。 The present invention (Claim 5) is a data classification program for causing a computer to function as means constituting the data classification apparatus according to any one of Claims 1 and 2.

上述のように、本発明では、尤度算出手段が算出した複数の尤度のうちで、尤度の大きい方からＮ個に対応したクラス群の中に、入力データに対する正しいクラスがあることを示す確からしさである確信度を算出し、複数の尤度算出手段の出力を統合して、入力データがいずれかに属するクラス群を決定することにより、入力データがいずれかに属するクラス群を決定するデータ分類問題において、正解のクラスが含まれるという意味での精度を向上させることが可能となる。 As described above, in the present invention, among the plurality of likelihoods calculated by the likelihood calculating means, there is a correct class for the input data in the class group corresponding to N items having the highest likelihood. Calculate the certainty that is the certainty to be shown, integrate the outputs of multiple likelihood calculation means, and determine the class group to which the input data belongs, thereby determining the class group to which the input data belongs In the data classification problem, it is possible to improve the accuracy in the sense that the correct class is included.

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の一実施の形態におけるデータ分類装置のハードウェア構成図である。It is a hardware block diagram of the data classification device in one embodiment of the present invention. 本発明の一実施の形態におけるデータ分類装置の機能構成図である。It is a functional block diagram of the data classification device in one embodiment of this invention. 本発明の一実施の形態におけるデータ分類装置の動作のフローチャートである。It is a flowchart of operation | movement of the data classification device in one embodiment of this invention.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図３は、本発明の一実施の形態におけるデータ分類装置のハードウェア構成を示す。 FIG. 3 shows a hardware configuration of the data classification device according to the embodiment of the present invention.

同図に示すデータ分類装置は、ＣＰＵ１１、メモリ１２、ディスプレイ１３、キーボード１４、処理プログラム１５、処理対象記憶部１６、ＯＳ（オペレーティングシステム）１７、算出パラメータ記憶部１８、確信度記憶部１９、クラス群番号記憶部２０から構成される。 The data classification apparatus shown in the figure includes a CPU 11, a memory 12, a display 13, a keyboard 14, a processing program 15, a processing target storage unit 16, an OS (operating system) 17, a calculation parameter storage unit 18, a certainty factor storage unit 19, a class. The group number storage unit 20 is configured.

上記の処理対象記憶部１６、算出パラメータ記憶部１８、確信度記憶部１９、クラス群番号記憶部２０はハードディスク等の記憶媒体である。 The processing target storage unit 16, the calculation parameter storage unit 18, the certainty degree storage unit 19, and the class group number storage unit 20 are storage media such as a hard disk.

以下に示すデータ分類の処理は、上記の処理プログラム１５によりＣＰＵ１１で実行されるが、図４に、本データ分類装置の処理機能構成を示す。 The processing of data classification shown below is executed by the CPU 11 by the processing program 15 described above. FIG. 4 shows the processing function configuration of the data classification apparatus.

同図に示すデータ分類装置は、尤度算出制御部１１０、ｎ（ｎ≧１）個の尤度算出部１２０、確信度計算部１３０、クラス群抽出部１４０、クラス群決定部１５０を有する。 The data classification apparatus shown in the figure includes a likelihood calculation control unit 110, n (n ≧ 1) likelihood calculation units 120, a certainty calculation unit 130, a class group extraction unit 140, and a class group determination unit 150.

尤度算出制御部１１０は、処理対象記憶部１６から処理対象となるデータの特徴量を読み込み、メモリ１２に格納すると共に、キーボード１４から尤度算出部１２０の個数を取得する。 The likelihood calculation control unit 110 reads the feature amount of the data to be processed from the processing target storage unit 16 and stores it in the memory 12, and acquires the number of likelihood calculation units 120 from the keyboard 14.

尤度算出部１２０は、ｎ個（ｎ≧１）存在し、それぞれ分類手法または構成要素や特徴が互いに異なる。尤度算出部１２０は、各クラスにおいて、尤度算出制御部１１０から入力された入力データの特徴量が出現する確率の積を求め、これを当該クラスに属する尤度とし、メモリ１２に格納する。 There are n likelihood calculation units 120 (n ≧ 1), and classification methods, components, and features are different from each other. The likelihood calculation unit 120 obtains the product of the probability that the feature quantity of the input data input from the likelihood calculation control unit 110 appears in each class, and stores this in the memory 12 as the likelihood belonging to the class. .

確信度計算部１３０は、メモリ１２から各尤度算出部１２０で求められた尤度を取得して、当該尤度が大きい順に所定数抽出し、算出パラメータ１８から取得した算出パラメータを用いて確信度を求め、当該尤度を算出した尤度算出部１２０の識別子（番号）と共に確信度記憶部１９に格納する。 The certainty factor calculation unit 130 acquires the likelihood obtained by each likelihood calculation unit 120 from the memory 12, extracts a predetermined number in descending order of the likelihood, and uses the calculation parameter acquired from the calculation parameter 18 to be certain The degree is obtained and stored in the certainty degree storage unit 19 together with the identifier (number) of the likelihood calculating unit 120 that has calculated the likelihood.

クラス群抽出部１４０は、メモリ１２から入力データが各クラスに属する尤度の大きい順にＮ個のクラスを抽出し、これをクラス群とし、当該クラス群の番号を尤度と共にクラス群番号記憶部２０に格納する。 The class group extraction unit 140 extracts N classes in descending order of the likelihood that the input data belongs to each class from the memory 12, sets this as a class group, and class number number storage unit together with the likelihood of the class group 20.

クラス群決定部１５０は、確信度記憶部１９から確信度と当該確信度を算出した尤度算出部の識別子（番号）を読み込み、その中で最大の確信度を算出した尤度算出部１２０を選択し、選択された尤度算出部１２０が算出した尤度が大きい順にＮ個のクラスを抽出し、当該Ｎ個の尤度に対応するクラス群の番号をクラス群番号記憶部２０から抽出する。この抽出されたクラス群の番号を入力されたデータが属するクラス群として出力する。 The class group determination unit 150 reads the certainty factor and the identifier (number) of the likelihood calculating unit that has calculated the certainty factor from the certainty factor storage unit 19, and calculates the likelihood calculating unit 120 that has calculated the maximum certainty factor among them. N classes are extracted in the descending order of the likelihoods calculated and selected by the selected likelihood calculating unit 120, and class group numbers corresponding to the N likelihoods are extracted from the class group number storage unit 20. . The extracted class group number is output as the class group to which the input data belongs.

次に、上記の一連の動作を説明する。 Next, the above series of operations will be described.

図５は、本発明の一実施の形態におけるデータ分類装置の動作のフローチャートである。 FIG. 5 is a flowchart of the operation of the data classification device in one embodiment of the present invention.

ステップ１０１）尤度算出制御部１１０は、処理対象記憶部１６から、処理対象である入力データＷの特徴量をメモリ１２上に読み込む。 Step 101) The likelihood calculation control unit 110 reads the feature amount of the input data W that is the processing target from the processing target storage unit 16 into the memory 12.

ステップ１０２）尤度算出制御部１１０は、キーボード１４から入力された尤度算出部１２０の数ｎを取得する。 Step 102) The likelihood calculation control unit 110 acquires the number n of the likelihood calculation units 120 input from the keyboard 14.

ステップ１０３）尤度算出制御部１１０は、尤度算出部１２０の番号ｉを１に初期化（ｉ＝１）する。 Step 103) The likelihood calculation control unit 110 initializes the number i of the likelihood calculation unit 120 to 1 (i = 1).

ステップ１０４）尤度算出部１２０の番号がｉ≦ｎであればステップ１０５に移行し、そうでなければ（ｉ＞ｎ）、ステップ１１１に移行する。 Step 104) If the number of the likelihood calculation unit 120 is i ≦ n, the process proceeds to Step 105; otherwise (i> n), the process proceeds to Step 111.

ステップ１０５）尤度算出制御部１１０は、入力データＷの特徴を、ｉ番目の尤度算出部１２０（LCi）に入力し、当該尤度算出部１２０iは、入力データＷがクラスＣk(ｋ＝１，…，Ｋ）に属する尤度Ｐi（Ｗ｜Ｃk）（ｋ＝１，…，Ｋ）を算出し、当該尤度算出部１２０iの識別子（番号）と共にメモリ１２に格納する。尤度算出処理では、各クラスにおいて、入力データＷの特徴（または、データの特徴）が出現する確率の積を、入力データＷの特徴が当該クラスに属する尤度であると判断する。なお、特徴が出現する確率を求める方法としては、文献１「上田修功、斉藤和巳著「多重トピックテキストの確率モデルテキストモデルの研究の最前線（１）」情報処理学会、会誌「情報処理」４５巻２号、pp. 184-190, 2004年2月」や文献２「上田修功、斉藤和巳著「多重トピックテキストの確率モデルテキストモデルの研究の最前線（１）」情報処理学会、会誌「情報処理」４５巻３号、pp.282-289, 2004年3月」に記載されているナイーブベイズモデルという方法を使用するようにしてもよい。 Step 105) The likelihood calculation control unit 110 inputs the characteristics of the input data W to the i-th likelihood calculation unit 120 (LCi), and the likelihood calculation unit 120i has the input data W of class Ck (k = k = 1, Pi, (W | Ck) (k = 1,..., K) belonging to 1,..., K) is calculated and stored in the memory 12 together with the identifier (number) of the likelihood calculating unit 120i. In the likelihood calculation process, the product of the probability that the feature (or data feature) of the input data W appears in each class is determined as the likelihood that the feature of the input data W belongs to the class. In addition, as a method for obtaining the probability of appearance of a feature, Reference 1 “Osamu Ueda, Kazuaki Saito” “Probability of text model for multi-topic text (1)”, Information Processing Society of Japan, Journal “Information Processing” 45 Volume 2, No. pp. 184-190, February 2004 ”and Reference 2“ Usuda Nobuyoshi and Saito Kazuaki “Probability of text models for multi-topic texts (1)”, Information Processing Society of Japan, Journal “Information You may make it use the method of the naive Bayes model described in the process "Vol. 45 No. 3, pp.282-289, March, 2004".

ステップ１０６）確信度算出部１３０は、メモリ１２に格納されている上記入力データＷが各クラスに属する尤度Ｐi（Ｗ｜Ｃk）を用い、第ｉ番目の尤度算出部１２０（LCi）の確信度 Step 106) The certainty factor calculation unit 130 uses the likelihood Pi (W | Ck) to which the input data W stored in the memory 12 belongs to each class, and uses the i-th likelihood calculation unit 120 (LCi). Confidence

を次のように算出する。

Is calculated as follows.

まず、最大の尤度を First, the maximum likelihood

とし、以下尤度が大きい順に第Ｎ＋１番目に大きな尤度までを

And the N + 1th largest likelihood in descending order of likelihood.

とする。

And

第ｉ番目の尤度算出部１２０（LCi）用の確信度算出パラメータ Reliability calculation parameter for i-th likelihood calculation unit 120 (LCi)

を、算出パラメータ記憶部１８から読み込み、次の式(1)，(2)によって確信度

Is read from the calculated parameter storage unit 18, and the certainty factor is obtained by the following equations (1) and (2).

を求める。

Ask for.

なお、上記の算出パラメータを求める場合、文献３「丹後俊郎、山岡和枝、高木春良著、「ロジスティック回帰分析」朝倉書店」に記載されているロジスティック回帰分析において、正しいクラスがわかっているデータの集合の各データを尤度算出部１２０に入力し、算出される各クラスに属する尤度を大きい順に第Ｎ＋１位までとったものを説明変数、尤度が大きい順に第Ｎ位以内に正しいクラスがある（＝１）か否（＝０）かを結果変数として最尤法によりロジスティック回帰パラメータを算出したものを使用してもよい。

When obtaining the above calculation parameters, a set of data for which the correct class is known in the logistic regression analysis described in Reference 3 “Toshiro Tango, Kazue Yamaoka, Haruyoshi Takagi,“ Logistic regression analysis ”Asakura Shoten” Are input to the likelihood calculation unit 120, and the likelihoods belonging to each class calculated are taken as the explanatory variables, and the correct class is within the Nth order in descending order of likelihood. A logistic regression parameter calculated by the maximum likelihood method using (= 1) or not (= 0) as a result variable may be used.

ステップ１０７）確信度計算部１３０は、上記で求められた確信度 Step 107) The certainty factor calculation unit 130 obtains the certainty factor obtained above.

を尤度算出部１２０の識別子（番号）と共に確信度記憶部１９に格納する。

Are stored in the certainty storage unit 19 together with the identifier (number) of the likelihood calculating unit 120.

ステップ１０８）クラス群抽出部１４０は、メモリ１２に格納されている上記の入力データＷの特徴量が各クラスに属する尤度Ｐi（Ｗ｜Ｃk）の大きい順に第Ｎ位以内に対応するクラス群を抽出し、これをhⁱとする。 Step 108) The class group extracting unit 140 corresponds to the class group within the Nth order in descending order of the likelihood Pi (W | Ck) in which the feature amount of the input data W stored in the memory 12 belongs to each class. , And let this be h ⁱ .

ステップ１０９）クラス群抽出部１４０は、上記クラスタ群hⁱを、クラス群番号記憶部２０に保存する。 Step 109) The class group extraction unit 140 stores the cluster group h ⁱ in the class group number storage unit 20.

ステップ１１０）尤度算出部１２０の番号ｉをｉ＋１として、ステップ１０４に移行する。 Step 110) The number i of the likelihood calculation unit 120 is set to i + 1, and the process proceeds to Step 104.

ステップ１１１）ステップ１０４において、ｉ＞ｎである場合は、入力データがいずれかに属するクラス群を決定する。つまり、確信度記憶部１９から各尤度算出部１２０の確信度 Step 111) If i> n in step 104, the class group to which the input data belongs is determined. That is, the certainty factor of each likelihood calculating unit 120 from the certainty factor storage unit 19.

を抽出し、これを互いに比較し、最大の確信度を出力した尤度算出部１２０を選択し、当該尤度算出部１２０において尤度が大きい順に第Ｎ位までのクラス群の番号を、クラス群番号記憶部２０から抽出し、入力データがいずれかに属するクラス群ｈであるとする。

Are extracted, and the likelihood calculation unit 120 that outputs the maximum certainty factor is selected. The likelihood calculation unit 120 selects the class group numbers up to the Nth order in descending order of likelihood. It is assumed that the class group h is extracted from the group number storage unit 20 and the input data belongs to one of them.

ステップ１１２）上記のｈを、入力データがいずれかに属するクラス群として出力する。 Step 112) The above h is output as a class group to which the input data belongs.

なお、上記の図４に示すデータ分類装置の各構成要素の機能をプログラムとして構築し、データ分類装置として利用されるコンピュータのＣＰＵにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 It should be noted that the function of each component of the data classification apparatus shown in FIG. 4 is constructed as a program and installed and executed on a CPU of a computer used as the data classification apparatus, or distributed via a network. Is possible.

また、構築されたプログラムをハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 In addition, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

１１ＣＰＵ
１２尤度記憶手段、メモリ
１３ディスプレイ
１４キーボード
１５処理プログラム
１６処理対象記憶部
１７ＯＳ（オペレーティングシステム）
１８算出パラメータ記憶部
１９確信度記憶手段、確信度記憶部
２０クラス群番号記憶部
１００データ分類装置
１１０尤度算出制御部
１２０尤度算出手段、尤度算出部
１３０確信度算出手段、確信度算出部
１４０クラス群抽出部
１５０データクラス群決定手段、データクラス群決定部 11 CPU
12 Likelihood storage means, memory 13 Display 14 Keyboard 15 Processing program 16 Processing target storage unit 17 OS (Operating System)
18 Calculation parameter storage unit 19 Certainty factor storage unit, certainty factor storage unit 20 Class group number storage unit 100 Data classification device 110 Likelihood calculation control unit 120 Likelihood calculation unit, Likelihood calculation unit 130 Certainty factor calculation unit, Certainty factor calculation Unit 140 class group extraction unit 150 data class group determination means, data class group determination unit

Claims

A data classification device for determining a class group to which predetermined input data belongs,
The likelihood that the predetermined input data belongs to each of a plurality of classes is calculated and stored in the likelihood storage means, and a plurality of likelihoods that have different classification methods or components and features from each other. Degree calculation means;
N pieces of the predetermined input data are extracted in descending order of likelihoods calculated by one likelihood calculating means among a plurality of likelihood calculating means stored in the likelihood storage means, A certainty factor calculating means for calculating a certainty factor that is a certainty indicating that there is a correct class for the input data in the class group corresponding to the likelihood, and storing the certainty factor in the certainty factor storing unit;
The confidence level of the likelihood calculation means is extracted from the confidence measure storage means, and the class group corresponding to the top N largest likelihoods calculated by the likelihood calculation means that has calculated the maximum confidence, or the likelihood Data class group determining means for determining that the input data is a class group to which any one of the class groups corresponding to the top N items having a large conversion result reflecting the certainty factor,
A data classification apparatus comprising:

The certainty factor calculating means includes:
Means for weighting the likelihood belonging to each class according to the magnitude of the certainty factor and obtaining a value added over all the likelihood calculating means;
The data class group determining means includes
2. The data classification device according to claim 1, further comprising means for determining that the class group corresponding to N from the largest value added over all the likelihood calculating means is a class group to which the input data belongs. .

A data classification method for determining a class group to which predetermined input data belongs,
In an apparatus having a plurality of likelihood calculating means, likelihood storing means, and certainty degree storing means with different classification methods or components and features,
A likelihood calculating step in which each likelihood calculating means calculates a likelihood that is a likelihood that predetermined input data belongs to each of a plurality of classes, and stores the likelihood in the likelihood storage means;
The certainty factor calculation means extracts N pieces of likelihoods calculated by one likelihood calculation means among a plurality of likelihood calculation means stored in the likelihood storage means for predetermined input data in descending order. And a certainty factor calculating step of calculating a certainty factor that is a certainty indicating that there is a correct class for the input data in the class group corresponding to the N likelihoods, and storing the certainty factor in a certainty factor storing unit; ,
The confidence level of the likelihood calculation means is extracted from the confidence measure storage means, and the class group corresponding to the top N largest likelihoods calculated by the likelihood calculation means that has calculated the maximum confidence, or the likelihood Performing a data class group determination step for determining, as a class group to which the input data belongs, one of the class groups corresponding to the top N items having a large conversion result reflecting the certainty factor. Data classification method to be performed.

In the certainty factor calculating step,
Weighting the likelihood belonging to each class according to the magnitude of the certainty factor, obtaining a value added over all the likelihood calculating means,
In the data class group determination step,
The data classification method according to claim 3, wherein the class group corresponding to N classes having the larger value added over all the likelihood calculation means is determined to be a class group to which the input data belongs.

A data classification program for causing a computer to function as means for configuring the data classification device according to claim 1.