JP5353443B2

JP5353443B2 - Data classifier creation device, data classifier, data classifier creation method, data classification method, data classifier creation program, data classification program

Info

Publication number: JP5353443B2
Application number: JP2009125157A
Authority: JP
Inventors: 正明牧野; 一郎宍戸
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2009-05-25
Filing date: 2009-05-25
Publication date: 2013-11-27
Anticipated expiration: 2029-05-25
Also published as: JP2010272053A

Abstract

<P>PROBLEM TO BE SOLVED: To allow a plurality of evaluators or a plurality of decision criteria to be used for one piece of input data. <P>SOLUTION: In a data classifier creation device 1, first a 1:1 data creation part 20 creates a 1:1 data set being a set of 1:1 data for classifier creation on the basis of data stored in an instance data storage part 11 and stores the 1:1 data set in a 1:1 data storage part 12. A classifier creation part 30 generates a plurality of classifiers on the basis of the 1:1 data set stored in the 1:1 data storage part 12 and stores them in a classifier storage part 13. A classifier output part 40 reads a plurality of classifiers from the classifier storage part 13 and outputs the plurality of classifiers via an external connection part 100. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、デジタルコンテンツの分類技術、及び検索技術に係り、デジタルコンテンツをクラスに分類するための分類器を作成するデータ分類器作成装置、データ分類器作成方法、データ分類器作成プログラムと、作成された分類器に基づきデジタルコンテンツを分類するデータ分類装置、データ分類方法、データ分類プログラムに関する。 The present invention relates to a digital content classification technique and a search technique, and relates to a data classifier creation apparatus, a data classifier creation method, a data classifier creation program, and a creation program for creating a classifier for classifying digital contents into classes. The present invention relates to a data classification device, a data classification method, and a data classification program for classifying digital contents based on the classified classifier.

近年、デジタルコンテンツに関する圧縮技術の発展や、大容量記憶媒体の普及を背景にして、大量のデジタルコンテンツを大容量記憶媒体やコンピュータ等に格納して活用することが広く行われている。このようにデジタルコンテンツの格納数が増大するに伴い、所望のデジタルコンテンツを検索する技術への要求も高まっている。その検索方法の一つとして、デジタルコンテンツをカテゴリに予め分類することで、ユーザがそのカテゴリ情報を利用して、所望のデジタルコンテンツを取得するといった方法がある。デジタルコンテンツをカテゴリに分類する手法としては、ユーザ自身が直接分類する方法の他、入力信号から演算処理によってカテゴリ分類を行う分類器を利用する方法がある。更に、複数の分類器を組み合わせて、分類性能を向上させる技術が、特許文献１および特許文献２において開示されている。 In recent years, a large amount of digital content has been widely stored and utilized in a large-capacity storage medium, a computer, etc., against the background of the development of compression technology related to digital content and the widespread use of large-capacity storage media. As the number of stored digital contents increases in this way, there is an increasing demand for a technique for searching for desired digital contents. As one of the search methods, there is a method in which digital contents are classified into categories in advance, and a user acquires desired digital contents using the category information. As a method of classifying digital contents into categories, there is a method of using a classifier that performs category classification by an arithmetic process from an input signal, in addition to a method of direct classification by the user himself / herself. Furthermore, Patent Literature 1 and Patent Literature 2 disclose techniques for improving classification performance by combining a plurality of classifiers.

特許文献１には、概念学習部により生成される複数のルール／判別木の出力を多数決部に入力し、多数決部の出力を最終的な分類結果とする技術が記載されている。
また、特許文献２には、学習データを基に複数の決定木を構築し、それら決定木を加重多数決法によって線形結合する際に、経験誤り確率から推定される誤り確率が小さくなるように学習するという分類器の作成システムが記載されている。 Patent Document 1 describes a technique in which outputs of a plurality of rules / discriminant trees generated by a concept learning unit are input to a majority decision unit, and the output of the majority decision unit is used as a final classification result.
In Patent Document 2, learning is performed such that when a plurality of decision trees are constructed based on learning data and the decision trees are linearly combined by the weighted majority method, the error probability estimated from the empirical error probability is reduced. A system for creating a classifier is described.

特開平７−０６４７９３号公報Japanese Patent Application Laid-Open No. 7-064793 特開平２００１−１９５３７９号公報JP 2001-195379 A

ところで、上記従来技術に記載されているように、決定木などの分類器を作成する場合、デジタルコンテンツと、それに対応する望ましい分類カテゴリとの組を複数格納した学習データセットを予め用意しておく必要がある。デジタルコンテンツを分類する場合に、正解となる分類結果が評価者によって異なるような、いわゆる主観的な分類が必要になることが少なくない。特に、人間の感性や嗜好に関わる分類を行う場合には、この傾向が顕著であり、正解となる分類結果が評価者によって異なることは、むしろ当り前である。例えば、あるデジタルコンテンツの印象についての正解を付与する場合、ある評価者は『美しい』というカテゴリを正解であるとしたが、ある別の評価者は『かわいい』というカテゴリを正解であるとしたといった場合である。これは、評価者の判断基準に個人差が存在することによって発生するものであるため、どちらのカテゴリも間違いであるとは言えない。 By the way, as described in the above prior art, when creating a classifier such as a decision tree, a learning data set that stores a plurality of sets of digital content and a desirable classification category corresponding thereto is prepared in advance. There is a need. When classifying digital content, it is often necessary to perform so-called subjective classification in which correct classification results differ depending on the evaluator. In particular, when performing classification related to human sensitivity and preference, it is obvious that this tendency is remarkable, and the correct classification result varies depending on the evaluator. For example, when giving a correct answer about the impression of a certain digital content, one evaluator said that the category “beautiful” was correct, but another evaluator said that the category “cute” was correct. Is the case. Since this occurs because there are individual differences in the evaluation criteria of the evaluator, it cannot be said that both categories are wrong.

しかしながら、上記従来技術においては、学習データセットが複数の評価者によって作成され、同一の入力データに対して複数の「正解」が存在するような場合に、どの学習データを用いて分類器を作成すればよいかといったことは、十分には考慮されていなかった。 However, in the above prior art, when a training data set is created by multiple evaluators and there are multiple “correct answers” for the same input data, a classifier is created using which learning data. Whether or not to do so was not fully considered.

そこで、本発明は、１つの入力データに対し複数の評価者もしくは複数の判断基準を扱うことができるデータ分類器作成装置、データ分類器作成方法、データ分類器作成プログラムと、１つの入力データに対し複数の評価者もしくは複数の判断基準を扱かってデジタルコンテンツを分類することができるデータ分類装置、データ分類方法、データ分類プログラムを提供することを目的とする。 Therefore, the present invention provides a data classifier creating apparatus, a data classifier creating method, a data classifier creating program, and a single input data that can handle a plurality of evaluators or a plurality of judgment criteria for one input data. An object of the present invention is to provide a data classification device, a data classification method, and a data classification program capable of classifying digital contents by handling a plurality of evaluators or a plurality of judgment criteria.

上記課題を解決するため、本発明のデータ分類器作成装置は、入力データと、前記入力データの正解の分類を示す正解データとが対応した事例データであって、同一の入力データに対して複数の評価者によって付与された複数の正解データが対応する事例を含む事例データの集合を格納する事例データ格納部と、前記事例データ格納部から事例データの集合を読み出して、１つの入力データが１つの正解データに対応する１対１データの集合を、前記複数の評価者ごとに複数個数作成する１対１データ作成部と、前記作成された複数個数の１対１データの集合を参照しながら、１つの１対１データの集合から１つの分類器を作成するように制御して、複数の分類器を作成する分類器作成部と、を有するデータ分類器作成装置である。
また、次の発明のデータ分類装置は、上記データ分類器作成装置が作成した複数の分類器を格納する分類器保存部と、前記分類器保存部に格納された複数の分類器のそれぞれに入力データを入力して複数の分類結果を得る分類実行部と、前記複数の分類結果を用いて、最終的なクラスを出力する分類結果統合部と、を有するデータ分類装置である。
また、次の発明のデータ分類器作成方法は、入力データと、前記入力データの正解の分類を示す正解データとが対応した事例データであって、同一の入力データに対して複数の評価者によって付与された複数の正解データが対応する事例を含む事例データの集合から事例データの集合を読み出して、１つの入力データが１つの正解データに対応する１対１データの集合を、前記複数の評価者ごとに複数個数作成するステップと、前記作成された複数個数の１対１データの集合を参照しながら、１つの１対１データの集合から１つの分類器を作成するように制御して、複数の分類器を作成するステップと、を有するデータ分類器作成方法である。
また、次の発明のデータ分類方法は、上記データ分類器作成方法によって作成された複数の分類器を格納するステップと、前記分類器保存ステップにおいて格納された複数の分類器のそれぞれに入力データを入力して複数の分類結果を得るステップと、前記複数の分類結果を用いて、最終的なクラスを出力するステップと、を有するデータ分類方法である。
また、次の発明のデータ分類器作成プログラムは、入力データと、前記入力データの正解の分類を示す正解データとが対応した事例データであって、同一の入力データに対して複数の評価者によって付与された複数の正解データが対応する事例を含む事例データの集合から事例データの集合を読み出して、１つの入力データが１つの正解データに対応する１対１データの集合を前記複数の評価者ごとに複数個数作成するステップと、前記作成された複数個数の１対１データの集合を参照しながら、１つの１対１データの集合から１つの分類器を作成するように制御して、複数の分類器を作成するステップと、をコンピュータに実行させるデータ分類器作成プログラムである。
また、次の発明のデータ分類プログラムは、上記データ分類器作成プログラムによって作成された複数の分類器を格納するステップと、前記分類器保存ステップにおいて格納された複数の分類器のそれぞれに入力データを入力して複数の分類結果を得るステップと、前記複数の分類結果を用いて、最終的なクラスを出力するステップと、をコンピュータに実行させるデータ分類プログラムである。 In order to solve the above-described problem, the data classifier creating apparatus according to the present invention is case data in which input data corresponds to correct data indicating a correct classification of the input data, and a plurality of the same input data. A case data storage unit for storing a set of case data including cases corresponding to a plurality of correct answer data given by the evaluator, and reading the case data set from the case data storage unit, one input data is 1 While referring to the one-to-one data creation unit that creates a plurality of one-to-one data sets corresponding to one correct data for each of the plurality of evaluators and the created plurality of one-to-one data sets A data classifier creating apparatus including a classifier creating unit that creates a plurality of classifiers by controlling to create one classifier from a set of one-to-one data.
Further , the data classification device of the next invention is input to each of the classifier storage unit storing the plurality of classifiers created by the data classifier generation device and the plurality of classifiers stored in the classifier storage unit The data classification device includes a classification execution unit that inputs data and obtains a plurality of classification results, and a classification result integration unit that outputs a final class using the plurality of classification results .
The data classifier creating a next invention, the input data, a case data and correct answer data corresponding indicating the classification of the input data correct, by a plurality of evaluators for the same input data A set of case data is read from a set of case data including cases corresponding to a plurality of given correct answer data, and a set of one-to-one data in which one input data corresponds to one correct answer data is evaluated. Creating a plurality of numbers for each person and controlling to create one classifier from one set of one-to-one data while referring to the plurality of sets of one-to-one data created, A method of creating a plurality of classifiers.
In the data classification method of the next invention , input data is input to each of the plurality of classifiers stored in the step of storing the plurality of classifiers created by the data classifier creation method and the classifier storage step. A data classification method comprising: a step of inputting and obtaining a plurality of classification results; and a step of outputting a final class using the plurality of classification results .
The data classifier creation program of the next invention is case data in which input data corresponds to correct data indicating a correct answer classification of the input data, and a plurality of evaluators performs the same input data. A set of case data is read out from a set of case data including cases corresponding to a plurality of given correct answer data, and a set of one-to-one data in which one input data corresponds to one correct data is the plurality of evaluators. Creating a plurality of numbers for each, and controlling to create one classifier from one set of one-to-one data while referring to the plurality of sets of one-to-one data created A data classifier creating program for causing a computer to execute the step of creating a classifier of
The data classifying program of the next invention is a method of storing a plurality of classifiers created by the data classifier creating program, and input data to each of the plurality of classifiers stored in the classifier storing step. A data classification program that causes a computer to execute a step of inputting and obtaining a plurality of classification results and a step of outputting a final class using the plurality of classification results .

本発明によれば、１つの入力データに対し複数の評価者もしくは複数の判断基準を扱うことができると共に、１つの入力データに対し複数の評価者もしくは複数の判断基準を扱かってデジタルコンテンツを分類することができる。
その結果、１つの入力データに対し、複数の正解データが割り当てられているようなデータ集合で分類器を構築しても、分類精度の良い分類器を作成することが出来る。
また、１つの入力データに対して、複数の評価者が作成した正解データであっても、精度の良い分類器を作成することができるため、特定の評価者の評価に偏らない、バランスの良い分類器を作成することが出来る。
また、複数の評価者からなる学習データから分類器を複数構築し、その複数の分類器から得られる分類結果を統合して最終的な分類結果を取得するようにすれば、従来よりも精度の良い分類結果を取得することが出来る。 According to the present invention, it is possible to handle a plurality of evaluators or a plurality of judgment criteria for one input data, and classify digital contents by handling a plurality of evaluators or a plurality of judgment criteria for one input data. can do.
As a result, even if a classifier is constructed with a data set in which a plurality of correct answer data are assigned to one input data, a classifier with high classification accuracy can be created.
Moreover, even if it is correct answer data created by a plurality of evaluators for one input data, it is possible to create a classifier with high accuracy. A classifier can be created.
Also, if multiple classifiers are constructed from learning data consisting of multiple evaluators, and the classification results obtained from the multiple classifiers are integrated to obtain the final classification results, the accuracy will be higher than before. Good classification results can be obtained.

実施の形態１，２におけるデータ分類器作成装置１の構成を示すブロック図である。It is a block diagram which shows the structure of the data classifier production apparatus 1 in Embodiment 1,2. 実施の形態１，２におけるデータ分類器作成装置１の処理の流れを示すフローチャートである。6 is a flowchart showing a flow of processing of the data classifier creating apparatus 1 in the first and second embodiments. 実施の形態１〜４における事例データ格納部の一例を示す図である。It is a figure which shows an example of the case data storage part in Embodiment 1-4. 実施の形態１，２における１対１データ作成部の処理の流れを示すフローチャートである。6 is a flowchart showing a flow of processing of a one-to-one data creation unit in the first and second embodiments. 実施の形態１，２における１対１データ格納部の一例を示す図である。It is a figure which shows an example of the one-to-one data storage part in Embodiment 1,2. 実施の形態１〜４におけるシャッフル処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the shuffle process in Embodiment 1-4. 実施の形態１〜４における分類器作成部の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the classifier preparation part in Embodiment 1-4. 実施の形態１〜４における分類器ペアの一例を示す図である。It is a figure which shows an example of the classifier pair in Embodiment 1-4. 実施の形態１，２における分類器格納部の一例を示す図である。It is a figure which shows an example of the classifier storage part in Embodiment 1,2. 実施の形態１〜４における遺伝子個体の一例を示す図である。It is a figure which shows an example of the gene individual in Embodiment 1-4. 実施の形態１〜４における遺伝的アルゴリズムを適用して分類器を作成する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which produces the classifier by applying the genetic algorithm in Embodiment 1-4. 実施の形態１〜４における世代交代処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the generation change process in Embodiment 1-4. 実施の形態１〜４における一点交叉の一例を示す図である。It is a figure which shows an example of the one point crossing in Embodiment 1-4. 実施の形態２におけるデータ分類装置２の構成を示すブロック図である。6 is a block diagram showing a configuration of a data classification device 2 in Embodiment 2. FIG. 実施の形態２におけるデータ分類装置２の処理の流れを示すフローチャートである。10 is a flowchart illustrating a processing flow of the data classification device 2 according to the second embodiment. 実施の形態３，４におけるデータ分類器作成装置３の構成を示すブロック図である。6 is a block diagram showing a configuration of a data classifier creation device 3 in Embodiments 3 and 4. FIG. 実施の形態３，４における１対１データ作成部の処理の流れを示すフローチャートである。10 is a flowchart showing a flow of processing of a one-to-one data creation unit in the third and fourth embodiments. 実施の形態３，４における１対１データ格納部の一例を示す図である。FIG. 10 is a diagram illustrating an example of a one-to-one data storage unit in the third and fourth embodiments. 実施の形態３，４における分類器格納部の一例を示す図である。It is a figure which shows an example of the classifier storage part in Embodiment 3, 4. FIG. 実施の形態４におけるデータ分類装置４の構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a data classification device 4 in a fourth embodiment. 実施の形態４におけるデータ分類装置４の処理の流れを示すフローチャートである。14 is a flowchart illustrating a processing flow of the data classification device 4 according to the fourth embodiment.

まず、本発明の特徴を示すと、次の５つの特徴がある。
［１］同一の入力データに対して複数の正解データがある事例集合を、１つの入力データが１つの正解データに対応するように複数の集合に分割し、分割した集合ごとに分類器を作成する点。
［２］分割した集合ごとに異なる学習アルゴリズムを適用する点。
［３］分割した集合ごとに異なる学習パラメータを適用する点。
［４］複数の評価者の正解データを用意し、評価者ごとに集合を作成し、評価者に対応した分類器を複数作成する点。
［５］複数の評価者の正解データを用意し、複数の評価者の正解データが混在した集合を作成し、複数の分類器を作成する点。
以上の５つの特徴をそれぞれ１つずつ備えてる実施の形態でも、２以上の特徴を任意に選択して適当に組み合わせた実施の形態でも良い。 First, the characteristics of the present invention are shown as follows.
[1] A case set with multiple correct data for the same input data is divided into multiple sets so that one input data corresponds to one correct data, and a classifier is created for each divided set Points to do.
[2] A different learning algorithm is applied to each divided set.
[3] A different learning parameter is applied to each divided set.
[4] Providing correct data of a plurality of evaluators, creating a set for each evaluator, and creating a plurality of classifiers corresponding to the evaluators.
[5] A point in which correct data of a plurality of evaluators is prepared, a set in which correct data of a plurality of evaluators is mixed, and a plurality of classifiers are generated.
An embodiment in which each of the above five features is provided, or an embodiment in which two or more features are arbitrarily selected and appropriately combined may be used.

以下、本発明を実施するための形態について、下記のように、幾つか一例を示して説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, modes for carrying out the present invention will be described with some examples as follows.

実施の形態１.
図１は、本発明の実施の形態１におけるデータ分類器作成装置の構成例を示す。 Embodiment 1.
FIG. 1 shows a configuration example of a data classifier creating apparatus according to Embodiment 1 of the present invention.

図１において、本実施の形態１のデータ分類器作成装置１は、データ格納部１０と、１対１データ作成部２０と、分類器作成部３０と、分類器出力部４０とによって構成されている。データ格納部１０は、さらに、事例データ格納部１１と、１対１データ格納部１２と、分類器格納部１３とを有している。なお、図１では、機能ブロック図によりハードウエア的に構成して示しているが、このデータ分類器作成装置１は、ＣＰＵや、メモリ、ハードディスクドライブを具備する通常のコンピュータを使って、ソフトウェア的に構成するようにしてもよい。
In FIG. 1, the data classifier creating apparatus 1 according to the first embodiment includes a data storage unit 10, a one-to-one data creating unit 20, a classifier creating unit 30, and a classifier output unit 40. Yes. The data storage unit 10 further includes a case data storage unit 11, a one-to-one data storage unit 12, and a classifier storage unit 13. In FIG. 1, the data classifier creating apparatus 1 is configured as a software using a normal computer having a CPU, a memory, and a hard disk drive. You may make it comprise.

図２は、図１に示すデータ分類器作成装置１の動作の一例を示すフローチャートである。 FIG. 2 is a flowchart showing an example of the operation of the data classifier creating apparatus 1 shown in FIG.

図１に示すデータ分類器作成装置１の動作を、図２のフローチャートを参照して説明する。 The operation of the data classifier creating apparatus 1 shown in FIG. 1 will be described with reference to the flowchart of FIG.

まず、１対１データ作成部２０は、事例データ格納部１１に格納されたデータを基に、分類器を作成するための１対１データの集合である１対１データ集合を作成し、１対１データ格納部１２に格納する（ステップＳ１０）。 First, the one-to-one data creation unit 20 creates a one-to-one data set that is a set of one-to-one data for creating a classifier based on the data stored in the case data storage unit 11. Stored in the one-to-one data storage unit 12 (step S10).

次に、分類器作成部３０は、１対１データ格納部１２に格納された１対１データ集合を基に、分類器を作成し、分類器格納部１３に格納する（ステップＳ２０）。次に、分類器出力部４０は、分類器格納部１３から複数の分類器を読み出し、外部接続部１００を介して分類器を出力する（ステップＳ３０）。 Next, the classifier creating unit 30 creates a classifier based on the one-to-one data set stored in the one-to-one data storage unit 12, and stores it in the classifier storage unit 13 (step S20). Next, the classifier output unit 40 reads a plurality of classifiers from the classifier storage unit 13 and outputs the classifiers via the external connection unit 100 (step S30).

なお、データ格納部１０は、例えば、ハードディスクやメモリのような高速アクセスが可能な大容量記録媒体である。データ格納部１０は、さらに、事例データ格納部１１と、１対１データ格納部１２と、分類器格納部１３とで構成される。
The data storage unit 10 is a large-capacity recording medium that can be accessed at high speed, such as a hard disk or a memory. The data storage unit 10 further includes a case data storage unit 11, a one-to-one data storage unit 12, and a classifier storage unit 13.

事例データ格納部１１は、１つの入力データに対し、正解の分類（クラス）を示す１つ以上の正解データを対応付けた事例データの集合、すなわち事例データ集合を格納する。事例データ格納部１１には、さらに、正解データを与えた評価者を識別する評価ＩＤも、その正解データに対応付けて格納する。入力データとは、連続値、あるいは離散値で構成されるデータの集合である。具体的な例を示すと、データのソースが楽曲に関する音響データであれば、ビートの強さや周波数特性などを数値とした特徴量やテンポ、音楽ジャンルを示すジャンル番号などであり、画像データであれば、画像データをグレースケール化し、２０Ｘ２０ピクセルにサイズを縮小した画像の各画素値などである。また、複数の正解データを同じ評価者が与えた場合は、それぞれ別の評価ＩＤを用意する。 The case data storage unit 11 stores a set of case data in which one or more correct data indicating a correct answer classification (class) is associated with one input data, that is, a case data set. The case data storage unit 11 also stores an evaluation ID for identifying an evaluator who gave correct data in association with the correct data. Input data is a set of data composed of continuous values or discrete values. For example, if the source of the data is acoustic data related to a song, it may be a feature value, tempo, genre number indicating a music genre, etc., with numerical values such as beat strength and frequency characteristics. For example, each pixel value of an image obtained by converting the image data to gray scale and reducing the size to 20 × 20 pixels. When the same evaluator gives a plurality of correct answer data, different evaluation IDs are prepared for each.

図３は、事例データ格納部１１に格納したｍ個の事例データの一例を示す図である。 FIG. 3 is a diagram illustrating an example of m pieces of case data stored in the case data storage unit 11.

図３において、事例データ格納部１１に格納された各事例データは、入力データと、正解データと、その正解データを与えた評価ＩＤとを対応付けて格納した様子を示すものである。例えば、変数がｎ個である入力データ（Ａ１１，…，Ａ１ｎ）に対する正解データとして、（Ｃ１，Ｃ１，Ｃ２）が割り振られており、それらの正解データを与えた評価者を識別する評価ＩＤは、順に（Ｍ１，Ｍ２，Ｆ１）であることを示す。 In FIG. 3, each case data stored in the case data storage unit 11 shows a state in which input data, correct answer data, and an evaluation ID given the correct answer data are stored in association with each other. For example, (C1, C1, C2) is assigned as correct data for input data (A11,..., A1n) having n variables, and an evaluation ID for identifying an evaluator who gave the correct data is , (M1, M2, F1) in order.

図１に戻り、１対１データ作成部２０は、事例データ格納部１１に格納された事例データ集合を基に、１つの入力データが１つの正解データに対応する１対１データの集合、すなわち１対１データ集合を複数作成し、作成した１対１データ集合を１対１データ格納部１２に格納する。 Returning to FIG. 1, the one-to-one data creation unit 20 is a set of one-to-one data in which one input data corresponds to one correct data based on the case data set stored in the case data storage unit 11, that is, A plurality of one-to-one data sets are created, and the created one-to-one data sets are stored in the one-to-one data storage unit 12.

入力データと正解データが１対１である１対１データの集合を作成する理由は、事例データのような同一の入力データに複数の正解が存在する状態で分類器を作成すると、分類器の分類精度が低下するためである。 The reason for creating a set of one-to-one data in which input data and correct answer data are one-to-one is that if a classifier is created with a plurality of correct answers in the same input data such as case data, This is because the classification accuracy decreases.

図４は、１対１データ作成部２０が、評価ＩＤごとの１対１データ集合を作成するための処理例を示すフローチャートである。 FIG. 4 is a flowchart showing a processing example for the one-to-one data creation unit 20 to create a one-to-one data set for each evaluation ID.

まず、１対１データ作成部２０は、事例データ格納部１１から、事例データを１つ読み出す（ステップＳ１１０）。 First, the one-to-one data creation unit 20 reads one case data from the case data storage unit 11 (step S110).

次に、１対１データ作成部２０は、読み出した事例データから、入力データと、正解データとを１対１に対応付けた１対１データを、正解データの数だけ作成する（ステップＳ１２０）。すなわち、図３に示す、入力データが｛Ａ１１，…，Ａ１ｎ｝である事例データから１対１データを作成すると、入力データ｛Ａ１１，…，Ａ１ｎ｝と正解データ｛Ｃ１]、入力データ｛Ａ１１，…，Ａ１ｎ｝と正解データ｛Ｃ１｝、入力データ｛Ａ１１，…，Ａ１ｎ｝と正解データ｛Ｃ２｝、の３つの１対１データが作成される。 Next, the one-to-one data creation unit 20 creates one-to-one data corresponding to the input data and the correct answer data one-to-one from the read case data by the number of correct answer data (step S120). . That is, when one-to-one data is created from the case data shown in FIG. 3 where the input data is {A11,..., A1n}, the input data {A11,..., A1n} and correct data {C1] and input data {A11 ,..., A1n} and correct answer data {C1}, input data {A11,..., A1n} and correct answer data {C2} are created.

次に、１対１データ作成部２０は、作成した１対１データそれぞれを、１対１データの正解データを与えた評価ＩＤと同様の識別名を有する１対１データ集合に格納する（ステップＳ１３０）。 Next, the one-to-one data creation unit 20 stores each created one-to-one data in a one-to-one data set having the same identification name as the evaluation ID that gave the correct data of the one-to-one data (Step 1). S130).

その際、１対１データ作成部２０は、その評価ＩＤに対応する１対１データ集合が存在しない場合、その評価ＩＤを識別名とする１対１データ集合を１対１データ格納部１２に作成し、その１対１データ集合に１対１データを格納する。 At this time, when the one-to-one data set corresponding to the evaluation ID does not exist, the one-to-one data creation unit 20 converts the one-to-one data set having the evaluation ID as an identification name into the one-to-one data storage unit 12. Create one-to-one data in the one-to-one data set.

次に、１対１データ作成部２０は、事例データ格納部１１から全てデータを読み終えたか否かを判断する（ステップＳ１４０）。ここで、未読の事例データがあれば（ステップＳ１４０“ＮＯ”）、ステップＳ１１０に戻り、ステップＳ１１０以降の処理を行なう。これに対し、全ステップの事例データを読み出している場合には（ステップＳ１４０“ＹＥＳ”）、１対１データ作成部２０は、以上の処理を終了する。 Next, the one-to-one data creation unit 20 determines whether or not all data has been read from the case data storage unit 11 (step S140). Here, if there is unread case data (“NO” in step S140), the process returns to step S110, and the processes after step S110 are performed. On the other hand, when the case data of all steps is read (step S140 “YES”), the one-to-one data creation unit 20 ends the above processing.

図５(ａ)〜(ｃ)それぞれ、１対１データ集合の識別名である１対１データラベルと、複数の１対１データとで構成される１対１データ集合を格納した１対１データ格納部１２の状態の一例を示す図である。 Each of FIGS. 5A to 5C is a one-to-one storing a one-to-one data set composed of a one-to-one data label that is an identification name of a one-to-one data set and a plurality of one-to-one data. It is a figure which shows an example of the state of the data storage part.

１対１データ集合の１対１データラベルには、評価ＩＤと同一の識別名が割り振られており、同一の評価ＩＤによって正解データを与えられた１対１データを格納する。 The one-to-one data label of the one-to-one data set is assigned the same identification name as the evaluation ID, and stores one-to-one data to which correct answer data is given by the same evaluation ID.

図５(ａ)は、評価ＩＤ「Ｍ１」のみで作成された１対１データ集合であり、１対１データラベルとして「Ｍ１」が割り振られている。 FIG. 5A shows a one-to-one data set created with only the evaluation ID “M1”, and “M1” is assigned as a one-to-one data label.

同様に、図５(ｂ)は，評価ＩＤ「Ｍ２」のみで作成された１対１データ集合であり、図５(ｃ)は，評価ＩＤ「Ｆ１」のみで作成された１対１データ集合である。 Similarly, FIG. 5B is a one-to-one data set created only with the evaluation ID “M2”, and FIG. 5C is a one-to-one data set created only with the evaluation ID “F1”. It is.

以上が、１対１データ作成部２０による評価ＩＤ別に１対１データ集合を作成する方法である。なお、評価ＩＤを混合した１対１データ集合を作成することもできる。評価ＩＤを混合した１対１データ集合を作成する場合、１対１データ作成部２０は、図４に示すステップＳ１４０の処理の後に、１対１データ集合間で、同一の入力データ同士の１対１データを交換するシャッフル処理を行うようにする。 The above is a method of creating a one-to-one data set for each evaluation ID by the one-to-one data creation unit 20. It is also possible to create a one-to-one data set in which evaluation IDs are mixed. In the case of creating a one-to-one data set in which evaluation IDs are mixed, the one-to-one data creation unit 20 performs a 1-to-1 data set between the same input data after the process of step S140 shown in FIG. A shuffle process for exchanging one-to-one data is performed.

図６は、１対１データ作成部２０のシャッフル処理の動作の一例を示すフローチャートである。
まず、１対１データ作成部２０は、評価ＩＤ別に作成された複数の１対１データ集合から、まだシャッフル処理を行っていない１組の１対１データ集合のペアを選択する（ステップＳ１５１）。 FIG. 6 is a flowchart showing an example of the operation of the shuffle process of the one-to-one data creation unit 20.
First, the one-to-one data creation unit 20 selects a pair of one-to-one data sets that have not yet been shuffled from a plurality of one-to-one data sets created for each evaluation ID (step S151). .

次に、１対１データ作成部２０は、前記１対１データ集合のペアの中から、入力データが同一である１対１データ同士を、所定の確率ｐ（０．０＜ｐ＜１．０）で交換する（ステップＳ１５２）。この処理は、全ての入力データに対して行う。 Next, the one-to-one data creation unit 20 selects one-to-one data having the same input data from a pair of the one-to-one data sets with a predetermined probability p (0.0 <p <1. 0) (step S152). This process is performed for all input data.

次に、１対１データ作成部２０は、全ての１対１データ集合のペアでシャッフル処理を行ったか否かの判断を行う（ステップＳ４２０）。もし、全て完了した場合（ステップＳ４２０“Ｙｅｓ”）、シャッフル処理を終了し、まだシャッフル処理を行っていないペアが存在する場合（ステップＳ４２０“Ｎｏ”）、ステップＳ４００へ戻る。以上がステップＳ１５０のシャッフル処理の説明である。上記シャッフル処理は、複数回繰り返して行っても良い。 Next, the one-to-one data creation unit 20 determines whether or not the shuffle process has been performed on all pairs of one-to-one data sets (step S420). If all have been completed (step S420 “Yes”), the shuffle process is terminated, and if there is a pair that has not been shuffled yet (step S420 “No”), the process returns to step S400. The above is the description of the shuffle process in step S150. The shuffle process may be repeated a plurality of times.

以上が、評価ＩＤを混合した１対１データ集合を作成する方法であるが、他の方法、例えば、予め所定の個数だけ空の１対１データ集合を作成しておき、事例データから作成した１対１データを、前記１対１データ集合にランダムに格納するようにしても良い。
以上が、１対１データ作成部２０の説明である。 The above is a method for creating a one-to-one data set in which evaluation IDs are mixed. Other methods, for example, a predetermined number of empty one-to-one data sets are created in advance and created from case data. One-to-one data may be randomly stored in the one-to-one data set.
The above is the description of the one-to-one data creation unit 20.

次に、分類器作成部３０について説明する。
分類器作成部３０は、１対１データ作成部２０によって作成された１対１データ集合を用いて学習アルゴリズムを適用することで、新たな分類器を作成する。分類器とは、データが入力されると、そのデータに基づいて少なくとも１つのクラスに分類するものである。学習アルゴリズムの公知の技術としては、“決定木”、“ｋ−最近傍法（ｋ−ＮＮ）”、“ニューラルネットワーク、サポートベクタマシン”、“ベイズモデル”などが知られている。学習アルゴリズムとは、学習用データを基に分類器自体を構築するためのアルゴリズムや、学習用データの変数選択を行うアルゴリズムである。 Next, the classifier creation unit 30 will be described.
The classifier creation unit 30 creates a new classifier by applying a learning algorithm using the one-to-one data set created by the one-to-one data creation unit 20. A classifier classifies data into at least one class based on the data. Known techniques of learning algorithms include “decision tree”, “k-nearest neighbor method (k-NN)”, “neural network, support vector machine”, “Bayes model”, and the like. The learning algorithm is an algorithm for constructing a classifier itself based on learning data or an algorithm for selecting variables for learning data.

図７は、分類器作成部３０が、１対１データ集合を用いて、複数の分類器を作成する動作の一例を示すフローチャートである。 FIG. 7 is a flowchart illustrating an example of an operation in which the classifier creating unit 30 creates a plurality of classifiers using a one-to-one data set.

分類器作成部３０は、まず、１対１データ作成部２０によって作成された１対１データ集合群を、１対１データ格納部１２から読み込み、その１対１データ集合群の中で、まだ学習用データ集合として用いられていない１対１データ集合を１つ選択する（ステップＳ２１）。１対１データ集合群とは、１対１データ格納部１２に格納された、１対１データ集合の集合である。 First, the classifier creation unit 30 reads the one-to-one data set group created by the one-to-one data creation unit 20 from the one-to-one data storage unit 12, and among the one-to-one data set group, One one-to-one data set that is not used as a learning data set is selected (step S21). The one-to-one data set group is a set of one-to-one data sets stored in the one-to-one data storage unit 12.

図５の１対１データ集合の例であれば、１対１データ集合「Ｍ１」と、１対１データ集合「Ｍ２」と、１対１データ集合「Ｆ１」の３つをまとめたものである。 In the example of the one-to-one data set in FIG. 5, the one-to-one data set “M1”, the one-to-one data set “M2”, and the one-to-one data set “F1” are collected. is there.

次に、分類器作成部３０は、選択された１対１データ集合から、分類器自体の構築に用いるための学習用データ集合と、作成した分類器の評価に用いるための評価用データ集合を選択する（ステップＳ２２）。 Next, the classifier creating unit 30 selects a learning data set for use in construction of the classifier itself and an evaluation data set for use in evaluating the created classifier from the selected one-to-one data set. Select (step S22).

具体的には、分類器作成部３０は、選択された１対１データ集合を均等に２分割して、一方を学習用データ集合、もう一方を評価用データ集合として選択する。例えば、奇数行を学習用データ、偶数行を評価用データとすることで、均等にデータを２分割する。また、評価用データ集合を、学習用データ集合ではない別の１対１データ集合から選択しても良い。すなわち、学習用データ集合を、まだ学習用データ集合として選択されていない１対１データ集合を、１対１データ集合群の中から１つ選択し、評価用データ集合は、１対１データ集合群の中から、前記学習用データ集合として選択された１対１データ集合以外をランダムに１つ選び出す。 Specifically, the classifier creation unit 30 equally divides the selected one-to-one data set into two, and selects one as a learning data set and the other as an evaluation data set. For example, the odd-numbered rows are used as learning data and the even-numbered rows are used as evaluation data, so that the data is equally divided into two. The evaluation data set may be selected from another one-to-one data set that is not a learning data set. In other words, a one-to-one data set that has not yet been selected as a learning data set is selected from the one-to-one data set group, and the evaluation data set is a one-to-one data set. A group other than the one-to-one data set selected as the learning data set is randomly selected from the group.

図５に示す１対１データ集合群から、学習用データ集合と評価用データ集合を選択する例を示すと、１対１データ集合「Ｍ１」を学習用データ集合として選択した場合、残る２つの１対１データ集合「Ｍ２」と「Ｆ１」のいずれかをランダムに１つ選択し、選択された１対１データ集合を評価用データ集合とする。 An example of selecting the learning data set and the evaluation data set from the one-to-one data set group shown in FIG. 5 shows that when the one-to-one data set “M1” is selected as the learning data set, the remaining two One of the one-to-one data sets “M2” and “F1” is selected at random, and the selected one-to-one data set is set as an evaluation data set.

次に、分類器作成部３０は、新たに作成する分類器と、その分類器を作成するための構築パラメータと、学習アルゴリズムを選択する（ステップＳ２３）。そのため、分類器作成部３０は、予め、分類器と、構築パラメータと、学習アルゴリズムを組み合わせた分類器ペアを保持しておく。構築パラメータとは、分類器の構築に用いる関数や、しきい値などである。 Next, the classifier creation unit 30 selects a classifier to be newly created, a construction parameter for creating the classifier, and a learning algorithm (step S23). Therefore, the classifier creation unit 30 holds a classifier pair that combines a classifier, a construction parameter, and a learning algorithm in advance. The construction parameters are functions used for construction of the classifier, threshold values, and the like.

図８は、分類器ペアの一例であって、分類器と、その分類器の構築パラメータ、及び学習のためのアルゴリズムを組み合わせ、それぞれのペアに、識別子である分類器ペアＩＤを割り振って格納した状態の一例を示している。 FIG. 8 shows an example of a classifier pair. A classifier, a construction parameter of the classifier, and an algorithm for learning are combined, and a classifier pair ID that is an identifier is allocated and stored in each pair. An example of a state is shown.

分類器作成部３０は、このような分類器ペアの中で、現在選択されている学習用データ集合から、分類器の作成を行っていないものを分類器ペアＩＤ順に１つ選択する。構築パラメータが特に指定されていない場合、最も標準的な構築パラメータで分類器を構築する。 The classifier creation unit 30 selects one classifier pair in which the classifier is not created from the currently selected learning data set in the order of the classifier pair ID. If no construction parameters are specified, the classifier is built with the most standard construction parameters.

また、図８では、分類器ペアＩＤ「ＡｌｇｏＩＤ１」と「ＡｌｇｏＩＤ２」は、どちらも同様の分類器ｋ−ＮＮであり、学習アルゴリズムも、等しく遺伝的アルゴリズムであるが、構築パラメータが、それぞれ「ｋ＝３」と「ｋ＝5」とで異なっている。同じ分類器であっても、このように構築パラメータが異なれば、別の分類器として扱ってよい。 Further, in FIG. 8, the classifier pair IDs “AlgoID1” and “AlgoID2” are both the same classifier k-NN, and the learning algorithm is equally a genetic algorithm, but the construction parameter is “k”, respectively. = 3 "and" k = 5 ". Even the same classifier may be treated as another classifier if the construction parameters are different.

また、図８では、分類器ペアＩＤ「ＡｌｇｏＩＤ２」と「ＡｌｇｏＩＤ３」は、どちらも同様の分類器ｋ−ＮＮであるが、学習アルゴリズムが、それぞれ「遺伝的アルゴリズム」と「主成分分析」とで異なっている。同じ分類器であっても、このように学習アルゴリズムが異なれば、別の分類器として扱ってよい。 In FIG. 8, the classifier pair IDs “AlgoID2” and “AlgoID3” are both similar classifiers k-NN, but the learning algorithms are “genetic algorithm” and “principal component analysis”, respectively. Is different. Even if they are the same classifier, they may be treated as different classifiers if the learning algorithms are different.

次に、分類器作成部３０は、ステップＳ２２で選択された学習用データ集合と、評価用データ集合と、ステップＳ２３で選択された分類器ペアとを基に、新しい分類器を作成する（ステップＳ２４）。もし、前記選択された学習用データ集合で、全ての分類器ペアで分類器を作成したのであれば（ステップＳ２５“ＹＥＳ”）、ステップＳ２６に進み、そうでなければ（ステップＳ２５“ＮＯ”）、ステップＳ２３に戻る（ステップＳ２５）。 Next, the classifier creating unit 30 creates a new classifier based on the learning data set selected in step S22, the evaluation data set, and the classifier pair selected in step S23 (step S23). S24). If classifiers have been created for all classifier pairs in the selected learning data set (step S25 “YES”), the process proceeds to step S26; otherwise (step S25 “NO”). The process returns to step S23 (step S25).

次に、分類器作成部３０は、作成した分類器を、分類器格納部１３に格納する（ステップＳ２６）。分類器格納部１３には、学習用データ集合として用いた１対１データ集合の識別子と、評価用データの分類結果が正解データと一致した割合である正答率と、分類器構築情報とを組み合わせて格納する。ここで、分類器構築情報とは、作成した分類器を再度構築するために必要な情報である。例えば、決定木やｋ−ＮＮといった分類器の種類や、しきい値、重み、教師データ、分岐条件などを格納する。 Next, the classifier creation unit 30 stores the created classifier in the classifier storage unit 13 (step S26). The classifier storage unit 13 combines the identifier of the one-to-one data set used as the learning data set, the correct answer rate that is the ratio that the classification result of the evaluation data matches the correct answer data, and the classifier construction information. Store. Here, the classifier construction information is information necessary to reconstruct the created classifier. For example, classifier types such as decision trees and k-NN, threshold values, weights, teacher data, branch conditions, and the like are stored.

図９は、分類器格納部１３に格納された６つの分類器の一例を示す図である。 FIG. 9 is a diagram illustrating an example of six classifiers stored in the classifier storage unit 13.

図９では、分類器１つ分の情報として、作成した分類器の識別子（分類器識別子）と、学習用データ集合として用いた１対１データ集合の識別子と、正答率と、分類器構築情報とを組み合わせて格納した様子を示している。例えば、分類器識別子「分類器１」は、１対１データ集合の識別子が「Ｍ１」であり、正答率が「０．８０」であることを示している。 In FIG. 9, as information for one classifier, the identifier of the created classifier (classifier identifier), the identifier of the one-to-one data set used as the learning data set, the correct answer rate, and the classifier construction information It shows a state of storing in combination. For example, the classifier identifier “classifier 1” indicates that the identifier of the one-to-one data set is “M1” and the correct answer rate is “0.80”.

また、分類器構築情報として、分類器名「ｋ−ＮＮ」と、その分類器を構築するために必要なパラメータを格納している。 Further, the classifier name “k-NN” and parameters necessary for constructing the classifier are stored as classifier construction information.

次に、分類器作成部３０は、全ての１対１データ集合を学習用データ集合として利用したか否かの判断を行う（ステップＳ２７）。ここで、まだ学習用データ集合として利用していない１対１データ集合がある場合（ステップＳ２７“ＮＯ”）、分類器作成部３０は、ステップＳ２１へ戻る一方、全て終えたのであれば（ステップＳ２７“ＹＥＳ”）、分類器作成部３０は、処理を終了する。 Next, the classifier creating unit 30 determines whether or not all the one-to-one data sets have been used as learning data sets (step S27). Here, when there is a one-to-one data set that has not been used as a learning data set (step S27 “NO”), the classifier creation unit 30 returns to step S21, but if all have been completed (step S27). (S27 “YES”), the classifier creation unit 30 ends the process.

ここで、ステップＳ２４の分類器作成処理の具体例として、分類器の１つである“ｋ−ＮＮ”を作成し、学習アルゴリズムとして遺伝的アルゴリズム（ＧＡ）を用いて変数選択を行う方法を示す。 Here, as a specific example of the classifier creation process in step S24, a method of creating “k-NN” which is one of the classifiers and performing variable selection using a genetic algorithm (GA) as a learning algorithm is shown. .

“ｋ−ＮＮ”とは、クラスが不明なノードＸに対してクラスを割り当てる場合、そのノードＸから距離の近い順にｋ個のノードのクラスを参照し、最頻出クラスを、ノードＸのクラスとする方法である。ノード間の距離は、入力データ同士のユークリッド距離を計算することで取得するが、ユークリッド距離ではなく、マハラノビスの汎距離などを用いても構わない。 “K-NN” means that when assigning a class to a node X whose class is unknown, the class of k nodes is referred to in order of the distance from the node X, and the most frequently appearing class is defined as the class of the node X. It is a method to do. The distance between the nodes is obtained by calculating the Euclidean distance between the input data, but instead of the Euclidean distance, a Mahalanobis general distance or the like may be used.

ところで、この距離の算出に入力データの全ての変数を用いると、分類精度が低下することがある。例えば、変数の中には、分類にほとんど影響しないノイズデータが含まれていたり、重複した事象を説明したりすると、分類の精度が低下する。そのため、どの変数を用いるかを、取捨選択する必要がある。その組み合わせは、変数がＮ個あるとすると、２のＮ乗である。つまり、１つの組み合わせの評価に１ミリ秒かかる計算機上で、全ての組み合わせについて評価した場合、入力データの変数が高々２０個の場合、１７分ほどで処理が終了するが、５０個の場合は、数万年以上処理を続けなければ終了しない。 By the way, if all the variables of the input data are used for calculating the distance, the classification accuracy may be lowered. For example, if the variable includes noise data that hardly affects the classification, or if a duplicate event is described, the accuracy of the classification decreases. Therefore, it is necessary to select which variable is used. The combination is 2 to the Nth power, assuming that there are N variables. In other words, if all combinations are evaluated on a computer that takes 1 millisecond to evaluate one combination, the processing will be completed in about 17 minutes if there are at most 20 variables in the input data. If the process is not continued for tens of thousands of years, it will not be terminated.

そのため、変数の数がある程度大きい場合、どの変数を用いるかの選択に、学習アルゴリズムを適用する必要がある。そこで、その変数選択の学習アルゴリズムとして、遺伝的アルゴリズム（ＧＡ）を利用する。ＧＡの詳細なアルゴリズムに関しては、David E. Goldberg著、"Genetic Algorithms in Searching, Optimization, and Machine Learning"に開示されている。 Therefore, when the number of variables is large to some extent, it is necessary to apply a learning algorithm to select which variable to use. Therefore, a genetic algorithm (GA) is used as the variable selection learning algorithm. The detailed algorithm of GA is disclosed in "Genetic Algorithms in Searching, Optimization, and Machine Learning" by David E. Goldberg.

遺伝的アルゴリズム（ＧＡ）は、遺伝子の淘汰や交叉、突然変異、世代交代といった、生物進化から着想を得たアルゴリズムである。遺伝的アルゴリズム（ＧＡ）では、解くべき課題を遺伝子として、ビット列で表現する。 Genetic algorithm (GA) is an algorithm inspired by biological evolution, such as genetic selection, crossover, mutation, and generational change. In the genetic algorithm (GA), a problem to be solved is expressed as a gene by a bit string.

図１０に、変数の取捨選択問題を遺伝子（ビット列）として表現した例を示す。 FIG. 10 shows an example in which the variable selection problem is expressed as a gene (bit string).

図１０は、Ｚ個の遺伝子個体に、n個の変数（Ａ１，…，Ａｎ）について[１：用いる、０：用いない]をそれぞれ設定した状態を示している。分類器作成部３０は、これらＺ個の遺伝子個体の初期集団は、最初はランダムに設定する。遺伝子数Ｚは、１世代当たりの遺伝子個体の数である。Ｚ値に関しての規定は特にないが、５以上が望ましい。 FIG. 10 shows a state in which [1: used, 0: not used] is set for n variables (A1,..., An) for Z gene individuals. The classifier creating unit 30 initially sets the initial population of these Z gene individuals at random. The number of genes Z is the number of gene individuals per generation. There is no particular regulation regarding the Z value, but 5 or more is desirable.

図１１は、遺伝的アルゴリズム（ＧＡ）による学習の手順の一例を示すフローチャートである。図１１に示すように、評価関数の実行を行うステップＳ３２から、世代交代処理を行うステップＳ３７までを繰り返し実行する。この繰り返し処理を、遺伝的アルゴリズム（ＧＡ）では世代と呼ぶ。 FIG. 11 is a flowchart illustrating an example of a learning procedure using a genetic algorithm (GA). As shown in FIG. 11, the process from step S32 for executing the evaluation function to step S37 for performing the generation change process is repeatedly executed. This iterative process is called a generation in the genetic algorithm (GA).

まず、分類器作成部３０は、Ｚ個分の初期遺伝子個体を生成する（ステップＳ３１）。遺伝子個体の各ビットは、ランダムに生成する。 First, the classifier creation unit 30 generates Z initial gene individuals (step S31). Each bit of the gene individual is randomly generated.

次に、分類器作成部３０は、評価用データ集合に対して、Ｚ個それぞれの遺伝子個体の設定に基づいて”ｋ−ＮＮ”を実行し、適応度を算出する（ステップＳ３２）。適応度は、学習用データ集合を基に作成した”ｋ−ＮＮ”を用いて、評価用データの分類を実際に行うことで算出する。学習用データ集合をＴ、評価用データ集合をＥとし、ｉ番目の遺伝子の値をＧｉ、ｘ番目の学習用データにおけるｉ番目の変数をt（ｘ，ｉ）、ｙ番目の評価用データにおけるｉ番目の変数をＥ（ｙ，ｉ）とすると、学習用データｘと評価用データｙとのユークリッド距離Ｄ（ｘ，ｙ）は、次の式である数１によって求められる。 Next, the classifier creating unit 30 performs “k-NN” on the evaluation data set based on the setting of each of the Z gene individuals, and calculates the fitness (step S32). The fitness is calculated by actually classifying the evaluation data using “k-NN” created based on the learning data set. The learning data set is T, the evaluation data set is E, the i-th gene value is Gi, the i-th variable in the x-th learning data is t (x, i), and the y-th evaluation data is Assuming that the i-th variable is E (y, i), the Euclidean distance D (x, y) between the learning data x and the evaluation data y is obtained by the following equation (1).

分類器作成部３０は、この距離計算を、ｘ∈Ｔである全ての学習用データに対して行う。全ての学習用データとの距離を算出した後、算出されたユークリッド距離Ｄ（ｘ，ｙ）の小さい順にｋ個の学習用データを取得し、その正解データの最頻値を算出する。前記最頻値がｙ番目の評価用データに対応する正解データと一致しているか否かを調べ、一致していた場合、正解数ｃを１増加させる。以上の処理を、ｙ∈Ｅである全ての評価用データに対して行い、正解データとの正答率を計算する。すなわち、ｊ番目の遺伝子個体の正答率Ｒｊは、評価用データの総データ数をＭとすると、次の式である数２によって求められる。 The classifier creating unit 30 performs this distance calculation for all learning data with x∈T. After calculating the distance to all the learning data, k learning data are acquired in ascending order of the calculated Euclidean distance D (x, y), and the mode value of the correct data is calculated. It is checked whether or not the mode value matches the correct answer data corresponding to the y-th evaluation data. If they match, the correct answer number c is incremented by one. The above processing is performed on all the evaluation data for which yεE, and the correct answer rate with the correct answer data is calculated. That is, the correct answer rate Rj of the j-th gene individual is obtained by the following equation (2), where M is the total number of evaluation data.

分類器作成部３０は、ここで求められた正答率Rjを、遺伝子個体jの適応度とする。この適応度の計算を、Ｚ個の遺伝子個体全てに対して行う。 The classifier creation unit 30 sets the correct answer rate Rj obtained here as the fitness of the gene individual j. This fitness calculation is performed for all Z gene individuals.

次に、分類器作成部３０は、ステップＳ３２で求められたこの世代最大の適応度が、これまでの世代においても最大の適応度であったか否かを判断する（ステップＳ３３）。もし、適応度がこれまでで最大である、もしくは１世代目である場合（ステップＳ３３“ＹＥＳ”）、ステップＳ３４に進む。そうでなければ（ステップＳ３３“Ｎｏ”）、ステップＳ３５に進む。 Next, the classifier creating unit 30 determines whether or not the maximum adaptability obtained in step S32 is the maximum adaptability in the previous generations (step S33). If the fitness level is the maximum or the first generation so far (step S33 “YES”), the process proceeds to step S34. Otherwise (step S33 “No”), the process proceeds to step S35.

次に、分類器作成部３０は、最も優秀な適応度が得られた遺伝子個体のビット配列と、その適応度とを対応付けて、格納部２０に一時記憶する（ステップＳ３４）。 Next, the classifier creation unit 30 associates the bit sequence of the gene individual that has obtained the best fitness with the fitness, and temporarily stores it in the storage unit 20 (step S34).

次に、分類器作成部３０は、学習を終了するか否かの判断を行う（ステップＳ３５）。学習終了は、世代数が所定のしきい値を超えた場合や、最大の適応度が所定のしきい値を超えた場合や、全ての遺伝子個体の平均適応度が規定のしきい値を超えた場合などで判断する。全ての条件を判断しても良いし、どれか一つを判断するだけでも良い。もし、学習を終了する判断した場合（ステップＳ３５“ＹＥＳ”）は、分類器作成処理を終了する。そうでなければ（ステップＳ３５“Ｎｏ”）、ステップＳ３６に進む。 Next, the classifier creation unit 30 determines whether to end learning (step S35). Learning ends when the number of generations exceeds a predetermined threshold, when the maximum fitness exceeds a predetermined threshold, or the average fitness of all gene individuals exceeds a specified threshold. Judgment in case of All conditions may be judged, or only one of them may be judged. If it is determined to end the learning (step S35 “YES”), the classifier creation process is ended. Otherwise (step S35 “No”), the process proceeds to step S36.

次に、分類器作成部３０は、次世代の遺伝子個体を設定する処理を行う（ステップＳ３６）。この処理は遺伝的アルゴリズム（ＧＡ）の本質な部分であって、具体的には、遺伝子個体の交叉と突然変異、次世代に残す遺伝子個体の選択の処理を行う。 Next, the classifier creation unit 30 performs a process of setting a next-generation gene individual (step S36). This process is an essential part of the genetic algorithm (GA), and specifically, crossover and mutation of gene individuals and selection of gene individuals to be left in the next generation are performed.

図１２は、遺伝的アルゴリズム（ＧＡ）における一般的な世代交代処理の動作の一例を示すフローチャートである。 FIG. 12 is a flowchart showing an example of the operation of a general generation change process in the genetic algorithm (GA).

まず、分類器作成部３０は、親となる遺伝子のペアから、子の遺伝子個体を作成するために、親のペアを選択する（ステップＳ４０）。次に、分類器作成部３０は、ステップＳ４０で選択された親のペアの遺伝子個体を交叉させ、子の遺伝子個体を作成する（ステップＳ４１）。次に、分類器作成部３０は、作成された子の遺伝子個体に対し、突然変異処理を行う（ステップＳ４２）。次に、分類器作成部３０は、親の世代の中から、どの遺伝子個体を次の世代へ残すか、生存選択を行う（ステップＳ４３）。以上が世代交代処理の流れである。複製選択や交叉、突然変異、生存選択には様々な方法があって、どの手法や組み合わせを用いても構わない。 First, the classifier creation unit 30 selects a parent pair in order to create a child gene individual from a parent gene pair (step S40). Next, the classifier creation unit 30 crosses the parent pair of gene individuals selected in step S40 to create a child gene individual (step S41). Next, the classifier creation unit 30 performs a mutation process on the created child gene individual (step S42). Next, the classifier creation unit 30 performs survival selection as to which gene individual is left in the next generation among the generations of the parent (step S43). The above is the flow of the generation change process. There are various methods for replication selection, crossover, mutation, and survival selection, and any method or combination may be used.

ここで、単純ＧＡモデルと呼ばれる方法を例に挙げて、世代交代処理の説明を行う。単純ＧＡモデルでは、複製選択として、ルーレット選択を行う。ルーレット選択とは、遺伝子個体の適応度に比例した確率で遺伝子個体を選択する方法である。遺伝子個体jの選択確率p(j)は、次の式である数３によって求められる。 Here, the generation change process will be described by taking a method called a simple GA model as an example. In the simple GA model, roulette selection is performed as replication selection. Roulette selection is a method of selecting a gene individual with a probability proportional to the fitness of the gene individual. The selection probability p (j) of the gene individual j is obtained by the following equation (3).

分類器作成部３０は、ここで求められたp(j)に従うように、乱数を用いて確率的に遺伝子個体を選択する。選択された遺伝子個体をペアとして子の遺伝子個体を作成するため、選択する数は最低２個である。 The classifier creation unit 30 selects a gene individual probabilistically using random numbers so as to follow p (j) obtained here. Since a selected gene individual is paired to create a child gene individual, at least two are selected.

次に、分類器作成部３０は、ステップＳ４０で選択された遺伝子対を用いて子の遺伝子個体を作成するために、一点交叉を行う。一点交叉とは、ビット配列の任意の切断箇所を１カ所指定し、その箇所で、親の遺伝子ペアを交叉させる方法である。 Next, the classifier creation unit 30 performs one-point crossover to create a child gene individual using the gene pair selected in step S40. One-point crossing is a method of designating one arbitrary cutting position of the bit sequence and crossing the parent gene pair at that position.

図１３は、｛１００１０１｝のビット列を有する遺伝子個体Ａと、｛０００１１１｝のビット列を有する遺伝子個体Ｂとを親のペアとし、切断箇所を３ビット目に指定して一点交叉させることによって、新たな遺伝子個体Ｃと遺伝子個体Ｄを作成する様子を示したものである。 FIG. 13 shows a case where a gene individual A having a bit string of {100101} and a gene individual B having a bit string of {000111} are set as a parent pair, and a cut point is designated at the third bit and one point crossover is performed. This shows how gene individuals C and D are created.

すなわち、切断箇所より左の遺伝子個体Ａのビット列をＡ１＝｛１００｝、右のビット列をＡ２＝｛１０１｝、切断箇所より左の遺伝子個体Ｂのビット列をＢ１＝｛０００｝、右のビット列をＢ２＝[１１１]とおくと、一点交叉によって新たに作成される遺伝子個体Cのビット列は｛Ａ１｝｛Ｂ２｝＝｛１００１１１｝、遺伝子個体Ｄのビット列は｛Ｂ１｝｛Ａ２｝＝｛０００１０１｝となる様子を示す。 That is, the bit string of the gene individual A to the left of the cut position is A1 = {100}, the right bit string is A2 = {101}, the bit string of the gene individual B to the left of the cut position is B1 = {000}, and the right bit string is If B2 = [111], the bit string of gene individual C newly created by one-point crossover is {A1} {B2} = {100111}, and the bit string of gene individual D is {B1} {A2} = {000101} It shows how it becomes.

次に、単純ＧＡモデルにおける突然変異の処理の例を示す。新たに作成された遺伝子個体の各々のビットに対して、所定の割合でビットを反転させる。ビット反転の発生確率は、０．１％程度の小さい値に設定する。 Next, an example of mutation processing in a simple GA model is shown. For each bit of a newly created gene individual, the bit is inverted at a predetermined rate. The occurrence probability of bit inversion is set to a small value of about 0.1%.

次に、単純ＧＡモデルにおける生存選択であるが、新たに作成された遺伝子個体のみを次世代に残し、旧世代の遺伝子個体は全て淘汰する方法を取る。１世代の遺伝子個体数に満たない分は、遺伝子個体のビット列をランダムで生成して新たに作成する。 Next, as for survival selection in a simple GA model, only a newly created gene individual is left in the next generation, and all the old generation gene individuals are deceived. For the number of genes less than one generation, a bit string of gene individuals is randomly generated and newly created.

以上が世代交代処理の一例である。世代交代処理を終えると、ステップＳ３２に戻り、新しい世代の遺伝子個体に対し、適応度を計算する。 The above is an example of the generation change process. When the generation change process is completed, the process returns to step S32, and the fitness is calculated for the new generation of gene individuals.

以上が学習アルゴリズムにＧＡを適用して”ｋ−ＮＮ”を作成する例である。 The above is an example of creating “k-NN” by applying GA to the learning algorithm.

“ｋ−ＮＮ”の作成処理について詳述したが、もちろん、他の分類器や学習アルゴリズムを用いて分類器を作成することができる。例えば、分類器の１つである決定木を作成することもできる。決定木では、学習アルゴリズムとしてＣＡＲＴやＣ４．５などが知られている。これらの学習アルゴリズムに従って、決定木を作成することができる。また、学習用データの変数の選択にＧＡを適用し、決定木を構築することもできる。 Although the creation process of “k-NN” has been described in detail, it is needless to say that a classifier can be created using another classifier or a learning algorithm. For example, a decision tree that is one of the classifiers can be created. In the decision tree, CART, C4.5, and the like are known as learning algorithms. A decision tree can be created according to these learning algorithms. In addition, a decision tree can be constructed by applying GA to selection of variables for learning data.

また、学習用データ集合と、評価用データ集合を用いて、多層パーセプトロンなどのニューラルネットワークを作成することもできる。 A neural network such as a multilayer perceptron can also be created using the learning data set and the evaluation data set.

また、分類器としてサポートベクタマシン(ＳＶＭ)を利用することも出来る。また、ＳＶＭでは、非線形の分離問題に対応するために、カーネル関数を用いた非線形ＳＶＭが知られている。この時用いられるカーネル関数として、多項式型カーネルや、ガウシアン型カーネルなどが知られているが、これらを異なる構築パラメータとしてＳＶＭを作成しても良い。 A support vector machine (SVM) can also be used as a classifier. In SVM, in order to cope with a non-linear separation problem, non-linear SVM using a kernel function is known. As a kernel function used at this time, a polynomial type kernel, a Gaussian type kernel, or the like is known, but an SVM may be created using these as different construction parameters.

また、入力データの正規化処理を行って分類器を作成しても良い。 Further, the classifier may be created by performing normalization processing of the input data.

また、本実施の形態では、”ｋ−ＮＮ”で用いる変数の選択を遺伝的アルゴリズムを用いて行ったが、これに限定されるものではなく、決定木やサポートベクタマシンなど他の種類の分類器を作成する場合にも、遺伝的アルゴリズムを用いて入力データを構成する変数の選択を行っても良い。 In this embodiment, the variable used in “k-NN” is selected using a genetic algorithm, but the present invention is not limited to this, and other types of classifications such as decision trees and support vector machines are used. When creating a container, the variables constituting the input data may be selected using a genetic algorithm.

分類器出力部４０は、分類器格納部１３に格納された分類器を、正答率の高い順に、所定の個数だけ選び出し、外部接続部１００を介して出力する。例えば、図９に示す分類器格納部１３に格納された分類器の例において、正答率の高い分類器の上位３つを選択する場合、正答率０．８５の「分類器５」と、正答率０．８０の「分類器１」と、正答率０．７５の「分類器６」とを選択する。 The classifier output unit 40 selects a predetermined number of classifiers stored in the classifier storage unit 13 in descending order of the correct answer rate, and outputs the selected classifiers via the external connection unit 100. For example, in the example of the classifier stored in the classifier storage unit 13 illustrated in FIG. 9, when selecting the top three classifiers with the highest correct answer rate, “Classifier 5” with a correct answer rate of 0.85 and the correct answer “Classifier 1” with a rate of 0.80 and “Classifier 6” with a correct answer rate of 0.75 are selected.

また、同一の学習用データ集合によって作成された分類器ごとに、正答率の高い順に所定の個数ずつ取得するようにしても良い。例えば、図９の例では、１対１データ集合の識別子は「Ｍ１」と、「Ｍ２」と、「Ｆ１」の３種類である。これらの識別子ごとに、正答率の高い分類器を１つ選択する場合、「Ｍ１」から作成された分類器では、正答率０．８０の「分類器１」を、「Ｍ２」から作成された分類器では、正答率０．６８の「分類器４」を、「Ｆ１」から作成された分類器では、正答率０．８５の「分類器5」を、それぞれ選択する。 Alternatively, a predetermined number of classifiers created from the same learning data set may be acquired in descending order of the correct answer rate. For example, in the example of FIG. 9, there are three types of identifiers of the one-to-one data set: “M1”, “M2”, and “F1”. For each of these identifiers, when selecting a classifier with a high correct answer rate, a classifier created from “M1” is created from “M2”, “Classifier 1” with a correct answer rate of 0.80. The classifier selects “Classifier 4” with a correct answer rate of 0.68, and the classifier created from “F1” selects “Classifier 5” with a correct answer rate of 0.85.

また、所定値よりも正答率の高い分類器を選択するようにしても良い。例えば、正答率が０．８０以上である分類器を選択するとした場合、図９の例では、正答率０．８０の分類器１と、正答率０．８５の分類器５を、それぞれ選択する。 In addition, a classifier having a higher correct answer rate than a predetermined value may be selected. For example, when a classifier having a correct answer rate of 0.80 or more is selected, in the example of FIG. 9, a classifier 1 having a correct answer rate of 0.80 and a classifier 5 having a correct answer rate of 0.85 are selected. .

また、出力する分類器の個数は、分類器出力部４０に予め定める所定個数を基本とするが、外部から指定された個数を出力するようにしても良い。また、全ての分類器を出力するようにしても良い。 The number of classifiers to be output is based on a predetermined number that is predetermined in the classifier output unit 40, but the number specified from the outside may be output. Alternatively, all classifiers may be output.

分類器出力部４０によって選択された分類器は、外部接続部１００を介して出力される。外部接続部１００は、データの通信経路であって、バスやネットワークケーブルなどである。 The classifier selected by the classifier output unit 40 is output via the external connection unit 100. The external connection unit 100 is a data communication path, and is a bus, a network cable, or the like.

以上がデータ分類器作成装置１の説明である。 The above is the description of the data classifier creating apparatus 1.

従って、本実施形態１のデータ分類器作成装置１によれば、複数の分類器を作成して格納するので、１つの入力データに対し複数の評価者もしくは複数の判断基準を扱うことができると共に、１つの入力データに対し複数の評価者もしくは複数の判断基準を扱かってデジタルコンテンツを分類することができる。 Therefore, according to the data classifier creating apparatus 1 of the first embodiment, since a plurality of classifiers are created and stored, a plurality of evaluators or a plurality of judgment criteria can be handled for one input data. Digital content can be classified by handling a plurality of evaluators or a plurality of judgment criteria for one input data.

その結果、１つの入力データに対し、複数の正解データが割り当てられているようなデータ集合で分類器を構築しても、分類精度の良い分類器を作成することが出来る。 As a result, even if a classifier is constructed with a data set in which a plurality of correct answer data are assigned to one input data, a classifier with high classification accuracy can be created.

また、本実施形態１のデータ分類器作成装置１では、１つの入力データに対して、複数の評価者によって複数の正解データが与えられた場合であっても、入力データと正解データとが１対１である１対１データを集めた１対１データ集合によって分類器を作成しているので、精度の良い分類器を作成することができ、特定の評価者の評価に偏らない、バランスの良い分類器を作成することが出来る。 In the data classifier creating apparatus 1 according to the first embodiment, even when a plurality of correct answer data are given to a single input data by a plurality of evaluators, the input data and the correct answer data are one. Since the classifier is created by the one-to-one data set that collects the one-to-one data that is one-to-one, it is possible to create a classifier with high accuracy, which is not biased toward the evaluation of a specific evaluator. A good classifier can be created.

実施の形態２．
図１４は、本発明の実施の形態２におけるデータ分類装置２の構成例を示すブロック図である。 Embodiment 2. FIG.
FIG. 14 is a block diagram showing a configuration example of the data classification device 2 according to Embodiment 2 of the present invention.

図１４において、実施の形態２のデータ分類装置２は、分類器取得部６０と、分類器保存部７０と、分類実行部８０と、分類結果統合部９０とを備える。また、このデータ分類装置２は、外部接続部１００を介して図１に示す実施の形態１のデータ分類器作成装置１と接続されている。このデータ分類装置２も、図１に示す実施形態１のデータ分類装置２と同様に、ソフトウェア的に構成するようにしてもよい。 14, the data classification device 2 according to the second embodiment includes a classifier acquisition unit 60, a classifier storage unit 70, a classification execution unit 80, and a classification result integration unit 90. The data classification device 2 is connected to the data classifier creation device 1 according to the first embodiment shown in FIG. The data classification device 2 may also be configured as software, similar to the data classification device 2 of the first embodiment shown in FIG.

図１５は、図１４に示すデータ分類装置２の動作の一例を示すフローチャートである。 FIG. 15 is a flowchart showing an example of the operation of the data classification device 2 shown in FIG.

図１５を参照して、図１４に示す実施の形態２におけるデータ分類装置２の動作を説明する。 Referring to FIG. 15, the operation of data classification device 2 in the second embodiment shown in FIG. 14 will be described.

まず、データ分類装置２の分類器取得部６０は、データ分類器作成装置１から外部接続部１００を介して分類器を取得し、取得した分類器を、分類器保存部７０に格納する（ステップＳ２００）。 First, the classifier acquisition unit 60 of the data classification device 2 acquires a classifier from the data classifier creation device 1 via the external connection unit 100, and stores the acquired classifier in the classifier storage unit 70 (step). S200).

次に、データ分類装置２の分類実行部８０は、外部装置２００から、入力データが入力されたか否かの判断を行う（ステップＳ２１０）。そして、入力データが入力されたのであれば（ステップＳ２１０“ＹＥＳ”）、ステップＳ２２０に進み、そうでなければ（ステップＳ２１０“ＮＯ”）、入力データが入力されるまで待機する。なお、外部装置２００とは、バスやＬＡＮなどのケーブルや、マウスなどの入力装置で接続されている。 Next, the classification execution unit 80 of the data classification device 2 determines whether or not input data has been input from the external device 200 (step S210). If input data has been input (step S210 “YES”), the process proceeds to step S220. If not (step S210 “NO”), the process waits until input data is input. The external device 200 is connected to a cable such as a bus or LAN, or an input device such as a mouse.

次に、データ分類装置２の分類実行部８０は、分類器保存部７０に格納された分類器を利用して、前記入力された入力データの分類を行う。（ステップＳ２２０）。 Next, the classification execution unit 80 of the data classification device 2 classifies the input data that has been input using the classifier stored in the classifier storage unit 70. (Step S220).

次に、データ分類装置２の分類結果統合部９０は、複数の分類器から出力された分類結果を統合し、最終的な分類結果を決定する（ステップＳ２３０）。 Next, the classification result integration unit 90 of the data classification device 2 integrates the classification results output from the plurality of classifiers, and determines the final classification result (step S230).

次に、データ分類装置２の分類実行部８０は、データを入力した外部装置２００に、前記最終的な分類結果を出力する（ステップＳ２４０）。以上が、データ分類装置２の動作概要である。 Next, the classification execution unit 80 of the data classification device 2 outputs the final classification result to the external device 200 that has input the data (step S240). The above is the outline of the operation of the data classification device 2.

次に、データ分類装置２の各部の説明を詳細に行う。 Next, each part of the data classification device 2 will be described in detail.

分類器取得部６０は、外部接続部１００を介して、データ分類器作成装置１から、データ分類器作成装置１に設定された個数だけ、分類器を取得し、取得した分類器を分類器保存部７０に格納する。分類器保存部７０は、データ分類器作成装置１の分類器格納部１３と同様のデータを格納する。すなわち、図９に示すように、学習用データ集合として用いた１対１データ集合の識別子と、正答率と、分類器構築情報とを組み合わせて格納する。 The classifier acquisition unit 60 acquires the number of classifiers set in the data classifier generation device 1 from the data classifier generation device 1 via the external connection unit 100, and stores the acquired classifiers as classifiers. Store in the unit 70. The classifier storage unit 70 stores the same data as the classifier storage unit 13 of the data classifier creation device 1. That is, as shown in FIG. 9, the identifier of the one-to-one data set used as the learning data set, the correct answer rate, and the classifier construction information are stored in combination.

分類器取得部６０は、データ分類器作成装置１に、取得する分類器の個数を指定して、その個数だけ分類器を取得するようにしても良い。また、全ての分類器をデータ分類器作成装置１から取得するようにしても良い。また、正答率が所定値以上である分類器を、データ分類器作成装置１から取得するようにしても良い。 The classifier acquisition unit 60 may specify the number of classifiers to be acquired in the data classifier creation apparatus 1 and acquire the same number of classifiers. Further, all classifiers may be acquired from the data classifier creating apparatus 1. Further, a classifier having a correct answer rate equal to or higher than a predetermined value may be acquired from the data classifier creating apparatus 1.

分類実行部８０は、外部から入力データが入力されると、分類器保存部７０に格納された複数の分類器に入力データを与え、複数の分類結果を得る。例えば、６つの分類器に入力データを与えて分類結果を取得する場合、｛Ｃ１，Ｃ１，Ｃ１，Ｃ２，Ｃ２，Ｃ３｝といったように、それぞれの分類器から１つずつ、合計６個の分類結果を取得する。 When input data is input from the outside, the classification execution unit 80 gives input data to a plurality of classifiers stored in the classifier storage unit 70, and obtains a plurality of classification results. For example, when obtaining classification results by providing input data to six classifiers, a total of six classifications, one from each classifier, such as {C1, C1, C1, C2, C2, C3}. Get the result.

前記分類例｛Ｃ１，Ｃ１，Ｃ１，Ｃ２，Ｃ２，Ｃ３｝では、ユニークなクラスがＣ１，Ｃ２，Ｃ３と３つ存在するため、最終的な分類結果を１つのクラスとするためには、それらの分類結果をまとめて１つのクラスにする統合処理が必要である。 In the classification example {C1, C1, C1, C2, C2, C3}, there are three unique classes C1, C2, and C3. Therefore, in order to make the final classification result one class, It is necessary to integrate the classification results into one class.

分類結果統合部９０は、分類実行部８０によって得られた複数の分類結果を統合し、最終的な分類結果となるクラスを１つ決定する。分類結果統合部９０は、分類器から取得した分類結果の中で、最も多くの分類器から出力されたクラスを、最終分類結果として選択する多数決方式を用いる。例えば、分類結果が、｛Ｃ１，Ｃ１，Ｃ１，Ｃ２，Ｃ２，Ｃ３｝であったとする。この分類結果の例で多数決処理を行うと、Ｃ１が３票、Ｃ２が２票、C３が１票であるので、最大の得票数を獲得したC１を最終分類結果として選択する。 The classification result integration unit 90 integrates a plurality of classification results obtained by the classification execution unit 80, and determines one class as a final classification result. The classification result integration unit 90 uses a majority voting method that selects, from the classification results acquired from the classifiers, the class output from the largest number of classifiers as the final classification result. For example, it is assumed that the classification result is {C1, C1, C1, C2, C2, C3}. When the majority process is performed in the example of the classification result, C1 has three votes, C2 has two votes, and C3 has one vote. Therefore, C1 that has acquired the maximum number of votes is selected as the final classification result.

また、分類器それぞれの１票の大きさを正答率の値とするなど、分類器の１票に重み付けをして、得票数を計算しても良い。例えば、正答率が０．８の分類器と、正答率が０．７の分類器とが、同一のクラスを分類結果として出力した場合、０．８と０．７を足した１．５を、そのクラスの得票数とする。 Also, the number of votes may be calculated by weighting one vote of the classifier, such as setting the size of one vote of each classifier as the value of the correct answer rate. For example, when a classifier with a correct answer rate of 0.8 and a classifier with a correct answer rate of 0.7 output the same class as a classification result, 1.5 is obtained by adding 0.8 and 0.7. , The number of votes for that class.

以上が、データ分類装置２の説明である。 The above is the description of the data classification device 2.

従って、本実施形態２のデータ分類装置２によれば、実施形態１のデータ分類器作成装置１によって作成され格納された複数の分類器を取得し、複数の分類器にて分類を実行し、その分類結果を統合して、最適な最終分類結果を選択することができ、その結果、従来よりも精度の良い分類結果を取得することが出来る。 Therefore, according to the data classification device 2 of the second embodiment, a plurality of classifiers created and stored by the data classifier creation device 1 of the first embodiment are acquired, and classification is performed by the plurality of classifiers. By integrating the classification results, it is possible to select the optimum final classification result, and as a result, it is possible to obtain a classification result with higher accuracy than in the past.

実施の形態３．
実施の形態３のデータ分類器作成装置３は、実施の形態１のデータ分類器作成装置１で作成されるものと同様の分類器（一般分類器）に加え、限定された種類の正解データのみで構成された部分１対１データ集合からも、複数の分類器（専用分類器）を作成するようにしたものである。専用分類器は、限定された種類の正解データのみで構成された部分１対１データ集合から作成されるため、出力されるデータも、作成に用いられた正解データと同じ種類のクラスに限定される。このような専用分類器は、一般分類器による分類の結果、得票数が上位であるクラス間に所定以上の得票差が見られないなどの場合において、最終的なクラスへの分類を、より高い精度で行なうための決選投票に用いられる。例えば、Ｃ１とＣ２の２つのクラスについて決選投票を行う場合には、Ｃ１とＣ２のみを正解データとして含む１対１データ集合に基づいて作成された専用分類器を用いる。この専用分類器は、Ｃ１とＣ２のみを正解データとして含む１対１データ集合から作成された分類器であるため、出力される分類結果は、必ずＣ１かＣ２のどちらかのクラスとなる。 Embodiment 3 FIG.
The data classifier creating apparatus 3 according to the third embodiment is not limited to the same classifier (general classifier) as that created by the data classifier creating apparatus 1 according to the first embodiment, but only limited types of correct answer data. A plurality of classifiers (dedicated classifiers) are also created from the partial one-to-one data set configured as described above. Since the dedicated classifier is created from a partial one-to-one data set composed only of limited types of correct answer data, the output data is also limited to the same type of class as the correct answer data used for creation. The Such a dedicated classifier has a higher classification to the final class when the result of classification by the general classifier does not show a difference of more than a predetermined vote between classes with the highest number of votes. Used for final voting for accuracy. For example, when the final voting is performed for two classes C1 and C2, a dedicated classifier created based on a one-to-one data set including only C1 and C2 as correct answer data is used. Since this dedicated classifier is a classifier created from a one-to-one data set including only C1 and C2 as correct data, the classification result to be output is always one of classes C1 and C2.

図１６は、実施の形態３のデータ分類器作成装置３の構成例を示すブロック図である。 FIG. 16 is a block diagram illustrating a configuration example of the data classifier creating apparatus 3 according to the third embodiment.

図１６において、データ分類作成装置３は、データ分類器作成装置１と同様の構成であるが、１対１データ格納部１２と、分類器格納部１３が格納する内容と、１対１データ作成部２０と、分類器作成部３０の動作内容が異なるため、本実施の形態３においては、それぞれ１対１データ格納部１２Ａ、分類器格納部１３Ａ、１対１データ作成部２０Ａ、分類器作成部３０Ａとする。このデータ分類器作成装置３も、実施形態１のデータ分類器作成装置１と同様に、ソフトウェア的に構成するようにしてもよい。 In FIG. 16, the data classification creation device 3 has the same configuration as the data classifier creation device 1, but the content stored in the one-to-one data storage unit 12, the classifier storage unit 13, and the one-to-one data creation. Since the operation contents of the unit 20 and the classifier creation unit 30 are different, in the third embodiment, the one-to-one data storage unit 12A, the classifier storage unit 13A, the one-to-one data creation unit 20A, and the classifier creation, respectively. Part 30A. The data classifier creating apparatus 3 may also be configured as software, similar to the data classifier creating apparatus 1 of the first embodiment.

ここで、実施の形態３のデータ分類器作成装置３の１対１データ作成部２０Ａは、データ分類器作成装置１の１対１データ集合の作成処理に加えて、特定の正解データが設定された１対１データのみで構成される部分１対１データ集合を１つ以上作成し、１対１データ格納部１２Ａに格納する。この部分１対１データ集合は、２種類の正解データの組み合わせを子の要素とし、その組み合わせの数だけ作成されることを基本とする。なお、２種類ではなく、２より大きな数であれば良い。また、全ての組み合わせで部分１対１データ集合を作成せずに、設計者側が部分１対１データ集合を作成する組み合わせを予め指定しても良い。 Here, the one-to-one data creation unit 20A of the data classifier creation device 3 according to the third embodiment sets specific correct answer data in addition to the one-to-one data set creation processing of the data classifier creation device 1. One or more partial one-to-one data sets composed of only one-to-one data are created and stored in the one-to-one data storage unit 12A. This partial one-to-one data set is based on the fact that a combination of two types of correct answer data is used as a child element and the number of combinations is created. Note that the number is not limited to two, but may be a number larger than two. Alternatively, the designer may specify in advance a combination for creating a partial one-to-one data set without creating a partial one-to-one data set for all combinations.

図１７は、１対１データ作成部２０Ａの動作の一例を示すフローチャートである。１対１データ作成部２０と同様の処理を行うステップに関しては、同様のステップ番号を割り振っている。 FIG. 17 is a flowchart showing an example of the operation of the one-to-one data creation unit 20A. Similar steps are assigned to steps that perform the same processing as the one-to-one data creation unit 20.

まず、１対１データ作成部２０Ａは、２種類のクラスを構成要素とする空の部分１対１データ集合を、１対１データ格納部１２に作成する（ステップＳ１００）。部分１対１データ集合は、２種類のクラスの組み合わせ全てについて、評価ＩＤのユニーク数だけ作成される。例えば、クラスの組み合わせが１０通りあって、評価ＩＤのユニーク数が３である場合、１０×３＝３０で、３０個の部分１対１データ集合を作成する。また、それぞれの部分１対１データ集合の識別子として、構成要素と、評価ＩＤを組み合わせたものを与える。例えば、構成要素のクラスが「Ｃ１」と「Ｃ２」であり、評価ＩＤが「Ｍ１」である場合、その部分１対１データ集合の識別子として「Ｍ１｛Ｃ１，Ｃ２｝」などを与える。 First, the one-to-one data creation unit 20A creates an empty partial one-to-one data set having two types of classes as components in the one-to-one data storage unit 12 (step S100). The partial one-to-one data set is created by the unique number of evaluation IDs for all combinations of two types of classes. For example, when there are 10 combinations of classes and the unique number of evaluation IDs is 3, 30 partial one-to-one data sets are created with 10 × 3 = 30. In addition, a combination of a component and an evaluation ID is given as an identifier of each partial one-to-one data set. For example, when the component classes are “C1” and “C2” and the evaluation ID is “M1”, “M1 {C1, C2}” or the like is given as the identifier of the partial one-to-one data set.

続くステップＳ１１０と、ステップＳ１２０と、ステップＳ１３０は、実施の形態１の１対１データ作成部２０と同様である。 Subsequent step S110, step S120, and step S130 are the same as the one-to-one data creation unit 20 of the first embodiment.

次に、１対１データ作成部２０Ａは、前記作成した１対１データの正解データと、評価ＩＤとを基に、対応する部分１対１データ集合に、１対１データを追加する（ステップＳ１３５）。該当する部分１対１データ集合が複数存在する場合は、該当する部分１対１データ集合全てに１対１データを追加する。 Next, the one-to-one data creation unit 20A adds one-to-one data to the corresponding partial one-to-one data set based on the created correct data of the one-to-one data and the evaluation ID (Step 1). S135). When there are a plurality of corresponding partial one-to-one data sets, one-to-one data is added to all the corresponding partial one-to-one data sets.

図１８（ａ）〜（ｃ）は、それぞれ、評価ＩＤ「Ｍ１，Ｍ２，Ｆ１」ごとに、クラスＣ１，Ｃ２のみを要素とする部分１対１データ集合に１対１データを格納した状態を示すものである。 FIGS. 18A to 18C show a state in which one-to-one data is stored in a partial one-to-one data set having only classes C1 and C2 for each evaluation ID “M1, M2, F1”. It is shown.

図１８に示す３つの部分１対１データ集合は、正解データＣ１，Ｃ２が割り振られた１対１データのみを格納したものであって、さらに、図１８（ａ）は、評価ＩＤ「Ｍ１」によって、図１８（ｂ）は評価ＩＤ「Ｍ２」によって、図１８（ｃ）は評価ＩＤ「Ｆ１」によって割り振られたものを格納している。 The three partial one-to-one data sets shown in FIG. 18 store only the one-to-one data to which the correct data C1 and C2 are allocated. FIG. 18A shows an evaluation ID “M1”. FIG. 18B stores information assigned by the evaluation ID “M2”, and FIG. 18C stores information assigned by the evaluation ID “F1”.

次のステップＳ１４０は、データ分類器作成装置１の１対１データ作成部２０と同様である。以上の１対１データ作成部２０Ａによって、１対１データ格納部１２Ａには、図５に示す１対１データ集合に加えて、図１８に示す部分１対１データ集合も格納される。 The next step S140 is the same as the one-to-one data creation unit 20 of the data classifier creation device 1. In addition to the one-to-one data set shown in FIG. 5, the partial one-to-one data set shown in FIG. 18 is also stored in the one-to-one data storage unit 12A by the above-described one-to-one data creation unit 20A.

また、データ分類器作成装置１の１対１データ作成部２０と同様に、シャッフル処理を行うことで、評価ＩＤを混合した部分１対１データ集合を作成することもできる。シャッフル処理は、１対１データ作成部２０と同様に、ステップＳ１４０の後に行われる。動作の流れも図６に示すフローチャートとほぼ同様であるが、ステップＳ１５１の処理が若干異なる。具体的には、部分１対１データ集合の評価ＩＤを混合するために、１対１データを交換する部分１対１データ集合のペアを、まだシャッフル処理を行っていないものの中から選択する。部分１対１データ集合のペアは、１対１データの有する正解データの組み合わせが同一であって、評価ＩＤが異なるものとする。その他のステップＳ１５２、ステップＳ１５３に関しては、実施の形態１と同様である。 Similarly to the one-to-one data creation unit 20 of the data classifier creation device 1, a partial one-to-one data set in which evaluation IDs are mixed can be created by performing a shuffle process. The shuffle process is performed after step S140, as in the one-to-one data creation unit 20. The flow of operation is almost the same as the flowchart shown in FIG. 6, but the processing in step S151 is slightly different. Specifically, in order to mix evaluation IDs of partial one-to-one data sets, a pair of partial one-to-one data sets for exchanging one-to-one data is selected from those that have not yet been shuffled. Assume that pairs of partial one-to-one data sets have the same combination of correct data in one-to-one data and different evaluation IDs. Other steps S152 and S153 are the same as those in the first embodiment.

以上が、１対１データ作成部２０Ａの説明である。 The above is the description of the one-to-one data creation unit 20A.

分類器作成部３０Ａの動作の流れは、図７に示すデータ分類器作成装置１の分類器作成部３０と同様であるが、ステップＳ２１と、ステップＳ２２と、ステップＳ２６の動作の内容が異なる。 The operation flow of the classifier creating unit 30A is the same as that of the classifier creating unit 30 of the data classifier creating apparatus 1 shown in FIG. 7, but the contents of the operations in step S21, step S22, and step S26 are different.

まず、分類器作成部３０Ａは、１対１データ格納部１２Ａに格納された１対１データ集合群の中から、まだ選択されていない１対１データ集合を１つ選び出すと共に、各部分１対１データ集合群からも、部分１対１データ集合を１つ選び出す（ステップＳ２１）。部分１対１データ集合群とは、構成要素を同一とする部分１対１データ集合をまとめたものである。例えば、図１８に示す、部分１対１データ集合Ｍ１｛Ｃ１，Ｃ２｝と、部分１対１データ集合Ｍ２｛Ｃ１，Ｃ２｝と、部分１対１データ集合Ｍ３｛Ｃ１，Ｃ２｝は、同一の構成要素｛Ｃ１，Ｃ２｝を有する部分１対１データ集合であるので、これらをまとめたものが部分１対１データ集合群である。部分１対１データ集合群は、構成要素の組み合わせの数だけ存在し、部分１対１データ集合は、その部分１対１データ集合群ごとに１つずつ選択される。 First, the classifier creation unit 30A selects one one-to-one data set that has not yet been selected from the one-to-one data set group stored in the one-to-one data storage unit 12A. One partial one-to-one data set is also selected from one data set group (step S21). The partial one-to-one data set group is a collection of partial one-to-one data sets having the same components. For example, the partial one-to-one data set M1 {C1, C2}, the partial one-to-one data set M2 {C1, C2}, and the partial one-to-one data set M3 {C1, C2} shown in FIG. Since the partial one-to-one data set having the constituent elements {C1, C2} is a partial one-to-one data set group. There are as many partial one-to-one data sets as there are combinations of components, and one partial one-to-one data set is selected for each partial one-to-one data set group.

次に、分類器作成部３０Ａは、ステップＳ２１で選択された１対１データ集合と、部分１対１データ集合から、学習用データ集合と、評価用データ集合とを選択する（ステップＳ２２）。１対１データ集合から学習用データ集合と、評価用データ集合とを選択する処理については、実施の形態１と同様である。 Next, the classifier creating unit 30A selects a learning data set and an evaluation data set from the one-to-one data set selected in step S21 and the partial one-to-one data set (step S22). The processing for selecting the learning data set and the evaluation data set from the one-to-one data set is the same as in the first embodiment.

分類器作成部３０Ａでは、さらに、部分１対１データ集合からも、学習用データ集合と、評価用データ集合とを選択する。その場合は、ステップＳ２１で選択された部分１対１データ集合を均等に２分割して、一方を学習用データ集合、もう一方を評価用データ集合とする。例えば、奇数行を学習用データ、偶数行を評価用データとすると、均等にデータを２分割することができる。この処理を、選択された全ての部分１対１データ集合に対して行う。 The classifier creating unit 30A further selects a learning data set and an evaluation data set from the partial one-to-one data set. In this case, the partial one-to-one data set selected in step S21 is equally divided into two, and one is set as a learning data set and the other is set as an evaluation data set. For example, if odd lines are learning data and even lines are evaluation data, the data can be equally divided into two. This process is performed on all selected partial one-to-one data sets.

また、部分１対１データ集合の評価用データ集合は、前記学習用データ集合と同じ部分１対１データ集合群に属する部分１対１データ集合の中から、前記学習用データ集合として選ばれた以外の部分１対１データ集合を、ランダムに１つ選び出すようにしても良い。 Further, the evaluation data set of the partial one-to-one data set was selected as the learning data set from the partial one-to-one data sets belonging to the same partial one-to-one data set group as the learning data set. Other partial 1-to-1 data sets may be selected at random.

続くステップＳ２３と、ステップＳ２４と、ステップＳ２５は、実施の形態１と同様である。 Subsequent step S23, step S24, and step S25 are the same as in the first embodiment.

次に、分類器作成部３０Ａは、作成した分類器を分類器格納部１３Ａに格納する（ステップＳ２６）。分類器作成部３０Ａは、１対１データ集合を学習用データとして作成した分類器である一般分類器と、部分１対１データ集合を学習用データとして作成した分類器である専用分類器とを分けて、分類器格納部１３Ａに格納する。 Next, the classifier creation unit 30A stores the created classifier in the classifier storage unit 13A (step S26). The classifier creating unit 30A includes a general classifier that is a classifier that creates a one-to-one data set as learning data, and a dedicated classifier that is a classifier that creates a partial one-to-one data set as learning data. Separately, it is stored in the classifier storage unit 13A.

図１９は、分類器格納部１３Ａに格納された分類器の一例である。図１９（ａ）は一般分類器の格納例であり、図１９（ｂ）は、専用分類器の格納例である。図１９（ａ）の一般分類器のフォーマットは、実施の形態１で説明の図９と同様である。図１９（ｂ）は、分類器識別子と、部分１対１データ集合の識別子と、構成要素と、正答率と、分類器構築情報とを格納している。分類器識別子と、正答率と、分類器構築情報は、（ａ）の一般分類器と同様である。部分１対１データ集合の識別子は、専用分類器の作成に用いられた学習用データ集合の識別子である。構成要素は、専用分類器の構築に用いた部分１対１データ集合の構成要素を示す。例えば、図１９(b)に示す３つの専用分類器では、構成要素｛Ｃ１，Ｃ２｝が格納されている。これは、正解データがＣ１とＣ２である１対１データのみの部分１対１データ集合から作成された、Ｃ１とＣ２を分類するための専用分類器であることを示す。 FIG. 19 is an example of a classifier stored in the classifier storage unit 13A. FIG. 19A is a storage example of a general classifier, and FIG. 19B is a storage example of a dedicated classifier. The format of the general classifier in FIG. 19A is the same as that in FIG. 9 described in the first embodiment. FIG. 19B stores classifier identifiers, partial one-to-one data set identifiers, components, correct answer rates, and classifier construction information. The classifier identifier, the correct answer rate, and the classifier construction information are the same as those of the general classifier of (a). The identifier of the partial one-to-one data set is the identifier of the learning data set used for creating the dedicated classifier. The component indicates a component of the partial one-to-one data set used for the construction of the dedicated classifier. For example, in the three dedicated classifiers shown in FIG. 19B, the components {C1, C2} are stored. This indicates that it is a dedicated classifier for classifying C1 and C2 created from a partial one-to-one data set of only one-to-one data whose correct answer data is C1 and C2.

次のステップＳ２７は、実施の形態１と同様である。
以上がデータ分類器作成装置３の説明である。 The next step S27 is the same as that in the first embodiment.
The above is the description of the data classifier creation device 3.

従って、実施の形態３のデータ分類器作成装置３によれば、実施の形態１のデータ分類器作成装置１で作成されるものと同様の分類器（一般分類器）を複数作成することができると共に、いくつかのクラスの正解データのみで構成された部分１対１データ集合から複数の分類器（専用分類器）を作成することができる。 Therefore, according to the data classifier creating apparatus 3 of the third embodiment, a plurality of classifiers (general classifiers) similar to those created by the data classifier creating apparatus 1 of the first embodiment can be created. At the same time, a plurality of classifiers (dedicated classifiers) can be created from a partial one-to-one data set composed of only some classes of correct data.

実施の形態４．
図２０は、本発明の実施の形態４におけるデータ分類装置４の構成例を示すブロック図である。 Embodiment 4 FIG.
FIG. 20 is a block diagram showing a configuration example of the data classification device 4 according to Embodiment 4 of the present invention.

図２０において、データ分類装置４は、分類器取得部６０Ａと、分類器保存部７０と、分類実行部８０Ａと、分類結果統合部９０とを備え、外部接続部１００を介して図１６に示す実施の形態３のデータ分類作成装置３と接続されている。分類器保存部７０と、分類結果統合部９０は、実施の形態２におけるデータ分類装置２のものと同様であるため、説明を省略する。 20, the data classification device 4 includes a classifier acquisition unit 60A, a classifier storage unit 70, a classification execution unit 80A, and a classification result integration unit 90, and is illustrated in FIG. 16 via the external connection unit 100. It is connected to the data classification creation device 3 of the third embodiment. Since the classifier storage unit 70 and the classification result integration unit 90 are the same as those of the data classification device 2 in the second embodiment, the description thereof is omitted.

ここで、実施の形態４のデータ分類装置４では、実施の形態２のデータ分類装置２の分類器取得部６０と分類実行部８０の動作が異なるため、データ分類装置４では、それぞれ分類器取得部６０Ａ、分類実行部８０Ａとする。また、データ分類装置４の分類器取得部６０は、外部接続部１００を介して、データ分類器作成装置３と連結されている。また、分類実行部８０は、外部装置２００と接続されている。このデータ分類装置４も、実施形態１〜３のデータ分類装置１〜３と同様に、ソフトウェア的に構成するようにしてもよい。 Here, in the data classification device 4 according to the fourth embodiment, the operations of the classifier acquisition unit 60 and the classification execution unit 80 of the data classification device 2 according to the second embodiment are different. 60A and classification execution unit 80A. The classifier acquisition unit 60 of the data classification device 4 is connected to the data classifier creation device 3 through the external connection unit 100. The classification execution unit 80 is connected to the external device 200. This data classification device 4 may also be configured as software, similar to the data classification devices 1 to 3 of the first to third embodiments.

データ分類装置４における分類器取得部６０Ａは、データ分類装置２の分類器取得部６０と同様に、一般分類器をデータ分類器作成装置３から取得し、さらに、専用分類器も、データ分類器作成装置３から取得する処理を行う。 Similar to the classifier acquisition unit 60 of the data classifier 2, the classifier acquisition unit 60A in the data classifier 4 acquires a general classifier from the data classifier creation device 3, and the dedicated classifier also includes a data classifier. Processing to be acquired from the creation device 3 is performed.

データ分類装置４における分類実行部８０Ａは、データ分類装置２の分類実行部８０と同様に、複数の一般分類器を用いて分類を行い、さらに、前記一般分類器の分類結果が所定の条件を満たさない場合は、分類器取得部６０Ａに専用分類器を取得させ、その専用分類器を用いて決選投票を行う。 Similar to the classification execution unit 80 of the data classification device 2, the classification execution unit 80A in the data classification device 4 performs classification using a plurality of general classifiers, and the classification result of the general classifiers satisfies a predetermined condition. If not, the classifier acquisition unit 60A acquires a dedicated classifier, and performs a voting vote using the dedicated classifier.

図２１は、データ分類装置４の動作の一例を示すフローチャートである。 FIG. 21 is a flowchart showing an example of the operation of the data classification device 4.

まず、データ分類装置４の分類器取得部６０Ａは、一般分類器を、外部接続部１００を介して、データ分類器作成装置３から取得し、分類器保存部７０に格納する（ステップＳ２００）。この処理は、実施の形態２の分類器取得部６０と同様である。 First, the classifier acquisition unit 60A of the data classification device 4 acquires a general classifier from the data classifier creation device 3 via the external connection unit 100 and stores it in the classifier storage unit 70 (step S200). This process is the same as that of the classifier acquisition unit 60 of the second embodiment.

次のステップＳ２１０は、実施の形態２と同様である。 The next step S210 is the same as in the second embodiment.

次に分類実行部８０Ａは、ステップＳ２００で取得した一般分類器を基に、分類を行う（ステップＳ２２０）。この処理は、実施の形態２の分類実行部８０と同様である。 Next, the classification execution unit 80A performs classification based on the general classifier acquired in Step S200 (Step S220). This process is the same as that of the classification execution unit 80 of the second embodiment.

次に、分類実行部８０Ａは、最大の票数を得たクラスの得票数が、所定値を上回ったか否かを判断する（ステップＳ２２１）。もし、所定値を上回っている場合、ステップＳ２３０に進み、そうでなければステップＳ２２２に進む。 Next, the classification execution unit 80A determines whether or not the number of votes obtained for the class that has obtained the maximum number of votes exceeds a predetermined value (step S221). If it exceeds the predetermined value, the process proceeds to step S230, and if not, the process proceeds to step S222.

次に、最大得票数が所定値を上回らなかった場合、分類器取得部６０Ａは、得票数上位２つのクラスから作成された専用分類器を、データ分類器作成装置３から取得する（ステップＳ２２２）。例えば、得票数上位２つのクラスがＣ１とＣ２である場合、Ｃ１とＣ２のみを構成要素とする学習用データ集合から作成された専用分類器を取得する。専用分類器も、一般分類器の取得処理と同様に、データ分類器作成装置３に設定された個数だけ、専用分類器を取得する。もしくは、取得する専用分類器の個数を指定して、その個数だけ専用分類器を取得するようにしても良い。 Next, when the maximum number of votes does not exceed a predetermined value, the classifier acquisition unit 60A acquires the dedicated classifier created from the two classes with the highest number of votes from the data classifier creation device 3 (step S222). . For example, when the two classes with the highest number of votes are C1 and C2, a dedicated classifier created from a learning data set having only C1 and C2 as constituent elements is acquired. The dedicated classifiers also acquire the dedicated classifiers by the number set in the data classifier creation device 3 in the same manner as the general classifier acquisition process. Alternatively, the number of dedicated classifiers to be acquired may be designated and the number of dedicated classifiers acquired.

次に、分類実行部８０Ａは、以上のようにして取得した専用分類器で、同一の入力データに対して再び分類を行う（ステップＳ２２３）。つまり、ステップＳ２２３では、分類実行部８０Ａは、分類を行う分類器が専用分類器であること以外は、ステップＳ２２０と同様の処理を行う。もし、Ｃ１とＣ２の専用分類器で分類を行う場合、Ｃ１かＣ２、いずれかの分類結果が専用分類器によって出力される。 Next, the classification execution unit 80A performs classification again on the same input data with the dedicated classifier acquired as described above (step S223). That is, in step S223, the classification execution unit 80A performs the same process as step S220, except that the classifier that performs classification is a dedicated classifier. If classification is performed by the dedicated classifiers C1 and C2, the classification result of either C1 or C2 is output by the dedicated classifier.

次に、分類結果統合部９０は、分類結果を統合して、最終的な分類結果を決定し、分類実行部８０Ａに返す（ステップＳ２３０）。この処理は、実施の形態２と同様である。 Next, the classification result integration unit 90 integrates the classification results, determines the final classification result, and returns it to the classification execution unit 80A (step S230). This process is the same as in the second embodiment.

次のステップＳ２３０は、実施の形態２と同様である。 The next step S230 is the same as in the second embodiment.

以上が、実施の形態４のデータ分類装置４の動作の流れである。 The above is the operation flow of the data classification device 4 of the fourth embodiment.

なお、ステップＳ２００において、分類器取得部６０Ａは、全ての専用分類器を取得して分類器保存部７０に格納しておき、ステップＳ２２２で、分類器保存部７０に格納された専用分類器から、決選投票に必要な分類器を読み出すようにしても良い。 In step S200, the classifier acquisition unit 60A acquires all the dedicated classifiers and stores them in the classifier storage unit 70. From the dedicated classifier stored in the classifier storage unit 70 in step S222, the classifier acquisition unit 60A acquires all the dedicated classifiers. The classifier required for the final vote may be read out.

また、得票数上位２つのクラスでの決選投票としたが、所定の得票率を超えるクラスによる決選投票など、２つ以上のクラスで決選投票を行っても良い。 Further, although the final vote in the two classes with the highest number of votes, the final vote in two or more classes, such as a final vote by a class exceeding a predetermined vote rate, may be performed.

従って、本実施の形態４のデータ分類装置４によると、上記実施の形態２のデータ分類装置２と同様の効果が得られると共に、さらに、実施の形態３のデータ分類器作成装置３によって作成された特定のクラス同士を専門に分類する専用分類器を用いることによって、分類の精度をさらに向上させることが出来る。 Therefore, according to the data classification device 4 of the fourth embodiment, the same effects as those of the data classification device 2 of the second embodiment can be obtained, and further, the data classification device 4 can be created by the data classifier creation device 3 of the third embodiment. By using a dedicated classifier that specially classifies specific classes, the classification accuracy can be further improved.

１，３データ分類器作成装置
２，４データ分類装置
１０格納部
１１事例データ格納部
１２，１２Ａ１対１データ格納部
１３，１３Ａ分類器格納部
２０，２０Ａ１対１データ作成部
３０，３０Ａ分類器作成部
４０分類器出力部
６０，６０Ａ分類器取得部
７０分類器保存部
８０，８０Ａ分類実行部
９０分類結果統合部
１００外部接続部
２００外部装置 DESCRIPTION OF SYMBOLS 1,3 Data classifier preparation apparatus 2,4 Data classification apparatus 10 Storage part 11 Case data storage part 12, 12A One-to-one data storage part 13, 13A Classifier storage part 20, 20A One-to-one data preparation part 30, 30A Classifier creation unit 40 Classifier output unit 60, 60A Classifier acquisition unit 70 Classifier storage unit 80, 80A Classification execution unit 90 Classification result integration unit 100 External connection unit 200 External device

Claims

Case data in which input data corresponds to correct data indicating a correct answer classification of the input data, and includes cases in which a plurality of correct data given by a plurality of evaluators corresponds to the same input data A case data storage for storing a set of case data;
One-to-one data creation that reads a set of case data from the case data storage unit and creates a plurality of one-to-one data sets corresponding to one correct data for each of the plurality of evaluators. And
Classifier creation for creating a plurality of classifiers by controlling to create one classifier from one set of one-to-one data while referring to the plurality of sets of one-to-one data And
A data classifier creating apparatus having

2. The data classifier according to claim 1, wherein the classifier creating unit creates the plurality of classifiers by applying two or more types of learning algorithms to the plurality of created one-to-one data sets. Creation device.

The classifier creating unit creates the plurality of classifiers by applying a learning algorithm using two or more different learning parameters to the created plurality of one-to-one data sets. The data classifier preparation apparatus of Claim 1 or Claim 2.

The data classifier according to any one of claims 1 to 3 , wherein the classifier creation unit further creates a classifier by selecting a variable constituting input data using a genetic algorithm. Device making device.

A classifier storage unit that stores a plurality of classifiers created by the data classifier creation device according to any one of claims 1 to 4,
A classification execution unit that inputs input data to each of the plurality of classifiers stored in the classifier storage unit to obtain a plurality of classification results;
A classification result integration unit that outputs a final class using the plurality of classification results ;
A data classification device.

The classifier storage unit, the classification result integration unit of the class to be output target, all types of classes and can output dedicated classifier only some types of classes, the classification result integration unit is to output target storing an output capable general classifier class,
The data classification device according to claim 5 .

The classifier storage unit further stores a correct answer rate of classification results corresponding to the plurality of classifiers,
The classification result integration unit weights the plurality of classification results according to the correct answer rate and outputs the final class;
The data classification device according to claim 5 or 6.

Case data in which input data corresponds to correct data indicating a correct answer classification of the input data, and includes cases in which a plurality of correct data given by a plurality of evaluators corresponds to the same input data Reading a set of case data from a set of case data, creating a plurality of sets of one-to-one data in which one input data corresponds to one correct data for each of the plurality of evaluators ;
Controlling to create one classifier from one one-to-one data set while referring to the plurality of one-to-one data sets created, and creating a plurality of classifiers;
A method for creating a data classifier.

Storing a plurality of classifiers created by the data classifier creation method according to claim 8;
Inputting input data to each of the plurality of classifiers stored in the classifier storage step to obtain a plurality of classification results;
Using the plurality of classification results to output a final class;
A data classification method comprising:

Case data in which input data corresponds to correct data indicating a correct answer classification of the input data, and includes cases in which a plurality of correct data given by a plurality of evaluators corresponds to the same input data Reading a set of case data from a set of case data, creating a plurality of one-to-one data sets for each of the plurality of evaluators, one input data corresponding to one correct data;
Controlling to create one classifier from one one-to-one data set while referring to the plurality of one-to-one data sets created, and creating a plurality of classifiers;
Data classifier creation program that causes a computer to execute.

Storing a plurality of classifiers created by the data classifier creation program according to claim 10;
Inputting input data to each of the plurality of classifiers stored in the classifier storage step to obtain a plurality of classification results;
Using the plurality of classification results to output a final class;
Classification program that causes a computer to execute.