JP2008204190A

JP2008204190A - Classification evaluation device

Info

Publication number: JP2008204190A
Application number: JP2007039875A
Authority: JP
Inventors: Yoshitaka Hamaguchi; 佳孝濱口
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2007-02-20
Filing date: 2007-02-20
Publication date: 2008-09-04

Abstract

PROBLEM TO BE SOLVED: To provide a classification evaluation device for providing a user with an index in selecting target classification from classification obtained by automatic classification. SOLUTION: This classification evaluation device for evaluating the result of the classification of a plurality of data on the basis of the characteristics of data is provided with: a similarity calculation part 104 for calculating the similarity of the characteristics of data with characteristics representing data belonging to the classification to which the data belong; and a classification evaluation part 106 for evaluating the classification on the basis of the similarity calculated about each data belonging to the classification. Thus, it is possible to automatically evaluate and display validity of classification in narrowing down the classification to be used for the retrieval of data from a plurality of classifications. Therefore, it is possible for a user to more quickly and easily find out his or her target classification. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、分類評価装置に係り、特に複数のデータからなるデータ群が分類された分類結果を評価する分類評価装置に関する。 The present invention relates to a classification evaluation apparatus, and more particularly to a classification evaluation apparatus that evaluates a classification result obtained by classifying a data group including a plurality of data.

従来より、データを自動的に分類するための多様な方法が考案されている。また、自動的に分類されたデータの中から目的のものを効率的に検索することができるかを示す、分類の有用度を評価して提示するための方法も考案されている。 Conventionally, various methods for automatically classifying data have been devised. In addition, a method for evaluating and presenting the usefulness of classification has been devised, which indicates whether a target object can be efficiently searched from automatically classified data.

例えば、特許文献１には、分類される文書の語句ベクトルを計算し、自動的に似たデータ同士を同じ分類とする自動分類の方法が開示されている。また、分類に対応する語句ベクトルの値が大きなものから１以上の語句をその分類を代表するキーワードとして表示する方法について記載されている。この方法によると、キーワードによって文書データベースの全体構成が把握できるようになる。さらに、キーワードを人が見て判断することで、目的にあった分類を選択してデータの絞り込みを行うことが容易にできるようになる。 For example, Patent Document 1 discloses an automatic classification method that calculates word vectors of documents to be classified and automatically sets similar data to the same classification. In addition, a method is described in which one or more words / phrases with a large word vector value corresponding to the classification are displayed as keywords representing the classification. According to this method, the entire configuration of the document database can be grasped by the keyword. Furthermore, it is possible to easily narrow down the data by selecting a category suitable for the purpose by judging the keyword by looking at the keyword.

しかしながら、特許文献１の方法の場合、以下のような２つの問題がある。１つには、分類の数が非常に多い場合、各分類のキーワードを目視で確認することが難しいということである。 However, the method of Patent Document 1 has the following two problems. One is that when the number of classifications is very large, it is difficult to visually confirm the keywords of each classification.

もう１つは、分類される文書が論文や特許公報等ではなく、Ｗｅｂページやメール等である場合、文書中に内容を表す語が少なかったり、あるいは全く含まれない場合が多く存在する。このような場合、分類を代表するキーワードが必ずしも内容を表す語にならないことがある。また、互いに共通するキーワードを含まない文書が多い場合は、どのような文書でも共通して含まれることが多い非常に一般的な語をキーワードとして同じ分類とされることがある。 The other is that when a document to be classified is not a paper or a patent bulletin, but a Web page, an e-mail, etc., there are many cases where there are few or no words in the document. In such a case, the keyword representing the classification may not necessarily be a word representing the content. In addition, when there are many documents that do not include a common keyword, a very common word that is often included in any document may be classified as a keyword.

特開平８−２６３５１４号公報JP-A-8-263514

このように、自動分類されたデータベースを用いる場合、分類として適当なものと不適当なものの質の差が大きくなるが、いずれも混在して表示されるために、効率よく目的の分類を見つけ出すことが困難になる。 In this way, when using an automatically classified database, the difference in quality between appropriate and unsuitable classifications increases, but since both are displayed together, the target classification can be found efficiently. Becomes difficult.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、自動分類により得られる分類から目的とする分類を選択する際の指標を利用者に提供することが可能な、新規かつ改良された分類評価装置を提供することにある。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide a user with an index for selecting a target classification from classifications obtained by automatic classification. It is an object of the present invention to provide a new and improved classification and evaluation apparatus capable of satisfying the requirements.

上記課題を解決するために、本発明のある観点によれば、データの特徴に基づいて複数のデータを分類した結果を評価する分類評価装置であって、各データについて、データの特徴とそのデータが属する分類に属するデータを代表する特徴との類似度を算出する類似度算出部と、分類に属する各データについて算出された類似度に基づいてその分類を評価する分類評価部と、を備える分類評価装置が提供される。 In order to solve the above-described problem, according to one aspect of the present invention, there is provided a classification evaluation apparatus for evaluating a result of classifying a plurality of data based on data characteristics. A classification comprising: a similarity calculation unit for calculating a similarity with a feature representing data belonging to the class to which the data belongs; and a classification evaluation unit for evaluating the classification based on the similarity calculated for each data belonging to the class An evaluation device is provided.

かかる構成により、複数の分類の中からデータの検索等に用いる分類をさらに絞り込むような場合において、分類の有効度を自動的に評価して表示することが可能となり、利用者はより早く簡単に目的の分類を見つけることができるようになる。 With this configuration, it is possible to automatically evaluate and display the effectiveness of the classification in a case where the classification used for data search or the like is further narrowed down from a plurality of classifications, and the user can quickly and easily The target classification can be found.

また、分類評価部はさらに、分類に属するデータの数に基づいて分類を評価するようにしてもよい。例えば、類似度算出部によって算出された類似度の値とデータ数との乗数により、分類を評価するようにしてもよい。これにより、データ数が多くデータ全体の特徴を俯瞰しやすい分類ほど、高い評価を得ることができる。 Further, the classification evaluation unit may further evaluate the classification based on the number of data belonging to the classification. For example, the classification may be evaluated based on a multiplier between the similarity value calculated by the similarity calculation unit and the number of data. As a result, a higher evaluation can be obtained for a classification that has a larger number of data and is easier to overlook the characteristics of the entire data.

また、分類評価部は、分類に属するデータの類似度の平均値に基づいて分類を評価するようにしてもよい。これにより、分類に属するデータ全体のまとまりを考慮した評価を行うことができるようになる。 Further, the classification evaluation unit may evaluate the classification based on the average value of the similarity of the data belonging to the classification. As a result, it is possible to perform evaluation in consideration of a group of all data belonging to the classification.

また、分類は、異なる特徴を有する複数のセルからなり、類似度算出部は、データの特徴とデータが属するセルの特徴との類似度を算出するようにしてもよい。これにより、例えば、複数のセルからなる自己組織化マップを用いて分類を行った場合に、分類方法の特性を評価に利用することができる。 The classification may include a plurality of cells having different features, and the similarity calculation unit may calculate the similarity between the data features and the features of the cells to which the data belongs. Thereby, for example, when classification is performed using a self-organizing map including a plurality of cells, the characteristics of the classification method can be used for evaluation.

また、分類評価部はさらに、分類に属するセルの数に基づいて分類を評価するようにしてもよい。あるいは、分類評価部は、分類に属するセルのうち１以上のデータが属するセルの数に基づいて分類を評価するようにしてもよい。これにより、分類に属するデータが分類全体に分散しているか否かによって分類を評価できるようになる。例えば、分類に属するセルのうち１以上のデータが属するセルの数の比率が多いと、データが分類全体に分散していると判断できる。 The classification evaluation unit may further evaluate the classification based on the number of cells belonging to the classification. Alternatively, the classification evaluation unit may evaluate the classification based on the number of cells to which one or more data belong among the cells belonging to the classification. As a result, the classification can be evaluated based on whether or not the data belonging to the classification is dispersed throughout the classification. For example, if the ratio of the number of cells to which one or more data belongs among the cells belonging to the classification is large, it can be determined that the data is dispersed throughout the classification.

また、分類評価部は、分類に属する全てのデータの類似度を合計した数をセルの数で割った値に基づいて分類を評価するようにしてもよい。これにより、データの類似度が高い分類ほど高い評価を得ることができ、データの数が多い分類ほど高い評価を得ることができるようになる。あるいは、データが分類全体に分散しておらず、一部に纏まっている分類ほど高い評価を得ることができるようになる。 The classification evaluation unit may evaluate the classification based on a value obtained by dividing the total number of similarities of all data belonging to the classification by the number of cells. As a result, a higher evaluation can be obtained for a classification having a higher degree of data similarity, and a higher evaluation can be obtained for a classification having a larger number of data. Alternatively, the data is not distributed over the entire classification, and a higher classification can be obtained for a classification that is partly collected.

以上説明したように本発明によれば、自動分類により得られる分類から目的とする分類を選択する際の指標を利用者に提供することが可能となる。 As described above, according to the present invention, it is possible to provide a user with an index for selecting a target classification from classifications obtained by automatic classification.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

（第１の実施形態）
まず、図１を参照して、本発明の第１の実施形態にかかる分類評価装置について説明する。図１は、本実施形態にかかる分類評価装置１００の概略構成を示すブロック図である。 (First embodiment)
First, a classification evaluation apparatus according to a first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram illustrating a schematic configuration of a classification evaluation apparatus 100 according to the present embodiment.

次に、本実施形態にかかる分類評価装置１００の詳細について説明する。本実施形態にかかる分類評価装置１００は、図１に示すように、特徴ベクトル抽出部１０１と、自動分類部１０２と、参照ベクトル取得部１０３と、類似度算出部１０４と、データ数算出部１０５と、分類評価部１０６と、表示部１０７とにより構成される。分類評価装置１００は、複数のデータが格納されたデータベース１１０を入力とし、入力されたデータデータベース１１０のデータを分類した分類結果の評価を行って利用者に提示する。以下、分類評価装置１００の各部について説明する。 Next, details of the classification evaluation apparatus 100 according to the present embodiment will be described. As shown in FIG. 1, the classification evaluation apparatus 100 according to the present embodiment includes a feature vector extraction unit 101, an automatic classification unit 102, a reference vector acquisition unit 103, a similarity calculation unit 104, and a data number calculation unit 105. And a classification evaluation unit 106 and a display unit 107. The classification evaluation apparatus 100 receives a database 110 in which a plurality of data is stored as an input, evaluates a classification result obtained by classifying the input data in the data database 110, and presents it to the user. Hereinafter, each part of the classification evaluation apparatus 100 will be described.

（特徴ベクトル抽出部１０１）
特徴ベクトル抽出部１０１は、分類対象であるデータベース１１０内のデータの特徴をベクトル化するための機能部である。特徴ベクトル抽出部１０１は、データベース１１０内のデータから１または２以上の特徴要素を抽出し、各データについて抽出された特徴要素からなるベクトルを求める。以下、各データについて求められたベクトルを特徴ベクトルと呼ぶ。 (Feature vector extraction unit 101)
The feature vector extraction unit 101 is a functional unit for vectorizing features of data in the database 110 that is a classification target. The feature vector extraction unit 101 extracts one or more feature elements from the data in the database 110, and obtains a vector composed of the feature elements extracted for each data. Hereinafter, a vector obtained for each data is referred to as a feature vector.

（自動分類部１０２）
自動分類部１０２は、データベース１１０内の各データを分類するための機能部である。自動分類部１０２は、特徴ベクトル抽出部１０１によって求められた特徴ベクトルを用いて、データベース１１０内の各データ間の類似度を算出し、類似したデータ同士を同一の分類として分類する。自動分類部１０２において用いられる分類方法には、例えば、ｋ−ｍｅａｎｓ法等の分類手法を用いることができる。 (Automatic classification unit 102)
The automatic classification unit 102 is a functional unit for classifying each data in the database 110. The automatic classification unit 102 uses the feature vector obtained by the feature vector extraction unit 101 to calculate the similarity between the data in the database 110 and classifies the similar data as the same classification. As a classification method used in the automatic classification unit 102, for example, a classification method such as a k-means method can be used.

（参照ベクトル取得部１０３）
参照ベクトル取得部１０３は、自動分類部１０２により得られる各分類について、その分類を代表する特徴ベクトルを取得するための機能部である。ここで、取得される各分類を代表する特徴ベクトルを参照ベクトルと呼ぶ。参照ベクトルは、例えば、その分類に含まれる全データの特徴ベクトルの平均を算出する等の方法によって得ることができる。 (Reference vector acquisition unit 103)
The reference vector acquisition unit 103 is a functional unit for acquiring a feature vector representing the classification of each classification obtained by the automatic classification unit 102. Here, a feature vector representing each acquired category is referred to as a reference vector. The reference vector can be obtained by, for example, a method of calculating an average of feature vectors of all data included in the classification.

（類似度算出部１０４）
類似度算出部１０４は、各分類について、含まれるデータの特徴ベクトルとその分類の参照ベクトルとの類似度を算出するための機能部である。類似度は、例えば、各データの特徴ベクトルと分類の参照ベクトルとのユークリッド距離や内積等を算出することにより求められる。算出された類似度の値は、類似度算出部１０４から分類評価部１０６に与えられる。 (Similarity calculation unit 104)
The similarity calculation unit 104 is a functional unit for calculating the similarity between the feature vector of included data and the reference vector of the classification for each classification. The similarity is obtained, for example, by calculating the Euclidean distance, the inner product, or the like between the feature vector of each data and the classification reference vector. The calculated similarity value is given from the similarity calculation unit 104 to the classification evaluation unit 106.

（データ数算出部１０５）
データ数算出部１０５は、各分類に含まれるデータの数を求めるための機能部である。算出された各分類のデータ数は、データ数算出部１０５から分類評価部１０６に与えられる。 (Data count calculation unit 105)
The data number calculation unit 105 is a functional unit for obtaining the number of data included in each classification. The calculated number of data of each classification is given from the data number calculation unit 105 to the classification evaluation unit 106.

（分類評価部１０６）
分類評価部１０６は、各分類に含まれるデータの有効度を示す評価値を算出し、分類を評価するための機能部である。評価値の算出には、類似度算出部１０４によって算出された類似度や、データ数算出部１０５によって算出された各分類のデータ数が用いられる。 (Classification evaluation unit 106)
The classification evaluation unit 106 is a functional unit for calculating an evaluation value indicating the effectiveness of data included in each classification and evaluating the classification. For the evaluation value calculation, the similarity calculated by the similarity calculation unit 104 and the number of data of each classification calculated by the data number calculation unit 105 are used.

分類評価部１０６は、例えば、各分類に含まれるデータの類似度の平均値や最大値を求めることによって評価値を算出するようにしてもよい。これにより、分類に含まれるデータの類似度が高い（データのばらつきが少ない）ほど、高い評価値を得ることができる。あるいは、分類に含まれるデータの類似度の平均値とデータ数とを乗算する（類似度を合計する）ことによって評価値を算出するようにしてもよい。これにより、データ数が多いほど高い評価値を得ることができる。 For example, the classification evaluation unit 106 may calculate the evaluation value by obtaining an average value or a maximum value of the similarity of the data included in each classification. Thereby, a higher evaluation value can be obtained as the similarity of the data included in the classification is higher (data variation is smaller). Alternatively, the evaluation value may be calculated by multiplying the average value of the degrees of similarity of the data included in the classification by the number of data (summing the degrees of similarity). Thereby, a higher evaluation value can be obtained as the number of data increases.

あるいは、データ数が多い場合は、データ数ではなくデータ数の対数を用いて評価値を算出するようにしてもよい。これにより、データ数が多い場合にはデータの類似度を重視し、データ数が少ない場合にはデータ数を重視するように評価値を算出することができるようになる。 Alternatively, when the number of data is large, the evaluation value may be calculated using the logarithm of the number of data instead of the number of data. As a result, it is possible to calculate the evaluation value such that importance is attached to the similarity of data when the number of data is large, and importance is attached to the number of data when the number of data is small.

なお、ここで示した評価値の計算方法は一例であって、データの類似度やデータ数を用いた多様な計算方法により評価値を算出することが可能である。 Note that the evaluation value calculation method shown here is merely an example, and the evaluation value can be calculated by various calculation methods using the similarity of data and the number of data.

（表示部１０７）
表示部１０７は、自動分類部１０２で得られる各分類分類評価部１０６で得られる評価値に応じて表示するための機能部である。例えば、評価値が高い順序で分類をリスト化して表示してもよく、あるいは、評価値のレベルに応じて分類を色分けして表示してもよい。また、評価値の高い分類を濃い色で表示する等して、利用者が評価値の高い分類を容易に見つけることができるように表示してもよい。 (Display unit 107)
The display unit 107 is a functional unit for displaying according to the evaluation value obtained by each classification and classification evaluation unit 106 obtained by the automatic classification unit 102. For example, the classifications may be displayed in a list in descending order of evaluation values, or the classifications may be displayed in different colors according to the evaluation value levels. In addition, the classification with a high evaluation value may be displayed in a dark color so that the user can easily find the classification with a high evaluation value.

以上、分類評価装置１００の構成について説明した。なお、分類評価装置１００の各部は、上述した各機能を実行可能なプログラムモジュールをコンピュータ等の情報処理装置にインストールしたソフトウェアで構成されてもよいし、あるいは、上述した各機能を実行可能なプロセッサ等のハードウエアで構成されてもよい。 The configuration of the classification evaluation apparatus 100 has been described above. Each unit of the classification evaluation apparatus 100 may be configured by software in which a program module that can execute each function described above is installed in an information processing apparatus such as a computer, or a processor that can execute each function described above. Or the like.

次に、図２〜図４に基づいて、本実施形態にかかる分類評価装置１００により実行される分類評価処理の一例を説明する。ここで、図２は、分類評価装置１００により実行される分類評価処理のフローチャートである。また、図３は、分類評価装置１００の自動分類部１０２によって分類されたデータの一例を模式的に示す図である。また、図４は、図３に示す分類例から算出される各分類のデータとの類似度、類似度の平均値、データ数及び評価値を示した図である。 Next, an example of the classification evaluation process executed by the classification evaluation apparatus 100 according to the present embodiment will be described based on FIGS. Here, FIG. 2 is a flowchart of the classification evaluation process executed by the classification evaluation apparatus 100. FIG. 3 is a diagram schematically illustrating an example of data classified by the automatic classification unit 102 of the classification evaluation apparatus 100. FIG. 4 is a diagram showing the similarity to the data of each classification calculated from the classification example shown in FIG. 3, the average value of the similarity, the number of data, and the evaluation value.

図２を参照して説明すると、まず、ステップＳ１２０において、特徴ベクトル抽出部１０１は、データベース１１０内のデータから１または２以上の特徴要素を抽出する。次いで、ステップＳ１２２で、抽出された特徴要素に基づき、各データの特徴ベクトルを求める。さらに、ステップＳ１２４で、自動分類部１０２は、特徴ベクトルを用いてデータベース１１０内のデータを分類する。 Referring to FIG. 2, first, in step S 120, the feature vector extraction unit 101 extracts one or more feature elements from the data in the database 110. Next, in step S122, a feature vector of each data is obtained based on the extracted feature elements. Further, in step S124, the automatic classification unit 102 classifies the data in the database 110 using the feature vector.

以上のような処理を経て、データベース１１０内のデータが例えば図３に示すように分類されたと仮定する。図３は、説明のために２次元で示しているが、一般的には多次元であってもよい。図３において、白丸は、データベース１１０のデータＤ１〜Ｄ１２の特徴ベクトルを空間中にプロットしたものを示している。また、実線の直線は、ステップＳ１２４における自動分類部１０２の動作により得られた分類の境界を表す。ここでは、ステップＳ１２４の分類処理によって、Ａ１１、Ａ１２、Ａ１３及びＡ１４の４つの分類を含む分類が得られたものとする。なお、ここでは、説明のために上記４つの分類のみを例にとって説明するが、本発明はこれに限られず、より多数の分類が得られた場合であっても適用可能である。 It is assumed that the data in the database 110 has been classified as shown in FIG. 3, for example, through the above processing. Although FIG. 3 is shown in two dimensions for the sake of explanation, it may generally be multidimensional. In FIG. 3, white circles indicate the feature vectors of the data D1 to D12 in the database 110 plotted in the space. A solid straight line represents a classification boundary obtained by the operation of the automatic classification unit 102 in step S124. Here, it is assumed that a classification including four classifications A11, A12, A13, and A14 is obtained by the classification processing in step S124. Here, for explanation, only the above four classifications will be described as an example. However, the present invention is not limited to this, and can be applied even when a larger number of classifications are obtained.

以下、図３の分類結果の例を参照しながら、図２に示す分類評価処理について説明する。ステップＳ１２６で、参照ベクトル取得部１０３は、ステップＳ１２４において得られた各分類の参照ベクトルを取得する。図３では、ここで得られる各分類の参照ベクトルを×で表す。例えば、分類Ａ１１の場合、含まれるデータはＤ１の１つのみであるため、そのデータの特徴ベクトルが参照ベクトルＣ１１となる。分類Ａ１２の場合、Ｄ２〜Ｄ５の４つのデータを含むため、それらの重心となるベクトルを算出し、図３に示す参照ベー句取るＣ１２を得る。分類Ａ１３及びＡ１４についても同様にデータの重心を算出することで参照ベクトルＣ１３及びＣ１４を得る。 Hereinafter, the classification evaluation process illustrated in FIG. 2 will be described with reference to the classification result example of FIG. In step S126, the reference vector acquisition unit 103 acquires the reference vector of each classification obtained in step S124. In FIG. 3, the reference vector of each classification obtained here is represented by x. For example, in the case of classification A11, since the included data is only one of D1, the feature vector of the data becomes the reference vector C11. In the case of the classification A12, since the four data D2 to D5 are included, a vector serving as the center of gravity is calculated to obtain C12 which takes the reference phrase shown in FIG. Similarly, for the classifications A13 and A14, reference vectors C13 and C14 are obtained by calculating the center of gravity of the data.

次いで、ステップＳ１２８で、各分類に属するデータと分類との類似度を算出する。具体的には、各データの特徴ベクトルと分類の参照ベクトルとの距離ｄを算出し、１／（１＋ｄ）の計算により求められる値を類似度とする。例えば、図３に示す分類Ａ１２を例にとると、まず、分類Ａ１２の参照ベクトルＣ１２とＤ２〜Ｄ５との距離をそれぞれ算出する。ここで、Ｃ１２とＤ２〜Ｄ５との距離がそれぞれ、１．０、０．２、１．０、１．１であったとする。次に、求められた距離の値に対して１／（１＋ｄ）の計算を行って、類似度０．５、０．８３、０．５、０．４８を得る。分類Ａ１１、Ａ１２及びＡ１４についても同様の計算を行って、各データと分類との類似度を算出する。 Next, in step S128, the similarity between the data belonging to each category and the category is calculated. Specifically, the distance d between the feature vector of each data and the classification reference vector is calculated, and the value obtained by the calculation of 1 / (1 + d) is set as the similarity. For example, taking the classification A12 shown in FIG. 3 as an example, first, the distance between the reference vector C12 of the classification A12 and D2 to D5 is calculated. Here, it is assumed that the distances between C12 and D2 to D5 are 1.0, 0.2, 1.0, and 1.1, respectively. Next, 1 / (1 + d) is calculated with respect to the obtained distance value to obtain the similarity of 0.5, 0.83, 0.5, and 0.48. Similar calculations are performed for the classifications A11, A12, and A14, and the similarity between each data and the classification is calculated.

図４に、ステップＳ１２８で計算されたＤ１〜Ｄ１２までの各データと分類Ａ１１〜Ａ１４との類似度の値と、各分類の類似度の平均値を示す。類似度の平均値から、各分類中のデータのばらつき度が分かる。即ち、平均値が高い分類ほどデータのばらつきが少なく、互いに似通ったデータからなる分類であることが分かる。図４を参照すると、分類Ａ１１〜１４の中では、分類Ａ１２は最もデータが纏まっており、分類Ａ１３は最もデータにばらつきがあるということが分かる。 FIG. 4 shows the similarity value between each of the data D1 to D12 calculated in step S128 and the categories A11 to A14, and the average value of the similarities of each category. From the average value of the similarities, the degree of variation of data in each classification can be found. That is, it can be seen that the higher the average value, the less data variation, and the classification composed of similar data. Referring to FIG. 4, it can be seen that, among the categories A11 to A14, the category A12 has the most data, and the category A13 has the most variation in data.

次いで、ステップＳ１３０で、各分類のデータ数を算出する。図３の例において、算出された分類Ａ１１〜Ａ１４のデータ数を図４に示す。図４に示すように、分類Ａ１１〜Ａ１４のデータ数は、それぞれ１、４、５及び２となる。 Next, in step S130, the number of data of each classification is calculated. In the example of FIG. 3, the calculated number of data of the classifications A11 to A14 is shown in FIG. As shown in FIG. 4, the numbers of data of the classifications A11 to A14 are 1, 4, 5, and 2, respectively.

次いで、ステップＳ１３２で、各分類の評価値を算出する。ここでは、評価値として、ステップＳ１２８で求めた各分類に属するデータとの類似度の合計値を用いる。図３の例において、算出された分類Ａ１１〜Ａ１４の評価値を図４に示す。図４に示すように、分類Ａ１１〜Ａ１４の評価値は、それぞれ１．０、２．３１、１．４６、１．０６となる。 Next, in step S132, an evaluation value for each classification is calculated. Here, the total value of the similarities with the data belonging to each classification obtained in step S128 is used as the evaluation value. In the example of FIG. 3, the calculated evaluation values of the classifications A11 to A14 are shown in FIG. As shown in FIG. 4, the evaluation values of the classifications A11 to A14 are 1.0, 2.31, 1.46, and 1.06, respectively.

ステップＳ１３４で、ステップＳ１３２で得られた評価値の順に分類を表示する。したがって、分類の表示順は、Ａ１２→Ａ１３→Ａ１４→Ａ１１となる。分類の表示は、参照ベクトルのうち大きな要素を表示する、あるいは、参照ベクトルとの距離が最も近いデータを表示する等して視認性を高めるようにしてもよい。 In step S134, the classification is displayed in the order of the evaluation values obtained in step S132. Therefore, the display order of the classification is A12 → A13 → A14 → A11. The classification may be displayed by increasing the visibility by displaying a large element of the reference vector or by displaying data that is closest to the reference vector.

以上説明したように、Ａ１２のように各データの特徴ベクトルが互いに近い、即ち、よく似通ったデータ同士で構成される分類は評価値が高くなり、Ａ１３のようにデータ数が多いが含まれるデータのばらつきが大きい分類のほうが評価値は低くなる。したがって、似通ったデータから構成される分類の方が情報として纏まっていることが期待でき、データの有効性が高いと判断できる。 As described above, the classification composed of similar data, such as A12, has a high evaluation value, and data including a large number of data such as A13. Evaluation values are lower for classifications with large variations. Therefore, it can be expected that classifications composed of similar data are gathered as information, and it can be determined that the effectiveness of the data is high.

また、Ａ１４のように、データ間の距離が互いに近くてもデータ数が２と小さい分類の場合、Ａ１３のようにデータ間のばらつきが大きくデータ数も大きい分類よりも評価値が低くなる。これにより、より多くのデータが含まれて、データベース全体の特徴を俯瞰しやすい分類を先に確認しやすくなる。 In addition, in the case of a classification where the number of data is as small as 2 even when the distance between the data is close to each other as in A14, the evaluation value is lower than in the classification where the variation between data is large and the number of data is large as in A13. Thereby, it becomes easy to confirm the classification | category which contains more data and it is easy to overlook the characteristics of the whole database first.

このように、データの有効性が高く、質が良いと推定される分類を見分けることが可能となり、全分類をマップ状で示すよりも効率的にデータの全体像を確認できる情報を提供することができる。 In this way, it is possible to discriminate classifications that are estimated to have high data validity and quality, and provide information that enables the overall picture of the data to be confirmed more efficiently than a map of all classifications. Can do.

（第２の実施形態）
次に、図５を参照して、本発明の第２の実施形態にかかる分類評価装置について説明する。図５は、本実施形態にかかる分類評価装置２００の概略構成を示すブロック図である。 (Second Embodiment)
Next, a classification evaluation apparatus according to the second embodiment of the present invention will be described with reference to FIG. FIG. 5 is a block diagram illustrating a schematic configuration of the classification evaluation apparatus 200 according to the present embodiment.

次に、本実施形態にかかる分類評価装置２００の詳細について説明する。本実施形態にかかる分類評価装置２００は、図５に示すように、特徴ベクトル抽出部２０１と、自己組織化マップ作成部２０２と、セル分類部２０３と、類似度算出部２０４と、データ数算出部２０５と、セル数算出部２０６と、分類評価部２０７と、表示部２０８とにより構成される。本実施形態にかかる分類評価装置２００は、データベース２１０内のデータを自己組織化マップにより分類を行った場合の分類を評価するもので、自己組織化マップの特徴に合わせて評価を行う。以下、分類評価装置２００の各部について説明する。 Next, details of the classification evaluation apparatus 200 according to the present embodiment will be described. As shown in FIG. 5, the classification evaluation apparatus 200 according to the present embodiment includes a feature vector extraction unit 201, a self-organizing map creation unit 202, a cell classification unit 203, a similarity calculation unit 204, and a data number calculation. A unit 205, a cell number calculation unit 206, a classification evaluation unit 207, and a display unit 208 are configured. The classification evaluation apparatus 200 according to the present embodiment evaluates classification when data in the database 210 is classified by a self-organizing map, and performs evaluation according to the characteristics of the self-organizing map. Hereinafter, each part of the classification evaluation apparatus 200 will be described.

（特徴ベクトル抽出部２０１）
特徴ベクトル抽出部２０１は、上述した第１の実施形態にかかる特徴ベクトル抽出部１０１と実質的に同一の機能を有するものである。特徴ベクトル抽出部２０１は、データベース２１０内のデータから１または２以上の特徴要素を抽出し、各データの特徴ベクトルを求める。 (Feature vector extraction unit 201)
The feature vector extraction unit 201 has substantially the same function as the feature vector extraction unit 101 according to the first embodiment described above. The feature vector extraction unit 201 extracts one or more feature elements from the data in the database 210 and obtains a feature vector of each data.

（自己組織化マップ作成部２０２）
自己組織化マップ作成部２０２は、データベース２１０の各データの特徴ベクトルを入力とし、自己組織化マップを作成するための機能部である。ここで用いられる自己組織化マップの作成方法は、一般的に用いられる手法を用いることができる。作成された自己組織化マップは、複数のセルで構成され、各セルはそれぞれ異なる特徴ベクトルを有する。各セルが有する特徴ベクトルを参照ベクトルと呼ぶ。 (Self-organizing map creation unit 202)
The self-organizing map creating unit 202 is a functional unit for creating a self-organizing map with the feature vector of each data in the database 210 as an input. As a method for creating the self-organizing map used here, a generally used method can be used. The created self-organizing map is composed of a plurality of cells, and each cell has a different feature vector. The feature vector that each cell has is called a reference vector.

（セル分類部２０３）
セル分類部２０３は、自己組織化マップの各セルを分類するための機能部である。セル分類部２０３は、各セルの参照ベクトルに基づいてセルを分類する。本実施形態においては、各セルの参照ベクトルを代表する特徴要素を代表要素とし、同一の代表要素を持つセルを同一分類として分類する。各セルの代表要素は、例えば、各セルの参照ベクトルを構成する特徴要素のうち、最も値が大きい要素をそのセルの代表要素としてもよい。 (Cell classification unit 203)
The cell classification unit 203 is a functional unit for classifying each cell of the self-organizing map. The cell classification unit 203 classifies cells based on the reference vector of each cell. In the present embodiment, a feature element that represents the reference vector of each cell is used as a representative element, and cells having the same representative element are classified as the same classification. As the representative element of each cell, for example, an element having the largest value among the characteristic elements constituting the reference vector of each cell may be used as the representative element of the cell.

（類似度算出部２０４）
類似度算出部２０４は、データベース２１０の各データと自己組織化マップの各セルとの類似度を算出し、データをいずれかのセルに分類するための機能部である。データとセルとの類似度は、例えば、各データの特徴ベクトルと、各セルの参照ベクトルとのユークリッド距離や内積等を算出することにより求められる。類似度算出部２０４は、さらに、そのデータとの類似度が最も高いセル、即ち、そのデータと最も類似しているセルを検索し、そのセルにデータを分類する。さらに、分類されたセルとの類似度（類似度の最大値）をそのデータと分類との類似度とし、分類評価部２０７に与える。 (Similarity calculation unit 204)
The similarity calculation unit 204 is a functional unit for calculating the similarity between each data in the database 210 and each cell of the self-organizing map and classifying the data into any cell. The similarity between the data and the cell is obtained, for example, by calculating the Euclidean distance, the inner product, etc. between the feature vector of each data and the reference vector of each cell. Further, the similarity calculation unit 204 searches for a cell having the highest similarity with the data, that is, a cell most similar to the data, and classifies the data into the cell. Further, the similarity (maximum value of similarity) with the classified cell is set as the similarity between the data and the classification, and is given to the classification evaluation unit 207.

（データ数算出部２０５）
データ数算出部２０５は、各分類に含まれるデータの数を算出するための機能部である。データ数算出部２０５は、セル分類部において同一分類に分類されたセルに属するデータ数を合計することによりデータ数を算出する。算出された各分類のデータ数は、データ数算出部２０５から分類評価部２０７に与えられる。 (Data number calculation unit 205)
The data number calculation unit 205 is a functional unit for calculating the number of data included in each classification. The data number calculation unit 205 calculates the number of data by summing the number of data belonging to the cells classified into the same classification in the cell classification unit. The calculated number of data of each classification is given from the data number calculation unit 205 to the classification evaluation unit 207.

（セル数算出部２０６）
セル数算出部２０６は、各分類に含まれるセルの数を算出するための機能部である。セル数算出部２０６は、分類に含まれる全セルの数の他に、類似度算出部２０４によって１以上のデータが分類されたセルのみの数を算出するようにしてもよい。算出された各分類のセル数（またはデータが属するセル数）は、セル数算出部２０６から分類評価部２０７に与えられる。 (Cell number calculation unit 206)
The cell number calculation unit 206 is a functional unit for calculating the number of cells included in each classification. The cell number calculation unit 206 may calculate the number of only cells into which one or more data are classified by the similarity calculation unit 204 in addition to the number of all cells included in the classification. The calculated number of cells of each classification (or the number of cells to which the data belongs) is given from the cell number calculation unit 206 to the classification evaluation unit 207.

（分類評価部２０７）
分類評価部２０７は、各分類に含まれるデータの有効度を示す評価値を算出して分類を評価する。算出には類似度算出部２０４で得られた各データの分類との類似度や、データ数算出部２０５によって算出されたデータ数、セル数算出部２０６によって算出されたセル数等が用いられる。 (Classification evaluation unit 207)
The classification evaluation unit 207 calculates an evaluation value indicating the effectiveness of data included in each classification and evaluates the classification. For the calculation, the similarity with each data classification obtained by the similarity calculation unit 204, the number of data calculated by the data number calculation unit 205, the number of cells calculated by the cell number calculation unit 206, and the like are used.

分類評価部２０７は、上述した第１の実施形態にかかる分類評価部１０６と同様に、各分類に含まれるデータの類似度の平均値や最大値、類似度の合計、類似度とデータ数との乗数を求めることによって評価値を算出するようにしてもよい。また、データ数の代わりにデータ数の対数を用いてもよい。 Similar to the classification evaluation unit 106 according to the first embodiment described above, the classification evaluation unit 207 includes an average value and a maximum value of similarity of data included in each classification, a sum of similarities, a similarity and the number of data. The evaluation value may be calculated by obtaining a multiplier of. Further, a logarithm of the number of data may be used instead of the number of data.

また、第１の実施形態にかかる分類評価部１０６と同様の方法により類似度及びデータ数から求めた評価値を、分類に属するセル数で割ることにより評価値を算出してもよい。あるいは、分類に含まれる全てのセル数の代わりに、１以上のデータが属するセル数を用いてもよい。これにより、分類に含まれるセル、あるいは、１以上のデータが属するセルの数が少ないほど高い評価値が得られる。すなわち、分類内でのデータのばらつきが少ないと判断することができる。 The evaluation value may be calculated by dividing the evaluation value obtained from the similarity and the number of data by the same method as the classification evaluation unit 106 according to the first embodiment by the number of cells belonging to the classification. Alternatively, instead of the total number of cells included in the classification, the number of cells to which one or more data belongs may be used. As a result, a higher evaluation value is obtained as the number of cells included in the classification or the number of cells to which one or more data belongs is smaller. That is, it can be determined that there is little variation in data within the classification.

（表示部２０８）
表示部２０８は、上述した第１の実施形態にかかる表示部１０７と実質的に同一の機能を有するものである。 (Display unit 208)
The display unit 208 has substantially the same function as the display unit 107 according to the first embodiment described above.

以上、分類評価装置２００の構成について説明した。なお、分類評価装置２００の各部は、上述した各機能を実行可能なプログラムモジュールをコンピュータ等の情報処理装置にインストールしたソフトウェアで構成されてもよいし、あるいは、上述した各機能を実行可能なプロセッサ等のハードウエアで構成されてもよい。 The configuration of the classification evaluation apparatus 200 has been described above. Each unit of the classification evaluation apparatus 200 may be configured by software in which a program module that can execute each function described above is installed in an information processing apparatus such as a computer, or a processor that can execute each function described above. Or the like.

次に、図６及び図７に基づいて、本実施形態にかかる分類評価装置２００により実行される分類評価処理の一例を説明する。ここで、図６は、分類評価装置２００により実行される分類評価処理のフローチャートである。また、図７は、分類評価装置２００の自己組織化マップ作成部２０２によって作成された自己組織化マップの例を示す説明図である。また、図８は、図７に示す自己組織化マップの例から算出される各分類のデータ数、セル数及び評価値を示す説明図である。 Next, based on FIG.6 and FIG.7, an example of the classification evaluation process performed by the classification evaluation apparatus 200 concerning this embodiment is demonstrated. Here, FIG. 6 is a flowchart of the classification evaluation process executed by the classification evaluation apparatus 200. FIG. 7 is an explanatory diagram showing an example of a self-organizing map created by the self-organizing map creating unit 202 of the classification evaluation apparatus 200. FIG. 8 is an explanatory diagram showing the number of data, the number of cells, and the evaluation value of each classification calculated from the example of the self-organizing map shown in FIG.

図６を参照して説明すると、まず、ステップＳ２２０において、特徴ベクトル抽出部２０１は、データベース２１０内のデータから１または２以上の特徴要素を抽出する。次いで、ステップＳ２２２で、抽出された特徴要素に基づき、各データの特徴ベクトルを求める。 Referring to FIG. 6, first, in step S 220, the feature vector extraction unit 201 extracts one or more feature elements from the data in the database 210. Next, in step S222, a feature vector of each data is obtained based on the extracted feature elements.

次いで、ステップＳ２２４で、自己組織化マップ作成部２０２は、求められた特徴ベクトルに基づいて図７に示すようなマップを作成する。図７に示すマップは６×６の３６個セルＡ１〜Ｆ６からなり、各セルはそれぞれ、異なる特徴ベクトルを有する。セルが有する特徴ベクトルを参照ベクトルと呼ぶ。 Next, in step S224, the self-organizing map creating unit 202 creates a map as shown in FIG. 7 based on the obtained feature vector. The map shown in FIG. 7 is composed of 6 × 6 36 cells A1 to F6, and each cell has a different feature vector. The feature vector that the cell has is called a reference vector.

次いで、ステップＳ２２６で、セル分類部２０３は、各セルの参照ベクトルに含まれる特徴要素の中で最も値が大きい要素を代表要素として取得する。例えば、図７の例では、各セルは、Ｘ１〜Ｘ７までの特徴要素を含む７次元以上の特徴ベクトル（参照ベクトルという）を有するものとし、Ｘ１〜Ｘ７を含む特徴要素の中で最も値が大きい要素を取得する。図７では、ステップＳ２２６で取得された各セルの代表要素を各セルの上段に示している。 Next, in step S226, the cell classification unit 203 acquires, as a representative element, an element having the largest value among the feature elements included in the reference vector of each cell. For example, in the example of FIG. 7, each cell has a 7-dimensional or higher feature vector (referred to as a reference vector) including feature elements X1 to X7, and the value is the highest among the feature elements including X1 to X7. Get the big element. In FIG. 7, the representative element of each cell acquired in step S226 is shown in the upper stage of each cell.

次いで、ステップＳ２２８で、セル分類部２０３は、代表要素が同一であるセルを同一の分類としてセルを分類する。図７では、ステップＳ２２８の処理によって分類されたセルの分類の境界を太線で示している。したがって、図７の太線によって区切られたセルのまとまりが１つの分類に相当し、分類に含まれるセルに分類されるデータが、その分類に属するデータとなる。 Next, in step S228, the cell classification unit 203 classifies the cells with the same representative element as the same classification element. In FIG. 7, the boundaries of the classification of the cells classified by the process of step S228 are indicated by bold lines. Therefore, a group of cells delimited by the thick lines in FIG. 7 corresponds to one classification, and data classified into cells included in the classification is data belonging to the classification.

一般的には、自己組織化マップによる可視化はここまでであり、分類の数が多くその質にばらつきが大きい場合は適当な分類を選択することが困難である。本実施形態においては、以下のステップの処理を経て、適当な分類を容易に選択するための指標を利用者に提示することができる。 In general, the visualization by the self-organizing map is up to this point, and it is difficult to select an appropriate classification when the number of classifications is large and the quality varies greatly. In the present embodiment, an index for easily selecting an appropriate classification can be presented to the user through the processing of the following steps.

次いで、ステップＳ２３０で、類似度算出部２０４において各データとセルＡ１〜Ｆ６の各々との類似度を算出し、最も類似度の高いセルにデータを分類する。類似度算出部２０４は、各データについてデータの特徴ベクトルと全てのセルの参照ベクトルとの距離等を算出し、類似度を算出する。さらに、類似度が最も高いセルにそのデータを分類する。このようにして分類された結果、各セルに属するデータの数を図７の各セルの下段に示している。 Next, in step S230, the similarity calculation unit 204 calculates the similarity between each data and each of the cells A1 to F6, and classifies the data into the cell having the highest similarity. The similarity calculation unit 204 calculates the distance between the feature vector of the data and the reference vectors of all the cells for each data, and calculates the similarity. Further, the data is classified into the cell having the highest similarity. As a result of such classification, the number of data belonging to each cell is shown in the lower part of each cell in FIG.

次いで、ステップＳ２３２で、データ数算出部２０５において、各分類に属するデータ数を算出する。これは、ステップＳ２３０による分類の結果、各分類に属するセルに分類されたデータの数を合計することにより算出される。例えば、図７の例において、代表要素がＸ２である分類の場合、分類に属するセルは、Ｄ１、Ｄ２、Ｅ１、Ｅ２、Ｅ３、Ｆ１、Ｆ２及びＦ３であるので、それらのセルに属するデータ数（図７の各セルの下段の値）を合計して、０＋０＋２＋０＋０＋６＋３＋０＝１１を得る。他の分類についても同様の計算により、データ数を算出する。図８に、算出された各分類のデータ数を示す。 Next, in step S232, the data number calculation unit 205 calculates the number of data belonging to each classification. This is calculated by summing up the number of data classified into the cells belonging to each classification as a result of the classification in step S230. For example, in the example of FIG. 7, in the case of the classification whose representative element is X2, the cells belonging to the classification are D1, D2, E1, E2, E3, F1, F2, and F3, so the number of data belonging to those cells (The lower value of each cell in FIG. 7) is summed to obtain 0 + 0 + 2 + 0 + 0 + 6 + 3 + 0 = 11. For the other classifications, the number of data is calculated by the same calculation. FIG. 8 shows the calculated number of data for each classification.

次いで、ステップＳ２３４で、セル数算出部２０６において、各分類に属するセルの数を算出する。例えば、図７の例において、代表要素がＸ２である分類の場合には、分類に属するセルは、Ｄ１、Ｄ２、Ｅ１、Ｅ２、Ｅ３、Ｆ１、Ｆ２及びＦ３の８個である。また、このときセルに属するデータ数が０であるものについては除外してセル数を算出するようにしてもよい。その場合、Ｘ２の分類の場合には、セル数は、Ｅ１、Ｆ１及びＦ２の３個となる。他の分類についても同様の計算により、セル数を算出する。図８に、算出された各分類のデータ数を示す。 Next, in step S234, the cell number calculation unit 206 calculates the number of cells belonging to each classification. For example, in the example of FIG. 7, in the case of a classification whose representative element is X2, there are eight cells belonging to the classification, D1, D2, E1, E2, E3, F1, F2, and F3. At this time, the number of cells may be calculated by excluding those in which the number of data belonging to the cell is zero. In that case, in the case of classification of X2, the number of cells is three, E1, F1, and F2. For other classifications, the number of cells is calculated by the same calculation. FIG. 8 shows the calculated number of data for each classification.

次いで、ステップＳ２３６で、分類評価部２０７において、各分類の評価値を算出する。例えば、代表要素がＸ２である分類の類似度の平均値が０．８であったとすると、この分類についての評価値は、（類似度の平均値）×（データ数）÷（１以上のデータが属するセル数）＝０．８×１１÷３＝２．９３と算出される。 Next, in step S236, the classification evaluation unit 207 calculates an evaluation value for each classification. For example, if the average value of the similarity of the classification whose representative element is X2 is 0.8, the evaluation value for this classification is (average value of similarity) × (number of data) ÷ (data of 1 or more Cell number) = 0.8 × 11 ÷ 3 = 2.93.

比較のために、図７の例で、全ての分類の類似度の平均値が０．８であった場合に、上記計算式によって算出される評価値を図８に示している。例えば、Ｘ２及びＸ４の分類は、データ数が最も多い１１個でありデータ全体を代表する分類であるように見えるが、評価値はＸ２の方が高い値になっている。これは、Ｘ４の分類では、データは多くのセルに分散しており、必ずしも同じ特徴で集まったデータではなく、分類の内部でさらに細かいカテゴリに分割することが可能なデータであることを表している。一方、Ｘ２の分類は、データが分類内の一部のセルに集中しており、データ同士の関連性が高く、有効度の高い分類であることが分かる。 For comparison, in the example of FIG. 7, when the average value of the similarities of all the classifications is 0.8, the evaluation value calculated by the above formula is shown in FIG. For example, the classification of X2 and X4 is 11 that has the largest number of data and seems to be a classification that represents the entire data, but the evaluation value of X2 is higher. This means that in X4 classification, the data is distributed in many cells and is not necessarily data gathered with the same characteristics, but data that can be divided into finer categories within the classification. Yes. On the other hand, in the classification of X2, it can be seen that the data is concentrated in a part of the cells in the classification, and the relevance between the data is high and the effectiveness is high.

最後にステップＳ２３８で評価値に基づいて分類を表示する。例えば、評価値が高い順、即ち、Ｘ７→Ｘ２→Ｘ３→Ｘ５→Ｘ６→Ｘ４→Ｘ１の順に表示する。これにより、Ｘ７が最も強調された表示となり、利用者は、有効度の高い分類を容易に確認することができるようになる。 Finally, in step S238, the classification is displayed based on the evaluation value. For example, the evaluation values are displayed in descending order, that is, in the order of X7 → X2 → X3 → X5 → X6 → X4 → X1. As a result, X7 is displayed with the highest emphasis, and the user can easily confirm the classification with high effectiveness.

以上説明したように、データを複数の分類に分類し、その中からデータの検索等に用いる分類をさらに絞り込むような場合において、分類を自動的に評価して表示することによって利用者がより早く簡単に目的の分類を見つけることができるようになる。 As described above, when data is classified into a plurality of classifications, and the classification used for data retrieval or the like is further narrowed down from among the classifications, the user can be quickly evaluated by automatically evaluating and displaying the classifications. You can easily find the target classification.

以上、添付図面を参照しながら本発明の好適な実施形態について説明したが、本発明は係る例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 As mentioned above, although preferred embodiment of this invention was described referring an accompanying drawing, it cannot be overemphasized that this invention is not limited to the example which concerns. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

本発明の第１の実施形態にかかる分類評価装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the classification | category evaluation apparatus concerning the 1st Embodiment of this invention. 同実施形態にかかる分類評価装置により実行される分類評価処理のフローチャートである。It is a flowchart of the classification evaluation process performed by the classification evaluation apparatus concerning the embodiment. 同実施形態にかかる自動分類部によって分類されたデータの一例を示す説明図である。It is explanatory drawing which shows an example of the data classified by the automatic classification | category part concerning the embodiment. 図３に示す分類例から算出される各分類のデータとの類似度、類似度の平均値、データ数及び評価値を示す説明図である。It is explanatory drawing which shows the similarity with the data of each classification | category calculated from the classification example shown in FIG. 3, the average value of similarity, the number of data, and an evaluation value. 本発明の第２の実施形態にかかる分類評価装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the classification | category evaluation apparatus concerning the 2nd Embodiment of this invention. 同実施形態にかかる分類評価装置により実行される分類評価処理のフローチャートである。It is a flowchart of the classification evaluation process performed by the classification evaluation apparatus concerning the embodiment. 同実施形態にかかる自己組織化マップ作成部によって作成された自己組織化マップの例を示す説明図である。It is explanatory drawing which shows the example of the self-organization map created by the self-organization map creation part concerning the embodiment. 図７に示す自己組織化マップの例から算出される各分類のデータ数、セル数及び評価値を示す説明図である。It is explanatory drawing which shows the data number of each classification | category calculated from the example of the self-organization map shown in FIG. 7, the number of cells, and an evaluation value.

Explanation of symbols

１００、２００分類評価装置
１０１特徴ベクトル抽出部
１０２自動分類部
１０３参照ベクトル取得部
１０４類似度算出部
１０５データ数算出部
１０６分類評価部
１０７表示部
１１０、２１０データベース
２０１特徴ベクトル抽出部
２０２自己組織化マップ作成部
２０３セル分類部
２０４類似度算出部
２０５データ数算出部
２０６セル数算出部
２０７分類評価部
２０８表示部 100, 200 Classification evaluation apparatus 101 Feature vector extraction unit 102 Automatic classification unit 103 Reference vector acquisition unit 104 Similarity calculation unit 105 Data number calculation unit 106 Classification evaluation unit 107 Display unit 110, 210 Database 201 Feature vector extraction unit 202 Self-organization Map creation unit 203 Cell classification unit 204 Similarity calculation unit 205 Data number calculation unit 206 Cell number calculation unit 207 Classification evaluation unit 208 Display unit

Claims

A classification evaluation apparatus for evaluating a result of classifying a plurality of data based on data characteristics,
For each data, a similarity calculation unit that calculates the similarity between the characteristics of the data and the characteristics representing the data belonging to the classification to which the data belongs;
A classification evaluation unit that evaluates the classification based on the similarity calculated for each data belonging to the classification;
A classification evaluation apparatus comprising:

The classification evaluation apparatus according to claim 1, wherein the classification evaluation unit further evaluates the classification based on the number of the data belonging to the classification.

The classification evaluation apparatus according to claim 1, wherein the classification evaluation unit evaluates the classification based on an average value of the similarities of the data belonging to the classification.

The classification is composed of a plurality of cells having different characteristics,
The classification evaluation apparatus according to claim 1, wherein the similarity calculation unit calculates a similarity between a feature of the data and a feature of a cell to which the data belongs.

The classification evaluation apparatus according to claim 4, wherein the classification evaluation unit further evaluates the classification based on the number of the cells belonging to the classification.

6. The classification evaluation unit according to claim 4, wherein the classification evaluation unit evaluates the classification based on the number of the cells to which one or more of the data belong among the cells belonging to the classification. Classification evaluation device.

The classification evaluation unit evaluates the classification based on a value obtained by dividing the total number of the similarities of all the data belonging to the classification by the number of the cells. Classification evaluation apparatus in any one of.