JP2002329188A

JP2002329188A - Data analyzer

Info

Publication number: JP2002329188A
Application number: JP2001133662A
Authority: JP
Inventors: Hirotsugu Kashimura; 洋次鹿志村; Hitoshi Ikeda; 仁池田; Sukeji Kato; 典司加藤
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2001-04-27
Filing date: 2001-04-27
Publication date: 2002-11-15

Abstract

PROBLEM TO BE SOLVED: To provide a data analyzer capable of forming clusters so as to attain practical classification without requiring manual operation even when the number of clusters is not previously known or there is no previous knowledge of data to be classified and allowed to be utilized for classification processing. SOLUTION: The data analyzer generates a degradation map by degrading the dimensions of a weighting vector of a lattice space map formed by SOM learning. Then the analyzer sets up a classification boundary in the lattice space map based on the degradation map and classifies the data to be classified in accordance with the classification boundary.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、画像や味覚などの
データを分類認識するデータ分析装置に係り、特に分類
手順の改良に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data analyzer for classifying and recognizing data such as images and tastes, and more particularly to an improvement in a classification procedure.

【０００２】[0002]

【従来の技術】近年、人間の脳内の情報処理を模倣した
新しい情報処理技術が発展してきている。この人間の脳
を模倣するシステムとしては、ニューラルネットワーク
や階層型モジュールネットワークが知られており、多く
は視覚情報処理、すなわち画像認識などの分野に関連し
て研究が行われている。この研究のうち、網膜に相当す
る画像データの取得部から入力される一連の画像部分を
その物理的、画像上の特徴に基づいて類否判断する処理
は、クラスタ分類などと呼ばれている。2. Description of the Related Art In recent years, a new information processing technology that imitates information processing in the human brain has been developed. Neural networks and hierarchical module networks are known as systems that imitate the human brain, and research is being conducted in many fields in fields such as visual information processing, that is, image recognition. In this research, the process of judging the similarity of a series of image parts input from the image data acquisition unit corresponding to the retina based on its physical and image characteristics is called cluster classification or the like.

【０００３】脳の複雑系に対する情報処理(オブジェク
ト認識、推論、学習等)を司る大脳新皮質は１０００か
ら２０００個の神経細胞からなり、特に視覚情報の処理
において特徴的なことは、網膜からの情報を受け取る視
覚野に、階層的な構造を有するコラムが配置されている
ことである。このコラムは、脳内の視覚情報処理の基本
単位と考えられており、各コラムは特定の視覚パターン
に反応するようになっていることが知られている。さら
に、類似パターンに反応するコラム同士が近接配置され
る。すなわち、大脳新皮質は、類似の入力刺激に対する
選択性を備えるのであり、選択性を実現する大脳新皮質
の特定の領域を「領野」と呼んでいる。[0003] The cerebral neocortex, which controls information processing (object recognition, inference, learning, etc.) for the complex system of the brain, comprises 1000 to 2000 neurons. A column having a hierarchical structure is arranged in the visual cortex receiving information. This column is considered to be the basic unit of visual information processing in the brain, and it is known that each column responds to a specific visual pattern. Further, columns that react to the similar pattern are arranged close to each other. In other words, the neocortex has selectivity for similar input stimuli, and a specific region of the neocortex that realizes selectivity is called a "region".

【０００４】具体的に、視覚系の情報処理を担当する領
野である視覚野での情報伝達経路は、時空間的な解析を
司るＶ１（第一次視覚野；primary visual cortex）、
Ｖ２（第二次視覚野）、Ｖ３（第三次視覚野）及びＭＴ
（Middle Temporal）を経由する背側経路（dorsal path
way）と、形状解析を行うＶ１、Ｖ２、Ｖ４（第四次視
覚野）、ＰＩＴ（Posterior Inferior Temporal）、及
びＡＩＴ（Anterior Inferior Temporal）を経由する腹
側経路（ventral pathway）とに大別でき、視覚情報は
それぞれの経路を通じて頭頂連合野へ導かれ、運動系と
統合されることが解剖学的研究から知られている。より
具体的に、腹側経路の流れにおいては、網膜や、視覚野
のＶ１、Ｖ２といった初期視覚は、受容野（コラムが受
け持つ視野内の領域）の決定、色や線分解析といった機
能を備え、その情報を上位階層へ伝達する。この初期視
覚の機能は、いわば画像処理における前処理に相当す
る。[0004] More specifically, information transmission paths in the visual cortex, which is a territory responsible for information processing of the visual system, include V1 (primary visual cortex), which controls spatiotemporal analysis,
V2 (second visual cortex), V3 (tertiary visual cortex) and MT
(Dorsal path) via (Middle Temporal)
way) and V1, V2, V4 (fourth visual cortex) for performing shape analysis, PIT (Posterior Inferior Temporal), and ventral pathway through AIT (Anterior Inferior Temporal). It is known from anatomical studies that visual information is guided to the parietal association area through each path and integrated with the motor system. More specifically, in the flow of the ventral path, the initial vision such as the retina and the V1 and V2 of the visual cortex has functions such as determination of a receptive field (an area in a visual field covered by a column), color and line segment analysis. , And transmits that information to the upper layer. This function of initial vision corresponds to so-called pre-processing in image processing.

【０００５】これらの低次の領野を経た入力情報はＶ４
と呼ばれる領野に導かれ、各コラムが担当する受容野に
基づき、基本的なパターン（テキスチャー）の組合せ情
報へ変換される。この組み合わせ情報はＶ４よりも広い
受容野を備えたＰＩＴにおいて、部分画像としてその特
徴分析がなされ、更に広い受容野を持つＡＩＴで物体と
して解析・認識される。[0005] Input information passed through these low-order fields is V4
, And are converted into combination information of basic patterns (textures) based on the receptive field assigned to each column. This combination information is subjected to feature analysis as a partial image in a PIT having a receptive field wider than V4, and analyzed and recognized as an object by an AIT having a wider receptive field.

【０００６】すなわち、コンピュータシステムを利用し
て脳を模倣した情報処理を実現する過程では、これら時
空間的解析を行う第１システムと、形状解析のための第
２システムとを少なくとも用いる必要がある。従来、こ
れらの目的のために、脳を模倣したデータ解析のための
装置として、次の３機能を実現する必要がある。That is, in the process of realizing information processing imitating the brain using a computer system, it is necessary to use at least a first system for performing spatio-temporal analysis and a second system for shape analysis. . Conventionally, for these purposes, it is necessary to realize the following three functions as an apparatus for data analysis imitating the brain.

【０００７】１．基本単位となるコラム機能モデルの実
現２．コラムを基本単位とする階層的な領野モデルの実現３．コラム機能モデルへの学習データ及び認識データの
抽出[0007] 1. 1. Realization of a column function model as a basic unit 2. Realization of a hierarchical territory model using columns as basic units Extraction of learning data and recognition data to column function model

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、これら
の３機能をそれぞれ実現するための研究の過程では、所
定の画像データ（顔のデータなど）に認識の対象を限っ
て調整すればある程度の機能実現が可能であるものの、
対象物を限らずに各機能を実現しようとすると、各機能
のモデル化等が容易でない。However, in the course of research for realizing each of these three functions, some functions can be realized by adjusting the recognition target to predetermined image data (such as face data). Although it is possible,
When trying to realize each function without limiting the object, it is not easy to model each function.

【０００９】［関連技術］そこで、上記機能と等価な機
能を提供するための簡便な方法として、入力画像データ
を格子空間マップ上で分類整理するものがある。この分
類整理のためには、例えば自己組織化特徴マッピング
（以下、ＳＯＭと略す）（T. Kohonen. Self-organizin
g formation of topologically correct feature maps.
BiologicalCybernetics, 1982）を利用している。[Related Art] Therefore, as a simple method for providing a function equivalent to the above function, there is a method of classifying and organizing input image data on a grid space map. For this classification and organization, for example, self-organizing feature mapping (hereinafter abbreviated as SOM) (T. Kohonen. Self-organizin
g formation of topologically correct feature maps.
BiologicalCybernetics, 1982).

【００１０】すなわち、処理対象画像データから認識対
象ごとの部分画像データ（顔の画像データであれば、
目、鼻、口のそれぞれに分割したデータ）を生成し、各
部分画像データごとに格子空間マップの定義を変更す
る。このＳＯＭを利用したデータ処理装置では、部分画
像データに基づいて複数の特徴量を演算し、この複数の
特徴量からなる特徴量セットと、格子空間上に配列され
た格子との重みベクトルを格子空間マップとして定義す
る。ここでは、特徴量セットに含まれる特徴量の各々に
ついて重みが定義され、この各成分の重みをまとめたも
のを重みベクトルと称している。That is, from image data to be processed, partial image data for each recognition object (if it is face image data,
Then, the definition of the grid space map is changed for each of the partial image data. In a data processing device using this SOM, a plurality of feature values are calculated based on the partial image data, and a weight vector between a feature value set including the plurality of feature values and a grid arranged in a grid space is stored in a grid. Define as a spatial map. Here, a weight is defined for each of the feature amounts included in the feature amount set, and the sum of the weights of these components is referred to as a weight vector.

【００１１】当初、重みベクトルは次の方法で初期化さ
れる。すなわち、上記Kohonenの文献に示されるよう
に、学習の対象となる入力ベクトル群（ここでの特徴量
セットに相当する）の中から一つ、その学習ステップで
参照する入力ベクトルＩをランダムに選び出し、各格子
の重みベクトルの初期化を行う。また、同じく、Kohone
nによれば、各重みベクトルに乱数で初期値を設定する
こととしても構わない。Initially, the weight vector is initialized in the following manner. That is, as shown in the above-mentioned Kohonen document, one input vector I to be referred to in the learning step is randomly selected from one input vector group to be learned (corresponding to a feature amount set here). , The weight vector of each lattice is initialized. Also, Kohone
According to n, an initial value may be set to each weight vector by a random number.

【００１２】次に、重みベクトルの学習を行う。この学
習課程においては、学習用の特徴量セットが生成され、
当該学習用特徴量セットと格子空間上の各格子の重みベ
クトルとの所定測度（例えばユークリッド測度）が演算
される。そして各格子のうち、関係が最大（測度が最
小）となる格子を見いだす。そして格子空間上、その格
子に対し近傍に存在する各格子について、学習用特徴量
セットとの測度が小さくなるように、その重みベクトル
を補正する。このような重みベクトルの補正を行いつつ
学習を繰り返すことで、互いに類似する特徴量からなる
特徴量セットに対し、最小測度を有する格子が特定の領
域に集中するようになり、画像データの分類に適用可能
な状態となる。Next, learning of a weight vector is performed. In this learning process, a feature amount set for learning is generated,
A predetermined measure (for example, a Euclidean measure) between the learning feature amount set and the weight vector of each lattice in the lattice space is calculated. Then, among the grids, a grid having a maximum relationship (minimum measure) is found. Then, the weight vector is corrected for each of the lattices existing in the neighborhood of the lattice in the lattice space so that the measure with the learning feature amount set becomes small. By repeating the learning while performing the correction of the weight vector, the lattice having the minimum measure is concentrated on a specific region for the feature amount set including the feature amounts similar to each other, and the classification of the image data is performed. Applicable state.

【００１３】しかしながら、ここで当該学習後の格子空
間マップを利用してデータ分類を行うときには、分類の
基準となる境界線を上記格子空間上に形成し、分類対象
として与えられたデータについての特徴量セットに対し
て最小測度を有する格子が、どの境界線内に属するか
（この境界線で区切られた格子空間上の領域を以下、ク
ラスタと呼ぶ）に基づき、当該データを分類することが
適切である。すなわち、クラスタの境界を決定する方法
が求められる。However, when classifying data using the grid space map after the learning, a boundary line serving as a reference for classification is formed in the grid space, and the characteristic of data given as a classification target is obtained. It is appropriate to classify the data based on which boundary line the grid having the minimum measure for the quantity set belongs to (the area on the grid space separated by this boundary line is hereinafter referred to as a cluster). It is. That is, a method for determining the boundary of a cluster is required.

【００１４】そこでクラスタ境界を決定するために、各
格子についての重みベクトルの各成分を参照する方法が
考えられている。具体的に、顔の画像データであれば、
特徴量としてエネルギー、エントロピー、相関などの量
が定義されており、例えばエネルギーに対応する重み成
分を参照して、それが事前に定められた所定のしきい値
以下の格子からなるグループと、しきい値より大となる
格子からなるグループとに２分する。そして、しきい値
より大となる格子からなるグループについて、エントロ
ピーに対応する重み成分を参照して、同様の処理を繰り
返すことで、格子空間上にクラスタを形成するのであ
る。In order to determine a cluster boundary, a method has been considered in which each component of a weight vector for each lattice is referred to. Specifically, if it is face image data,
As features, amounts of energy, entropy, correlation, etc. are defined.For example, by referring to a weight component corresponding to energy, a group consisting of a grid having a predetermined threshold or less is defined as a group. It is divided into two groups: a group of lattices larger than the threshold. Then, for a group of lattices larger than the threshold value, clusters are formed on the lattice space by repeating the same processing with reference to the weight component corresponding to the entropy.

【００１５】この方法によると、分類対象となる画像デ
ータに対してどのような特徴量を使用するかを予め検討
する必要がある。また、複数の標本データからクラスタ
を形成する一般的な手法として、最尤度推定法、K-mean
s法、LBG（Linde-Buzo-Gray）法、MDS（Multi-Dimensio
nal Scaling）法などが知られている。しかし、これら
の方法はクラスタの数を当初から予定したり、適切なし
きい値設定を調整するなど人為的操作が不可欠であっ
た。According to this method, it is necessary to consider in advance what kind of feature is to be used for the image data to be classified. In addition, as a general method of forming a cluster from a plurality of sample data, maximum likelihood estimation, K-mean
s method, LBG (Linde-Buzo-Gray) method, MDS (Multi-Dimensio
nal Scaling) method is known. However, these methods require an artificial operation such as planning the number of clusters from the beginning and adjusting an appropriate threshold value setting.

【００１６】また解析対象のデータから得られる特徴量
セットの要素数が比較的小さく、学習後のマップ解析が
容易であったので、上記のような単純なクラスタ形成法
を適用できたのであるが、分類対象を広げて特徴量セッ
トの要素数を大きくすると単に上記のクラスタ形成法を
適用するだけでは、精度の高い境界決定ができないこと
が知られている。Further, since the number of elements of the feature quantity set obtained from the data to be analyzed is relatively small and the map analysis after learning is easy, the above simple cluster forming method can be applied. It is known that if the number of elements of a feature amount set is increased by expanding the classification target, simply applying the above-described cluster formation method cannot determine a boundary with high accuracy.

【００１７】このように特徴量セットの要素数が大きく
なると境界が複雑に入り組んだ状態になる。ＳＯＭ等で
得られた格子空間マップに対し、後処理として境界を決
定する手法として、U-Matrix法（Unified Distance Mat
rix Methods）やポテンシャル法と呼ばれるものがあ
る。ここで、U-Matrix法については、A. Ultsch et .a
l. "Knowledge Extraction from Artificial Neural Ne
tworks and Applications",Proc. Transputer Anwender
Treffen/ World Transputer Congress TAT/WTC 93 Aac
hen, Springer 1993に詳細な記述があり、ポテンシャル
法については、D.Coomans, D.L.Massart; Anal.Chim.Ac
ta.,5-3,225-239(1981)に詳細な説明があるので、詳し
い内容を省いて、その概略を説明する。As described above, when the number of elements in the feature set increases, the boundary becomes complicated and complicated. U-Matrix (Unified Distance Mat)
rix Methods) and potential methods. Here, the U-Matrix method is described in A. Ultsch et.a
l. "Knowledge Extraction from Artificial Neural Ne
tworks and Applications ", Proc. Transputer Anwender
Treffen / World Transputer Congress TAT / WTC 93 Aac
hen, Springer 1993, for a detailed description of the potential method, see D. Coomans, DLMassart; Anal.
Ta., 5-3, 225-239 (1981) has a detailed description.

【００１８】まず、U-Matrix法は、格子空間マップ内の
任意の格子と、それに隣接する格子との距離を求め、そ
の値を格子間に３次元的に再マッピングするものであ
る。ここで格子間距離には、互いに隣接する２つの格子
にプロトタイピングされた重みベクトルの成分毎の差の
絶対値和や、差の２乗平均などが用いられる。要する
に、U-Matrixの基本的な考え方は、互いに類似する特徴
量セットの各々に対し、測度が最小となるような格子間
の距離を小さくし、３次元的には「谷」に属するように
する。逆に、互いに異なる特徴量セットの各々に対し、
測度が最小となるような格子間の距離を大きくして、相
対的に「山」となるようにする。そして、この山に沿っ
て境界を決定する。つまり、U-Matrixは、ＳＯＭがベク
トル量（特徴量セットなど）の類似度でマッピングを行
い、入力空間での距離は保存されない点を補うものであ
る。U-Matrix法は、山と谷のコントラストが明確な場合
は、人為的介入の要なく比較的適切な境界を決定できる
が、境界が緩やかに変化する場合やクラスタごとに山や
谷の高さ深さが異なる場合には人為的介入が必要となる
点で上記関連技術の域を出ないものである。First, in the U-Matrix method, a distance between an arbitrary grid in a grid space map and a grid adjacent thereto is obtained, and the value is three-dimensionally re-mapped between the grids. Here, as the inter-grating distance, the sum of absolute values of the differences for each component of the weight vector prototyped to two mutually adjacent gratings, the mean square of the differences, and the like are used. In short, the basic idea of U-Matrix is to reduce the distance between grids that minimizes the measure for each similar feature set, and to three-dimensionally belong to a “valley”. I do. Conversely, for each different feature set,
The distance between the grids that minimizes the measure is increased so as to be relatively “mountain”. Then, determine the boundary along this mountain. In other words, the U-Matrix compensates for the fact that the SOM performs mapping based on the similarity of the vector amount (feature amount set or the like) and the distance in the input space is not preserved. The U-Matrix method can determine a relatively appropriate boundary without the need for human intervention when the contrast between peaks and valleys is clear.However, when the boundaries change slowly or the height of the peaks or valleys for each cluster If the depths are different, human intervention is required, which is beyond the scope of the related art.

【００１９】また、ポテンシャル法は、いわゆる教師無
し学習に基づく境界決定法であるということができる。
この方法では、所定の関数（ポテンシャル関数）を用い
て、全ての入力データ群（特徴量セットの群）の各々の
入力データを関数群の組み合わせによって近似し、その
結果としての確率密度関数を推定する方法である。これ
により確率密度関数の重なりの少ない部分（谷）を境界
として用いる。ここでポテンシャル関数としてはガウシ
アン型を用いることが多い。具体的には、Ｎ個の入力ベ
クトルからなる入力データ群があるとき、それぞれＫ次
元の大きさを持つとするとｌ番目の入力データが他の入
力データから受ける平均的なポテンシャル（ｌ番目入力
が全体の入力集合に対する寄与率）Ψｌを次の（１），
（２）式によって定義する。Further, it can be said that the potential method is a boundary determination method based on so-called unsupervised learning.
In this method, using a predetermined function (potential function), each input data of all input data groups (groups of feature quantity sets) is approximated by a combination of function groups, and a probability density function as a result is estimated. How to As a result, a portion (trough) of the probability density function with less overlap is used as a boundary. Here, a Gaussian type is often used as the potential function. More specifically, when there is an input data group consisting of N input vectors, assuming that each of the input data groups has a K-dimensional size, the average potential that the l-th input data receives from other input data (where the l-th input is The contribution ratio to the entire input set) Ψl is expressed by the following (1),
(2) Defined by equation.

【００２０】[0020]

【数１】 (Equation 1)

【００２１】尚、ｘ_klはｌ番目入力のｋ番目の成分を意
味する。また、αはスムージングパラメータで分類され
るクラスタの数に影響を与える。従って、ポテンシャル
法では、その分布形状を仮定する分布関数の最適化や、
各種パラメータの最適化が入力ベクトル集合ごとに求め
られ、要するに分類対象となるデータの特性について事
前に知識が必要であるうえ、人為的調整が不可欠とな
る。Note that x _kl means the k-th component of the l-th input. Α affects the number of clusters classified by the smoothing parameter. Therefore, in the potential method, optimization of the distribution function assuming the distribution shape,
Optimization of various parameters is required for each input vector set. In short, knowledge of the characteristics of data to be classified is required in advance, and artificial adjustment is indispensable.

【００２２】さらに、これらの研究を基礎にして、特開
平７−２３４８５４号公報「クラスタ分類装置」や、特
開平８−３６５５７号公報「クラスタ分類装置」、「自
己組織化特徴マップ上のデータ密度ヒストグラムを用い
た教師無しクラスタ分類法」、電子情報通信学会論文誌
D-II Vol. J79-DII No.7 pp. 1280-1290 1996年7月等
には、U-Matrix法の発展形として、隣接格子間距離や、
入力に反応する（入力に対して測度最小となる）格子数
の分布、ないし集積度を用いる技術が開示されている。
この技術によると、測度最小となる格子がクラスタ内に
集中するというＳＯＭの傾向に着目したもので、クラス
タ間の重心距離が大きい場合には十分な効果が得られる
と期待されるものの、格子空間マップ上でクラスタが複
雑に絡み合い、境界が複雑な形状となるときには十分な
効果が得られない。Further, based on these studies, Japanese Patent Application Laid-Open No. Hei 7-234854, "Cluster Classification Device", Japanese Patent Application Laid-Open No. 8-36557, "Cluster Classification Device", "Data density on self-organizing feature map" Unsupervised Cluster Classification Using Histogram ", IEICE Transactions
D-II Vol. J79-DII No.7 pp. 1280-1290 In July 1996, etc., as a development of the U-Matrix method,
A technique is disclosed that uses the distribution or the degree of integration of the number of grids that responds to an input (a measure becomes minimum for the input).
According to this technique, attention is paid to the tendency of SOM that a grid having a minimum measure is concentrated in a cluster, and it is expected that a sufficient effect can be obtained when the centroid distance between clusters is large. When the clusters are complicatedly entangled on the map and the boundary has a complicated shape, a sufficient effect cannot be obtained.

【００２３】結局、上記関連技術の域では、画像の分
類、ワインの味覚といった感性量分類や、Ｗｅｂアクセ
スログ解析等に用いられる要素数の多いデータや各要素
成分の独立性が保証されない場合には、クラスタ分類が
困難になる。すなわち、このままでは人為的にパラメー
タ調整を行うことが不可欠になって、適用用途が限定さ
れてしまう。After all, in the range of the related art, when there is no guarantee of independence of data having a large number of elements or individual elements used for sensitivity classification such as image classification and wine taste, Web access log analysis and the like. Makes cluster classification difficult. That is, in this state, it is indispensable to perform the parameter adjustment artificially, and the application is limited.

【００２４】本発明は上記実情に鑑みて為されたもの
で、クラスタ数が事前にわかっていなかったり、分類対
象のデータについて事前の知識がない場合にも、人為的
操作を介することなく、実用的な分類を達成できるよう
クラスタを形成でき、分類処理に資することのできるデ
ータ分析装置を提供することを目的とする。The present invention has been made in view of the above-mentioned circumstances. Even when the number of clusters is not known in advance, or when there is no prior knowledge about the data to be classified, it is possible to use the data without any manual operation. It is an object of the present invention to provide a data analysis device capable of forming a cluster so as to achieve a general classification and contributing to a classification process.

【００２５】[0025]

【課題を解決するための手段】上記従来例の問題点を解
決するための本発明は、複数の入力データについて、そ
れぞれの入力データに対応するｎ個（ｎは２以上）の特
徴量データからなる特徴量セットを演算する手段と、前
記特徴量セットの各成分から格子空間をなす各格子への
マッピング係数を定義する格子空間マップを生成する手
段と、を具備し、事前に複数の学習用入力データを利用
して学習を行い、その結果として、互いに類似する特徴
量セットが前記格子空間内の所定領域に集中してマッピ
ングされるように定義された前記格子空間マップを保持
し、当該格子空間マップを用いてデータの分類を行うデ
ータ分析装置において、前記特徴量セットの各成分に対
応する格子ごとのマッピング係数を所定の合成条件で合
成して得られるｍ個の合成量から、前記格子空間上の各
格子への縮退マッピングを生成するマップ縮退手段と、
前記縮退マッピングに基づいて前記格子空間マップ上に
データの分類境界を設定する境界設定手段と、を含むこ
とを特徴としている。SUMMARY OF THE INVENTION The present invention for solving the above-mentioned problems of the prior art is based on n (n is 2 or more) feature data corresponding to each input data. Means for calculating a set of feature amounts, and means for generating a grid space map that defines a mapping coefficient from each component of the set of feature amounts to each grid forming a grid space. Learning is performed using input data, and as a result, the grid space map defined so that feature sets similar to each other are intensively mapped to a predetermined region in the grid space is held, and the grid In a data analysis device that classifies data using a spatial map, m is obtained by combining mapping coefficients for each grid corresponding to each component of the feature amount set under predetermined combining conditions. From the amount of synthesis, and the map degeneracy means for generating degenerate mapping to the grating on the grating space,
Boundary setting means for setting a classification boundary of data on the grid space map based on the degenerate mapping.

【００２６】ここで、合成特徴量を生成する際には、学
習によって形成された格子空間マップを参照して生成を
行う。すなわち、格子空間マップをなす、各格子につい
ての重みベクトル（マッピング係数に相当する）のノル
ム（Norm）を定義し、このノルム分布を維持させる合成
量の組を生成する。Here, when generating a synthetic feature, generation is performed with reference to a grid space map formed by learning. That is, the norm (Norm) of the weight vector (corresponding to the mapping coefficient) for each lattice, which forms the lattice space map, is defined, and a set of synthesis amounts for maintaining this norm distribution is generated.

【００２７】また、前記マップ縮退手段は、マッピング
係数に対する主成分分析に基づいて、合成量の合成条件
を決定することが好ましい。さらに、これによる合成量
の数ｍは、主成分分析時の各成分の寄与率や寄与率の変
化の比較によって決定され、又は合成量に基づく元の格
子空間マップの再現性に応じて決定されることが好まし
い。It is preferable that the map degeneration means determines a synthesis condition of a synthesis amount based on a principal component analysis on a mapping coefficient. Further, the number m of the combined amount is determined by comparing the contribution ratio of each component during the principal component analysis and the change of the contribution ratio, or is determined according to the reproducibility of the original grid space map based on the combined amount. Preferably.

【００２８】さらに、前記合成量の数ｍは、主成分分析
時の各成分の寄与率や寄与率の変化の比較により決めら
れる数ｍ１と、合成量に基づく元の格子空間マップの再
現性に応じて決定される数ｍ２とのうち、いずれか少な
い方の値とすることも好ましい。Further, the number m of the synthesis amount is a number m1 determined by comparing the contribution ratio of each component in the principal component analysis and the change of the contribution ratio, and the reproducibility of the original lattice space map based on the synthesis amount. It is also preferable to set the value to the smaller one of the number m2 determined accordingly.

【００２９】さらに、特徴量サブセットの選択を学習用
入力データに対して行ってもよい。すなわち、上記従来
例の問題点を解決するための本発明は、複数の入力デー
タに基づいてｎ個（ｎは２以上）の特徴量データからな
る特徴量セットを演算する手段と、前記特徴量セットか
ら格子空間をなす各格子へのマッピングを定義する格子
空間マップを生成する手段と、を具備し、事前に複数の
学習用入力データを利用して学習を行い、その結果とし
て、互いに類似する特徴量セットが前記格子空間内の所
定領域に集中してマッピングされるように定義された前
記格子空間マップを保持し、当該格子空間マップを用い
てデータの分類を行うデータ分析装置において、前記学
習用入力データに基づく特徴量セットの各成分を所定の
合成条件で合成して得られるｍ個（ｍは１以上ｎ未満）
の学習用合成特徴量を演算する合成手段と、前記学習用
合成特徴量のセットを用いて縮退格子空間マップを学習
形成する手段と、を含み、前記学習形成の結果として保
持される縮退格子空間マップ上に、分類境界を設定する
境界設定手段と、を含むことを特徴としている。Further, the selection of the feature amount subset may be performed on the input data for learning. That is, the present invention for solving the above-mentioned problem of the conventional example comprises means for calculating a feature amount set including n (n is 2 or more) feature amount data based on a plurality of input data; Means for generating a grid space map that defines a mapping from the set to each of the grids forming the grid space, wherein learning is performed in advance by using a plurality of learning input data, and as a result, mutually similar In a data analysis device that holds the grid space map defined so that a feature amount set is intensively mapped to a predetermined region in the grid space and classifies data using the grid space map, M (m is 1 or more and less than n) obtained by combining each component of the feature quantity set based on the input data for
And a means for learning and forming a degenerate lattice space map using the set of learning synthetic features, and a degenerate lattice space held as a result of the learning formation. And a boundary setting means for setting a classification boundary on the map.

【００３０】さらに、上記従来例の問題点を解決するた
めの本発明は、複数の入力データについて、それぞれの
入力データに対応するｎ個（ｎは２以上）の特徴量デー
タからなる特徴量セットを演算する工程と、前記特徴量
セットの各成分から格子空間をなす各格子へのマッピン
グ係数を定義する格子空間マップを生成する工程と、を
具備し、事前に複数の学習用入力データを利用して学習
を行い、その結果として、互いに類似する特徴量セット
が前記格子空間内の所定領域に集中してマッピングされ
るように定義された前記格子空間マップを利用してデー
タの分類を行うデータ分析方法において、前記特徴量セ
ットの各成分に対応する格子ごとのマッピング係数を所
定の合成条件で合成して得られるｍ個（ｍは１以上ｎ未
満）の合成量から、前記格子空間上の各格子への縮退マ
ッピングを生成するマップ縮退工程と、前記縮退マッピ
ングに基づいて前記格子空間マップ上にデータの分類境
界を設定する境界設定工程と、を含むことを特徴として
いる。Further, according to the present invention for solving the problems of the above-mentioned conventional example, a feature amount set consisting of n (n is 2 or more) feature amount data corresponding to each of the plurality of input data is provided. And a step of generating a grid space map that defines a mapping coefficient from each component of the feature amount set to each grid forming a grid space, using a plurality of learning input data in advance. The learning is performed, and as a result, data sets are classified using the grid space map defined so that feature sets similar to each other are intensively mapped to a predetermined area in the grid space. In the analysis method, m (m is 1 or more and less than n) synthesized amounts obtained by synthesizing mapping coefficients for each grid corresponding to each component of the feature amount set under predetermined synthesis conditions are used. A map degenerating step of generating a degenerate mapping to each grid in the grid space; and a boundary setting step of setting a data classification boundary on the grid space map based on the degenerate mapping. .

【００３１】さらに、上記従来例の問題点を解決するた
めの本発明は、コンピュータに、複数の入力データに基
づいてｎ個（ｎは２以上）の特徴量データからなる特徴
量セットを演算する手順と、前記特徴量セットから格子
空間へのマッピングを定義する格子空間マップを生成す
る手順と、を実行させ、複数の学習用入力データを利用
して事前に行われた学習結果として、互いに類似する特
徴量セットが前記格子空間内の所定領域に集中してマッ
ピングされるように定義された前記格子空間マップを利
用して、データの分類を行わせるデータ分析プログラム
において、前記格子空間マップを所定の条件で縮退して
得られる縮退マッピングを生成するマップ縮退手順と、
前記縮退マッピングに基づいて前記格子空間マップ上に
データの分類境界を設定する手順と、を含むことを特徴
としている。Further, according to the present invention for solving the above-mentioned problems of the conventional example, a computer calculates a feature amount set consisting of n (n is 2 or more) feature amount data based on a plurality of input data. And generating a grid space map that defines a mapping from the feature set to the grid space, and performing similarity as learning results performed in advance using a plurality of learning input data. A data analysis program for classifying data by using the grid space map defined so that the feature set to be mapped is concentrated on a predetermined area in the grid space. A map degeneration procedure for generating a degeneration mapping obtained by degeneration under the conditions of
Setting a data classification boundary on the grid space map based on the degenerate mapping.

【００３２】[0032]

【発明の実施の形態】［第１実施形態］本発明の第１の
実施の形態について図面を参照しながら説明する。尚、
以下の説明では、説明のため、分類対象のデータとして
画像データの場合を例示するが、実際の利用において
は、画像データのみならず、味覚データなど他の感覚器
データであってもよいし、さらに実験結果などの測定デ
ータであっても構わない。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [First Embodiment] A first embodiment of the present invention will be described with reference to the drawings. still,
In the following description, for the sake of explanation, the case of image data is exemplified as data to be classified, but in actual use, not only image data, but also other sensory organ data such as taste data, Furthermore, measurement data such as experimental results may be used.

【００３３】本実施の形態に係るデータ分析装置１は、
図１に示すように、ＣＰＵ１１と、ＲＡＭ１２と、ＲＯ
Ｍ１３と、ハードディスク１４と、画像入力用インタフ
ェース１５と、ディスプレイ１６と、外部記憶部１７と
から基本的に構成され、各部はバス接続されている。す
なわち、本実施の形態のデータ分析装置１は、一般的な
パーソナルコンピュータによってソフトウエア的に実現
される。このソフトウエアは、一般的にはＣＤ−ＲＯＭ
やＤＶＤ−ＲＯＭなどの記録媒体に格納された状態で頒
布され、またはネットワークを介してダウンロードされ
る（ネットワークに対する接続インタフェースは図示を
省略した）。そして、当該記録媒体によって頒布される
場合には外部記憶部１７にて読み出されて、所定のイン
ストール処理により、ハードディスク１４に格納され
る。また、ネットワークを介してダウンロードされた場
合も同様に、ハードディスク１４にインストールされ
る。The data analyzer 1 according to the present embodiment
As shown in FIG. 1, the CPU 11, the RAM 12, and the RO
M13, a hard disk 14, an image input interface 15, a display 16, and an external storage unit 17, each of which is connected to a bus. That is, the data analysis device 1 of the present embodiment is realized by software using a general personal computer. This software is generally a CD-ROM
It is distributed in a state of being stored on a recording medium such as a DVD or a DVD-ROM, or downloaded via a network (a connection interface to the network is not shown). When distributed by the recording medium, the data is read out from the external storage unit 17 and stored in the hard disk 14 by a predetermined installation process. Also, when downloaded via a network, it is similarly installed on the hard disk 14.

【００３４】ＣＰＵ１１は、このハードディスク１４に
格納されているプログラムに従って動作し、基本的には
Ｗｉｎｄｏｗｓ（登録商標）等のオペレーティングシス
テムの管理下で本実施の形態のデータ分析装置１を具現
化するデータ分析ソフトウエア等を実行する。具体的
に、本実施の形態のデータ分析プログラムは、図２に示
すように、画像処理部２１と、選択処理部２２と、分類
処理部２３とからなり、画像処理部２１は、さらにホワ
イトニング処理部３１と、ガボールフィルタ処理部３２
とを含んでなり、選択処理部２２は、画像抽出処理部３
５と、選択信号生成処理部３６とを備え、分類処理部２
３は、特徴量生成部４１と、学習処理部４２と、境界決
定処理部４３と、分類実行部４４とを備えている。ここ
では、これらの各部が、それぞれソフトウエアモジュー
ルとして実現されることとしているが、ハードウエア的
に論理回路によって構成されても構わない。このＣＰＵ
１１における処理については後に詳しく述べる。The CPU 11 operates in accordance with the program stored in the hard disk 14 and basically operates under the control of an operating system such as Windows (registered trademark) to implement the data analysis apparatus 1 of the present embodiment. Execute analysis software. Specifically, the data analysis program according to the present embodiment includes an image processing unit 21, a selection processing unit 22, and a classification processing unit 23, as shown in FIG. Unit 31 and Gabor filter processing unit 32
And the selection processing unit 22 includes the image extraction processing unit 3
5 and a selection signal generation processing unit 36, and the classification processing unit 2
3 includes a feature amount generation unit 41, a learning processing unit 42, a boundary determination processing unit 43, and a classification execution unit 44. Here, it is assumed that each of these units is realized as a software module, but it may be configured by a logic circuit in hardware. This CPU
The processing in 11 will be described later in detail.

【００３５】ＲＡＭ１２は、ＣＰＵ１１のワークメモリ
として利用されるもので、ＣＰＵ１１の処理中に各種パ
ラメータやデータを記憶するために用いられる。ＲＯＭ
１３は、主としてオペレーティングシステムの読み込み
の処理など、データ分析装置１が起動する際に必要とな
るプログラムが格納されている。この起動用プログラム
の内容は広く知られているので、その説明を省略する。The RAM 12 is used as a work memory of the CPU 11, and is used to store various parameters and data during the processing of the CPU 11. ROM
Reference numeral 13 mainly stores programs that are required when the data analyzer 1 is started, such as processing for reading an operating system. Since the contents of the start-up program are widely known, description thereof will be omitted.

【００３６】ハードディスク１４は、オペレーティング
システムの本体や、種々のプログラムがインストールさ
れている。また、本実施の形態においては、このハード
ディスク１４には、既に説明したように、データ分析プ
ログラムがインストールされている。尚、ここではハー
ドディスク内に格納されている場合について例示した
が、例えばＳＲＡＭ（Static Random Access Memory）
や、ＥＥＰＲＯＭ等の不揮発性メモリにインストールし
ても構わないし、図１に示したように、ＣＰＵ１１と同
一筐体に含まなくても、図示しないネットワークインタ
フェースを介して接続される別のコンピュータ内にイン
ストールされていてもよい。The hard disk 14 has an operating system main body and various programs installed therein. In the present embodiment, a data analysis program is installed on the hard disk 14 as described above. Although the case where the data is stored in the hard disk is illustrated here, for example, an SRAM (Static Random Access Memory)
Alternatively, the CPU 11 may be installed in a non-volatile memory such as an EEPROM, or, as shown in FIG. 1, may be installed in another computer connected via a network interface (not shown) even though it is not included in the same housing as the CPU 11. It may be installed.

【００３７】画像入力用インタフェース１５には、スキ
ャナ等の画像入力装置が接続され、当該画像入力装置か
ら画像データの入力を受けて、ＣＰＵ１１に出力する。
ディスプレイ１６は、ＣＰＵ１１からの指示に従って、
画像を表示する。An image input device such as a scanner is connected to the image input interface 15, and receives image data input from the image input device and outputs the image data to the CPU 11.
The display 16 responds to an instruction from the CPU 11
Display an image.

【００３８】［処理の詳細］ここで、ＣＰＵ１１のデー
タ分析の処理について、図２の機能ブロックを参照しな
がら説明する。ＣＰＵ１１のデータ分析の処理は、学習
課程と、実際の分類過程に分けることができるが、機能
ブロックとしてはいずれの過程においても同じものが用
いられる。[Details of Processing] Here, the data analysis processing of the CPU 11 will be described with reference to the functional blocks of FIG. The data analysis process of the CPU 11 can be divided into a learning process and an actual classification process, and the same functional block is used in any process.

【００３９】まず、各ブロックの処理の概要を説明する
と、画像処理部２１は、画像入力用インタフェース１５
を介して入力された画像データをグレースケールのデー
タに変換し、さらに学習、分類すべき画像の特徴が明確
になるように所定の変換を施し、選択処理部２２に出力
する。選択処理部２２は、画像処理部２１から入力され
る画像データから、学習すべき部分画像データを分割し
て抽出し、分類処理部２３に出力する。分類処理部２３
は、学習課程においては、入力される部分画像データに
基づいてＳＯＭを行って格子空間マッピングを学習形成
し、分類過程においては、この格子空間マッピングを変
更せずに、学習形成された格子空間マッピングに従っ
て、入力される部分画像データの分類を行う。この学習
課程から分類過程に移行する際に、境界の決定処理が行
われ、格子空間マッピング内に境界が設定される。First, an outline of the processing of each block will be described.
Is converted into grayscale data, subjected to a predetermined conversion so that the features of the image to be learned and classified are clarified, and output to the selection processing unit 22. The selection processing unit 22 divides and extracts partial image data to be learned from the image data input from the image processing unit 21, and outputs the data to the classification processing unit 23. Classification processing unit 23
In the learning process, the SOM is performed based on the input partial image data to learn and form the lattice space mapping. In the classification process, the learned lattice space mapping is performed without changing the lattice space mapping. Is performed, the input partial image data is classified. When the process shifts from the learning process to the classification process, a boundary determination process is performed, and a boundary is set in the lattice space mapping.

【００４０】次に各部の動作についてより詳しく説明す
る。画像処理部２１のホワイトニング処理部３１は、入
力されたグレースケールの画像データに対し、フーリエ
変換を施して周波数成分に分割し、その低周波成分を除
去し、各周波数毎の成分の大きさを均等化した上で逆フ
ーリエ変換して、ガボールフィルタ処理部３２に出力す
る。この処理は、人間の網膜の神経節細胞において行わ
れる情報処理を模したもので、画像を構成する周波数成
分について、各周波数成分ごとの寄与を均一にするとと
もに低域を遮断し、照明による影の影響を低減するもの
である。これにより、画像の詳細な特徴が明確になる。Next, the operation of each unit will be described in more detail. The whitening processing unit 31 of the image processing unit 21 performs Fourier transform on the input grayscale image data to divide the grayscale image data into frequency components, removes the low frequency components, and determines the magnitude of the component for each frequency. After equalization, inverse Fourier transform is performed, and the result is output to the Gabor filter processing unit 32. This process simulates the information processing performed in the ganglion cells of the human retina. In the frequency components that make up the image, the contribution of each frequency component is made uniform, low frequencies are cut off, To reduce the effect of This clarifies the detailed features of the image.

【００４１】ガボールフィルタ処理部３２は、次の
（３）式によるガボールフィルタ処理を行い、その結果
を選択処理部２２に出力する。The Gabor filter processing section 32 performs the Gabor filter processing according to the following equation (3), and outputs the result to the selection processing section 22.

【００４２】[0042]

【数２】ここで、(Equation 2) here,

【数３】は、画像データの注目点を示す位置ベクトルであり、(Equation 3) Is a position vector indicating a point of interest in the image data,

【数４】は、各画素のベクトルである。またσは分散を表す。こ
のように、調和関数をガウス関数で変調して得られるガ
ボール関数を適用したことで、画像の注目点の部分が周
辺部よりも詳細に見えるという人間の視野と同等の状態
を模している。つまり、処理対象の画像データの周辺部
の輝度分散を小さくして背景部分の処理への影響を低減
する。(Equation 4) Is the vector of each pixel. Σ represents the variance. In this way, by applying the Gabor function obtained by modulating the harmonic function with the Gaussian function, it imitates the state equivalent to the human visual field where the point of interest of the image is more detailed than the surrounding area . In other words, the luminance variance in the peripheral portion of the image data to be processed is reduced, and the influence on the processing of the background portion is reduced.

【００４３】選択処理部２２の画像抽出処理部３５は、
選択信号生成処理部３６から入力される選択信号に従っ
て、画像処理部２１から入力される画像データから部分
画像データを抽出し、分類処理部２３に出力する。選択
信号生成処理部３６は、画像処理部２１から入力される
画像データを所定の画像ブロックに分割するための信号
を出力する。ここで、画像ブロックは、画像データを例
えば８×８ピクセルの正方ブロックに分割したものでよ
いが、互いに隣接するブロックとの間で、例えばその半
分の値である４ピクセルずつ重複するように分割する。The image extraction processing unit 35 of the selection processing unit 22
According to the selection signal input from the selection signal generation processing unit 36, partial image data is extracted from the image data input from the image processing unit 21 and output to the classification processing unit 23. The selection signal generation processing unit 36 outputs a signal for dividing the image data input from the image processing unit 21 into predetermined image blocks. Here, the image block may be obtained by dividing the image data into, for example, square blocks of 8 × 8 pixels, and is divided so as to overlap, for example, half the value of 4 pixels between adjacent blocks. I do.

【００４４】また、選択処理部２２は、次の（４）式に
よって分割した部分画像データから意味のあるブロック
のみを取り出して、分類処理部２３に出力することが好
ましい。この（４）式は、エントロピーの演算に相当
し、背景部分に相当する部分画像データに対しては小さ
い値となり、顔であれば、輪郭や目、鼻、口などの対象
部分（特徴部分）については大きい値となるものであ
る。したがって、この（４）式のＩＥの大きいもののみ
を取り出して分類処理部２３に出力することで、画像デ
ータの特徴的な部分を取り出すことができるようにな
る。It is preferable that the selection processing unit 22 extracts only significant blocks from the partial image data divided by the following equation (4) and outputs the extracted blocks to the classification processing unit 23. This equation (4) corresponds to an entropy calculation, and has a small value for partial image data corresponding to a background portion. For a face, a target portion (characteristic portion) such as an outline, an eye, a nose, and a mouth Is a large value. Therefore, by extracting only the one having a large IE in the equation (4) and outputting the extracted IE to the classification processing unit 23, a characteristic portion of the image data can be extracted.

【００４５】[0045]

【数５】ここでＰ_Lは、部分画像データに含まれる画素のうち、
特定の輝度レベルＬが出現する回数である。(Equation 5) Here, P _L is one of the pixels included in the partial image data.
This is the number of times a specific luminance level L appears.

【００４６】分類処理部２３の特徴量生成部４１は、選
択処理部２２から入力される複数の部分画像データの各
々について、事前に定められた処理に従って、複数の特
徴量を演算して出力する。この複数の特徴量の集合が特
徴量セットに相当する。尚、各特徴量がスカラである場
合等では、この特徴量セットは特徴量のベクトルとな
る。ここでは、特徴量をこのようにベクトル（特徴量ベ
クトル）として配列した場合について説明する。The feature value generation unit 41 of the classification processing unit 23 calculates and outputs a plurality of feature values for each of the plurality of partial image data input from the selection processing unit 22 in accordance with a predetermined process. . A set of the plurality of feature amounts corresponds to a feature amount set. When each feature value is a scalar or the like, this feature value set is a vector of feature values. Here, a case will be described in which the feature amounts are arranged as vectors (feature amount vectors) in this manner.

【００４７】学習処理部４２は、学習モードと分類モー
ドとの２つの動作モードを有し、各動作モードで異なる
処理を行う。具体的に学習モードにあるときには、学習
処理部４２は、Ｍ×Ｍ′の格子状に配列したノード（格
子）の各々について、特徴量ベクトルと同じ次元の重み
ベクトルを管理している。ここで、Ｍ×Ｍ′に、２次元
に配列しているのは単なる便宜であって、ｎ次元（ｎ＞
２）に配列しても構わない。The learning processing section 42 has two operation modes, a learning mode and a classification mode, and performs different processing in each operation mode. Specifically, when in the learning mode, the learning processing unit 42 manages a weight vector having the same dimension as the feature vector for each of the nodes (lattices) arranged in an M × M ′ lattice. Here, the two-dimensional arrangement in M × M ′ is merely for convenience, and is n-dimensional (n>
You may arrange in 2).

【００４８】すなわち、学習処理部４２は、学習モード
では、入力された特徴量ベクトルを学習用データとし
て、Ｍ×Ｍ′の格子空間上に、ＳＯＭ（自己組織化マッ
プ）によって格子空間マップを形成する。つまり、入力
された特徴量ベクトルと、各格子ごとに割り当てられた
重みベクトルとの距離を所定の測度で演算する。ここで
は簡単のため、ユークリッド測度を用いる。そして、こ
の距離が最小となる格子（最整合ノード）ｃを検出す
る。そして、この最整合ノード近傍の複数の格子につい
て、その重みベクトルを当該入力された特徴量ベクトル
を用いて更新する。具体的にこの更新は、次の（５）式
によって行われる。That is, in the learning mode, the learning processing unit 42 forms a lattice space map on the M × M ′ lattice space by SOM (self-organizing map) using the input feature amount vector as learning data. I do. That is, the distance between the input feature amount vector and the weight vector assigned to each lattice is calculated by a predetermined measure. Here, the Euclidean measure is used for simplicity. Then, a lattice (best matching node) c at which this distance is minimum is detected. Then, for the plurality of grids near the best matching node, the weight vectors are updated using the input feature amount vectors. Specifically, this update is performed by the following equation (5).

【００４９】[0049]

【数６】 (Equation 6)

【００５０】α(t)と、σ(t)とは、時間に関する単調減
少関数であり、||ｒ_c−ｒ_j||は、最整合ノードである格
子と、格子_jとのユークリッド距離を表す。また、ｔ
は、「時刻」であり、特徴量ベクトルが入力されるごと
にインクリメントされる（（５）式の左辺参照）。The α (t), the A sigma (t), is a monotonically decreasing function of time, || r _c -r _j || includes a grating is the maximum matching node, the Euclidean distance between the lattice _j Represent. Also, t
Is “time”, and is incremented each time a feature amount vector is input (see the left side of Expression (5)).

【００５１】この（５）式を用いた処理の繰り返しによ
り、格子空間マップが形成され、互いに類似する特徴量
ベクトルに対する最整合ノードが連続的な領域を形成す
るようになる。つまり、この格子空間には、多次元の入
力信号である特徴量ベクトルから２次元のマップへの非
線形射影が位相を保持したまま形成され、重みの更新に
より、データの特徴部分が組織化され、その学習成果と
して類似のデータに反応する格子が近接して存在してい
るようになる。By repeating the processing using the equation (5), a lattice space map is formed, and the most matching nodes for the feature amount vectors similar to each other form a continuous area. That is, in this lattice space, a non-linear projection from a feature vector, which is a multi-dimensional input signal, to a two-dimensional map is formed while maintaining the phase, and the feature part of the data is organized by updating the weights. As a result of the learning, a grid that responds to similar data is present in close proximity.

【００５２】学習処理部４２は、学習モードから分類モ
ードに設定が変更される際に、境界決定処理部４３に対
し、境界決定をすべき旨の指示を出力する。また、分類
モードにおいては、学習処理部４２は、入力された特徴
量ベクトルに対し、学習結果としての格子空間マップを
利用して最整合ノードを検索し、最整合ノードを分類実
行部４４に出力する。この際、学習処理部４２は、
（５）式を用いた自己組織化を行わない。When the setting is changed from the learning mode to the classification mode, the learning processing unit 42 outputs an instruction to the boundary determination processing unit 43 to determine the boundary. In the classification mode, the learning processing unit 42 searches the input feature amount vector for the best matching node using the grid space map as a learning result, and outputs the best matching node to the classification execution unit 44. I do. At this time, the learning processing unit 42
Do not perform self-organization using equation (5).

【００５３】境界決定処理部４３は、学習処理部４２か
ら入力された境界設定をすべき旨の指示に従って、各格
子をクラスタ分類する。この境界決定処理部４３の動作
については、後に詳しく説明する。The boundary determination processing unit 43 classifies each lattice into clusters according to the instruction to set the boundary input from the learning processing unit 42. The operation of the boundary determination processing unit 43 will be described later in detail.

【００５４】分類実行部４４は、学習処理部４２が分類
モードにあるときに動作し、分類対象のデータ（部分画
像データなど）の入力を受けて、これを保持し、学習処
理部４２から最整合ノードの情報の入力を受けて、この
最整合ノードが境界決定処理部４３によって境界決定さ
れたクラスタのうち、どのクラスタに属するかの情報と
ともに、保持している分類対象データをディプレイ１６
等に分類結果として出力する。The classification execution unit 44 operates when the learning processing unit 42 is in the classification mode, receives data to be classified (such as partial image data), holds the data, and receives In response to the input of the information of the matching node, the classification target data held is displayed on the display 16 together with information on which cluster the cluster most closely matched by the boundary determination processing unit 43 belongs to.
Etc. and output it as a classification result.

【００５５】［境界決定処理］ここで、境界決定処理部
４３の動作について説明する。境界決定処理部４３は、
図３に示すように、主成分分析部５１と、フィルタ部５
２と、境界解析部５３とから基本的に構成されている。
主成分分析部５１は、格子空間マップ（各格子の重みベ
クトル）の情報を取得し、当該重みベクトルの各々に対
する主成分分析（ＰＣＡ；Principal Component Analys
is）により、重みベクトルの線形合成として得られる、
合成重みベクトルの成分のうち、寄与率の大きいものか
ら順に並び替えて、その結果を出力する。このＰＣＡに
ついては広く知られたものであるのでその詳細な説明は
省略する。尚、寄与率は、主成分分析において定義され
る一般的な量であり、具体的には重みベクトルが（６）
式のように表されているとき、[Boundary Determination Processing] Here, the operation of the boundary determination processing unit 43 will be described. The boundary determination processing unit 43
As shown in FIG. 3, the principal component analyzer 51 and the filter 5
2 and a boundary analysis unit 53.
The principal component analysis unit 51 acquires information of a lattice space map (weight vector of each lattice), and performs principal component analysis (PCA; Principal Component Analysis) on each of the weight vectors.
is) yields a linear composition of the weight vectors,
Among the components of the composite weight vector, the components are rearranged in descending order of contribution ratio, and the result is output. Since this PCA is widely known, its detailed description is omitted. Note that the contribution rate is a general quantity defined in the principal component analysis, and specifically, the weight vector is (6)
When expressed as

【数７】これに対する線形合成により得られる合成重みベクトル
を（７）式とすると、(Equation 7) Assuming that a combined weight vector obtained by linear combination with respect to this is Expression (7),

【数８】当該線形合成の係数の組は、各格子の重みベクトルに基
づく分散共分散行列の固有ベクトルとして得られ、対応
する固有値の総和で除したものが寄与率に相当する。(Equation 8) The set of coefficients of the linear combination is obtained as an eigenvector of a variance-covariance matrix based on the weight vector of each lattice, and a value obtained by dividing by a sum of corresponding eigenvalues corresponds to a contribution ratio.

【００５６】フィルタ部５２は、寄与率の順に並び替え
られた合成重みベクトルのうち、最大の寄与率を有する
合成重みベクトル成分と、当該最大の寄与率に対し、所
定の割合（例えば最大寄与率の１０％）以上の大きさの
寄与率を有する合成重みベクトル成分までを出力する。
すなわち、このフィルタ部５２は、寄与率が所定の値よ
り大きいｍ個の（ｍは１以上で、かつ重みベクトルの成
分の数ｎ以下となる）合成重みベクトルを選択的に出力
する。この選択された合成重みベクトルの成分が本発明
の合成量に相当し、この主成分分析部５１とフィルタ部
５２とが縮退マッピングを生成するマップ縮退手段を実
現している。ここでは寄与率を用いたが、二乗分数に関
係している固有値そのものを同様に用いてもよい。The filter unit 52 determines a composite weight vector component having the largest contribution ratio among the composite weight vectors sorted in the order of contribution ratio, and a predetermined ratio (for example, the maximum contribution ratio) to the maximum contribution ratio. Is output up to the combined weight vector component having a contribution ratio of 10% or more.
That is, the filter unit 52 selectively outputs m (where m is equal to or more than 1 and the number of weight vector components is equal to or less than n) m contribution weights whose contribution rate is greater than a predetermined value. The component of the selected composite weight vector corresponds to the composite amount of the present invention, and the principal component analysis unit 51 and the filter unit 52 implement a map reduction unit that generates a reduction mapping. Although the contribution rate is used here, the eigenvalue itself related to the squared fraction may be used in the same manner.

【００５７】境界解析部５３は、フィルタ部５２によっ
て選択的に出力された選択された合成重みベクトルの成
分について、その統計量的値（中間値・平均値・中央値
等）をしきい値として、そのしきい値を越えるものとし
きい値を下回るものとにそれぞれ異なる符号を付与す
る。但し、ＰＣＡの場合、平均値は０となる。尚、ラン
ダムな学習データや高次元の学習データに基づくＳＯＭ
に対しては中央値を利用することが好ましい。またここ
で、符号としては、例えば合成重みベクトルの成分のう
ち、最大寄与率を有する第１成分について、その第１成
分の格子内分布の中央値より、当該第１成分の値が大き
い格子に「１１」、中央値以下の格子に「００」を付与
する（図４（ａ））。同様に、２番目に大きい寄与率を
有する第２成分について、その中央値より第２成分の値
が大きい格子に「０１」、中央値以下の格子に「００」
を付与し（図４（ｂ））、３番目のものについて中央値
より大きい第３成分を有する格子に「１０」、中央値以
下の格子に「００」を付与する（図４（ｃ））。この処
理はいわば、格子空間を２値化する処理に相当する。な
お、フィルタ部５２は、選択された各成分の組み合わせ
の和に対する中央値で境界を設定してもよい。The boundary analysis unit 53 uses the statistical values (intermediate value, average value, median value, etc.) of the components of the selected composite weight vector selectively output by the filter unit 52 as threshold values. , Different codes are given to those exceeding the threshold and those below the threshold. However, in the case of PCA, the average value is 0. SOM based on random learning data or high-dimensional learning data
It is preferable to use the median for. Here, as the code, for example, among the components of the composite weight vector, for the first component having the maximum contribution rate, the value of the first component is larger than the median value of the distribution of the first component in the lattice. “00” is assigned to “11” and the grid below the median (FIG. 4A). Similarly, for the second component having the second largest contribution ratio, “01” is assigned to a grid having a value of the second component larger than the median value, and “00” is assigned to a grid value having a median value or less.
(FIG. 4 (b)), and assigns “10” to the third grid having the third component larger than the median and “00” to the grid below the median (FIG. 4 (c)). . This process corresponds to a process of binarizing the grid space. Note that the filter unit 52 may set the boundary with a median value for the sum of the selected combinations of the components.

【００５８】そして境界解析部５３は、各格子につい
て、第１から第３成分に関係する符号を連結して、６ビ
ットの符号を生成する。具体的に第１成分について「１
１」、第２成分について「０１」、第３成分について
「１０」の符号が付されているときには、符号「１１０
１１０」を生成する。境界解析部５３は、この連結後の
符号が互いに共通する格子を一つのクラスタに分類す
る。つまり、格子空間上で、隣接する格子の各々に対す
る連結後の符号が異なる部分に境界を設定する。Then, the boundary analyzing unit 53 generates a 6-bit code by connecting the codes related to the first to third components for each lattice. Specifically, for the first component, "1
When “1” is assigned to the second component, “01” is assigned to the second component, and “10” is assigned to the third component.
110 "is generated. The boundary analysis unit 53 classifies the lattices having the same code after the connection into one cluster. That is, in the grid space, a boundary is set at a portion where the sign after connection to each of the adjacent grids is different.

【００５９】このように、寄与率が大きくなるように合
成し、その寄与率に応じて選択された成分を利用して境
界を設定することで、元の格子空間マップを定義する重
みベクトルを用いるよりも的確な基準で、実用的な分類
を実現できる。As described above, the synthesis is performed so that the contribution ratio becomes large, and the boundary is set using the component selected according to the contribution ratio, so that the weight vector that defines the original grid space map is used. Practical classification can be realized with more accurate criteria.

【００６０】フィルタ部５２はまた、次の方法によって
合成重みベクトルのうち、境界解析に用いる成分を決定
してもよい。当該最大の寄与率を有する合成重みベクト
ル成分と、当該最大の寄与率から、順に寄与率の変化率
を求め、所定の割合（例えば変化率が５％）以上を有す
る成分までの合成重みベクトルを出力する。この方法
は、寄与率又は固有値が単調に減少しない場合に有利と
なる。具体的には、第１成分の寄与率の大きさに対し、
第２成分がその３０％の大きさに、第３成分が第１成分
に対し２５％の大きさに、第４成分が１０％となってい
たとすると、変化率が５％以上となる成分、すなわち第
３成分までが境界解析に用いられる。The filter unit 52 may determine a component to be used for the boundary analysis among the combined weight vectors by the following method. From the composite weight vector component having the maximum contribution rate and the maximum contribution rate, the change rate of the contribution rate is determined in order, and the composite weight vector up to the component having a predetermined rate (for example, the change rate is 5% or more) is calculated. Output. This method is advantageous when the contribution or eigenvalue does not decrease monotonically. Specifically, for the magnitude of the contribution rate of the first component,
Assuming that the second component has a size of 30%, the third component has a size of 25% with respect to the first component, and the fourth component has a content of 10%, a component having a change rate of 5% or more; That is, up to the third component is used for the boundary analysis.

【００６１】更に、フィルタ部５２はまた、次の方法に
よって合成重みベクトルのうち、境界解析に用いる成分
を決定してもよい。すなわち、当初ｌを「１」とし、元
の重みベクトルのノルム（所定測度での大きさ）Further, the filter unit 52 may determine a component to be used for the boundary analysis in the composite weight vector by the following method. That is, initially, l is set to “1”, and the norm of the original weight vector (the magnitude at a predetermined measure)

【数９】と、合成量のうち寄与率の大きいものから候補としてｌ
個を選択し、このｌ個からなる合成量のベクトルを逆変
換したベクトルのノルム(Equation 9) And l as a candidate from those having a large contribution rate
Is selected, and the norm of the vector obtained by inverting the vector of the synthesis amount consisting of l

【数１０】とを演算し、これらに基づいて、元の重みベクトルがど
れだけ再現されるか（元の重みベクトルの再現率）を次
の（８）式を利用して演算する。(Equation 10) Is calculated, and based on these, how much of the original weight vector is reproduced (recall of the original weight vector) is calculated using the following equation (8).

【００６２】[0062]

【数１１】 [Equation 11]

【００６３】そして、元の重みベクトルの再現率が事前
に定められた割合よりも大きくなるまでｌをインクリメ
ントする。この方法によれば、再現率が高くなるので、
分類は容易になる。Then, 1 is incremented until the recall of the original weight vector becomes larger than a predetermined ratio. According to this method, the recall is high,
Classification becomes easier.

【００６４】さらに、寄与率に応じて選択すべき合成量
の数をｍ１とし、寄与率の変化に応じて選択すべき合成
量の数をｍ２とし、重みベクトルの再現率に応じて決定
される、選択すべき合成量の数をｍ３として、いずれか
少ない数を選択すべき数ｍとすることも好ましい。この
ようにして決められたｍ個を寄与率の大きい順に選択す
るのである。Further, the number of synthesis amounts to be selected according to the contribution ratio is m1, the number of synthesis amounts to be selected according to the change in the contribution ratio is m2, and is determined according to the recall of the weight vector. It is also preferable that the number of synthesis amounts to be selected be m3 and the smaller number be the number m to be selected. The m pieces determined in this way are selected in descending order of the contribution rate.

【００６５】［動作］本実施の形態に係るデータ分析装
置によると、ＳＯＭ学習で得られる格子空間内の各格子
に対するｎ次元の重みベクトルについて、元の重みベク
トルのノルムを所定の割合で再現するよう定義されたｍ
次元（ｍは１以上ｎ未満）の量が合成等により生成され
る。そして、この量に基づいて格子の分類を実行する。
これによってクラスタ数が事前にわかっていなかった
り、分類対象のデータについて事前の知識がない場合に
も、人為的操作を介することなく、実用的な分類を達成
できるようクラスタを形成でき、分類処理に資すること
ができる。[Operation] According to the data analyzer according to the present embodiment, the norm of the original weight vector is reproduced at a predetermined ratio for the n-dimensional weight vector for each lattice in the lattice space obtained by the SOM learning. M defined as
An amount of dimension (m is 1 or more and less than n) is generated by synthesis or the like. Then, grid classification is performed based on this amount.
As a result, even if the number of clusters is not known in advance, or if there is no prior knowledge about the data to be classified, clusters can be formed so that practical classification can be achieved without any manual operation. Can contribute.

【００６６】［第２実施形態］尚、ここまでの説明で
は、既に学習によって形成された格子空間マップについ
ての主成分分析によって境界設定に利用する合成量を選
択していたが、学習用データに基づいて生成される特徴
量ベクトルに対し、主成分分析を行って、同様に特徴量
ベクトルの次元ｎよりも少ないｍ個の合成特徴量を生成
して選択し、これによって境界形成用の格子空間マップ
（境界用マップ）を形成することも好ましい。[Second Embodiment] In the description so far, the synthesis amount used for the boundary setting has been selected by principal component analysis on the grid space map already formed by learning. Principal component analysis is performed on the feature amount vector generated based on this, and similarly, m synthesized feature amounts smaller than the dimension n of the feature amount vector are generated and selected. It is also preferable to form a map (boundary map).

【００６７】さらに、この境界用マップを別途生成せ
ず、当該境界用マップを格子空間マップとして用いても
よい。この場合には、分類モードにおいて入力される分
類対象のデータから得られる特徴量ベクトルに対し、学
習時に定義された合成特徴量を生成して、この合成特徴
量と、境界用マップを兼ねた格子空間マップとを参照し
て分類を実行する。Further, the boundary map may be used as a grid space map without separately generating the boundary map. In this case, for a feature amount vector obtained from data to be classified input in the classification mode, a combined feature amount defined at the time of learning is generated, and the combined feature amount and a grid serving as a boundary map are generated. Classification is performed with reference to the spatial map.

【００６８】[0068]

【実施例】図５は、本発明の第１の実施の形態に係るデ
ータ分析装置１によって、３つのワイナリーＡ〜Ｃで製
造されたワインを複数の味覚特徴量のセット（特徴量ベ
クトル）として分類した場合の例（ａ）と、ＭＤＳ法を
用い、手作業で分類した場合の例（ｂ）とを示す。FIG. 5 shows a data analyzer 1 according to the first embodiment of the present invention, in which wines manufactured in three wineries A to C are set as a plurality of taste feature amounts (feature amount vectors). An example (a) in the case of classification and an example (b) in the case of manual classification using the MDS method are shown.

【００６９】尚、図５（ａ）及び（ｂ）は、縦横の座標
の意味が異なっているので、対応する位置には必ずしも
相関はない。図５（ａ）に示すように、データ分析装置
１による分類の結果では、ワインは、その味覚特徴量に
基づいて３つのワイナリーに分類されたのに対し、図５
（ｂ）に示すように、ＭＤＳ法によっては、特徴量ベク
トル群は、４つの領域に分けられた。詳しく見ると、ワ
イナリーＣについては、共通部分を有しながら、異なる
特性があるものとして分類されたのである。Since the meanings of the vertical and horizontal coordinates are different in FIGS. 5A and 5B, there is not necessarily a correlation between the corresponding positions. As shown in FIG. 5A, in the result of the classification by the data analysis device 1, the wine is classified into three wineries based on the taste feature amounts, whereas FIG.
As shown in (b), according to the MDS method, the feature amount vector group was divided into four regions. In detail, winery C was classified as having different characteristics while having common parts.

【００７０】これに対し、データ分析装置１でもＭＤＳ
を用いた手作業と同様の４つのクラスタ分解結果が得ら
れ、ワイナリーＣは２つに分けられている。尚、図５
（ａ）において、格子は、上下の辺が連続し、左右の辺
が連続するとして定義される（具体的に３７×３７の格
子空間において、１行１列の格子は、３７行１列の格子
と、１行３７列の格子に隣接する）ことに注意された
い。On the other hand, the data analyzer 1
Are obtained, and the winery C is divided into two. FIG.
In (a), the grid is defined as a continuous upper and lower side and a continuous left and right side. (Specifically, in a 37 × 37 grid space, a 1 × 1 grid is a 37 × 1 grid. Note that the grid is adjacent to the grid of 1 row and 37 columns.

【００７１】[0071]

【発明の効果】本発明によれば、複数の入力データにつ
いて、それぞれの入力データに対応するｎ個（ｎは２以
上）の特徴量データからなる特徴量セットを演算し、特
徴量セットの各成分から格子空間をなす各格子へのマッ
ピング係数を定義する格子空間マップを生成して、事前
に複数の学習用入力データを利用して学習を行い、その
結果として、互いに類似する特徴量セットが格子空間内
の所定領域に集中してマッピングされるように定義され
た格子空間マップを保持し、当該格子空間マップを用い
てデータの分類を行うデータ分析装置であって、特徴量
セットの各成分に対応する格子ごとのマッピング係数を
所定の合成条件で合成して得られるｍ個（ｍは１以上ｎ
未満）の合成量から、格子空間上の各格子への縮退マッ
ピングを生成し、この縮退マッピングに基づいて格子空
間マップ上にデータの分類境界を設定するデータ分析装
置としているので、クラスタ数が事前にわかっていなか
ったり、分類対象のデータについて事前の知識がない場
合にも、人為的操作を介することなく、実用的な分類を
達成できるようクラスタを形成でき、分類処理に資する
ことができる。According to the present invention, for a plurality of input data, a feature amount set consisting of n (n is 2 or more) feature amount data corresponding to each input data is calculated, and each of the feature amount sets is calculated. A grid space map that defines mapping coefficients for each grid in the grid space from the components is generated, and learning is performed in advance by using a plurality of learning input data. As a result, feature sets similar to each other are obtained. A data analyzer that holds a grid space map defined to be intensively mapped to a predetermined region in a grid space and classifies data using the grid space map, wherein each component of a feature amount set is M (m is 1 or more and n is obtained by combining mapping coefficients for each grid corresponding to
), A degenerate mapping to each grid in the grid space is generated from the combined amount, and the data analysis device sets a data classification boundary on the grid space map based on this degenerate mapping. Even if the user does not know the data to be classified or does not have prior knowledge about the data to be classified, a cluster can be formed so as to achieve a practical classification without any manual operation, which can contribute to the classification process.

[Brief description of the drawings]

【図１】本発明の実施の形態に係るデータ分析装置の
構成ブロック図である。FIG. 1 is a configuration block diagram of a data analysis device according to an embodiment of the present invention.

【図２】ＣＰＵ１１におけるデータ分析処理プログラ
ムの機能ブロック図である。FIG. 2 is a functional block diagram of a data analysis processing program in a CPU 11.

【図３】境界決定処理部４３の機能ブロック図であ
る。FIG. 3 is a functional block diagram of a boundary determination processing unit 43.

【図４】データ分析装置における境界決定処理を表す
説明図である。FIG. 4 is an explanatory diagram illustrating a boundary determination process in the data analyzer.

【図５】本発明のデータ分析装置による分析例を表す
説明図である。FIG. 5 is an explanatory diagram showing an analysis example by the data analysis device of the present invention.

[Explanation of symbols]

１データ分析装置、１１ＣＰＵ、１２ＲＡＭ、１
３ＲＯＭ、１４ハードディスク、１５画像入力用
インタフェース、１６ディスプレイ、１７外部記憶
部、２１画像処理部、２２選択処理部、２３分類
処理部、３１ホワイトニング処理部、３２ガボールフ
ィルタ処理部、３５画像抽出処理部、３６選択信号
生成処理部、４１特徴量生成部、４２学習処理部、
４３境界決定処理部、４４分類実行部、５１主成分
分析部、５２フィルタ部、５３境界解析部。1 data analyzer, 11 CPU, 12 RAM, 1
3 ROM, 14 hard disk, 15 image input interface, 16 display, 17 external storage unit, 21 image processing unit, 22 selection processing unit, 23 classification processing unit, 31 whitening processing unit, 32 Gabor filter processing unit, 35 image extraction processing Unit, 36 selection signal generation processing unit, 41 feature amount generation unit, 42 learning processing unit,
43 boundary determination processing unit, 44 classification execution unit, 51 principal component analysis unit, 52 filter unit, 53 boundary analysis unit.

フロントページの続き (72)発明者加藤典司神奈川県足柄上郡中井町境430 グリーンテクなかい富士ゼロックス株式会社内Ｆターム(参考） 5L096 FA19 GA55 GA59 Continued on the front page (72) Inventor Noriji Kato 430 Nakai-cho Sakaigami-gun, Kanagawa Green Tech Naka Fuji Xerox Co., Ltd. F-term (reference) 5L096 FA19 GA55 GA59

Claims

[Claims]

1. A means for calculating a feature amount set consisting of n (n is 2 or more) feature amount data corresponding to each input data for a plurality of input data, and a grid from each component of the feature amount set Means for generating a grid space map that defines a mapping coefficient for each grid that forms a space, comprising: performing learning using a plurality of learning input data in advance;
As a result, the grid space map defined so that feature sets similar to each other are mapped intensively in a predetermined area in the grid space is held, and data is classified using the grid space map. In the data analysis device, a degenerate mapping to each grid in the grid space is performed from m synthesis quantities obtained by synthesizing mapping coefficients for each grid corresponding to each component of the feature quantity set under predetermined synthesis conditions. A data analysis apparatus, comprising: map reduction means for generating; and boundary setting means for setting a classification boundary of data on the grid space map based on the reduction mapping.

2. The data analysis device according to claim 1, wherein the map degeneration means determines a synthesis condition of a synthesis amount based on principal component analysis of a mapping coefficient.

3. The data analyzer according to claim 2, wherein the number m of the combined amounts is determined by comparing the contribution ratio of each component or a change in the contribution ratio during principal component analysis. Data analyzer.

4. The data analysis device according to claim 2, wherein the number m of the combined amounts is determined according to the reproducibility of the original grid space map based on the combined amount. .

5. The data analysis apparatus according to claim 2, wherein the number m of the combined amounts is a number m1 determined by comparing the contribution ratio of each component or a change in the contribution ratio during principal component analysis, and the combined amount. A data analysis apparatus characterized in that the smaller one of a number m2 determined according to the reproducibility of the original grid space map based on the above is used.

6. A means for calculating a feature quantity set consisting of n (n is 2 or more) feature quantity data based on a plurality of input data, and mapping the feature quantity set to each grid forming a grid space. Means for generating a grid space map to be defined, and performing learning using a plurality of learning input data in advance,
As a result, the grid space map defined so that feature sets similar to each other are mapped intensively in a predetermined area in the grid space is held, and data is classified using the grid space map. In the data analysis device, m components obtained by combining components of a feature amount set based on the learning input data under predetermined combining conditions (m is 1 or more and n
And a means for learning and forming a reduced grid space map by using the set of learning combined feature quantities, wherein the reduced means is held as a result of the learning formation. A boundary setting means for setting a classification boundary on a grid space map.

7. A step of calculating a feature amount set including n (n is 2 or more) feature amount data corresponding to each input data for a plurality of input data; and a grid from each component of the feature amount set. Generating a grid space map that defines mapping coefficients for each of the grids forming the space, and performing learning using a plurality of learning input data in advance,
As a result, in a data analysis method for classifying data using the grid space map defined such that feature sets similar to each other are intensively mapped to a predetermined area in the grid space, M mapping coefficients obtained by synthesizing mapping coefficients for each grid corresponding to each component of the quantity set under predetermined synthesis conditions (m
A map reduction step of generating a degenerate mapping to each grid in the grid space from a synthesis amount of 1 or more and less than n); and a boundary for setting a data classification boundary on the grid space map based on the degenerate mapping. A data analysis method, comprising: a setting step.

8. The data analysis method according to claim 7, wherein in synthesizing the synthesis amount, a synthesis condition is determined based on principal component analysis of the mapping coefficient, and the number m of the synthesis amount is determined by calculating A data analysis method characterized in that the data analysis method is determined by comparing the contribution ratio of the synthesis amount or the change in the contribution ratio during the analysis.

9. The data analysis method according to claim 7, wherein in synthesizing the synthesis amount, a synthesis condition is determined based on principal component analysis of the mapping coefficient, and the number m of the synthesis amount is determined by the synthesis amount. A data analysis method characterized in that the data analysis method is determined according to the reproducibility of the original grid space map based on the data.

10. The data analysis method according to claim 7, wherein, when synthesizing the synthesis amount, a synthesis condition is determined based on principal component analysis of the mapping coefficient, and the number m of the synthesis amount is determined by calculating Either the number m1 determined by comparing the contribution ratio of each synthesis amount or the change in the contribution ratio at the time of analysis or the number m2 determined according to the reproducibility of the original lattice space map based on the synthesis amount is smaller. A data analysis method, characterized in that the value is set to one of the values.

11. A computer which calculates a feature amount set including n (n is 2 or more) feature amount data based on a plurality of input data, and defines a mapping from the feature amount set to a lattice space. And generating a grid space map to be executed.As a learning result performed in advance using a plurality of learning input data, feature sets similar to each other are concentrated in a predetermined area in the grid space. A data analysis program for classifying data by using the grid space map defined to be mapped by using the map space reduction method for generating a reduced map obtained by reducing the grid space map under predetermined conditions. And setting a classification boundary of the data on the grid space map based on the degenerate mapping. Analysis program.

12. A computer-readable recording medium storing the data analysis program according to claim 11.