JP2004062482A

JP2004062482A - Data classifier

Info

Publication number: JP2004062482A
Application number: JP2002219128A
Authority: JP
Inventors: Hitoshi Ikeda; 池田　仁; Hirotsugu Kashimura; 鹿志村　洋次; Sukeji Kato; 加藤　典司
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-07-29
Filing date: 2002-07-29
Publication date: 2004-02-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data classifier capable of classifying face image data by photographing conditions or the property of an object. <P>SOLUTION: A map generation part 21 temporarily decides clusters to which the respective prototypes of a map belong, successively selects the respective prototypes as the prototype under consideration to be an object to be classified into the cluster, computes a measure between at least one prototype belonging to the respective clusters and the prototype under consideration, and changes the cluster to which the prototype under consideration should belong on the basis of the computed measure as needed. By repeating the processings, for the face image data for which a face of a person is photographed, a self-organized map divided in the clusters defined by the photographing state and the property of the object is obtained. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、自己組織化マップを用いて、データの分類を行うデータ分類装置、データ分類方法、及びそのプログラムに関する。
【０００２】
【従来の技術】
近年、情報機器の普及・発展に伴い、人々が受取る情報量はますます増加している。このような背景の下では、必要な情報を選びやすくするため、人間の介入なしに情報を認識し分類する技術の開発が要望されている。
【０００３】
こうした要望に対し、分類対象のデータについて比較して類似性のあるもの同士を一群のデータとして分類する、クラスタリング方法が考えられている。ここで類似性の判断に際しては、最尤度推定法、Ｋ−ｍｅａｎｓ法、併合法、ＭＤＳ（Ｍｕｌｔｉ−Ｄｉｍｅｎｓｉｏｎａｌ　Ｓｃａｌｉｎｇ）法などが知られている。
【０００４】
【発明が解決しようとする課題】
しかしながら、上記従来のクラスタリング方法は、パラメータの設定等、人間の介入が不可欠で、自律的にクラスタリング処理を実行できないという問題点があった。
【０００５】
また、従来のクラスタリング方法では、人物の顔が撮影された顔画像データを分類する場合、撮影時のフラッシュの有無や日射の方向などの撮影条件、人物の表情などの被写体の性状を細かく分類可能なマップを得ることができなかった。
【０００６】
［関連技術］
一方、比較的自律的にクラスタリング処理を行うための方法として、パターンデータの１つとしての入力画像データを格子空間マップ上で分類整理するものがある。この分類整理のためには、例えば自己組織化特徴マッピング（以下、ＳＯＭと略す）（Ｔ．　コホーネン　Ｓｅｌｆ−ｏｒｇａｎｉｚｉｎｇ　ｆｏｒｍａｔｉｏｎ　ｏｆ　ｔｏｐｏｌｏｇｉｃａｌｌｙ　ｃｏｒｒｅｃｔ　ｆｅａｔｕｒｅ　ｍａｐｓ．　Ｂｉｏｌｏｇｉｃａｌ　Ｃｙｂｅｒｎｅｔｉｃｓ，　１９８２）を利用している。このＳＯＭは、データが入力される入力層と、格子空間マップの競合層とからなる２階層のネットワークで、入力はある重みづけがされ、各格子に入力される。入力各成分についての重みづけをまとめたものを重みベクトルと称する。
【０００７】
当初、重みベクトルは次の方法で初期化される。すなわち、上記コホーネンの文献に示されるように、学習の対象となる複数の入力ベクトル（ここでの特徴量セットに相当する）の中からプロトタイプ数と同数の入力ベクトルＩをランダムに選び出し、各格子の重みベクトルの初期化を行う。また、同じく、コホーネンによれば、各重みベクトルに乱数で初期値を設定することとしても構わない。
【０００８】
次に、重みベクトルの学習を行う。この学習課程においては、学習用の特徴量セットが生成され、当該学習用特徴量セットと格子空間上の各格子の重みベクトルとの所定測度（例えばユークリッド距離）が演算される。そして各格子のうち、関係が最大（測度が最小）となる格子（勝利ノード）を見いだす。そして格子空間上、その格子（勝利ノード）に対し近傍に存在する各格子について、学習用特徴量セットとの測度が小さくなるように、その重みベクトルを補正する。このような重みベクトルの補正を行いつつ学習を繰り返すことで、互いに類似する特徴量からなる特徴量セットに対し、最小測度を有する格子が特定の領域に集中するようになり、データの分類に適用可能な状態となる。ここで、重みベクトルを補正する対象となる格子を選ぶにあたり、勝利ノードからのマップ上の距離を用いる。また、補正量は、勝利ノードｃからの距離に応じて変化させることが好ましく、補正量の大きさも変更可能としておくことが好ましい。一般的には、次の（１）式のように近隣のノードの重みベクトルＩに近づくよう重みベクトルｗを補正する。
【０００９】
【数１】

なお、
【数２】

ここで、α（ｔ）は、補正量の大きさを支配する量で、学習係数と呼ばれる。また、σ（ｔ）は、重みベクトルを修正する範囲の変化を決定する関数であり、近傍関数と呼ばれる。これらは時間ｔに対し、ともに単調減少する関数である。勝者ノードを中心にマップ上のノード間距離Ｒｍａｘが、
【数３】

の範囲に属する格子について（１）式による補正が行われるが、学習の繰り返しの中で、Ｒｍａｘはσ（ｔ）によって減少する。近傍関数σ（ｔ）としては、トライアングル型、レクトアングル型（四角）、メキシカンハット型等の関数を用いる。この近傍関数σ（ｔ）の選択によっても、学習結果に影響があることが知られている。なお、ｔは、「時刻」であり、特徴量セットが入力されるごとにインクリメントされる。また、｜｜ｒｃ−ｒｊ｜｜は、勝利ノードと、重みベクトルの補正対象ノードの間のノルム（距離）である。
【００１０】
しかし、上記技術をそのまま適用したのでは、直ちに自律的なデータ分類を行うことはできない。自律的なデータ分類を実現するには、まず、学習後の格子空間マップが適切なものであるかの判断が必要である。すなわち、（１）最適な格子空間マップを獲得する方法が必要である。また、当該学習後の格子空間マップを利用してデータ分類を行うときには、分類の基準となる境界線を上記格子空間上に形成し、分類対象として与えられたデータについての特徴量セットに対して最小測度を有する格子が、どの境界線内に属するか（この境界線で区切られた格子空間上の領域を以下、クラスタと呼ぶ）に基づき、当該データを分類することが適切である。すなわち、（２）クラスタの境界を決定する方法も求められる。
【００１１】
このうち、（１）最適な格子空間マップを獲得する方法として、コホーネンは、平均量子化誤差が最小となるマップを選択するという方法を提案している。つまり、学習条件を互いに異ならせて形成した複数の格子空間マップのうち、平均量子化誤差が最小のものを選択し、これを近似的に最適な格子空間マップとするのである。この方法によると、入力される特徴量セットの空間のトポロジーがマップのトポロジーに反映されない。いわば、トポロジーの保存度が低い。これは、クラスタリングの方法によっては誤分類に結びつくこともある。
【００１２】
トポロジーの保存に配慮したものとして、トポロジー関数（ｔｏｐｏｇｒａｐｈｉｃ　ｆｕｎｃｔｉｏｎ）と呼ばれる所定の指標を学習中にモニタし、これにより学習条件を制御して適切なマップを形成する技術（Ａｕｔｏ−ＳＯＭ法）も開発されている。しかし、トポロジー関数の演算自体が負荷の高い処理であるため、学習時間が長くなる問題点がある。
【００１３】
次に（２）クラスタの境界を自律的に決定する方法としては、Ｕ−Ｍａｔｒｉｘ（Ｕｎｉｆｉｅｄ　Ｄｉｓｔａｎｃｅ　Ｍａｔｒｉｘ　Ｍｅｔｈｏｄｓ）法と呼ばれる方法や、ポテンシャル法と呼ばれる方法が研究されている。ここで、Ｕ−Ｍａｔｒｉｘ法については、Ａ．Ｕｌｔｓｃｈ　ｅｔ．　ａｌ．，　”Ｋｎｏｗｌｅｄｇｅ　Ｅｘｔｒａｃｔｉｏｎ　ｆｒｏｍ　Ａｒｔｉｆｉｃｉａｌ　Ｎｅｕｒａｌ　Ｎｅｔｗｏｒｋｓ　ａｎｄ　Ａｐｐｌｉｃａｔｉｏｎｓ”，Ｐｒｏｃ．Ｔｒａｎｓｐｕｔｅｒ　Ａｎｗｅｎｄｅｒ　Ｔｒｅｆｆｅｎ／　Ｗｏｒｌｄ　Ｔｒａｎｓｐｕｔｅｒ　Ｃｏｎｇｒｅｓｓ　ＴＡＴ／ＷＴＣ　９３　Ａａｃｈｅｎ，　Ｓｐｒｉｎｇｅｒ　１９９３に詳しく開示されている。Ｕ−Ｍａｔｒｉｘでは、マップ上で隣接する２つの格子間の距離を次のように定義する。すなわち、当該２つの格子の各重みベクトルの成分毎の差について、その絶対値を総和したものや、当該差の二乗平均などを距離として定義するのである。すると、類似性の高い特徴量セットにそれぞれ強く結合（重みベクトルが特徴量セットに近い値を持つもの、このようなものを以下、「特徴量セットにプロトタイピングされている」と表現する）している隣接格子間、つまり、類似性の高い２つの特徴量セットのそれぞれにプロトタイピングされている隣接格子間の上記距離は小さくなり、類似性の低い２つの特徴量セットのそれぞれにプロトタイピングされている隣接格子間の距離は大きくなる。そこでこの距離の大きさを高さとした３次元的な面を考えると、互いに類似する特徴量セットにプロトタイピングされた格子間に対応する面の高さは低くなり「谷」を形成するのに対し、互いに異なる特徴量セットにプロトタイピングされた格子間に対応する面の高さは高くなり「山」を形成する。従ってこの「山」に沿って境界線を形成すれば、類似性の高い特徴量セットにプロトタイピングされている格子の集合（クラスタ）を規定できる。Ｕ−Ｍａｔｒｉｘは、いわば、自己組織化マップでは入力空間での距離が保存されない点を補強したものであるということができる。
【００１４】
しかしＵ−Ｍａｔｒｉｘは、「山」と「谷」との高低差が明瞭であれば境界を規定できるものの、現実の情報処理では「山」と「谷」との高低差は期待されるほど明瞭にならず、３次元面の高さはゆるやかに変化することも多い。この場合には、人為的に境界線を設定する必要があって、必ずしも自律的に境界が決定できるわけではない。
【００１５】
一方のポテンシャル法は、Ｄ．Ｃｏｏｍａｎｓ，　Ｄ．Ｌ．Ｍａｓｓａｒｔ，　Ａｎａｌ．Ｃｈｅｍ．Ａｃｔａ．，　５−３，　２２５−２３９（１９８１）に開示されているもので、事前に定めたポテンシャル関数を用いて、入力データに対する関数の値を重ね合わせて入力データを近似的に表現する母集団の確率密度関数を推定し、重なりあいの少ない部分を境界として決定するというものである。ポテンシャル関数としてはガウシアン型の関数とすることが多い。具体的には、Ｎ個の入力ベクトルからなる入力データ群があるとき、それぞれＫ次元の大きさを持つとするとｌ番目の入力データが他の入力データから受ける平均的なポテンシャル（ｌ番目入力が全体の入力集合に対する寄与率）Ψｌを次の（２），（３）式によって定義する。
【００１６】
【数４】

【００１７】
尚、ｘｋｌはｌ番目入力のｋ番目の成分を意味する。また、αはスムージングパラメータで分類されるクラスタの数に影響を与える。従って、ポテンシャル法では、その分布形状を仮定する分布関数の最適化や、各種パラメータの最適化が入力ベクトル集合ごとに求められ、要するに分類対象となるデータの特性について事前に知識が必要であるうえ、人為的調整が不可欠となる。また、このポテンシャル法では、入力データから得られる特徴量セットが高次元になると、それについて適切な確率密度分布を求めるにはサンプルが多数なければならず、少数の格子からなるマップに対しての適用が困難であるという問題点がある。つまり、ポテンシャル法についても、必ずしも自律的に境界が決定できるわけではない。
【００１８】
これらの問題点を解決するため、例えば特開平７−２３４８５４号公報、特開平８−３６５５７号公報、「自己組織化特徴マップ上のデータ密度ヒストグラムを用いた教師無しクラスタ分類法」，電子情報通信学会論文誌Ｄ−ＩＩ　Ｖｏｌ．Ｊ７９−ＤＩＩＮｏ．７　ｐｐ．１２８０−１２９０，　１９９６年７月などに開示された技術が研究されている。しかしながら、どの技術においても、入力されるデータの構成自体や、マッピングの結果において、分類に使いたい特徴が十分な距離をあけて各格子にプロトタイピングされることを前提としており、画像データの分類において例えば見られるような、分類してほしい特徴毎の分布形状の差異や重なり、その特徴にプロトタイピングされている格子のマップ上の位置の重心間の距離にばらつきがある場合などでは、マップ上でクラスタの境界が複雑に入り組むため、適切なクラスタリングができなくなる。
【００１９】
さらに、関連技術においては、マップ上の格子の数については研究の過程で経験的に決定するだけで、実際の用途に適合した適切な格子の数を決定するといったことは配慮されていなかった。しかしながら、適切な数よりも格子の数が少ない場合、クラスタ境界部の格子と、別のクラスタに属するべき特徴量セットが強く結合されてしまう場合があり、この場合は分類誤りが多くなる。この点について、格子の数を追加／削減して平均量子化誤差が所定量を下回るようにするという技術が、Ｊａｍｅｓ　Ｓ．　Ｋｉｒｋ　ｅｔ．　ａｌ．　”Ａ　Ｓｅｌｆ−Ｏｒｇａｎｉｚｅｄ　Ｍａｐ　ｗｉｔｈ　Ｄｙｎａｍｉｃ　Ａｒｃｈｉｔｅｃｔｕｒｅ　ｆｏｒ　Ｅｆｆｉｃｉｅｎｔ　Ｃｏｌｏｒ　Ｑｕａｎｔｉｚａｔｉｏｎ”，　ＩＪＣＮＮ’０１，　２１２８−２１３２に開示されている。尤も、この技術では、入力データに対応する特徴量セットの空間でのデータ分布を写像した格子が追加等されるだけなので、データ分類において重要となる、クラスタ境界付近の格子の数を増大させるというようなことには配慮されていない。そこで例えば当初から格子の数を多くしておくこととしてもよいが、この場合、計算時間が長くなって実用的でない。
【００２０】
また、プロトタイプを用いることなく、入力されるデータ（パターンデータ）を直接、クラスタに分類する場合も同様に、パターンデータの集合の確率的性質に基づき、パターンデータの集合をクラスタに分類する方法がある。なお、ここで確率的性質は、例えばベイズ学習により確率分布パラメータを逐次的に推定する方法や、ポテンシャル関数を用いる方法が知られている。しかし、この場合に確率的性質を推定するにあたっては、入力されるパターンデータについて、予めクラスタリングのヒントとなる情報（例えばラベル）が付与されている必要がある。これは、当該ヒントとなる情報ごとに、パターンデータを仮に分類した上で、各分類での確率分布を推定する演算を要するからである。
【００２１】
そこで、個々のパターンデータ間の類似度を所定の関数により演算し、パターンデータ空間の構造を分析し、その分析結果としての構造に従ってクラスタリングを行うこともできる。この種の方法には、Ｋ−ｍｅａｎｓ法や分割併合法（いわゆるＩＳＯＤＡＴＡ法）として知られている方法があるが、いずれも人為的にパラメータを設定する必要がある。すなわち、Ｋ−ｍｅａｎｓ法では、パターンデータ群をいくつのクラスタに分割したいかを表す最終クラスタ数を設定しておかなければならない。また、クラスタ中心値と呼ばれるパラメータの設定に、クラスタリングの結果が敏感に依存してしまい、設定する値に応じてクラスタリング結果の良否が決ってしまう問題点もある。
【００２２】
一方、分割併合法においても、クラスタ除去閾値、クラスタ分割閾値、クラスタ併合閾値といった数多くのパラメータ設定が必要であり、クラスタリング結果は、これらの値の設定に大きく依存している。
【００２３】
本発明は、上記実情に鑑みて為されたもので、自律的にクラスタリングを行うことのできるデータ分類装置を提供することを目的とする。
【００２４】
また、顔画像データを、撮影条件または被写体の性状により分類することが可能なデータ分類装置を提供することを目的とする。
【００２５】
【課題を解決するための手段】
上記従来例の問題点を解決するための本発明に係る顔画像データ分類装置は、複数の顔画像データを用いて、任意の顔画像データと、撮影条件と被写体の性状の少なくとも一つで定義される複数のクラスタとの対応関係を学習形成する学習形成手段と、学習形成された前記対応関係を用いて、顔画像データが属するクラスタを決定するクラスタ決定手段と、を有することを特徴とする。
【００２６】
また、本発明に係る顔画像データ分類方法は、複数の顔画像データを用いて、任意の顔画像データと、撮影条件と被写体の性状の少なくとも一つで定義される複数のクラスタとの対応関係を学習形成する学習形成ステップと、学習形成された前記対応関係を用いて、顔画像データが属するクラスタを決定するクラスタ決定ステップと、を含む方法である。
【００２７】
また、本発明に係る顔画像データ分類プログラムは、複数の顔画像データを用いて、任意の顔画像データと、撮影条件と被写体の性状の少なくとも一つで定義される複数のクラスタとの対応関係を学習形成する学習形成工程と、学習形成された前記対応関係を用いて、顔画像データが属するクラスタを決定するクラスタ決定工程と、を含むプログラムである。
【００２８】
上記顔画像データ分類装置、方法又はプログラムにおいて、各クラスタに属する顔画像データから、共通の撮影条件で撮影された顔画像データ、または共通の被写体の性状をもつ顔画像データを抽出することが好ましい。
【００２９】
顔画像データとクラスタとの対応関係の学習形成は、（ａ）各プロトタイプの属するクラスタを仮に決定し、（ｂ）各プロトタイプを順次、クラスタに分類する対象となるべき注目プロトタイプとして選択し、（ｃ）各クラスタごとに、各クラスタに属している少なくとも１つのプロトタイプと、前記注目プロトタイプとの間の測度を演算し、（ｄ）前記演算された測度に基づき、前記注目プロトタイプの属するべきクラスタを必要に応じて変更し、前記（ｂ）、（ｃ）、（ｄ）の処理を、各プロトタイプの属するべきクラスタの変更がなくなるまで繰返して行って、各プロトタイプをクラスタに分類することが好ましい。
【００３０】
または、顔画像データとクラスタとの対応関係の学習形成は、（ａ）前記各顔画像データについて、その属するクラスタを決定し、（ｂ）各顔画像データを順次、クラスタに分類する対象となるべき注目顔画像データとして選択し、（ｃ）各クラスタごとに、当該クラスタに属している少なくとも１つの顔画像データと、クラスタに分類する対象となった注目顔画像データとの間で所定の相関値を演算し、（ｄ）前記相関値に基づき、前記注目顔画像データの属するべきクラスタを決定し、前記（ｂ）、（ｃ）、（ｄ）の処理を、各顔画像データの属するべきクラスタの変更がなくなるまで繰返して行って、各顔画像データをクラスタに分類することが好ましい。
【００３１】
【発明の実施の形態】
本発明の実施の形態について図面を参照しながら説明する。以下に説明するデータ分類装置は、人物の顔が撮影された顔画像データを、その撮影条件と被写体の少なくとも一つで定義されるクラスタに分類するデータ分類装置である。
【００３２】
本実施の形態に係るデータ分類装置１は、図１に示すように、ＣＰＵ１１と、ＲＡＭ１２と、ＲＯＭ１３と、ハードディスク１４と、画像入力用インタフェース１５と、ディスプレイ１６と、外部記憶部１７とから基本的に構成され、これら各部はバス接続されている。すなわち、本実施の形態のデータ分類装置１は、一般的なパーソナルコンピュータによってソフトウエア的に実現される。このソフトウエアは、一般的にはＣＤＲＯＭやＤＶＤＲＯＭなどの記録媒体に格納された状態で頒布され、またはネットワークを介してダウンロードされる（ネットワークに対する接続インタフェースは図示を省略した）。そして、当該記録媒体によって頒布される場合には外部記憶部１７にて読み出されて、所定のインストール処理により、ハードディスク１４に格納される。また、ネットワークを介してダウンロードされた場合も同様に、ハードディスク１４にインストールされる。
【００３３】
ＣＰＵ１１は、このハードディスク１４に格納されているプログラムに従って動作し、基本的にはＷｉｎｄｏｗｓ（登録商標）等のオペレーティングシステムの管理下で本実施の形態のデータ分類装置１を具現化するデータ分類プログラム等を実行する。
【００３４】
ＲＡＭ１２は、ＣＰＵ１１のワークメモリとして利用されるもので、ＣＰＵ１１の処理中に各種パラメータやデータを記憶するために用いられる。ＲＯＭ１３は、主としてオペレーティングシステムの読み込みの処理など、データ分類装置１が起動する際に必要となるプログラムが格納されている。この起動用プログラムの内容は広く知られているので、その説明を省略する。
【００３５】
ハードディスク１４は、オペレーティングシステムの本体や、種々のプログラムがインストールされている。また、本実施の形態においては、このハードディスク１４には、既に説明したように、データ分類プログラムがインストールされている。尚、ここではハードディスク内に格納されている場合について例示したが、例えばＳＲＡＭ（Ｓｔａｔｉｃ　Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）や、ＥＥＰＲＯＭ等の不揮発性メモリにインストールしても構わないし、図１に示したように、ＣＰＵ１１と同一筐体に含まなくても、図示しないネットワークインタフェースを介して接続される別のコンピュータ内にインストールされていてもよい。
【００３６】
画像入力用インタフェース１５には、スキャナ等の画像入力装置が接続され、当該画像入力装置から画像データの入力を受けて、ＣＰＵ１１に出力する。ディスプレイ１６は、ＣＰＵ１１からの指示に従って、画像を表示する。
【００３７】
［実施の形態１］
ここで第１の実施の形態として、具体的に入力されるパターンデータ（顔画像データ）について、各パターンデータを代表するプロトタイプを生成し、そのプロトタイプをマップ上で分類した上で、当該マップ上で分類されたプロトタイプ群を、入力される各パターンデータの分類に供するものについて、第１の実施の形態として説明する。本実施の形態のデータ分類プログラムは、マップ生成部２１と、クラスタ境界決定部２２とを含み、マップ生成部２１は、ＳＯＭ学習部３１と、マップ選択部３２と、学習条件設定部３３と、プロトタイプ追加部３４とを備える。ここでは、これらの各部が、それぞれソフトウエアモジュールとして実現されることとしているが、ハードウエア的に論理回路によって構成されても構わない。このＣＰＵ１１における処理については後に詳しく述べる。
【００３８】
［処理の詳細］
ここでＣＰＵ１１が実行するデータ分類プログラムの詳細について、図２を参照しながら説明する。マップ生成部２１は、例えば既に説明した自己組織化マッピング（ＳＯＭ）により、プロトタイプマップを形成し、形成したプロトタイプマップの情報をクラスタ境界決定部２２に出力する。クラスタ境界決定部２２は、マップ生成部２１から入力されるプロトタイプマップに対し、各プロトタイプをクラスタに分類する。以下、これらの各部についての動作を分けて詳しく述べる。
【００３９】
［マップ生成］
まず、マップ生成部２１のＳＯＭ学習部３１は、学習条件設定部３３から入力される複数（例えばＭセット（Ｍは２以上の整数））の学習条件のセットの各々に対応するＭ個のプロトタイプマップ候補を生成する。各マップ候補は、各プロトタイプを特定する情報に対して、そのプロトタイプと特徴量セットの成分の各々との関係重みの情報を関連づけたものである。本実施の形態においては、マップを構成するプロトタイプは必ずしも格子点状に配列されている必要はない（この場合、プロトタイプを特定する情報にプロトタイプのマップ上の座標情報が含まれてもよい）が、以下の説明では簡単のため、格子点状に配列されているものとして説明する。
【００４０】
マップ選択部３２は、各マップ候補を量子化誤差（ＱＥ）と、トポロジカル・プロダクト（以下、ＴＰと呼ぶ）とを演算して、これらに基づき、クラスタ決定に適したマップを一つ、チャンピオンマップとして選択する。ここで、量子化誤差は、次の（４）式で演算される。
【００４１】
【数５】

【００４２】
（４）式において、Ｐはマップ学習に用いる特徴量セットの数（つまり学習パターン数）であり、Ｅｊは、ｊ番目の特徴量セットベクトルであり、Ｗｃは、ｊ番目の特徴量セットベクトルに対しての勝利ノードの重みベクトルである。なお、この量子化誤差については、コホーネンらにより広く知られたものであるので、その詳細な説明を省略する。
【００４３】
また、ＴＰは、次の（５）式で演算される。
【００４４】
【数６】

【００４５】
このＴＰは、入力層での空間（特徴量セットの空間）と、競合層での空間（プロトタイプの空間）との相対的位置関係が一致するほど小さい値となるもので、バウアー（Ｂａｕｅｒ）らによって、Ｂａｕｅｒ，　Ｈ．Ｕ．，　ａｎｄ　Ｐａｗｅｌｚｉｋ，　Ｋ．Ｒ．，（１９９２），　”Ｑｕａｎｔｉｆｙｉｎｇ　ｔｈｅ　ｎｅｉｇｈｂｏｒｈｏｏｄ　ｐｒｅｓｅｒｖａｔｉｏｎ　ｏｆ　ｓｅｌｆ−ｏｒｇａｎｉｚｉｎｇ　ｆｅａｔｕｒｅ　ｍａｐｓ．”　ＩＥＥＥ　Ｔｒａｎｓ．　Ｎｅｕｒａｌ　Ｎｅｔｗｏｒｋｓ，　３，　５７０−５７９などの論文で提案されているものである。
【００４６】
マップ選択部３２は、これらＱＥとＴＰとの値を用いて、次の（６）式にて演算されるスコア値が小さいものをチャンピオンマップＭＡＰｃとして選択し、その選択結果を出力する。
【００４７】
【数７】

つまり、
【数８】

である。
【００４８】
また、マップ選択部３２は、当初は、このチャンピオンマップの選択結果を後段のクラスタ境界決定部２２には出力せず、少なくとも一度、学習条件設定部３３に出力する。そして事前に設定された回数だけ繰返してチャンピオンマップの選択を行った後に、その時点での選択結果をクラスタ境界決定部２２に出力する。
【００４９】
学習条件設定部３３は、学習条件として例えば学習用の入力データの数（学習の回数）Ｎと、近傍距離σ（ｔ）と、学習係数α（ｔ）とのセットをＭセット出力する。この学習条件設定部３３は、当初はこれらの値や関数（Ｎ，σ（ｔ），α（ｔ））をランダムなパラメータに基づいて決定するか、事前に定められたセット（プリセット）として決定する。また、この学習条件設定部３３は、マップ選択部３２からチャンピオンマップの選択結果の入力を受けて、当該選択結果のマップ候補に対応する学習条件のセットを取出す。そして、この取出した学習条件のセットを基準として、さらにＭ個の学習条件のセットを生成して設定し、ＳＯＭ学習部３１に出力する。
【００５０】
なお、プロトタイプ追加部３４は、クラスタ境界決定がされた後に、プロトタイプマップの所定の位置にプロトタイプを追加してさらに学習を行わせるものであるが、クラスタ境界決定部２２の動作に関係するので、後に詳しく説明する。
【００５１】
ここで、マップ生成部２１における学習の動作について説明する。当初、学習条件設定部３３がランダムな、又は事前に定められたパラメータを用いて学習条件のセットを複数（例えばＭセット）生成して出力する。ＳＯＭ学習部３１は、学習条件設定部３３が出力する各学習条件のセットに応じてＭ個のプロトタイプマップの候補（マップ候補）を生成し、マップ選択部３２に出力する。マップ選択部３２は、これらのマップ候補の中から、量子化誤差とＴＰとの双方を用いて学習状態がクラスタリングに対して好適となっているマップ（チャンピオンマップ）を選択し、その選択結果を学習条件設定部３３に出力する。すると、学習条件設定部３３が当該チャンピオンマップの生成に用いられた学習条件に基づき、新たな学習条件のセットを複数生成し、再度ＳＯＭ学習部３１に出力して複数のマップ候補を生成させる。
【００５２】
このようにして、マップ候補の生成、チャンピオンマップの選択、学習条件の再設定という動作を所定の回数だけ繰返し、その結果得られたチャンピオンマップがクラスタの境界設定対象マップとしてクラスタ境界決定部２２に出力される。
【００５３】
［クラスタ境界決定］
クラスタ境界決定部２２は、マップ生成部２１から入力される境界設定対象となったマップについて、図３に示すような処理を実行する。具体的にクラスタ境界決定部２２は、入力されたマップに含まれるプロトタイプの各々に固有の番号を割当てて、仮のクラスタリング結果を生成する（Ｓ１）。この番号は、所定の順序に「１」から順に「Ｐ」（プロトタイプの数をＰとする）まで番号を振ったものでよい。この番号が仮のクラスタ番号となる。すなわち、当初は、各プロトタイプが互いに異なるクラスタに分類される。
【００５４】
次に、クラスタ境界決定部２２は、プロトタイプペアを抽出し、抽出したプロトタイプペアに係る各プロトタイプの重みベクトル間の類似度（Ｃｄ）を計算する（Ｓ２）。この計算の結果は、類似度テーブルとしてＲＡＭ１２に格納される。ここでプロトタイプペアというのは、各プロトタイプを順次注目プロトタイプとして選択し、注目プロトタイプと、他のプロトタイプとのすべての組み合せを採ったもの、すなわち２つのプロトタイプの組み合せ（ｃｏｍｂｉｎａｔｉｏｎ）のすべてをいっている。またここで類似度とは、各重みベクトルの成分ごとの差の二乗和（距離）を用いる。
【００５５】
この類似度をクラス（所定の数値範囲ごと）に分類して、各クラスごとの出現頻度の情報を生成し（図４）、この出現頻度が最大となった距離をＣｄとし、所定の「０」に近い微小量δ（例えばδ＝０．０１）を決定しておく。あるいは、出現頻度が最大となる距離より短くかつ頻度が減少から増加に転じる最大の距離をＣｄとしてもよい。
【００５６】
次に、クラスタ番号の更新処理を開始する（Ｓ３）。このクラスタ番号更新処理は、図５に示すようなものであるが、ここでは簡単のため、クラスタの境界を決定しようとするプロトタイプマップとして３×３の格子マップがあるとして説明する。当初プロトタイプマップに対しては処理Ｓ１にて、３×３＝９個のプロトタイプにそれぞれ図６（ａ）に示すような「１」〜「９」の固有の番号を割当ててある。
【００５７】
クラスタ境界決定部２２は、各プロトタイプを順次、注目プロトタイプとして選択する（Ｓ１１）。そして、当該注目プロトタイプに隣接する（仮にクラスタリングされたプロトタイプマップ上で所定距離内の）プロトタイプの属しているクラスタを選択し（Ｓ１２）、当該選択したクラスタに属するプロトタイプを抽出する（Ｓ１３）。
【００５８】
図６の例では例えば左下側の「１」を注目プロトタイプとして、それに隣接するクラスタの番号「１」，「４」，「５」，「２」のそれぞれについて、各クラスタに属するプロトタイプを選択することになる。そして、クラスタ境界決定部２２は、処理Ｓ１２にて選択したクラスタに属する各プロトタイプと注目プロトタイプとの測度としての相関量を次の（７）式（類似度の低下に伴い、所定微分値より急速に「０」に漸近する関数）を用いて演算し（Ｓ１４）、この相関量に基づき注目プロトタイプの属するクラスタを決定する。
【００５９】
【数９】

【００６０】
ここで、ｙ＊は、注目プロトタイプの重みベクトルであり、ｙｉは、ｉ番目のプロトタイプの重みベクトルである。また、χは、プロトタイプベクトルの集合であり、χ（ｃ）は、クラスタ番号ｃのプロトタイプベクトルの集合である。また、αの決定に用いられるＣｄ，δは、処理Ｓ２にて得たものであり、Ｌｎは、自然対数を表す。すなわち、（７）式は、注目プロトタイプと、クラスタ番号ｃに属するプロトタイプとの距離の総和を全体平均で除したもので、注目プロトタイプと、クラスタｃの相関量を表し、クラスタｃの中に注目プロトタイプとの重みベクトル間の相関量の大きいプロトタイプが多いほど（７）式は、大きい値となる。
【００６１】
クラスタ境界決定部２２は、（７）式が最も大きい値となるクラスタの番号を、注目プロトタイプのクラスタ番号として仮決定し（Ｓ１５）、この仮決定の内容を記憶する（Ｓ１６）。
【００６２】
ここで図６（ａ）に示したクラスタ「１」に分類されたプロトタイプが注目プロトタイプである場合、隣接プロトタイプとして「１」に属するプロトタイプについては、当初は他のプロトタイプがないので、演算されず、「４」に属するプロトタイプ、「５」に属するプロトタイプ、「２」に属するプロトタイプとの間の相関量が演算され、例えば「４」に属するプロトタイプとの距離が最も短い場合、注目プロトタイプの属するクラスタを「１」から「４」に変更する（図６（ｂ））。ここで隣接するプロトタイプだけでなく、全プロトタイプとの演算を行ってもよい。そのようにすると、プロトタイプマップ上は距離があるが、重みベクトル間の距離は小さいというようなプロトタイプを同一クラスタにまとめることができる。しかしながら、このようにすると、計算時間が大きくなるので、ここでは事前にＴＰを用いた評価を加味してプロトタイプマップ上の距離と、重みベクトル間の距離とが大きく違わないようなマップを選択したのである。
【００６３】
そしてクラスタ境界決定部２２は、すべてのプロトタイプを注目プロトタイプとして選択したか否かを調べ（Ｓ１７）、まだ選択されていないプロトタイプがあれば（Ｎｏならば）、処理Ｓ１１に戻って処理を続ける。また、処理Ｓ１７において、すべてのプロトタイプが選択されているならば（Ｙｅｓならば）、クラスタ番号更新の処理を終了する。
【００６４】
クラスタ境界決定部２２は、図３に示す処理に戻り、このようにして仮決定した内容と、更新処理前のクラスタ番号とを比較し、クラスタ番号に変化があったか（クラスタ番号がまだ収束していないか）を調べ（Ｓ４）、変化があった場合には（Ｙｅｓならば）、当該仮決定した内容を新たに仮のクラスタリング結果とし、処理Ｓ３を繰返して実行する。また、処理Ｓ４にて変化がなければ（Ｎｏならば）、すなわち収束したならば、現在のクラスタリング結果を出力する。
【００６５】
なお、上述の処理Ｓ２でのＣｄの決定方法に代えて、各注目プロトタイプについて、他のプロトタイプとの類似度の統計量を演算し、各注目プロトタイプごとの統計量に対して、さらに所定の統計処理を行った結果を用いて、Ｃｄを決定してもよい。この場合、Ｃｄは、次の（９）式で決められる。
【００６６】
【数１０】

【００６７】
ここで、ｋは、クラスタ決定の対象となるプロトタイプが現在属するクラスタに隣接するクラスタであり、Ｃ１は、「１」より大きな正の定数である。この（９）式によりＣｄを決めることで、隣接クラスタの中で、少なくとも１つのクラスタに属するすべてのプロトタイプが、クラスタ決定の対象となるプロトタイプに影響する。また、個々のプロトタイプごとに適したＣｄを適応的に決定することが可能となる。
【００６８】
［プロトタイプの追加］
なお、本実施形態では、クラスタ境界決定部２２がクラスタリング結果を直ちに最終的な結果として出力せず、少なくとも１度、マップ生成部２１のプロトタイプ追加部３４に戻して出力する処理を行ってもいる。この処理を行うか否かは、設計者またはユーザの任意である。プロトタイプ追加部３４は、クラスタリング結果を参照して、クラスタの境界部に新規プロトタイプを生成して、再度学習を行わせるべく、ＳＯＭ学習部３１に当該新規プロトタイプ追加後のプロトタイプマップを出力する。この際の学習は、微調整を目的とするものなので、例えばクラスタリング前の学習ではα（ｔ）＝０．２、σ（ｔ）＝２．０と初期設定して、７００パターンを１００００回学習するよう学習条件パラメータが設定されていた場合でも、新規プロトタイプ追加後の学習では、α（ｔ）＝０．００２、σ（ｔ）＝１．０、パターンの繰返し入力が１００といった学習条件パラメータで構わない。
【００６９】
具体的に、図６（ａ）のように当初仮にクラスタリングされていたプロトタイプマップに対して、クラスタ境界決定部２２が、クラスタ図６（ｃ）のようなクラスタリング結果を出力したとすると、この「４」と「６」との境界に新規プロトタイプを形成する（図６（ｄ））。ここで図６（ｄ）では、理解のために先のクラスタリング結果を括弧つきで示したが、このようにプロトタイプを追加した後は、先のクラスタリング結果は無意味なものである。
【００７０】
なお、新規プロトタイプは、必ずしもクラスタ境界に沿って全体的に追加しなければならないものではなく、その少なくとも一部に追加するものであっても構わない。この場合において、追加する部分としては、学習入力ベクトル（パターン）に対して最も距離の短い、最近接プロトタイプとなった回数（パターンの数）に基づいて決定することが好ましい。ＳＯＭ学習やＶＱ学習といった学習方法では、Ｕ−Ｍａｔｒｉｘ法が利用するように、クラスタの中心部ではプロトタイプの密度が大きくなり、クラスタ境界部ではプロトタイプの密度が小さくなる。従って、学習入力パターンに対して最近接プロトタイプとなる機会が少なく、所定の閾値以下の場合、つまりプロトタイプの密度が所定のしきい値よりも低い部分は、クラスタ境界近傍のプロトタイプとみなすことができる。そこで、当該部分に新規プロトタイプを追加するようにすれば、境界全体に沿って追加することがなくなり、再度の学習や、再度のクラスタリングにかかる効率を向上できる。
【００７１】
また、追加する新規プロトタイプの重みベクトルを決定するには、追加しようとする位置（例えば境界部分）近傍の既存プロトタイプの重みベクトルに対する所定の統計演算結果（例えば算術平均値）により重みベクトルを決定する。
【００７２】
［動作］
次に、本実施の形態に係るデータ分類装置１の動作について説明する。まず、学習条件設定部３３が複数の学習条件パラメータのセットＳ１，Ｓ２，…ＳＭを出力して、ＳＯＭ学習部３１において当該学習条件パラメータのセットの数に対応した（Ｍ個の）プロトタイプマップが生成される。ＳＯＭ学習部３１は、外部から入力される学習画像データに基づき所定の特徴量ベクトルを生成し、各プロトタイプマップの各プロトタイプと当該特徴量ベクトルの各成分との結合重みを調整する。ＳＯＭ学習部３１のこの動作は、コホーネンらの記述により広く知られたものである。本実施形態において、学習画像データとは、人物の顔が撮影された顔画像データであり、この顔画像データとしては、フラッシュ光を受けたもの／受けないもの、横から／上から光（日射）を受けたものなどの撮影条件が異なる顔画像データや、口を開けたもの／閉じたもの、眼鏡をかけたもの／かけないもの、笑った／怒った表情のもの、正面を向いたもの／うつむいたものなど被写体の性状が異なる複数の顔画像データである。
【００７３】
ＳＯＭ学習部３１により生成された複数のプロトタイプマップは、マップ選択部３２に出力され、マップ選択部３２が各マップに含まれるプロトタイプに関する演算から、量子化誤差（ＱＥ）及びトポロジカル・プロダクト（ＴＰ）に基づき、量子化誤差が低く、ＴＰにより示される入力層での空間（特徴量セットの空間）と、競合層での空間（プロトタイプの空間）との相対的位置関係の一致度、すなわち、重みベクトル間の距離と、競合層での距離との一致度が高いマップを選択する。これにより、類似する画像データに反応するプロトタイプ間のマップ上の距離が小さくなる。
【００７４】
そして選択されたマップの学習に用いられた学習条件パラメータのセットに基づき、学習条件設定部３３が再度学習条件パラメータのセットを複数生成してＳＯＭ学習部３１に出力し、複数のマップが再度生成され、その中から、ＱＥ及びＴＰに基づくマップ選択が行われる。こうして、学習条件パラメータが再帰的に調整され、マップの学習形成が再帰的に行われる。
【００７５】
このような再帰的学習の結果得られたマップについて、クラスタ境界決定部２２が、マップ上のプロトタイプを順次選択し、その選択したプロトタイプとそれに隣接するプロトタイプとの間の相関量が大きいもの同士を一つのクラスタにまとめる。つまり、プロトタイプのマップ上での隣接関係及び相関量によって各プロトタイプの属するクラスタが決定される。そして、この処理を繰返し実行して、クラスタリングの結果が収束したところで、そのクラスタリングの結果をプロトタイプ追加部３４に出力する。
【００７６】
プロトタイプ追加部３４がクラスタの境界部分に新規プロトタイプを追加したマップを生成して、このマップをＳＯＭ学習部３１に出力し、所定の学習条件を設定して再度学習を行わせる。この際は学習条件パラメータのセットは１つだけでよく、従ってマップは一つだけで構わない。そこで、この一つのマップの学習処理が完了すると、当該マップを（マップ選択部３２を介することなく）そのままクラスタ境界決定部２２に出力し、クラスタ境界決定部２２が改めてクラスタリングの処理を行う。
【００７７】
このクラスタリングの処理を行い、複数の顔画像データを用いて学習形成した場合のマップの例を図９に示す。図９では、各プロトタイプ５１にはクラスタの番号が付されている。ここで、番号１は図１０（ａ）に示すような「フラッシュ有り、且つ、口を閉じた顔」の顔画像データが属するクラスタの番号であり、番号２は図１０（ｂ）に示すような「フラッシュ有り、且つ、口を開けた顔」の顔画像データが属するクラスタの番号であり、番号３は図１０（ｃ）に示すような「フラッシュ無し、且つ、口を閉じた顔」の顔画像データが属するクラスタの番号であり、番号４は図１０（ｄ）に示すような「フラッシュ無し、且つ、口を開けた顔」の顔画像データが属するクラスタの番号である。各クラスタは、「フラッシュの有無」、「口の開閉」という２つの条件で定義されている。このように、本実施形態では、一つ以上の撮影条件または被写体の性状で定義される複数のクラスタに分割された自己組織化マップが学習形成される。なお、このマップは説明の便宜のための簡単な一例であり、別の例では、他の様々な撮影条件、被写体の性状ごとにクラスタリングされ、クラスタの番号が付されたマップが学習形成される。
【００７８】
そしてこのクラスタリングの処理の結果として得られたマップが分類処理に供される。すなわち、分類対象として入力された画像データに対して特徴量ベクトルを生成し、この特徴量ベクトルに対して最も結合重みの大きいプロトタイプ（入力された画像データに反応するプロトタイプ）を見いだす。そして当該プロトタイプの属するクラスタの番号が、当該画像データの分類番号となる。これにより、互いに類似する画像データに対して特定の分類番号が決定され、互いに異なる画像データに対しては、異なる分類番号が決定される。例えば、「フラッシュ有り、且つ、口を開けた顔」の顔画像データが入力されると、この顔画像データの特徴量ベクトルが図９に示されるマップの番号２が付されたプロトタイプと結合重みが大きいことが判別される。この結果、入力された顔画像データには分類番号２が決定される。そして、この自己組織化マップが出力され、その結果は、ディスプレイ１６に表示され、また図示しないプリンタ等により印字される。
【００７９】
本実施形態のデータ分類装置では、このように形成された自己組織化マップを用いて顔画像データを分類処理することにより、撮影条件や被写体の性状の細かな条件で、顔画像データを分類することができる効果がある。例えば、フラッシュ光を受けたもの／受けないもの、横から／上から光（日射）を受けたものなどの撮影条件が異なる顔画像データや、口を開けたもの／閉じたもの、眼鏡をかけたもの／かけないもの、笑った／怒った／困った表情のもの、正面を向いたもの／うつむいたもの／上を向いたものなど被写体の性状に応じて、顔画像データを分類することができる効果がある。なお、撮影条件、被写体の性状は、ここに挙げたものに限られない。
【００８０】
また、本実施形態では、例えば「フラッシュ有り、且つ、口を開けた顔」に分類された顔画像データと、「フラッシュ有り、且つ、口を閉じた顔」に分類された顔画像データは、「フラッシュ有り」という点では共通しているが、別の分類番号が決定されてしまう。これを回避するために、データ分類装置に、次に説明する共通の撮影条件の顔画像データまたは共通の被写体の性状の顔画像データを抽出する機能を追加してもよい。
【００８１】
ハードディスクには１４には、前もって、１つの撮影条件または１つの被写体の性状に対応して、その特徴を含むクラスタの分類番号が関連付けられた対応表を記憶されている。図９に示すマップの対応表を例として説明すると、「口を開けた顔」という被写体の性状に対して分類番号２，４が設定されており、「フラッシュ有り」という撮影条件に対して分類番号１，２が設定されている。この分類番号の設定は、人が各分類番号に属する顔画像データを目視で確認して、共通の特徴を持つ分類番号を選別して設定してもよいし、他の方法で行ってもよい。
【００８２】
ＣＰＵ１１は、複数の顔画像データに対して自己組織化マップを用いて分類処理を行った後、それらの顔画像データをその分類番号と関連付けてハードディスク１４に記憶する。ここで、ユーザからの１つの撮影条件または被写体の性状の画像データを要求する指令があると、ＣＰＵ１１はその撮影条件または被写体の性状に対応したクラスタの分類番号を対応表から読み出し、それらの分類番号が関連付けられた顔画像データを全て抽出する。そして、これらの抽出された顔画像データをユーザに提供する。このようにして共通の特徴をもつ顔画像データが抽出される。例えば、「フラッシュ有り」の顔画像データを要求する指令があると、ＣＰＵ１１は対応表から分類番号１，２を読み出し、これらの分類番号が関連付けられた図１０（ａ），（ｂ）を抽出する。
【００８３】
また、共通の特徴をもつ顔画像データは次のようにして抽出してもよい。この方法では、まず、上述したマップの学習形成方法で複数の自己組織化マップを用意する。これらの複数のマップは、顔画像データに対する分類条件が異なる。例えば、第１のマップは分類条件が細かく、図９に示すマップのようにフラッシュの有無および口の開閉が判別されるが、第２のマップは分類条件が粗く、フラッシュの有無のみが判別され、口の開閉は判別されない。このような分類条件の異なるマップは、マップの学習形成時に特徴量やパラメータを異なるものとしたり、マップの投影する座標系を変えたり、またはクラスタの階層構造を得る手段であるメルティングアルゴリズム（Ｋｅｎｎｅｔｈ　Ｒｏｓｅ　ｅｔ　ａｌ．，”Ｓｔａｔｉｓｔｉｃａｌ　Ｍｅｃｈａｎｉｃｓａｎｄ　Ｐｈａｓｅ　Ｔｒａｎｓｉｔｉｏｎ　ｉｎ　Ｃｌｕｓｔｅｒｉｎｇ”，Ｐｈｙｓ．Ｒｅｖ．Ｌｅｔｔ．６５，ｐｐ．９４５−９４８（１９９０））を用いればよい。
【００８４】
ＣＰＵ１１は、まず分類条件の細かい第１の自己組織化マップを用いて顔画像データの分類を行う。例えば、第１の自己組織化マップでは、「フラッシュ有り、且つ、口を開けた顔」の顔画像データと、「フラッシュ有り、且つ、口を閉じた顔」の顔画像データに対して、互いに異なるクラスタの分類番号が決定される。ＣＰＵ１１は、その後、ユーザの指示があると、さらに、分類条件の異なる第２の自己組織化マップを用いて顔画像データの分類を行う。これにより上記２つの顔画像データは、「フラッシュ有り」の顔画像データとして同じクラスタの分類番号が決定される。このような処理を行うことにより、同種の顔画像データが抽出され、ユーザに提供される。この再分類された顔画像データは、第１の自己組織化マップで決定された分類番号と、第２の自己組織化マップで決定された分類番号とともに、ハードディスク１４に保存すればよい。
【００８５】
なお、ここまでの説明では、学習条件パラメータを再帰的に調整して学習し、プロトタイプ間の相関度を用いてクラスタを決定し、クラスタ決定後にプロトタイプを追加して再学習、再度のクラスタ決定を行うこととしているが、プロトタイプを追加する技術については、既に用いられているプロトタイプマップの学習形成と、クラスタリング技術に独立して適用してもよい。この場合、プロトタイプマップの学習には、ＳＯＭだけでなく、ＶＱ学習なども用いてもよい。
【００８６】
［第２の実施の形態］
次に、本発明の第２の実施形態に係るデータ分類装置を説明する。第２の実施形態では、第１の実施形態と、顔画像データとクラスタとの対応関係を学習形成する方法が異なる。第２の実施形態では、パターンデータを直接的にクラスタリングしている。本実施の形態のデータ分類プログラムは、図１１に示すように、クラスタ決定部４１と、分類部４２とを含んで構成されている。
【００８７】
クラスタ決定部４１は、クラスタリングのための学習を行うときに動作し、後に詳しく説明する、クラスタリング処理を実行してクラスタリング結果（いわゆるクラスタフィルタ）を生成して、分類部４２に出力する。分類部４２は、実際の分類処理を実行するときに動作し、入力されたクラスタリング結果を記憶して（例えばハードディスク１４に格納して）、入力されるパターンデータについて、当該クラスタフィルタを参照しながら、どのクラスタに分類されるべきデータであるかを決定し、その決定の結果を分類結果として出力する。この分類部４２の処理の詳細についても後に詳しく述べる。
【００８８】
［処理の詳細］
ここでＣＰＵ１１が実行するデータ分類プログラムにおけるクラスタ決定部４１及び分類部４２の処理の詳細について説明する。まず、クラスタ決定部４１の処理の詳細について説明する。ＣＰＵ１１は、クラスタ決定部４１の処理として、図１２に示す処理が行われ、入力されたＮ個のパターンデータをＲＡＭ１２又はハードディスク１４に記憶し、入力された順番に、「１」から「Ｎ」までの番号をクラスタ番号として仮に割当てる（Ｓ２１）。また、各パターンデータについて、その性状を特徴づける量をパターンベクトルとして演算する。ＣＰＵ１１は、これら（仮の）クラスタ番号と、パターンベクトルとを、それぞれ対応するパターンデータに関連づけて格納しておく。当該パターンベクトル間の類似度を所定の関数によって演算する（Ｓ２２）。ここで類似度の演算に用いられる関数は、例えばパターンベクトル間の測度、具体的にはパターンベクトル間の成分ごとの差の二乗和を用いる。すなわち、複数のパターンデータから２つのパターンデータの組み合せを選択し、各組み合せに係る２つのパターンデータの２つのパターンベクトル間の測度を類似度として演算し、当該類似度を類似度テーブルとしてＲＡＭ１２に記憶する。
【００８９】
またここで（８）式により、後に相関値の演算に用いるパラメータ、αを演算しておく。具体的には、処理Ｓ２２で演算した類似度をクラス（所定数値範囲）ごとに分類して各クラスごとの出現頻度の情報を生成して、当該出現頻度が最大となった距離をＣｄとし、「０」に近い、所定の微小量δを決定して、αを演算する。この処理は、既に説明したプロトタイプ間の類似度を演算して、αを決定する処理と同様のものである。
【００９０】
なお、この場合にも、Ｃｄについては、当該出現頻度が最大となった距離の代わりに、各注目パターンデータについて、他のパターンデータとの類似度の統計量を演算し、各注目パターンデータごとの統計量に対して、さらに所定の統計処理（各統計量の最小値）を行った結果を用いて、すなわち（９）式にてＣｄを決定してもよい。この（９）式によりＣｄを決めることで、隣接クラスタの中で、少なくとも１つのクラスタに属するすべてのパターンデータが、クラスタ決定の対象となるパターンデータに影響する。また、個々のパターンデータごとに適したＣｄを適応的に決定することが可能となる。
【００９１】
そしてＣＰＵ１１は、クラスタ番号の更新処理を開始する（Ｓ２３）。このクラスタ番号の更新処理は、図１３に示すようなものであり、後に詳しく説明する。クラスタ番号の更新処理が完了すると、ＣＰＵ１１は、処理Ｓ２３の前後で、クラスタ番号に変化がないか（クラスタリングの収束が完了したか）を調べ（Ｓ２４）、変化があった場合（収束していない場合；Ｎｏの場合）には、処理Ｓ２３を繰返して実行する。また、処理Ｓ２４において、変化がなかった場合（クラスタリングが収束した場合、Ｙｅｓの場合）、処理を完了して、当該クラスタリングの結果（パターンデータとクラスタ番号とを関連づけた情報を含むもの）をハードディスク１４に格納する。
【００９２】
ここで、処理Ｓ２３のクラスタ番号の更新処理について図１３を参照しながら説明する。ＣＰＵ１１は、まず各パターンデータを順次、注目パターンデータとして選択する（Ｓ３１）。この選択の順序は、例えば処理Ｓ２１において仮のクラスタ番号を割当てた順序（例えば入力順）であってもよい。そしてＣＰＵ１１は、注目パターンデータに対して現在割当てられているクラスタ番号を取得し、その近傍クラスタを決定する（Ｓ３２）。ここで近傍クラスタ番号は、例えば類似度テーブルを参照しながら、注目パターンデータとの間の類似度の高い順に複数のパターンデータを取り出し、当該取り出した複数のパターンデータの各々に割当てられているクラスタ番号とする。なお、取り出すパターンデータの個数は、予め「８個」等と決めておいてもよいし、近傍クラスタとして決定されたクラスタ番号が複数（例えば「４個」等の所定個数）となるまでと決めておいてもよい。また、ここで近傍クラスタには、注目パターンデータ自身が属しているクラスタが含まれる。
【００９３】
ＣＰＵ１１は、近傍クラスタとして決定されたクラスタの番号を用いて、各近傍クラスタに現在属しているパターンデータを、決定された近傍クラスタごとに取り出し（Ｓ３３）、近傍クラスタごとに取出されたパターンデータと、注目パターンデータとの間の相関量を（７）式（類似度の低下に伴い、所定微分値より急速に「０」に漸近する関数）によって演算する（Ｓ３４）。すなわち、ｙ＊を注目パターンデータのパターンベクトルとし、取出されたパターンデータのうち、ｉ番目のパターンデータのパターンベクトルをｙｉとし、（７）式により、クラスタ番号ｃに属するパターンデータのパターンベクトルと、注目パターンデータのパターンベクトルとの間の距離（類似度）の総和を全体平均で除した値を演算する。そしてこの値が、注目パターンデータと、クラスタ番号ｃに属するパターンデータ群との間の相関量とするのである。
【００９４】
ＣＰＵ１１は、近傍クラスタごとに、そのクラスタに属するパターンデータ群と、注目パターンデータとの相関量を演算しておき、近傍クラスタのうち、相関量が最大となったものを選びだす（Ｓ３５）。そして注目パターンデータと当該選び出した近傍クラスタの番号とを関連づけて、仮の更新結果としてＲＡＭ１２に記憶しておく（Ｓ３６）。
【００９５】
そしてＣＰＵ１１は、すべてのパターンデータを注目パターンデータとして選択したか否かを調べ（Ｓ３７）、すべてのパターンデータを選択していなければ（未選択のものがあれば、Ｎｏであれば）、処理Ｓ３１に戻って処理を続ける。また、処理Ｓ３７において、未選択のものがなければ（Ｙｅｓであれば）、ＲＡＭ１２に記憶された仮の更新結果に基づいて、各パターンデータに現在関連づけられているクラスタ番号を更新し（Ｓ３８）、クラスタ番号の更新処理を完了する。ここで、一旦仮の更新結果として保持した上で、最後に更新を実行しているのは、ある注目パターンデータに対する更新処理により、後で注目パターンデータとして選択されるパターンデータのクラスタ番号の決定に影響を及ぼさないようにしたものである。
【００９６】
このように本実施の形態では、ＣＰＵ１１は、パターンデータごとに定義される、パターンベクトル間の距離により、クラスタ間の隣接関係を規定しながら、各クラスタに属しているパターンデータ群と、クラスタリング（分類学習）の対象となっている注目パターンデータとの相関量に基づいて、より相関性の高いクラスタに注目パターンデータを分類するという処理をクラスタリングの結果に変化がなくなるまで繰返して行う。なお、ここでは、近傍クラスタを決定した上で、当該近傍クラスタとの関係において相関量の演算を行うようにしているが、ＣＰＵ１１の処理速度が十分であれば、すべてのクラスタについて総当り的に、上記相関量の演算を行うようにしても構わない。この場合、処理Ｓ３２、Ｓ３３は、必ずしも必要でなくなり、処理Ｓ３４において、すべてのクラスタについて、各クラスタに属しているパターンデータ群と、注目パターンデータとの相関量を演算することとなる。
【００９７】
次に、こうして生成されたクラスタリングの結果を用いて、実際にデータの分類を行う処理（分類部４２の処理）について説明する。分類の対象となるパターンデータ（対象パターンデータ）が入力されると、ＣＰＵ１１は、当該対象パターンデータについてのパターンベクトル（対象ベクトル）を演算により求め、ハードディスク１４に格納されているクラスタリングの結果（例えばパターンデータと、そのパターンデータについてのパターンベクトルと、クラスタ番号とを関連づけたもの）を参照して、クラスタリングの結果に含まれている各パターンベクトル（基準ベクトル）と、対象パターンベクトルとの距離を演算する。そして、その距離が最も短い基準ベクトル（対象パターンベクトルに対して、最も類似度の高い基準ベクトル）を見いだして、当該見いだした基準ベクトルに関連づけられたクラスタ番号を分類結果として出力する。
【００９８】
上述した本実施形態においても、画像データとクラスタとの対応関係が学習形成されるため、この対応関係を用いて顔画像データを分類処理することができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態に係るデータ分類装置の構成ブロック図である。
【図２】本発明の第１の実施の形態に係るデータ分類装置の構成ブロック図である。
【図３】クラスタリングの処理を表すフローチャート図である。
【図４】プロトタイプ間の距離のヒストグラムの検出例を表す説明図である。
【図５】クラスタリングの処理におけるクラスタ更新処理の一例を表すフローチャート図である。
【図６】クラスタリングの処理の動作例を表す説明図である。
【図７】プロトタイプマップのクラスタリング結果の一例を表す説明図である。
【図８】プロトタイプの追加状態とその後のクラスタリング結果の一例を表す説明図である。
【図９】顔画像データの自己組織化マップのクラスタリング結果の一例を表す説明図である。
【図１０】複数種類の顔画像データの例を示す説明図である。
【図１１】本発明の第２の実施の形態に係るデータ分類装置の構成ブロック図である。
【図１２】クラスタリングの処理を表すフローチャート図である。
【図１３】クラスタリングの処理を表すフローチャート図である。
【符号の説明】
１　データ分類装置、１１　ＣＰＵ、１２　ＲＡＭ、１３　ＲＯＭ、１４　ハードディスク、１５　画像入力用インタフェース、１６　ディスプレイ、１７　外部記憶部、２１　マップ生成部、２２　クラスタ境界決定部、３１　ＳＯＭ学習部、３２　マップ選択部、３３　学習条件設定部、３４　プロトタイプ追加部、４１　クラスタ決定部、４２　分類部、５０　マップ、５１　プロトタイプ。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a data classification device that classifies data using a self-organizing map, a data classification method, and a program thereof.
[0002]
[Prior art]
In recent years, with the spread and development of information devices, the amount of information received by people is increasing more and more. Under such a background, there is a demand for the development of a technology for recognizing and classifying information without human intervention in order to facilitate selection of necessary information.
[0003]
In response to such a demand, a clustering method has been considered in which data having similarities are compared and classified as a group of data. Here, in determining the similarity, a maximum likelihood estimation method, a K-means method, a merging method, an MDS (Multi-Dimensional Scaling) method, and the like are known.
[0004]
[Problems to be solved by the invention]
However, the above-described conventional clustering method has a problem that human intervention such as parameter setting is indispensable and the clustering process cannot be executed autonomously.
[0005]
Also, with the conventional clustering method, when classifying face image data of a person's face, it is possible to finely classify the shooting conditions such as the presence or absence of flash at the time of shooting and the direction of sunshine, and the characteristics of the subject such as the expression of the person Couldn't get a good map.
[0006]
[Related technology]
On the other hand, as a method for performing the clustering process relatively autonomously, there is a method in which input image data as one of pattern data is classified and arranged on a grid space map. For this classification and organization, for example, self-organizing feature mapping (hereinafter abbreviated as SOM) (T. Kohonen Self-organizing formation of topologically correct features maps. Biological Cybernetics, 1982) is used. The SOM is a two-layer network composed of an input layer to which data is input and a competitive layer of the lattice space map. The input is weighted and input to each lattice. A set of weights for each input component is referred to as a weight vector.
[0007]
Initially, the weight vectors are initialized in the following manner. That is, as shown in the above-mentioned Kohonen's document, the same number of input vectors I as the number of prototypes are randomly selected from a plurality of input vectors to be learned (corresponding to the feature amount set here), and each grid is selected. Is initialized. Similarly, according to Kohonen, an initial value may be set to each weight vector with a random number.
[0008]
Next, learning of a weight vector is performed. In this learning process, a learning feature amount set is generated, and a predetermined measure (for example, a Euclidean distance) between the learning feature amount set and a weight vector of each grid in the grid space is calculated. Then, among the grids, a grid (winning node) having the maximum relationship (minimum measure) is found. Then, the weight vector is corrected for each of the lattices existing in the vicinity of the lattice (victory node) in the lattice space so that the measure with the learning feature amount set becomes small. By repeating the learning while correcting the weight vector as described above, the grid having the minimum measure is concentrated on a specific region for the feature set including the feature amounts similar to each other, and applied to data classification. It is possible. Here, in selecting a grid for which the weight vector is to be corrected, the distance on the map from the winning node is used. Further, the correction amount is preferably changed according to the distance from the winning node c, and the magnitude of the correction amount is preferably changeable. Generally, the weight vector w is corrected so as to approach the weight vector I of the neighboring node as in the following equation (1).
[0009]
(Equation 1)

In addition,
(Equation 2)

Here, α (t) is an amount that governs the magnitude of the correction amount, and is called a learning coefficient. Further, σ (t) is a function that determines a change in the range for correcting the weight vector, and is called a neighborhood function. These are functions that decrease monotonically with time t. The distance Rmax between the nodes on the map centering on the winner node,
[Equation 3]

Is corrected by the equation (1) for the lattices belonging to the range, but Rmax decreases by σ (t) during the repetition of learning. As the neighborhood function σ (t), a function of a triangle type, a rectangle type (square), a Mexican hat type, or the like is used. It is known that the selection of the neighborhood function σ (t) also affects the learning result. Here, t is “time”, and is incremented every time a feature amount set is input. || rc-rj || is the norm (distance) between the winning node and the node whose weight vector is to be corrected.
[0010]
However, if the above technique is applied as it is, autonomous data classification cannot be performed immediately. In order to realize autonomous data classification, first, it is necessary to determine whether the grid space map after learning is appropriate. That is, (1) a method of obtaining an optimal grid space map is required. When data classification is performed using the grid space map after the learning, a boundary line serving as a reference for classification is formed on the lattice space, and a feature amount set for data given as a classification target is formed. It is appropriate to classify the data based on which boundary line the grid having the minimum measure belongs to (a region in the grid space divided by this boundary line is hereinafter referred to as a cluster). That is, (2) a method for determining the boundaries of clusters is also required.
[0011]
Among them, (1) Kohonen has proposed a method of selecting a map that minimizes the average quantization error as a method of obtaining an optimal lattice space map. In other words, from among a plurality of grid space maps formed with learning conditions different from each other, the one with the smallest average quantization error is selected, and this is set as an approximately optimal grid space map. According to this method, the topology of the space of the input feature amount set is not reflected on the topology of the map. In other words, the preservation of topology is low. This may lead to misclassification depending on the clustering method.
[0012]
In consideration of preservation of topology, a technology (Auto-SOM method) for monitoring a predetermined index called a topological function during learning and controlling learning conditions thereby to form an appropriate map has been developed. Have been. However, since the calculation of the topology function itself is a process with a high load, there is a problem that the learning time becomes long.
[0013]
Next, (2) As a method of autonomously determining a boundary of a cluster, a method called a U-Matrix (Unified Distance Matrix Methods) method and a method called a potential method are being studied. Here, the U-Matrix method is described in A. Ultsch et. al. , "Knowledge Extraction from Artificial Neural Networks and Applications", Proc. This is disclosed in detail in Transporter Annender Treffen / World Transputer Congress TAT / WTC 93 Aachen, Springer 1993. In U-Matrix, the distance between two adjacent grids on the map is defined as follows. That is, for the difference between the components of each weight vector of the two grids, the sum of the absolute values, the root mean square of the difference, and the like are defined as the distance. Then, each of the feature sets having high similarity is strongly coupled to each other (a weight vector having a value close to that of the feature set, and such a set is hereinafter referred to as “prototyped to the feature set”). The distance between adjacent lattices, ie, between adjacent lattices that have been prototyped for each of two feature sets with high similarity, is small, and the distance between adjacent lattices that have been prototyped for each of two feature sets with low similarity is small. The distance between adjacent grids becomes large. Therefore, when considering a three-dimensional surface with the height of this distance as a height, the height of the surface corresponding to the space between the grids prototyped with feature sets similar to each other becomes low, and a "valley" is formed. On the other hand, the heights of the surfaces corresponding to the lattices prototyped with different feature quantity sets increase, forming "peaks". Therefore, if a boundary line is formed along the “mountain”, a set (cluster) of lattices that are prototyped into a feature amount set with high similarity can be defined. It can be said that the U-Matrix reinforces the point that the distance in the input space is not preserved in the self-organizing map.
[0014]
However, the U-Matrix can define the boundary if the height difference between “mountain” and “valley” is clear, but in actual information processing, the height difference between “mountain” and “valley” is as clear as expected. Instead, the height of the three-dimensional surface often changes slowly. In this case, it is necessary to set a boundary line artificially, and the boundary cannot always be determined autonomously.
[0015]
One potential method is described in Coomans, D.A. L. Massart, Anal. Chem. Acta. , 5-3, 225-239 (1981), which uses a predetermined potential function to superimpose a value of a function on input data to approximate a population. That is, a probability density function is estimated, and a portion having little overlap is determined as a boundary. The potential function is often a Gaussian function. More specifically, when there is an input data group consisting of N input vectors, assuming that each of the input data groups has a K-dimensional size, the average potential (l-th input is The contribution ratio Ψl to the entire input set is defined by the following equations (2) and (3).
[0016]
(Equation 4)

[0017]
Note that xkl means the k-th component of the l-th input. Α affects the number of clusters classified by the smoothing parameter. Therefore, in the potential method, optimization of a distribution function assuming the distribution shape and optimization of various parameters are obtained for each input vector set. In short, knowledge of characteristics of data to be classified is required in advance. , Artificial adjustment is indispensable. In addition, in this potential method, when the feature set obtained from the input data has a high dimension, a large number of samples are required to obtain an appropriate probability density distribution, and a map for a map including a small number of grids is required. There is a problem that application is difficult. That is, the boundary cannot always be determined autonomously in the potential method.
[0018]
In order to solve these problems, for example, JP-A-7-234854 and JP-A-8-36557, "Unsupervised cluster classification method using data density histogram on self-organizing feature map", Electronic Information Communication Journal Transaction D-II Vol. J79-DIINo. 7 pp. 1280-1290, the technology disclosed in July, 1996, etc. is being studied. However, in any technique, it is assumed that the features to be used for classification are prototyped on each grid at a sufficient distance in the structure of the input data itself and the mapping result, and the classification of image data is performed. In the case where there is a difference or overlap in the distribution shape for each feature that you want to classify, and the distance between the centers of gravity of the prototyping grid positions on the map, as seen in In this case, the boundaries of the clusters are complicated, so that appropriate clustering cannot be performed.
[0019]
Further, in the related art, the number of grids on the map was determined only empirically in the course of research, and no consideration was given to determining an appropriate number of grids suitable for actual use. However, if the number of grids is smaller than the appropriate number, the grid at the cluster boundary and the feature set to belong to another cluster may be strongly combined, and in this case, classification errors increase. In this regard, a technique that adds / reduces the number of lattices so that the average quantization error is smaller than a predetermined amount is described in James S. K. Kirk et. al. "A Self-Organized Map with Dynamic Architecture for Efficient Color Quantization", IJCNN'01, 2128-2132. However, in this technique, since only a grid mapping the data distribution in the space of the feature set corresponding to the input data is added, the number of grids near the cluster boundary, which is important in data classification, is increased. Such a thing is not considered. Therefore, for example, the number of grids may be increased from the beginning, but in this case, the calculation time becomes long, which is not practical.
[0020]
Similarly, when input data (pattern data) is directly classified into clusters without using a prototype, a method of classifying a set of pattern data into clusters based on the stochastic property of the set of pattern data is also used. is there. Here, as the stochastic property, for example, a method of sequentially estimating a probability distribution parameter by Bayes learning and a method of using a potential function are known. However, in this case, in estimating the probabilistic property, it is necessary that information (for example, a label) serving as a clue for clustering is added to the input pattern data in advance. This is because, for each piece of information serving as the hint, an operation of estimating the probability distribution in each classification after temporarily classifying the pattern data is required.
[0021]
Accordingly, it is also possible to calculate the similarity between individual pattern data by a predetermined function, analyze the structure of the pattern data space, and perform clustering according to the structure as the analysis result. This type of method includes a method known as a K-means method and a division / merging method (so-called ISODATA method), but it is necessary to set parameters artificially in each case. That is, in the K-means method, the number of final clusters indicating how many clusters the pattern data group wants to divide must be set. There is also a problem that the result of clustering depends sensitively on the setting of a parameter called a cluster center value, and the quality of the clustering result is determined according to the set value.
[0022]
On the other hand, also in the division and merging method, many parameter settings such as a cluster removal threshold, a cluster division threshold, and a cluster merging threshold are required, and the clustering result largely depends on the setting of these values.
[0023]
The present invention has been made in view of the above circumstances, and has as its object to provide a data classification device capable of performing autonomous clustering.
[0024]
It is another object of the present invention to provide a data classification device capable of classifying face image data according to shooting conditions or properties of a subject.
[0025]
[Means for Solving the Problems]
The face image data classification device according to the present invention for solving the problems of the above-described conventional example uses a plurality of face image data to define at least one of arbitrary face image data, shooting conditions, and properties of a subject. Learning means for learning and forming a correspondence relationship with a plurality of clusters to be learned, and cluster determining means for determining a cluster to which face image data belongs using the learned and formed correspondence relationship. .
[0026]
In addition, the face image data classification method according to the present invention uses a plurality of face image data to determine a correspondence relationship between arbitrary face image data and a plurality of clusters defined by at least one of a shooting condition and a property of a subject. And a cluster determining step of determining a cluster to which the face image data belongs using the learned and formed correspondence.
[0027]
In addition, the face image data classification program according to the present invention uses the plurality of face image data to determine a correspondence relationship between arbitrary face image data and a plurality of clusters defined by at least one of a shooting condition and a property of a subject. And a cluster determining step of determining a cluster to which the face image data belongs using the learned and formed correspondence.
[0028]
In the above-described face image data classification device, method, or program, it is preferable to extract, from face image data belonging to each cluster, face image data shot under a common shooting condition or face image data having a property of a common subject. .
[0029]
The learning formation of the correspondence between the face image data and the cluster is performed by (a) temporarily determining the cluster to which each prototype belongs, and (b) sequentially selecting each prototype as a target prototype to be classified into clusters, c) For each cluster, calculate a measure between at least one prototype belonging to each cluster and the prototype of interest, and (d) determine a cluster to which the prototype of interest belongs based on the calculated measure. It is preferable that the prototypes be classified into clusters by making changes as needed and repeating the processes (b), (c), and (d) until there is no change in the cluster to which each prototype belongs.
[0030]
Alternatively, the learning formation of the correspondence between the face image data and the cluster includes (a) determining the cluster to which the face image data belongs, and (b) sequentially classifying each face image data into clusters. (C) for each cluster, a predetermined correlation between at least one face image data belonging to the cluster and the target face image data to be classified into the cluster; (D) determine the cluster to which the face image data of interest belongs, based on the correlation value, and perform the processes of (b), (c), and (d) to make each of the face image data belong. It is preferable to repeatedly perform this process until there is no change in the cluster, and classify each face image data into a cluster.
[0031]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described with reference to the drawings. The data classification device described below is a data classification device that classifies face image data obtained by capturing a person's face into clusters defined by at least one of the capturing conditions and the subject.
[0032]
As shown in FIG. 1, the data classification device 1 according to the present embodiment basically includes a CPU 11, a RAM 12, a ROM 13, a hard disk 14, an image input interface 15, a display 16, and an external storage unit 17. These components are connected by a bus. That is, the data classification device 1 of the present embodiment is realized by software using a general personal computer. This software is generally distributed in a state of being stored in a recording medium such as a CDROM or a DVDROM, or is downloaded via a network (a connection interface to the network is not shown). When distributed by the recording medium, the data is read out from the external storage unit 17 and stored in the hard disk 14 by a predetermined installation process. Also, when downloaded via a network, it is similarly installed on the hard disk 14.
[0033]
The CPU 11 operates according to the program stored in the hard disk 14, and basically a data classification program or the like that embodies the data classification device 1 of the present embodiment under the control of an operating system such as Windows (registered trademark). Execute
[0034]
The RAM 12 is used as a work memory of the CPU 11 and is used for storing various parameters and data during the processing of the CPU 11. The ROM 13 stores a program that is necessary when the data classification device 1 is started, such as a process of reading an operating system. Since the contents of the start-up program are widely known, a description thereof will be omitted.
[0035]
The hard disk 14 has an operating system main body and various programs installed therein. In the present embodiment, a data classification program is installed on the hard disk 14 as described above. Although the case where the data is stored in the hard disk is illustrated here, it may be installed in a nonvolatile memory such as an SRAM (Static Random Access Memory) or an EEPROM, or as shown in FIG. May be installed in another computer connected via a network interface (not shown).
[0036]
An image input device such as a scanner is connected to the image input interface 15, and receives image data input from the image input device and outputs the image data to the CPU 11. The display 16 displays an image according to an instruction from the CPU 11.
[0037]
[Embodiment 1]
Here, as a first embodiment, for pattern data (face image data) to be specifically input, a prototype representing each pattern data is generated, and the prototype is classified on a map, and The following describes a first embodiment in which the prototype groups classified by are used to classify the input pattern data. The data classification program according to the present embodiment includes a map generation unit 21 and a cluster boundary determination unit 22. The map generation unit 21 includes a SOM learning unit 31, a map selection unit 32, a learning condition setting unit 33, And a prototype adding unit 34. Here, it is assumed that each of these units is realized as a software module, but it may be configured by a logic circuit in hardware. The processing in the CPU 11 will be described later in detail.
[0038]
[Details of processing]
Here, the data classification program executed by the CPU 11 will be described in detail with reference to FIG. The map generation unit 21 forms a prototype map by, for example, the self-organizing mapping (SOM) described above, and outputs information of the formed prototype map to the cluster boundary determination unit 22. The cluster boundary determining unit 22 classifies each prototype into clusters for the prototype map input from the map generating unit 21. Hereinafter, the operation of each of these units will be separately described in detail.
[0039]
[Map generation]
First, the SOM learning unit 31 of the map generation unit 21 generates M prototypes corresponding to each of a plurality of (for example, M sets (M is an integer of 2 or more)) learning condition sets input from the learning condition setting unit 33. Generate map candidates. Each map candidate associates information specifying each prototype with information on the relationship weight between the prototype and each of the components of the feature quantity set. In the present embodiment, the prototypes constituting the map do not necessarily have to be arranged in a lattice point (in this case, the information for specifying the prototype may include the coordinate information on the prototype map). For the sake of simplicity, the following description will be made on the assumption that they are arranged in lattice points.
[0040]
The map selection unit 32 calculates a quantization error (QE) and a topological product (hereinafter referred to as TP) for each map candidate, and based on these, selects one map suitable for determining a cluster and a champion map. Select as Here, the quantization error is calculated by the following equation (4).
[0041]
(Equation 5)

[0042]
In equation (4), P is the number of feature quantity sets used for map learning (that is, the number of learning patterns), Ej is the j-th feature quantity set vector, and Wc is the j-th feature quantity set vector. This is the weight vector of the winning node. Since the quantization error is widely known by Kohonen et al., A detailed description thereof will be omitted.
[0043]
TP is calculated by the following equation (5).
[0044]
(Equation 6)

[0045]
This TP becomes smaller as the relative positional relationship between the space in the input layer (space of the feature quantity set) and the space in the competitor layer (space of the prototype) matches, and Bauer et al. By Bauer, H .; U. , And Pawelzik, K .; R. , (1992), "Quantifying the need for preservation of self-organizing feature maps." IEEE Trans. Neural Networks, 3, 570-579, and the like.
[0046]
Using the values of QE and TP, the map selection unit 32 selects a map having a small score calculated by the following equation (6) as the champion map MAPc, and outputs the selection result.
[0047]
(Equation 7)

That is,
(Equation 8)

It is.
[0048]
In addition, the map selection unit 32 initially outputs the result of the selection of the champion map to the learning condition setting unit 33 at least once without outputting the result to the subsequent cluster boundary determination unit 22. After the selection of the champion map is repeatedly performed a preset number of times, the selection result at that time is output to the cluster boundary determination unit 22.
[0049]
The learning condition setting unit 33 outputs M sets of, for example, the number N of learning input data (number of times of learning), the neighborhood distance σ (t), and the learning coefficient α (t) as learning conditions. The learning condition setting unit 33 initially determines these values and functions (N, σ (t), α (t)) based on random parameters, or determines them as a predetermined set (preset). I do. The learning condition setting unit 33 receives a selection result of the champion map from the map selection unit 32 and extracts a set of learning conditions corresponding to a map candidate of the selection result. Then, based on the extracted set of learning conditions, a set of M learning conditions is further generated and set, and output to the SOM learning unit 31.
[0050]
Note that the prototype addition unit 34 adds a prototype to a predetermined position of the prototype map after the cluster boundary determination is performed, and further performs learning. However, since the prototype addition unit 34 relates to the operation of the cluster boundary determination unit 22, This will be described later in detail.
[0051]
Here, the learning operation in the map generation unit 21 will be described. Initially, the learning condition setting unit 33 generates and outputs a plurality of learning condition sets (for example, M sets) using random or predetermined parameters. The SOM learning unit 31 generates M prototype map candidates (map candidates) according to the set of learning conditions output from the learning condition setting unit 33, and outputs the candidates to the map selection unit 32. The map selection unit 32 selects a map (champion map) whose learning state is suitable for clustering from both of these map candidates using both the quantization error and the TP. Output to the learning condition setting unit 33. Then, the learning condition setting unit 33 generates a plurality of new sets of learning conditions based on the learning conditions used for generating the champion map, and outputs the new sets of learning conditions to the SOM learning unit 31 again to generate a plurality of map candidates.
[0052]
In this way, the operations of generating a map candidate, selecting a champion map, and resetting learning conditions are repeated a predetermined number of times, and the resulting champion map is sent to the cluster boundary determination unit 22 as a cluster boundary setting target map. Is output.
[0053]
[Cluster boundary determination]
The cluster boundary determination unit 22 performs a process as shown in FIG. 3 on the map for which the boundary is set, which is input from the map generation unit 21. Specifically, the cluster boundary determining unit 22 assigns a unique number to each of the prototypes included in the input map, and generates a temporary clustering result (S1). This number may be a number given in a predetermined order from “1” to “P” (the number of prototypes is P). This number becomes the temporary cluster number. That is, initially, each prototype is classified into a different cluster.
[0054]
Next, the cluster boundary determining unit 22 extracts a prototype pair, and calculates the similarity (Cd) between the weight vectors of the prototypes related to the extracted prototype pair (S2). The result of this calculation is stored in the RAM 12 as a similarity table. Here, the prototype pair refers to one in which each prototype is sequentially selected as a target prototype and all combinations of the target prototype and other prototypes are taken, that is, all combinations of two prototypes. Here, the similarity refers to a sum of squares (distance) of a difference between components of each weight vector.
[0055]
This similarity is classified into classes (for each predetermined numerical range), and information on the frequency of appearance for each class is generated (FIG. 4). Is determined in advance (eg, δ = 0.01). Alternatively, the maximum distance shorter than the distance at which the appearance frequency becomes the maximum and the frequency at which the frequency changes from decreasing to increasing may be Cd.
[0056]
Next, a cluster number update process is started (S3). This cluster number updating process is as shown in FIG. 5, but for simplicity, a description will be given here assuming that there is a 3 × 3 grid map as a prototype map for determining the boundaries of clusters. Initially, in process S1, 3 × 3 = 9 prototypes are assigned unique numbers “1” to “9” as shown in FIG. 6A for the prototype map.
[0057]
The cluster boundary determination unit 22 sequentially selects each prototype as a target prototype (S11). Then, a cluster to which a prototype adjacent to the target prototype (within a predetermined distance on a temporarily clustered prototype map) belongs is selected (S12), and a prototype belonging to the selected cluster is extracted (S13).
[0058]
In the example of FIG. 6, for example, a prototype belonging to each cluster is selected for each of the cluster numbers “1”, “4”, “5”, and “2” adjacent to the “1” on the lower left side as a target prototype. Will be. Then, the cluster boundary determining unit 22 calculates the correlation amount as a measure between each prototype belonging to the cluster selected in the process S12 and the prototype of interest by the following equation (7) (S14), and the cluster to which the target prototype belongs is determined based on the correlation amount.
[0059]
(Equation 9)

[0060]
Here, y * is the weight vector of the prototype of interest, and yi is the weight vector of the i-th prototype. Χ is a set of prototype vectors, and χ (c) is a set of prototype vectors of cluster number c. Cd and δ used for determining α are obtained in the process S2, and Ln represents a natural logarithm. That is, Expression (7) is obtained by dividing the sum of the distances between the prototype of interest and the prototype belonging to cluster number c by the overall average, and represents the correlation amount between the prototype of interest and cluster c. Equation (7) has a larger value as the number of prototypes having a larger correlation amount between the prototype and the weight vector is larger.
[0061]
The cluster boundary determining unit 22 tentatively determines the number of the cluster having the largest value of the expression (7) as the cluster number of the prototype of interest (S15), and stores the contents of the tentative determination (S16).
[0062]
Here, if the prototype classified into the cluster “1” shown in FIG. 6A is the target prototype, the prototype belonging to “1” as an adjacent prototype is not calculated because there is no other prototype at first. , A prototype belonging to “4”, a prototype belonging to “5”, and a prototype belonging to “2” are calculated. For example, if the distance to the prototype belonging to “4” is the shortest, The cluster is changed from “1” to “4” (FIG. 6B). Here, the calculation may be performed not only for the adjacent prototypes but also for all the prototypes. By doing so, prototypes having a distance on the prototype map but a small distance between weight vectors can be grouped into the same cluster. However, in this case, the calculation time becomes longer. Here, a map is selected in which the distance on the prototype map and the distance between the weight vectors do not greatly differ in consideration of the evaluation using the TP in advance. It is.
[0063]
Then, the cluster boundary determination unit 22 checks whether or not all prototypes have been selected as target prototypes (S17). If there is a prototype that has not been selected (if No), the process returns to step S11 to continue the process. If all prototypes have been selected in step S17 (Yes), the cluster number update process ends.
[0064]
The cluster boundary determining unit 22 returns to the processing illustrated in FIG. 3 and compares the content provisionally determined in this way with the cluster number before the update processing, and determines whether the cluster number has changed (the cluster number is still converging). Is checked (S4), and if there is a change (if Yes), the provisionally determined content is newly set as a provisional clustering result, and the process S3 is repeatedly executed. Further, if there is no change in the process S4 (if No), that is, if convergence, the current clustering result is output.
[0065]
Note that instead of the method of determining Cd in the above-described process S2, a statistic of the similarity with each other prototype is calculated for each target prototype, and a predetermined statistic is further calculated for the statistic of each target prototype. Cd may be determined using the result of the processing. In this case, Cd is determined by the following equation (9).
[0066]
(Equation 10)

[0067]
Here, k is a cluster adjacent to the cluster to which the prototype whose cluster is to be determined currently belongs, and C1 is a positive constant larger than “1”. By determining Cd according to the equation (9), all prototypes belonging to at least one cluster among the adjacent clusters affect the prototype for which the cluster is determined. In addition, it becomes possible to adaptively determine a suitable Cd for each prototype.
[0068]
[Add prototype]
In the present embodiment, the cluster boundary determination unit 22 does not immediately output the clustering result as the final result, but performs processing of returning the clustering result to the prototype adding unit 34 of the map generation unit 21 at least once. . Whether or not to perform this processing is up to the designer or the user. The prototype adding unit 34 refers to the clustering result, generates a new prototype at the boundary of the cluster, and outputs the prototype map after the addition of the new prototype to the SOM learning unit 31 so that learning is performed again. Since the learning at this time is aimed at fine adjustment, for example, in learning before clustering, α (t) = 0.2 and σ (t) = 2.0 are initially set, and 700 patterns are learned 10,000 times. Even if the learning condition parameters are set to perform the learning after the addition of the new prototype, the learning condition parameters such as α (t) = 0.002, σ (t) = 1.0, and the repeated input of the pattern are 100. I do not care.
[0069]
More specifically, assuming that the cluster boundary determination unit 22 outputs a clustering result as shown in FIG. 6C with respect to the prototype map that was initially temporarily clustered as shown in FIG. A new prototype is formed at the boundary between “4” and “6” (FIG. 6D). Here, in FIG. 6D, the previous clustering result is shown in parentheses for understanding, but after adding a prototype in this way, the previous clustering result is meaningless.
[0070]
It should be noted that the new prototype does not necessarily have to be added entirely along the cluster boundary, but may be added at least partially. In this case, it is preferable that the portion to be added is determined based on the number of times (the number of patterns) the closest prototype to the learning input vector (pattern) has the shortest distance. In learning methods such as SOM learning and VQ learning, as in the case of the U-Matrix method, the density of prototypes increases at the center of a cluster and decreases at the boundary of clusters. Therefore, the chance that the prototype becomes the closest prototype to the learning input pattern is small, and when the prototype is equal to or less than the predetermined threshold, that is, a portion where the density of the prototype is lower than the predetermined threshold can be regarded as a prototype near the cluster boundary. . Therefore, if a new prototype is added to the relevant portion, it will not be added along the entire boundary, and the efficiency of re-learning and re-clustering can be improved.
[0071]
Further, to determine the weight vector of the new prototype to be added, the weight vector is determined based on a predetermined statistical operation result (for example, an arithmetic average value) of the weight vector of the existing prototype near the position (for example, the boundary portion) to be added. .
[0072]
[motion]
Next, the operation of the data classification device 1 according to the present embodiment will be described. First, the learning condition setting unit 33 outputs a plurality of learning condition parameter sets S1, S2,... SM, and the SOM learning unit 31 generates (M) prototype maps corresponding to the number of learning condition parameter sets. Generated. The SOM learning unit 31 generates a predetermined feature amount vector based on learning image data input from the outside, and adjusts a connection weight between each prototype of each prototype map and each component of the feature amount vector. This operation of the SOM learning unit 31 is widely known by the description of Kohonen et al. In the present embodiment, the learning image data is face image data obtained by photographing a person's face. Examples of the face image data include those that have received flash light, those that have not received flash light, and those that have received light (insolation) from the side / top. ) Face image data with different shooting conditions, such as those who received the subject, those with opened / closed subjects, those with / without glasses, those with laughing / angry expressions, and those facing forward / A plurality of face image data with different properties of the subject, such as a face down.
[0073]
The plurality of prototype maps generated by the SOM learning unit 31 are output to the map selection unit 32, and the map selection unit 32 calculates a quantization error (QE) and a topological product (TP) based on an operation related to a prototype included in each map. , The degree of coincidence of the relative positional relationship between the space in the input layer (the space of the feature set) indicated by TP and the space in the competitive layer (the space of the prototype), that is, the weight A map having a high degree of matching between the distance between the vectors and the distance in the competitive layer is selected. This reduces the distance on the map between prototypes that respond to similar image data.
[0074]
Then, based on the set of learning condition parameters used for learning the selected map, the learning condition setting unit 33 generates a plurality of sets of learning condition parameters again and outputs them to the SOM learning unit 31, and the plurality of maps are generated again. Then, a map selection based on the QE and the TP is performed. Thus, the learning condition parameters are adjusted recursively, and learning formation of the map is performed recursively.
[0075]
For a map obtained as a result of such recursive learning, the cluster boundary determining unit 22 sequentially selects prototypes on the map, and selects maps having a large correlation amount between the selected prototype and prototypes adjacent thereto. Combine them into one cluster. That is, the cluster to which each prototype belongs is determined based on the adjacency and the correlation amount of the prototype on the map. This process is repeatedly executed, and when the result of the clustering converges, the result of the clustering is output to the prototype adding unit 34.
[0076]
The prototype adding unit 34 generates a map in which a new prototype is added to the boundary of the cluster, outputs this map to the SOM learning unit 31, sets predetermined learning conditions, and performs learning again. In this case, only one set of learning condition parameters is required, and therefore only one map is required. Therefore, when the learning process for this one map is completed, the map is directly output to the cluster boundary determining unit 22 (without passing through the map selecting unit 32), and the cluster boundary determining unit 22 performs the clustering process again.
[0077]
FIG. 9 shows an example of a map in a case where the clustering process is performed and learning formation is performed using a plurality of face image data. In FIG. 9, each prototype 51 is assigned a cluster number. Here, number 1 is the number of the cluster to which the face image data of “face with flash and closed mouth” belongs, as shown in FIG. 10A, and number 2 is as shown in FIG. 10B. The number of the cluster to which the face image data of “the face with the flash and the mouth open” belongs, and the number 3 is the “face without the flash and the mouth closed” as shown in FIG. The number of the cluster to which the face image data belongs, and the number 4 is the number of the cluster to which the face image data of “the face with no flash and open mouth” as shown in FIG. Each cluster is defined by two conditions of “presence / absence of flash” and “open / close of mouth”. As described above, in the present embodiment, a self-organizing map divided into a plurality of clusters defined by one or more shooting conditions or the properties of a subject is learned and formed. Note that this map is a simple example for convenience of explanation, and in another example, a map is clustered according to other various shooting conditions and the properties of the subject, and a map with a cluster number is formed by learning. .
[0078]
Then, a map obtained as a result of the clustering process is subjected to a classification process. That is, a feature amount vector is generated for image data input as a classification target, and a prototype having the largest connection weight with respect to this feature amount vector (a prototype that responds to input image data) is found. Then, the number of the cluster to which the prototype belongs becomes the classification number of the image data. Thereby, a specific classification number is determined for image data similar to each other, and a different classification number is determined for image data different from each other. For example, when face image data of “a face with a flash and an open mouth” is input, the feature amount vector of the face image data is changed to the prototype numbered 2 and the connection weight of the map shown in FIG. Is determined to be large. As a result, the classification number 2 is determined for the input face image data. Then, this self-organizing map is output, and the result is displayed on the display 16 and printed by a printer (not shown).
[0079]
In the data classification device of the present embodiment, the face image data is classified using the self-organizing map formed as described above, and the face image data is classified based on the shooting conditions and the detailed conditions of the subject. There are effects that can be. For example, face image data with different shooting conditions such as those that received / not received flash light, those that received light (solar radiation) from the side / from above, those that opened / closed their eyes, and those who wear glasses Face image data can be classified according to the characteristics of the subject, such as objects that do not call, things that do not call, laughing / angry / objects with troubled expressions, objects facing the front / objects facing down / objects facing up There is an effect that can be done. Note that the shooting conditions and the properties of the subject are not limited to those described here.
[0080]
In the present embodiment, for example, face image data classified as “a face with a flash and an open mouth” and face image data classified as a “face with a flash and a closed mouth” are: Although the fact that "flash is present" is common, another classification number is determined. In order to avoid this, the data classification device may be provided with a function of extracting face image data of common shooting conditions or face image data of the nature of a common subject, which will be described below.
[0081]
In the hard disk 14, a correspondence table is stored in advance, which corresponds to one shooting condition or one subject property and is associated with a cluster classification number including the feature thereof. Taking the correspondence table of the map shown in FIG. 9 as an example,

classification numbers

2 and 4 are set for the property of the subject “face with opened mouth”, and classification is performed for the shooting condition of “with flash”.

Numbers

1 and 2 are set. The setting of the classification number may be performed by a person visually confirming face image data belonging to each classification number and selecting and setting a classification number having a common feature, or may be performed by another method. .
[0082]
After performing a classification process on the plurality of face image data using the self-organizing map, the CPU 11 stores the face image data in the hard disk 14 in association with the classification number. Here, when there is a command from the user for requesting one image capturing condition or image data of the object property, the CPU 11 reads out the cluster classification number corresponding to the image capturing condition or the object property from the correspondence table, and performs those classifications. All face image data associated with the number is extracted. Then, these extracted face image data are provided to the user. In this way, face image data having common features is extracted. For example, when there is a command for requesting the face image data of “flash present”, the CPU 11 reads the

classification numbers

1 and 2 from the correspondence table and extracts FIGS. 10A and 10B in which these classification numbers are associated. I do.
[0083]
Further, face image data having common features may be extracted as follows. In this method, first, a plurality of self-organizing maps are prepared by the map learning forming method described above. These plural maps have different classification conditions for face image data. For example, the first map has a fine classification condition, and the presence / absence of a flash and the opening / closing of the mouth are determined as in the map shown in FIG. 9, whereas the second map has a rough classification condition and only the presence / absence of a flash is determined. The opening and closing of the mouth is not determined. Such maps having different classification conditions have different feature amounts and parameters at the time of learning formation of the map, change a coordinate system projected by the map, or have a melting algorithm (Kenneth algorithm) as a means for obtaining a hierarchical structure of clusters. Rose et al., "Statistical Mechanisms and Phase Transition in Clustering", Phys. Rev. Lett. 65, pp. 945-948 (1990)).
[0084]
The CPU 11 first classifies face image data using the first self-organizing map with fine classification conditions. For example, in the first self-organizing map, face image data of “a face with a flash and an open mouth” and face image data of “a face with a flash and a closed mouth” are compared with each other. The classification numbers of the different clusters are determined. After that, when there is a user's instruction, the CPU 11 further classifies the face image data using the second self-organizing maps having different classification conditions. As a result, a classification number of the same cluster is determined for the two face image data as the face image data of “with flash”. By performing such processing, the same type of face image data is extracted and provided to the user. The re-classified face image data may be stored in the hard disk together with the classification number determined by the first self-organizing map and the classification number determined by the second self-organizing map.
[0085]
In the description so far, learning is performed by recursively adjusting the learning condition parameters, determining a cluster using the degree of correlation between prototypes, adding a prototype after determining the cluster, re-learning, and determining the cluster again. However, the technique of adding a prototype may be independently applied to learning and formation of a prototype map that has already been used and a clustering technique. In this case, not only SOM but also VQ learning may be used for learning the prototype map.
[0086]
[Second embodiment]
Next, a data classification device according to a second embodiment of the present invention will be described. The second embodiment is different from the first embodiment in the method of learning and forming the correspondence between face image data and clusters. In the second embodiment, pattern data is directly clustered. As shown in FIG. 11, the data classification program according to the present embodiment includes a cluster determination unit 41 and a classification unit 42.
[0087]
The cluster determining unit 41 operates when performing learning for clustering, and performs a clustering process, which will be described in detail later, generates a clustering result (a so-called cluster filter), and outputs the result to the classification unit 42. The classification unit 42 operates when executing the actual classification processing, stores the input clustering result (for example, stores it on the hard disk 14), and refers to the cluster filter for the input pattern data. , To which cluster the data should be classified, and outputs the result of the determination as a classification result. The details of the processing of the classification unit 42 will also be described later.
[0088]
[Details of processing]
Here, details of the processing of the cluster determination unit 41 and the classification unit 42 in the data classification program executed by the CPU 11 will be described. First, details of the processing of the cluster determination unit 41 will be described. The CPU 11 performs the processing illustrated in FIG. 12 as the processing of the cluster determination unit 41, stores the input N pattern data in the RAM 12 or the hard disk 14, and sequentially changes the input pattern data from “1” to “N” in the input order. Are temporarily assigned as cluster numbers (S21). Further, for each pattern data, an amount characterizing the property is calculated as a pattern vector. The CPU 11 stores the (temporary) cluster number and the pattern vector in association with the corresponding pattern data. The similarity between the pattern vectors is calculated by a predetermined function (S22). Here, as a function used for calculating the similarity, for example, a measure between pattern vectors, specifically, a sum of squares of a difference of each component between pattern vectors is used. That is, a combination of two pattern data is selected from a plurality of pattern data, a measure between two pattern vectors of the two pattern data related to each combination is calculated as a similarity, and the similarity is stored in the RAM 12 as a similarity table. Remember.
[0089]
In addition, here, a parameter α to be used for calculating a correlation value later is calculated by the equation (8). Specifically, the similarity calculated in the process S22 is classified for each class (predetermined numerical range) to generate information on the appearance frequency for each class, and the distance at which the appearance frequency is maximum is Cd, A predetermined minute amount δ close to “0” is determined, and α is calculated. This process is similar to the process of calculating the similarity between prototypes and determining α as described above.
[0090]
In this case as well, for Cd, instead of the distance at which the appearance frequency becomes the maximum, the statistic of the similarity with other pattern data is calculated for each pattern of interest, and The Cd may be determined by using a result obtained by further performing a predetermined statistical process (minimum value of each statistic) on the statistic (i.e., the equation (9)). By determining Cd according to the equation (9), all the pattern data belonging to at least one cluster among the adjacent clusters affects the pattern data to be determined. Also, it becomes possible to adaptively determine a suitable Cd for each pattern data.
[0091]
Then, the CPU 11 starts the process of updating the cluster number (S23). The process of updating the cluster number is as shown in FIG. 13 and will be described later in detail. When the cluster number update process is completed, the CPU 11 checks whether there is no change in the cluster number (whether or not the convergence of clustering is completed) before and after the process S23 (S24). (Case: No), the process S23 is repeatedly executed. In step S24, when there is no change (when clustering converges or when Yes), the processing is completed, and the result of the clustering (including information that associates the pattern data with the cluster number) is stored on the hard disk. 14 is stored.
[0092]
Here, the updating process of the cluster number in the process S23 will be described with reference to FIG. First, the CPU 11 sequentially selects each pattern data as pattern data of interest (S31). This selection order may be, for example, the order (for example, the input order) in which the temporary cluster numbers are allocated in the processing S21. Then, the CPU 11 obtains the cluster number currently assigned to the pattern data of interest, and determines a nearby cluster (S32). Here, the neighboring cluster number is obtained, for example, by referring to a similarity table, extracting a plurality of pattern data in descending order of similarity with the pattern data of interest, and assigning a cluster assigned to each of the extracted plurality of pattern data. Number. The number of pattern data to be extracted may be determined in advance as “8” or the like, or until the number of clusters determined as neighboring clusters becomes a plurality (for example, a predetermined number such as “4”). You may keep it. Here, the nearby cluster includes a cluster to which the pattern data of interest belongs.
[0093]
Using the number of the cluster determined as the neighboring cluster, the CPU 11 extracts the pattern data currently belonging to each neighboring cluster for each of the determined neighboring clusters (S33), and retrieves the pattern data extracted for each of the neighboring clusters. Then, the correlation amount between the target pattern data and the target pattern data is calculated by Expression (7) (a function that gradually approaches “0” more than a predetermined differential value as the similarity decreases) (S34). That is, y * is set as the pattern vector of the pattern data of interest, the pattern vector of the i-th pattern data among the extracted pattern data is set as yi, and the pattern vector of the pattern data belonging to the cluster number c is calculated by Expression (7). Then, a value is calculated by dividing the total sum of the distances (similarity) between the pattern vector of the pattern data of interest and the pattern vector by the overall average. This value is used as the correlation amount between the pattern data of interest and the pattern data group belonging to the cluster number c.
[0094]
The CPU 11 calculates the amount of correlation between the pattern data group belonging to that cluster and the pattern data of interest for each neighboring cluster, and selects the cluster having the largest correlation amount from the neighboring clusters (S35). Then, the pattern data of interest and the number of the selected neighboring cluster are associated with each other and stored in the RAM 12 as a temporary update result (S36).
[0095]
Then, the CPU 11 checks whether or not all the pattern data have been selected as the pattern data of interest (S37), and if not all the pattern data have been selected (if there is an unselected one, if No), the processing is performed. The process returns to S31 and continues. If there is no unselected one in the process S37 (if Yes), the cluster number currently associated with each pattern data is updated based on the provisional update result stored in the RAM 12 (S38). Then, the cluster number update process is completed. Here, the temporary update result is stored once, and the last update is performed by the update process for a certain target pattern data, the determination of the cluster number of the pattern data to be selected later as the target pattern data Is not affected.
[0096]
As described above, in the present embodiment, the CPU 11 defines a pattern data group belonging to each cluster and a clustering ( The process of classifying the pattern data of interest into clusters having higher correlation based on the amount of correlation with the pattern data of interest that is the subject of classification learning) is repeated until there is no change in the clustering result. Note that, here, the neighborhood cluster is determined, and then the calculation of the correlation amount is performed in relation to the neighborhood cluster. However, if the processing speed of the CPU 11 is sufficient, all clusters are brute-forced. Alternatively, the calculation of the correlation amount may be performed. In this case, steps S32 and S33 are not always necessary, and in step S34, the correlation amount between the pattern data group belonging to each cluster and the pattern data of interest is calculated for all clusters.
[0097]
Next, a process of actually classifying data using the clustering result generated as described above (process of the classification unit 42) will be described. When pattern data to be classified (target pattern data) is input, the CPU 11 calculates a pattern vector (target vector) for the target pattern data by calculation, and obtains a clustering result (for example, Referring to the pattern data, the pattern vector of the pattern data, and the cluster number, the distance between each pattern vector (reference vector) included in the clustering result and the target pattern vector is determined. Calculate. Then, a reference vector having the shortest distance (a reference vector having the highest similarity to the target pattern vector) is found, and a cluster number associated with the found reference vector is output as a classification result.
[0098]
Also in the above-described embodiment, since the correspondence between the image data and the cluster is learned and formed, the face image data can be classified using this correspondence.
[Brief description of the drawings]
FIG. 1 is a configuration block diagram of a data classification device according to a first embodiment of the present invention.
FIG. 2 is a configuration block diagram of a data classification device according to the first embodiment of the present invention.
FIG. 3 is a flowchart illustrating clustering processing.
FIG. 4 is an explanatory diagram illustrating an example of detection of a histogram of the distance between prototypes.
FIG. 5 is a flowchart illustrating an example of a cluster update process in the clustering process.
FIG. 6 is an explanatory diagram illustrating an operation example of a clustering process.
FIG. 7 is an explanatory diagram illustrating an example of a clustering result of a prototype map.
FIG. 8 is an explanatory diagram illustrating an example of a prototype addition state and a subsequent clustering result.
FIG. 9 is an explanatory diagram illustrating an example of a clustering result of a self-organizing map of face image data.
FIG. 10 is an explanatory diagram showing an example of a plurality of types of face image data.
FIG. 11 is a configuration block diagram of a data classification device according to a second embodiment of the present invention.
FIG. 12 is a flowchart illustrating a clustering process.
FIG. 13 is a flowchart illustrating a clustering process.
[Explanation of symbols]
1 data classification device, 11 CPU, 12 RAM, 13 ROM, 14 hard disk, 15 image input interface, 16 display, 17 external storage unit, 21 map generation unit, 22 cluster boundary determination unit, 31 SOM learning unit, 32 map selection Part, 33 learning condition setting part, 34 prototype addition part, 41 cluster determination part, 42 classification part, 50 map, 51 prototype.

Claims

A face image data classification device that classifies face image data in which a person's face is captured,
Using a plurality of face image data, learning forming means for learning and forming a correspondence relationship between arbitrary face image data and a plurality of clusters defined by at least one of the shooting conditions and the properties of the subject,
Cluster determining means for determining a cluster to which face image data belongs, using the learned and formed correspondence;
A face image data classification device, comprising:

The data classification device according to claim 1,
A face image data classifying device comprising extraction means for extracting, from face image data belonging to each cluster, face image data shot under a common shooting condition or face image data having a property of a common subject.

The data classification device according to claim 1 or 2,
The learning forming means,
(A) Provisionally determine the cluster to which each prototype belongs,
(B) sequentially selecting each prototype as a target prototype to be classified into a cluster,
(C) for each cluster, calculating a measure between at least one prototype belonging to each cluster and said prototype of interest;
(D) changing the cluster to which the prototype of interest belongs as necessary based on the calculated measure;
A data classification apparatus, wherein the processing of (b), (c), and (d) is repeated until there is no change in the cluster to which each prototype belongs, and the prototypes are classified into clusters.

The data classification device according to claim 1 or 2,
The learning forming means,
(A) For each of the face image data, determine a cluster to which the face image data belongs;
(B) Each face image data is sequentially selected as attention face image data to be classified into a cluster,
(C) calculating, for each cluster, a predetermined correlation value between at least one piece of face image data belonging to the cluster and the face image data of interest to be classified into the cluster;
(D) determining a cluster to which the face image data of interest belongs, based on the correlation value;
A data classification characterized by repeating the processes (b), (c) and (d) until there is no change in the cluster to which each face image data belongs, and classifying each face image data into clusters. apparatus.

A face image data classification method for classifying face image data obtained by capturing a face of a person,
Using a plurality of face image data, learning formation step of learning and forming a correspondence relationship between arbitrary face image data and a plurality of clusters defined by at least one of the shooting condition and the property of the subject,
A cluster determining step of determining a cluster to which the face image data belongs using the learned and formed correspondence;
A face image data classification method characterized by including:

The data classification method according to claim 5, wherein
A face image data classification method including an extraction step of extracting, from face image data belonging to each cluster, face image data shot under a common shooting condition or face image data having a property of a common subject.

The data classification method according to claim 5, wherein:
In the learning forming step,
(A) Provisionally determine the cluster to which each prototype belongs,
(B) sequentially selecting each prototype as a target prototype to be classified into a cluster,
(C) for each cluster, calculating a measure between at least one prototype belonging to each cluster and said prototype of interest;
(D) changing the cluster to which the prototype of interest belongs as necessary based on the calculated measure;
A data classification method characterized by repeating the processes (b), (c), and (d) until there is no change in the cluster to which each prototype belongs, and classifying each prototype into a cluster.

The data classification method according to claim 5, wherein:
In the learning forming step,
(A) For each of the face image data, determine a cluster to which the face image data belongs;
(B) Each face image data is sequentially selected as attention face image data to be classified into a cluster,
(C) calculating, for each cluster, a predetermined correlation value between at least one piece of face image data belonging to the cluster and the face image data of interest to be classified into the cluster;
(D) determining a cluster to which the face image data of interest belongs, based on the correlation value;
A data classification characterized by repeating the processes (b), (c) and (d) until there is no change in the cluster to which each face image data belongs, and classifying each face image data into clusters. Method.

A face image data classification program for classifying face image data obtained by capturing a face of a person,
A learning forming step of learning and forming a correspondence between a plurality of clusters defined by at least one of imaging conditions and at least one of the properties of the subject, using a plurality of face image data,
A cluster determining step of determining a cluster to which the face image data belongs by using the correspondence formed by learning;
A face image data classification program characterized by including:

A data classification program according to claim 9,
A face image data classification program including an extraction step of extracting, from face image data belonging to each cluster, face image data shot under a common shooting condition or face image data having a property of a common subject.

It is a data classification program according to claim 9 or 10,
In the learning forming step,
(A) Provisionally determine the cluster to which each prototype belongs,
(B) sequentially selecting each prototype as a target prototype to be classified into a cluster,
(C) for each cluster, calculating a measure between at least one prototype belonging to each cluster and said prototype of interest;
(D) changing the cluster to which the prototype of interest belongs as necessary based on the calculated measure;
A data classification program characterized by repeatedly performing the processes (b), (c), and (d) until there is no change in a cluster to which each prototype belongs, and classifying each prototype into a cluster.

It is a data classification program according to claim 9 or 10,
In the learning forming step,
(A) For each of the face image data, determine a cluster to which the face image data belongs;
(B) Each face image data is sequentially selected as attention face image data to be classified into a cluster,
(C) calculating, for each cluster, a predetermined correlation value between at least one piece of face image data belonging to the cluster and the face image data of interest to be classified into the cluster;
(D) determining a cluster to which the face image data of interest belongs, based on the correlation value;
A data classification characterized by repeating the processes (b), (c) and (d) until there is no change in the cluster to which each face image data belongs, and classifying each face image data into clusters. program.