JP2002511175A

JP2002511175A - Image recognition method

Info

Publication number: JP2002511175A
Application number: JP54819999A
Authority: JP
Inventors: 孝美里中; 康二浅利; 孝明馬場
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1998-03-23
Filing date: 1998-03-23
Publication date: 2002-04-09
Also published as: WO1999049414A1

Abstract

(57)【要約】矩形領域に分割した２次元画像の輝度信号から量子化した輝度分布により、３次元空間の対象の特徴量を抽出し、さらに、その特徴量を固定した基底の直交変換により、情報量を低減して、特徴量入力ベクトル（１６）をニューラルネットワークで識別する。図４は本発明のニューラルネットワークは、対象特徴量である参照ベクトル(２０)と分散ベクトル（２１）を記憶するレジスタ、差の演算器（１７）と乗算器（１８）と２乗累積加算器（１９）とにより構成され、画像の特徴量パターン分布を考慮した距離により任意の対象から得られた２次元画像を用いて、対象の属性を識別する。少ない学習で、３次元空間の対象を識別するために、画像合成、回転の学習、画像の特徴量パターンの分散を考慮した学習の最適化を提供する。 (57) [Abstract] A feature amount of a target in a three-dimensional space is extracted from a brightness distribution quantized from a brightness signal of a two-dimensional image divided into a rectangular area, and furthermore, an orthogonal transformation of a basis in which the feature amount is fixed is performed. , The amount of information is reduced, and the feature amount input vector (16) is identified by a neural network. FIG. 4 shows a neural network according to the present invention which includes a register for storing a reference vector (20) and a variance vector (21), which are target feature amounts, a difference arithmetic unit (17), a multiplier (18), and a square accumulator. (19), and the attribute of the target is identified using a two-dimensional image obtained from an arbitrary target at a distance in consideration of the feature pattern distribution of the image. In order to identify an object in a three-dimensional space with a small amount of learning, image synthesis, rotation learning, and optimization of learning in consideration of variance of a feature amount pattern of an image are provided.

Description

【発明の詳細な説明】画像認識方法技術分野本発明は、３次元空間の物体（対象）を２次元画像の特徴量により識別する画像認識方法に関するものである。背景技術従来の３次元空間の物体の認識方法では、３次元対象の代表的特徴を２次元画像の集合として構成し、２次元の画像データの集合で表された特徴として物体の種類を識別した。３次元物体の画像を２次元画像の輝度信号で構成すると、輝度であらわされた情報量の次元が非常に大きくなるので、低次元の空間で表現する必要があった。従来法（Face Recognition：Hybrid Neural Network Approach T echnical Reports，CS-TR-3608、University of Maryland、１９９６で提案された）ニューラルネットワークでは、高次元空間の対象物体の特徴的な情報を低次元の特徴空間で損失なく表現するために、主成分分析、Ｋａｈｕｎｅｎ−Ｌｏｅｖｅ変換が用いられた。画像の特徴量の高次元空間における分布より、その分布に依存した最適な基底を求めて、対象の特徴量抽出を行った。多層型ニューラルネットワークは、入力パターンを重み付けする積和演算による識別素子から構成される。パターンを入力する入力層、識別結果を出力する出力層、識別素子を配列した隠れ層からなる。入力層の入力端子の数は、入力パターンと同じ次元である。隠れ層の積和演算を行う識別素子は、入力層の入力パターンを入力して、出力層に演算結果を出力する。出力層の積和演算を行う識別素子は、識別すべきクラスの数と同数である。入力パターンが入力端子より入力されたときに、出力層の識別素子は、それぞれの識別すべきクラスへの帰属度を出力する。入力パターンに対して、入力パターンの属性を代表するクラスを識別する出力層の識別素子が最大値を出力する。教師信号付き学習により、層間の信号の重み係数を、誤差逆伝播法で更新する。誤差逆伝播法では、入力パターンと出力の期待値とが教師信号として、入力層、出力層に提示される。滑らかのシグモイド関数に対して０，１を入力する。 kohonenは、対象の属性を代表する参照特徴パターンを記憶し、距離計算を行う単層ニューラルネットを提案した。それは、参照特徴パターンと任意の入力特徴パターンのユークリッド距離を判別した。（Kohonen，The self-organizing m ap，Proc．of the IEEE 78,1465-1480，1990）認識性能をさらに改善するために、距離演算型ネットワークでは、対象のクラスに属する特徴パターンの帰属度（尤度）を決める距離として、統計的分布を考慮したマハラノビス距離が用いられた。画像通信を用いた画像認識では、ＭＰＥＧ、ＪＰＥＧ等の圧縮画像を転送し、受けてそれを復元してから、特徴抽出を行い、画像の特徴を判別した。従来、三次元空間の対象物体の画像を用いて対象物体を識別する方法の課題を以下に述べる。３次元空間の物体の画像では、対象物体の自由度が高く、対象の回転、位置の変化、照明の変化に識別の性能が依存した。３次元の対象を２次元画像で構成する場合に、少なくとも大容量メモリを必要とした。情報量の削減のために用いる主成分分析、直交変換では、識別する対象の特徴を表現する基底がデータの分布に依存した。しかも、その基底の計算に膨大な演算量を必要した。多層ニューラルネットワークは、複雑なネットワーク構成を持つので、学習時間が長くなった。たとえば、Lawrenceらの方法では、４０人の１０種類の顔画像からなるデータベースで、５種類の顔画像の特徴を学習されるのに、４時間を要した。このように、少ない数の学習用画像サンプルを用いて、複雑なネットワークを効率的に学習させることが課題であった。さらに、有限個の比較的小数のデータベースを基にして対象識別を行うには、対象の特徴量の統計分布が正規分布からのずれが大きかった。学習のサンプル数が少数に限定されている場合、統計の分散の評価を考慮した画像の特徴量の識別は、分散を考慮しない場合に比べて、認識率が低下することがあった。特に、統計の分散の評価が困難であった。発明の開示次に本発明は、上の課題を解決するために、以下のような画像認識方法を提供する。第一手段は、輝度、色信号の量子化の手段である。ある属性を持つ対象の２次元画像の全画素から求めた輝度信号の度数分布とその累積輝度分布を用いて、量子化した輝度階調に属する画素数が総て等しくなるように、輝度を入力し、量子化した輝度を出力する量子化テーブルを構成する。２次元画像の全ての画素の輝度信号を量子化テーブルに入力して量子化する。第２手段は、矩形領域を基本単位とする特徴量抽出の手段である。第１の手段で得れた２次元画像を矩形領域単位に分割し、矩形領域ごとに、量子化された輝度、色信号の各階調に属する画素数を求める。２次元画像から求められた３次元配列の特徴量ｈ_(x,y,z)と記述する。ｘ，ｙは、矩形領域座標、ｚは、量子化された輝度階調を示す。第３の手段は、特徴パターンの次元の削減の手段である。特徴パターンは、第１、２の手段を用いて、任意の対象の２次元画像から求められた３次元配列の特徴量ｈ_(x,y,z)を変換することにより求まる。各階調ごとに、矩形領域（ｘ，ｙ）の水平座標、垂直座標方向に２次元画像を取り出し、２次元画像の矩形領域の画素数に２次元離散コサイン（もしくはサイン）を変換し、周波数成分を、求める。認識率が最大となるように、周波数成分の数を選択する。第４の手段は、特徴量の次元の削減の手段である。第１、２の手段を用いて対象の２次元画像から求められた３次元配列の特徴量パターンｈp(x,y,z)を３次元離散コサイン（もしくはサイン）変換し、周波数成分を求め、認識率が最大となるように、低周波成分の特徴量を用いることにより、特徴量の次元の削減を行う。第５の手段は、相関距離による特徴量識別の手段である。特徴パターンは、３次元マトリックス構造をもつ要素の画素数、周波数成分で表される。３次元マトリックス構造は、第１、第２、第３の手段、もしくは、第１、第２、第４の手段により、同一の属性を持つ、対象を走査して得られた画像データベースから求める。参照ベクトルと特徴パターンとの距離は、以下の式とする。（座標（Ｐ１，Ｐ２，Ｐ３）は、３次元画素数の成分（ｘ、ｙ、ｚ）、２次元ＤＣＴの周波数成分（ｕ，ｖ，ｚ）、３次元ＤＣＴの周波数成分（ｕ，ｖ，ｗ）と同一である。距離は、正規分布を仮定した確率関数の対数値の２倍である。同一の属性ｑの対象の二次元画像のデータベースから、第１、２の手段により求めた３次元配列の各要素成分の特徴量ｈqp(x,y,z)の統計的分布の中心μqp(x,y, z)と分散σqp(x,y,z)を求め、これらを参照の基準として、任意の対象の２次元画像から第１、２の手段で得た３次元配列の特徴量ベクトルｈp(x,y,z)と参照ベクトルμqp(x,y,z)との距離を計算する。任意の２次元画像の３次元配列の特徴量ベクトルｈp(x,y,z)は、複数個Ｍqの種類の属性を有する対象の二次元画像の集合から得られた参照ベクトルとの相互距離が最小である参照ベクトルで代表される種類ｍに属すると判定する。第６の手段は、ニューラルネットワークの学習を利用した距離による特徴量識別の手段である。ニューラルネット識別素子の素子数は、識別すべき属性の数と同じである。識別素子は、３次元配列の要素成分の個数と同数の入力端子とそれと全数結合した出力素子を有している。任意の対象の２次元画像から特徴量入力ベクトルを入力したとき、その対象の属性を代表する参照ベクトルを記憶している識別素子が最小の出力値を出力する。識別素子は、３次元配列の各要素成分の特徴量の統計的分布の中心、分散を参照、分散ベクトルを学習係数として記憶し、任意の対象の２次元画像から求めた特徴量入力ベクトルと参照ベクトルとの距離を出力端子より出力する。３次元マトリックス構造をもつ要素の画素数、周波数成分の特徴パターン分布の中心、分散に対応した参照、分散ベクトルをニューラルネットワークの学習で求める。第７の手段は、３次元マトリックス構造をもつ要素の画素数、周波数成分の特徴パターンの分布の混合モデルを用いた画像認識である。３次元マトリックス構造は、第１、第２、第３の手段、もしくは、第１、第２、第４の手段により、同一の属性を持つ、対象を走査して得られた画像データベースから求める。相関距離計算のために、あるセットのクラスからなる混合クラスを構成する。ある属性をもつ物体の特徴パターンのクラスについて、異なる属性をもつ特徴パターンが混合クラスに加える。混合クラスに属する最適なクラス数を選択する。最尤度分散ベクトルの成分を、ある特徴量入力パターンの分散ベクトルの成分と、それと異なる属性をもつ対象の分散ベクトルの成分との相互情報量の基準値を最少化することによって、３次元マトリックス構造をもつ各成分について、画素数成分、周波数成分の混合分布から求める。情報基準値は、任意の対象の２次元画像から得られた特徴量入力ベクトルの分散の対数値、その対象とは異なる対象の２次元画像から得られた特徴量の分散の対数値とを用いて定義される。第８の手段は、３次元マトリックス構造をもつ要素の画素数、周波数成分の特徴パターン分布の中心と分散に対応した参照ベクトルと分散ベクトルを求めるニューラルネットワークに入力する特徴量ベクトルの作成手段である。任意の対象の特徴量の入力ベクトルにおいて、特徴量ベクトル空間において、その特徴量ベクトルと第５の手段における距離が最も近い異なる対象の入力画像の特徴量ベクトルが選択される。これらの特徴パターンを相互に混合して得られる特徴量入力ベクトルを、ニューラルネットワークに入力する。その付加する他の属性の対象の特徴量その混合比を学習回数とともに低減させる。第９の手段は、任意の属性をもつ３次元空間の対象からえられた複数の２次元画像のデータセットから、第１の手段により求めた２次元画像の画素数、周波数成分の３次元配列の特徴パターンを遠隔地から受信して符号化した信号を有線もしくは無線の媒体を通じて、第３手段で送られてきた画像の対象の属性を判定する。第１０の手段は、第１、２の手段により用いられる画像のデータベースの構成方法である。対象の画像データベースは、３次元対象を一定の角度ごとに回転させながら、もしくは、３次元対象をある時間間隔で、撮影した２次元画像から構成させる。本発明は先述の課題を解決するもので、以下の作用をもつ。本発明による画像認識方法で用いるニューラルネットは、回転、位置、照明の変化に対して不変の特徴量を抽出する作用がある。ヒストグラムを用いた量子化により、照明の変化に不変な特徴量を用いているので、外界の照明の変化に対して強い識別する作用がある。本発明の特徴量の空間では、３次元物体の輝度の変化を、量子化した輝度空間において、分散して記憶する作用がある。画像の特徴量の画像の統計的なばらつきを最適化する作用がある。図面の簡単な説明図１は画像通信での本発明の対象識別法の実施例である。図２は２次元画像のブロック分割とサンプリングによる特徴抽出の実施例である。図３は本発明の画像認識ニューラルネットワークの構成図である。図４はニューラルネット識別装置のハードウエアの実施例である。図５はＤＣＴ係数の数、誤認識率、平均絶対誤差の関係である。図６はヒストグラム正規化の有無による学習性能の比較である。図７は本発明のニューラルネット学習のための画像合成の実施例である。図８は本発明の画像合成による学習データを用いた認識性能である。図９は混合クラスの関数である最尤度に混合クラスの構造である。図１０は平均分散、誤認識率、混合クラスのサンプル数との関係である。図１１は本発明の回転を考慮した対象認識方法の説明図である。図１２は回転角０〜６０度の変動を有する回転物体の認識性能である。図１３は様々な方法で得たシステム性能である。発明を実施するための最良の形態（実施例１）本発明における画像認識方法の実施例である。図１は、画像通信での本発明の対象識別法の説明図である。１は２次元画像の輝度信号入力、２は輝度信号の量子化、３は２次元画像の矩形領域単位の特徴抽出、４は特徴量の情報圧縮、５は特徴量の符号化、６は特徴量の暗号化、７は特徴量の解読化、８は特徴量の復号化、９はニューラルネットワークによる識別である。（１）、原画像は、ＣＣＤ（固体撮像素子）で撮影される。（２）、原画像の輝度、色信号の階調は、非線型テーブルにより、量子化された階調に変換される。（３）、量子化された画像は、複数の矩形領域に分割される。サンプリング器が矩形領域ごとの量子化された階調に属する映像信号を持つ画素数を求める。（４）、操作３のサンプリング得れた３次元のマトリックス構造で表された画素数は、請求の範囲第１項〜第３項の方法により、２次元もしくは３次元ＤＣＴにより変換される。ＤＣＴ係数の低周波成分は、対象識別の３次元の特徴量として、選択される。（５）、ＤＣＴの低周波成分から抽出された３次元の特徴パターンは、無線、有線の転送線より符号化、暗号化されて転送される。（６）、受信側で、転送された３次元の特徴パターンは、解読、復号化される。（７）、復号されたＤＣＴ係数の低周波成分は、ニューラルネットワークに入力される。転送器は、原画像を転送することなしに受信器側で、遠隔認識を実現する特徴抽出と圧縮の機能がある。本発明で用いる特徴抽出は、ＪＰＥＧ、ＭＰＥＧの規格と互換性のあるＤＣＴが用いられる。有線、無線の送信側で３次元配列の画素数で代表する２次元画像の特徴量入力ベクトルを符号化して送信し、受信側で符号化した２次元画像の特徴量の信号を受信して、ニューラルネットワークを用いて、その特徴量の符号が代表する対象の属性を判定する。本発明は、特徴量の情報圧縮は、離散コサイン変換のような固定の係数の直交変換を行う。圧縮した特徴量を符号化して転送することにより、圧縮画像を直接送信して受信側で特徴抽出と識別を行う場合に比べて、少ない情報量で受信側のニューラルネットワークを用いて対象識別を行うことができる。（実施例２）本発明の実施例では、画像データベースを用いた物体（対象）を識別するための特徴抽出と識別の基本操作を示す。本発明の対象認識方法の基本画像データベースは、Ｍq種類の対象ごとにＮp個の２次元画像から構成されている。（Ｍｑ＝４０、Ｎｐ＝１０）画像認識方法の基本操作は、（Ｍ１）非線形変換による２次元画像の全ての色、輝度信号の量子化（Ｍ２）量子化した２次元画像の矩形領域分割とサンプリングから得られた３次元メモリ構造による特徴量の表現法、（Ｍ３）３次元特徴パターンの離散コサイン変換（Ｍ４）本発明の距離の基準値を利用した特徴パターンの識別である。操作Ｍ１で、その累積輝度分布Ｇq（ｊ）を用いて、原画像の色もしくは輝度信号ｊは変換される。［数式１］ｆ（ｊ）は、２次元画像の全画素の画像信号から求めた２５６階調の度数分布である。２５６階調の画像信号ｊは、Ｎｚ階調の信号ｙに変換される。全画素の信号を量子化テーブルに入力し、最大階調ＮZの信号ｙqに変換する。得られた画像を量子化された２次元画像と呼ぶ。［数式２］図２は、２次元ブロック分割とサンプリングによる特徴抽出である。３次元のメモリ構造を用いて、対象の特徴量を代表させる。１０は、ＮｘｘＮｙ画素の矩形領域で、１１は、特徴量に用いる量子化した２次元画像の矩形領域分割、１２は、特徴量抽出した３次元配列の特徴である。操作Ｍ２の操作で、量子化した２次元画像１１は、ＮｘｘＮｙのサイズの矩形領域１０に分割され、各階調の区間に属する輝度、色信号の画素の数を計算する。矩形領域ごとにサンプリングして量子化した階調ごとに度数分布は。３次元配列ｈ_p(x,y,z)で表現される。（ここでｘ，ｙ，ｚは、水平座標、垂直座標、量子化した輝度階調座標成分で、ｘ＝１〜Ｎx、ｙ＝１〜Ｎy、ｚ＝１〜Ｎzである。）Ｎz 階調の量子化された画像がＮx ｘＮyの矩形領域に分割された場合には、ヒストグラム度数は、ＮｘｘＮｙｘＮｚの３次元配列で表現される。情報量の次元削減のために、操作Ｍ３で、ヒストグラム度数は２次元、３次元離散コサイン変換により変換させる。ここでは、２次元画像離散コサイン変換を適用すると、［数式３］を得る。（c_(V)=1/２√２（ｕ≠０）、c_(v)=1/2(u=0)，ｕ＝１〜Ｎx-1、ｖ＝１〜Ｎy-1、c）２次元画像離散コサイン変換で得られた低周波数成分を用いることにより、特徴量入力ベクトルの次元を削減できる。h^2dct _p(u,v,z)は、ｈ_p(x,y,z)の２次元離散コサイン変換である。ＮｘｘＮyのDCT係数の低周波成分ｈ^2dct _p(u,v,z)はパターン識別の特徴パターンとして用いられる。たとえば、u= ０〜N_dctx−１、ｖ=０〜N_dcty−１、ｚ=０〜N_z−１の係数を用いて、N_dctx x N_d _cty x N_zの３次元係数が得られる。実施例４で、説明する方法で、認識率を最大にするように、最適なN_dctx、N_dctyの値を選択した。第４の操作では、特徴量入力ベクトルは、正規分布の確率関数Ｐ（ｈ_(u,v,z)｜ μ_q(u,v,z)、σ_q(u,v,z)｜）の負対数尤度関数から求まる識別判定関数を用いてクラス分けされる。［数式４］ただし、μ^2dct _q(u、v,z)、σ^2dct _q(u,v,z)は、正規分布を仮定したクラスの平均、分散から得れた参照、分散ベクトルの成分である。Ｍq種類の対象の参照ベクトルとの距離が最小である参照ベクトルで代表される種類ｍに属すると判定する。［数式５］［数式６］（Ｎｐは学習サンプルの数である。）Ｍｑ個の対象をクラスわけするために、本発明では、Ｍｐ個の認識判別関数のセットを用いて、入力特徴パターンの認識判別関数が最少出力をもつクラスに帰属するとして、最尤度クラスｍ^*を同定する。［数式７］式３のかわりに、３次元ＤＣＴがヒストグラム密度の次元削減に用いる。［数式８］（実施例３）本発明の実施例では、任意属性を持つ対象の複数の２次元画像から求めた特徴量入力ベクトルをニューラルネットワークに入力し、画像認識を行う。図３は、本発明の画像認識ニューラルネットワークの構成の説明図である。１３は、入力端子、１４は、識別素子、１５は、出力端子である。ニューラルネットワークの各識別素子はＮxｘＮyｘＮz個の入力端子と１つの出力素子とから構成される。Ｍq個の種類の対象を識別するために、ニューラルネットワークは、数式４で定義される識別判別関数をもつ、Ｍq個の識別素子から構成される。任意の対象の２次元画像から求めた特徴量入力ベクトルに対して、対象を代表する参照ベクトルを記憶したニューラルネットワークの識別素子は、最小の出力値を出力する。図４は、ニューラルネットの識別装置のハードウエアの実施例である。１６は、入力端子からの入力ベクトルのレジスタ、１７は、差の演算器、１８は、乗算器、１９は、２乗の累積加算器、２０は、特徴量参照ベクトルレジスタ、２１は、特徴量分散ベクトルレジスタである。各識別素子は、メモリレジスタ２０と２１に、特徴入力パターンの正規分布から求まる参照ベクトルと分散ベクトルを保持している。それは、入力レジスタ１６に入力される３次元配列の入力特徴量パターンと参照ベクトルとの相対距離を決める。差の演算機１７を用いて、その参照ベクトルレジスタとＮxｘＮyｘＮz 個の入力端子からの特徴入力ベクトルレジスタとの差が演算される。乗算器１９は、その結果と特徴量分散ベクトルレジスタの逆数との乗算を行う。２乗の蓄積加算器１９で、すべてのベクトル成分の２乗した値の総和加算を行うことにより、参照ベクトルと特徴量入力ベクトルの特徴との距離を演算する。３次元構造の各成分ごとに、画素数、周波数成分の分布の平均と分散に対応する参照、分散ベクトルは、ニューラルネットワークの学習により、求まる。ここで、２次元ＤＣＴを用いたニューラルネットワークの学習の例を示している。クラスｐの特徴量の入力ベクトルｈ_p(u,v,z)を入力したとき、すべての識別素子は、入力ベクトルと記憶している参照ベクトルとの距離を出力する。パターンの入力と距離の最小値を出力する出力素子のクラスに対応する参照ベクトルを以下に式により変更する。入力パターンがクラスｋに属するとき、［数式９］入力パターンがクラス１に属さないとき、［数式１０］（μ^(t) _l(u,v,z)、μ^(t) _k(u,v,z)は、学習の繰返し回数ｔで、クラスｋとｌの分散ベクトルの成分を示している。）また、β1、β２は、更新係数で、ｔは、更新の回数を示す。［数式１１］［数式１２］（σ^(t) _l(u,v,z)、σ^(t) _k(u,v,z)は、学習の繰返し回数ｔで、クラスｋとｌの分散ベクトルの成分を示している。） ζ(λ)は、更新関数である。［数式１３］ λ＞０ ζ(λ)＝ζ₀．［数式１４］任意のクラスＰに対する参照、分散ベクトルの初期値は、ＤＣＴ係数の成分ごとに正規分布を仮定して、平均と分散より求める。［数式１５］［数式１６］（実施例４）本発明は、対象の特徴をＤＣＴにより表現するときの最適な係数の数を決める手段を提供する。本例では、ｈ^2dct _(u,v,z)は、実施例２の操作Ｍ３で記述されたヒストグラム度数ｈ_P(x,y,z)を２次元ＤＣＴ変換して得られる。図５は、ＤＣＴ係数の数と、認識誤差率、平均絶対誤差の関係をそれぞれ２２の２次元DCT ｈ^(a)2dct _(u,v,z)、２３のｈ^(b)2dct _(u,v,z)に対して示している。［数式１７］［数式１８］［数式１９］ここで、Ｎ^(a)dct、N^(b)dctは、特徴量の低周波数成分（ｕ，ｖ）を規定する。ＤＣＴ近似表現において、誤差と複雑さとは相反する関係である。ＤＣＴ係数の次元を増やすと、近似誤差は減少するが、学習のニューラルネットワークの複雑さは増大する。学習ニューラルネットワークの誤認識率は、係数の次元３６（Ｎ^(a)dct ＝５、Ｎ^(b)dct＝７）で最少になった。本発明によれば、ＤＣＴ係数の次元は、モデルの複雑さと平均絶対誤差との相反関係によって決まる認識エラーを最少にするように、決める。（実施例５）発明したニューラルネットワークは、４０人の１０異なるポーズからなるORL （Olivetti Research Laboratory）データベースの４００枚の画像を用いて、構成した。５つのポーズの画像を学習に、残りの画像をテストに用いた。１１４ｘ８８の入力画像を８ｘ８ｘ８の矩形ブロックに分割した。図６は、ヒストグラム正規化の非線型テーブルを用いた場合２５と用いない場合２４の学習性能を比較した。輝度の変化特性を評価するテスト画像は、ガンマ変換g(y)=255*(Y/255)^(1/ ^γ)を用いて生成した。γ係数が１であるとき、ヒストグラム正規化を用いない量子化の誤り率は、本発明の誤り率より少し低いが、輝度の変化が大きくなると、誤り率は非常に大きくなった。誤り率は、γが0.925 以下あるいは1.2以上では、0.1より大きくなった。それと比較して、ヒストグラム正規化を用いた量子化法の誤り率は、γが0.4から3.0の範囲では、0.04から0. 09の間で比較的低い誤り率であった。本発明の画像合成方法の有効性を示すために、図７では、画像合成を用いた２、３次元DCTによる顔認識性能の結果を、同じデータの他の方法と比較した。ブロックヒストグラム入力の次元は、４ｘ４ｘ４から４ｘ４ｘ８、もしくは４ｘ４ｘ４に２もしくは３次元DCTにより削減した。図７は、本発明の画像合成によるニューラルネットワーク学習のための画像合成の説明図である。ふたつの異なる人顔Ａ、Ｂの混合比を１／０、３／１、３／２、２／３、１／３、０／１にした場合の合成画像を示している。合成するふたつの画像をｑとｍ（ｑ）で表す。データベース画像において、任意の画像ｑに関して式（４）で定義する参照パターンの最小距離をもつ画像ｍ（ｑ）を選択する。［数式２１］［数式２２］［数式２３］その混合比を学習回数ｔに応じて変えていく。［数式２４］認識性能の評価を、４００人の顔画像からなるORLデータベースを用いた顔識別試験により、行った。そのデータベースは、４０人が目を開閉、笑顔の有無、眼鏡の有無等の１０種類の表情をとった顔画像からなる。図８は、本発明の画像合成による学習データを用いた認識性能を示している。４０人の１０種類からなる顔データベースで、学習サンプルの種類数を１から５の間で変えて、残りの９から５のサンプル数を認識のテスト画像として用いた。２７は、２次元ＤＣＴで求めたｑとｍ（ｑ）との類似の入力特徴量ベクトルを混合比Ａ／Ｂ（α＝0.5）で混合し、混合比を時定数（η＝２００）で学習回数ｔの進行とともに、減少させた。２６は画像合成を用いない入力特徴量ベクトルをニューラルネットに入力した識別結果である。５人の学習サンプルで、混合比0.5で２次元DCT法を用いた本発明の誤認識率は 2.4％で、合成を用いない誤認識率１３．６％に比べて非常に低くなった。したがって、２DCTの誤認識率は、ほぼ６倍改善されている。この結果は、本発明の合成を用いた２次元DCTによる顔認識の有効性を実証する。図１３は、自己組織化写像法SOM（Self-organizing Map）、相関ネットワーク CN（Convolution Network）疑似２次元HHM(Hidden Marcov Model)顔の特徴固有値法（Eigen face）による方法と比較した本発明の顔認識システムの性能を示している。SOM、HMM、Eigenfaceの結果は（S．Lawrence et al．Techical ReportC S-TR-3608，University Maryland 1996）から得れた。誤り率2.29%は、S0M＋CN 、KLT＋相関ネットワーク、疑似２次元HHM、顔の特徴固有値法の3.8%,5.3%、5% 、10.5%に比べて良かった。（実施例６）式(４)は，入力特徴パターンの正規分布を仮定して得られる参照ベクトルと分散ベクトルと用いた判別方程式である。それは、対象の特徴を示す参照パターンと入力の特徴パターンとの相関距離を評価する。サンプルの数が少ないときには、特徴パターン分布の仮定が正規分布からずれるために、画像認識の性能が低下する。学習のサンプルが少ない場合でも、正しく対象を認識させるために、前件の局所な特徴分布の混合モデルを発明した。本発明は、それぞれのクラスに、それを構成するクラス数が可変となるように、混合クラスに割り当る。モデルの分布と真の分布との相互エントロピーを最小にするように、混合するクラスの数と、局所的な距離パラメータ（参照ベクトルと分散ベクトル）を決める。図９は、混合するクラス数をパラメータとして、最尤度関数ｌ^k(u,v,z) _minによる混合クラス造を説明する図である。混合クラスＭ^k(m)(u,v,z)は、k(0)、k(1) 、…、k(m-1)（ただし、k(0)＝k)のクラスのセットから構成される。基準となるｋごとに、クラスｍ番目のクラスｋ(m)は、最尤度関数l^k(u,v,z) _min の基準により、割り当てる。［数式２５］ある種類ｋである対象の特徴量ベクトルの分散と他の対象の特徴量ベクトルとの分散のｌｏｇの和を最小にする対象の種類を調べる。ｍ番に最小の値をもつ種類をｋ（ｍ）で表すとする。ある種類ｋである対象の特徴量ベクトルの分散と他の対象の特徴量ベクトルとの分散のｌｏｇの和を逐次求める。クラスごとに、混合モデルの最小のサイズが、最尤度関数ｌ^k(u,v,z) _minの最小化により、ユニークに求まる。［数式２６］最適な分散は、数式（２６）より求まる。［数式２７］ k(q)は、クラスｋの画像分散とｑ番目に分散の和の距離が小さい画像のクラスの特徴量との混合分布の分散を持つ。図１０は、平均分散、誤認識率と、混合クラスのサンプル数との関係を示している。２９、３１で示す本発明の性能は、２８と３０とのすべての成分で同数の成分数を用いた結果を示す。クラスの数の大きさが３、４であるときに、７．６５％と８．２４％の最小誤り率と、６１８と９６０の分散を得た。本発明を用いることで、混合クラスの数を増加せると、誤り率と平均分散は低下して、６．２８％と５１８に飽和した。本発明の重要な点は、混合クラスの数をパラメータとして、変化させることである。エントロピーの距離で定義された混合クラスの最適なクラスの数を用いることで、クラスの数が可変な場合の誤り率は７．７８％から６．２８％に顕著に低下した。（実施例７）第７の実施例では、回転に対して不変な画像特徴量を識別するニューラルネットワークが、回転する物体の複数の画像をニューラルネットの学習データを用いることで、構成させる本発明の方法を示す。図１１は、本発明の回転を考慮した対象識別方法の説明図である。３２と３３の水平、垂直中心線の交点を中心に、回転物体３４を角度０、６０、１２０、１８０、２４０、３００度の角度で回転する。３次元空間で、回転物体を走査して得られる２次元画像の全ての画素の輝度信号を非線形テーブルに入力し、量子化した値に変化される。量子化した２次元画像は、複数の矩形領域に分割される。その矩形領域で、量子化された輝度信号に対応した輝度分布の各階調に属する画素数を求める。一人あたり６つの回転された画像から３次元構造の特徴パターンを求めて、ニューラルネットワークに入力する。回転物体の３次元表現を用いる本発明の方法は、回転物体の矩形領域の同一位置で、異なる輝度信号の画素をサンプルすることにより得られる特徴パターンを異なる階調に割り当てる。これによって、ニューラルネットワークは、回転させた画像の各信号の干渉なしに、特徴パターンを分散記憶することが可能になる。図１２は０〜６０度で回転角度を変化させた物体の認識性能を示している。ニューラルネットワークを構成するために、０、６０、１２０、１８０、２４０度の回転させた４０人の顔画像から得れた入力パターンを用いた。本発明では、学習サンプルの画像合成を用いた。（すなわち、θ＝０、６０、１２０、１８０、２４０度）。それらの画像を、角度変化の範囲Δθを１から６０度として、回転させることによって、顔画像を求めた。テスト画像の回転角は、θ＋Δθとなる。本発明の認識の誤り率は、画像合成を用いない方法に比べて、低い。本発明の実施例は、回転物体を認識する効果を説明する。輝度空間で６つ回転角の画像の信号を重畳すると、矩形領域ごとの画素の輝度信号で代表させた画像は、他の画像と干渉する。以上のように、本発明では、回転、照明、位置の変化等に対して不変な画像特徴量を入力してニューラルネットワークを構成するので、それらの変化に対して優れた認識率を示す。従来の距離の演算では、最適化が困難だった距離の演算の変わりに、本発明では、特徴量ベクトルの分散の統計的分布の推定を最適化することで、少ない学習数でも、効率良く、学習を行うことができる。本発明では、時間的に混合比を変化させて、類似した画像の特徴量をノイズとして重畳するので、類似した画像間の識別率を向上させる効果がある。本発明では、離散コサイン変換等の固定した係数で特徴量の圧縮を図るので、通信の標準的な圧縮通信システムに組み込みやすい。DETAILED DESCRIPTION OF THE INVENTION Image recognition method Technical field According to the present invention, an image (object) in a three-dimensional space is identified by a feature amount of a two-dimensional image. It relates to an image recognition method. Background art In a conventional method for recognizing an object in a three-dimensional space, a representative feature of a three-dimensional object is represented by a two-dimensional image. It is configured as a set of images, and the feature of the object is expressed as a feature represented by a set of two-dimensional image data. Type identified. When an image of a three-dimensional object is composed of luminance signals of a two-dimensional image, Because the dimension of the amount of information represented by becomes very large, it is expressed in a low-dimensional space Needed. Conventional method (Face Recognition: Hybrid Neural Network Approach T proposed in echnical Reports, CS-TR-3608, University of Maryland, 1996 In addition, neural networks use low-order information In order to represent the original feature space without loss, principal component analysis, Kahunen-Loe The ve transform was used. From the distribution of image features in high dimensional space, its distribution Then, the optimal basis depending on was obtained, and the feature amount of the object was extracted. Multilayer neural networks use a product-sum operation to weight the input pattern. And an identification element. An input layer for inputting patterns, an output layer for outputting identification results, and a hidden Layer. The number of input terminals in the input layer has the same dimension as the input pattern. Hiding The identification element that performs the product-sum operation of the layers inputs the input pattern of the input layer and performs an operation on the output layer. Outputs the calculation result. The number of classes to be identified is And the same number. When an input pattern is input from the input terminal, the identification The child outputs the degree of belonging to each class to be identified. For input pattern Therefore, the identification element of the output layer that identifies the class that represents the attribute of the input pattern has the maximum value. Is output. By supervised learning, the weighting coefficient of the signal between layers can be Update with. In the backpropagation method, the input pattern and the expected value of the output are used as teacher signals. And presented to the input and output layers. Enter 0 and 1 for smooth sigmoid function Power. kohonen memorizes the reference feature pattern representing the target attribute and calculates the distance. A single-layer neural network was proposed. It consists of a reference feature pattern and any input features. The Euclidean distance of the sign pattern was determined. (Kohonen, The self-organizing m ap, Proc. of the IEEE 78,1465-1480,1990) To further improve recognition performance , The distance calculation type network, the degree of belonging of the feature pattern belonging to the target class ( Mahalanobis distance considering statistical distribution is used as distance to determine likelihood) Was. In image recognition using image communication, compressed images such as MPEG and JPEG are transferred. After receiving and restoring it, feature extraction was performed to determine the features of the image. Conventionally, the problem of a method for identifying a target object using an image of the target object in a three-dimensional space is described. It is described below. In an image of an object in a three-dimensional space, the degree of freedom of the target object is high, and the rotation and position of the target The performance of discrimination depended on changes in lighting and lighting. When a 3D object is composed of 2D images, at least a large amount of memory is required And In principal component analysis and orthogonal transformation used to reduce the amount of information, The basis for expressing the characteristics of the data depended on the distribution of the data. Moreover, the calculation of the basis is expanded. It required a large amount of computation. Multi-layer neural networks have a complex network configuration, The interval became longer. For example, in Lawrence et al.'S method, 10 face images of 40 people It takes 4 hours to learn the features of five types of facial images in a database consisting of did. In this way, using a small number of training image samples, The task was to make the students learn the work efficiently. Furthermore, in order to perform object identification based on a finite number of relatively small databases, The deviation of the statistical distribution of the target feature from the normal distribution was large. Number of training samples Of image features considering evaluation of statistical variance when the number is limited to a small number In some cases, the recognition rate was lower than in the case where variance was not considered. In particular, It was difficult to evaluate the total variance. Disclosure of the invention Next, the present invention provides the following image recognition method to solve the above problems. I do. The first means is a means for quantizing the luminance and chrominance signals. Secondary of target with certain attribute Using the frequency distribution of the luminance signal obtained from all pixels of the original image and its cumulative luminance distribution, Input the luminance so that the number of pixels belonging to the A quantization table for outputting the converted luminance is configured. Brightness of all pixels in 2D image The degree signal is input to a quantization table and quantized. The second means is a means for extracting a feature amount using a rectangular area as a basic unit. By the first means The obtained two-dimensional image is divided into rectangular areas, and the quantized luminance is calculated for each rectangular area. , The number of pixels belonging to each gradation of the color signal is obtained. Three-dimensional arrangement determined from two-dimensional images Column feature h_{(x, y, z)}It is described. x and y are rectangular area coordinates, and z is quantized. FIG. The third means is a means for reducing the dimension of the feature pattern. The feature pattern is the first Features of three-dimensional array obtained from two-dimensional image of arbitrary object using means of 2. Quantity h_{(x, y, z)}Is obtained by converting. For each gradation, a rectangular area (x, y) The two-dimensional image is extracted in the horizontal and vertical coordinate directions of the Converts a two-dimensional discrete cosine (or sine) to a prime number and finds the frequency component . The number of frequency components is selected so that the recognition rate is maximized. The fourth means is a means for reducing the dimension of the feature amount. Target using the first and second means The feature pattern hp (x, y, z) of the three-dimensional array obtained from the two-dimensional image of Performs scattered cosine (or sine) conversion, finds frequency components, and maximizes recognition rate As described above, the dimension of the feature amount is reduced by using the feature amount of the low frequency component. A fifth means is a means for identifying a feature quantity based on a correlation distance. Feature pattern is tertiary It is represented by the number of pixels and frequency components of the element having the original matrix structure. 3D matri Box structure, the first, second and third means, or the first, second and fourth means From the image database obtained by scanning the object with the same attribute . The distance between the reference vector and the feature pattern is given by the following equationAnd (Coordinates (P1, P2, P3) are three-dimensional pixel number components (x, y, z) and two-dimensional D CT frequency components (u, v, z), three-dimensional DCT frequency components (u, v, w) Are identical. The distance is twice the logarithmic value of the probability function assuming a normal distribution. From the database of two-dimensional images of the same attribute q, The center μqp (x, y, z) of the statistical distribution of the feature quantity hqp (x, y, z) of each element component of the three-dimensional array z) and the variance σqp (x, y, z) are calculated, and these are used as a reference for two-dimensional The feature vector hp (x, y, z) of the three-dimensional array obtained from the image by the first and second means and the reference vector Calculate the distance from the vector μqp (x, y, z). A feature vector hp (x, y, z) of a three-dimensional array of an arbitrary two-dimensional image is a plurality of Mq seeds. Distance from reference vectors obtained from a set of two-dimensional images of objects with a class of attributes It is determined that it belongs to the type m represented by the reference vector having the smallest separation. The sixth means is feature amount discrimination by distance using learning of a neural network. Means. The number of elements of the neural network identification element is the same as the number of attributes to be identified. identification The elements have the same number of input terminals as the number of element components in the three-dimensional array and the output terminals connected to all the input terminals. It has a force element. Input a feature value input vector from a two-dimensional image of an arbitrary object The identification element storing the reference vector representing the attribute of the target is The output value of is output. The discriminating element is a statistical component of the feature amount of each element component of the three-dimensional array. The center of the cloth and the variance are referred to, and the variance vector is stored as a learning coefficient. From the output terminal, the distance between the feature input vector obtained from the two-dimensional image and the reference vector Output. The number of pixels of elements having a three-dimensional matrix structure and the characteristic pattern of frequency components Neural network for reference, variance vector corresponding to the center of variance, variance Find by learning. The seventh means is the feature of the number of pixels of the element having the three-dimensional matrix structure and the frequency component. Image recognition using a mixed model of the distribution of patterns. 3D matrix structure Are the same by the first, second, and third means or the first, second, and fourth means. Is obtained from an image database obtained by scanning an object having the attribute of Correlation distance Construct a mixed class of a set of classes for computation. An attribute In the class of the feature pattern of the object, the feature patterns with different attributes are mixed. Add to joint class. Select the optimal number of classes belonging to the mixed class. Maximum likelihood variance The components of the vector are the components of the variance vector of a certain feature Minimize the reference value of mutual information with the component of the variance vector of the target having the attribute Thus, for each component having a three-dimensional matrix structure, the number of pixels component, It is determined from the mixture distribution of wave number components. The information reference value is obtained from a two-dimensional image of any object. Logarithmic value of the variance of the input feature vector, and the two-dimensional image It is defined using the logarithmic value of the variance of the feature amount obtained from the image. Eighth means is the feature of the number of pixels and frequency components of an element having a three-dimensional matrix structure. A method for finding the reference vector and variance vector corresponding to the center and variance of the pattern distribution. This is a means for creating a feature vector to be input to the neural network. Any target In the feature vector, the feature vector And the feature vector of the input image of the different target whose distance is the closest in the fifth means. Is selected. A feature input database obtained by mixing these feature patterns with each other The vector is input to the neural network. The target of the other attributes The feature ratio is reduced with the number of times of learning. A ninth means is a method for generating a plurality of two-dimensional images obtained from an object in a three-dimensional space having arbitrary attributes. From the image data set, the number of pixels and frequency components of the two-dimensional image obtained by the first means The signal which received the characteristic pattern of the three-dimensional array from the remote place and encoded Or the attribute of the object of the image sent by the third means through a wireless medium . The tenth means is a configuration of a database of images used by the first and second means. Is the way. The image database of the object rotates the three-dimensional object by a certain angle. From a two-dimensional image taken of a three-dimensional object at certain time intervals Let it run. The present invention solves the above-mentioned problem, and has the following operations. The neural network used in the image recognition method according to the present invention includes rotation, position, and illumination. It has the effect of extracting a feature that is invariant to changes. Quantization using histogram Because of this, the invariant features are used for changes in lighting, Has a strong discriminating effect. In the feature space of the present invention, a change in the brightness of a three-dimensional object is obtained. There is an effect that the quantization is dispersed and stored in the quantized luminance space. Image features It has the effect of optimizing the statistical variability of the amount of images. BRIEF DESCRIPTION OF THE FIGURES FIG. 1 shows an embodiment of the object identification method of the present invention in image communication. FIG. 2 shows an embodiment of the feature extraction by block division and sampling of a two-dimensional image. You. FIG. 3 is a configuration diagram of the image recognition neural network of the present invention. FIG. 4 shows an embodiment of the hardware of the neural network identification device. FIG. 5 shows the relationship between the number of DCT coefficients, the erroneous recognition rate, and the average absolute error. FIG. 6 is a comparison of learning performance with and without histogram normalization. FIG. 7 shows an embodiment of image synthesis for neural network learning according to the present invention. FIG. 8 shows recognition performance using learning data obtained by image synthesis according to the present invention. FIG. 9 shows the structure of the maximum likelihood mixture class as a function of the mixture class. FIG. 10 shows the relationship among the average variance, the false recognition rate, and the number of samples in the mixed class. FIG. 11 is an explanatory diagram of the object recognition method in consideration of rotation according to the present invention. FIG. 12 shows the recognition performance of a rotating object having a rotation angle of 0 to 60 degrees. FIG. 13 shows system performance obtained by various methods. BEST MODE FOR CARRYING OUT THE INVENTION (Example 1) 5 is an embodiment of an image recognition method according to the present invention. FIG. 1 is an explanatory diagram of the object identification method of the present invention in image communication. 1 is a luminance signal input of a two-dimensional image, 2 is quantization of the luminance signal, and 3 is a rectangular signal of the two-dimensional image. Feature extraction in unit of shape area, 4 is information compression of feature, 5 is coding of feature, 6 is feature Encryption of quantity, 7 is decryption of feature quantity, 8 is decryption of feature quantity, 9 is neural network It is identification by work. (1) The original image is captured by a CCD (solid-state imaging device). (2) The luminance of the original image and the gradation of the color signal are quantized by a non-linear table. It is converted to gradation. (3) The quantized image is divided into a plurality of rectangular areas. Sampler The number of pixels having a video signal belonging to a quantized gradation for each rectangular area is obtained. (4) A pixel represented by a three-dimensional matrix structure obtained by sampling in operation 3 The number is converted to a two-dimensional or three-dimensional DCT by the method of claims 1 to 3. Is converted. The low-frequency component of the DCT coefficient is used as a three-dimensional feature for object identification. Is selected. (5) The three-dimensional feature pattern extracted from the low frequency components of the DCT is wireless, Encoded and encrypted from the line transfer line and transferred. (6) On the receiving side, the transferred three-dimensional feature pattern is decrypted and decoded. (7) The low frequency component of the decoded DCT coefficient is input to the neural network. Is done. The transmitter realizes remote recognition on the receiver side without transmitting the original image. Has extraction and compression capabilities. The feature extraction used in the present invention is performed by DCT compatible with JPEG and MPEG standards. Used. Input of feature values of two-dimensional images represented by the number of pixels in a three-dimensional array on the wired or wireless transmission side The vector is encoded and transmitted, and the signal of the feature amount of the two-dimensional image encoded on the receiving side is obtained. Receive and use a neural network to represent the object represented by the sign of the feature Is determined. According to the present invention, information compression of a feature Performs orthogonal transformation of fixed coefficients. By encoding and transferring the compressed features, Compared to the case where the compressed image is directly transmitted and the feature extraction and identification are performed on the receiving side. Object identification can be performed using the neural network on the receiving side based on the amount of information . (Example 2) In the embodiment of the present invention, in order to identify an object (object) using an image database, The basic operation of feature extraction and identification is described. Basic image database of the object recognition method of the present invention The source is composed of Np two-dimensional images for each of the Mq types of objects. (Mq = 40, Np = 10) The basic operation of the image recognition method is (M1) Quantization of all color and luminance signals of a two-dimensional image by non-linear transformation (M2) Third order obtained from rectangular area division and sampling of the quantized two-dimensional image Expression method of feature quantity by original memory structure, (M3) Discrete cosine transform of three-dimensional feature pattern (M4) Characteristic pattern identification using distance reference value of the present invention It is. In operation M1, the color or luminance signal of the original image is calculated using the cumulative luminance distribution Gq (j). The number j is transformed. [Formula 1] f (j) is a frequency distribution of 256 gradations obtained from the image signals of all the pixels of the two-dimensional image. is there. The image signal j of 256 gradations is converted into a signal y of Nz gradation. Signal of all pixels The signal is input to a quantization table and converted into a signal yq of the maximum gradation NZ. The obtained image Is referred to as a quantized two-dimensional image. [Formula 2] FIG. 2 shows feature extraction by two-dimensional block division and sampling. Three-dimensional The feature amount of the object is represented using the memory structure. 10 is the Nx x Ny pixel Reference numeral 11 denotes a rectangular area division of a quantized two-dimensional image used for a feature amount; Reference numeral 2 denotes a feature of the three-dimensional array from which the feature amount is extracted. By the operation of the operation M2, the two-dimensional image 11 quantized is converted into a rectangular shape having a size of NxxNy. Calculate the number of pixels of luminance and chrominance signals that are divided into the shape region 10 and belong to each gradation section. You. What is the frequency distribution for each gray level sampled and quantized for each rectangular area? 3D Array h_{p (x, y, z)}Is represented by (Where x, y, z are horizontal coordinate, vertical coordinate, quantity Child X = 1 to Nx, y = 1 to Ny, z = 1 to Nz. ) Nz If the grayscale quantized image is divided into Nx x Ny rectangular areas, The gram frequency is represented by a three-dimensional array of Nx x Ny x Nz. In order to reduce the dimension of the amount of information, in operation M3, the histogram frequency is two-dimensionally and three-dimensionally separated. It is converted by a scattered cosine transform. Here, two-dimensional image discrete cosine transform is applied. If you use [Equation 3] Get. (C_(V)= 1 / 2√2 (u ≠ 0), c_(v)= 1/2 (u = 0), u = 1 to Nx-1, v = 1 ~ Ny-1, c) By using low-frequency components obtained by two-dimensional image discrete cosine transform, The dimension of the quantity input vector can be reduced. h^2dct _{p (u, v, z)}Is h_{p (x, y, z)}Two-dimensional separation This is a scattered cosine transform. Low frequency component of DCT coefficient of NxxNy h^2dct _{p (u, v, z)}Are used as feature patterns for pattern identification. For example, u = 0-N_dctx-1, v = 0 to N_dcty-1, z = 0 to N_zUsing a coefficient of -1, N_dctx x N_d _cty x N_zAre obtained. In the fourth embodiment, the recognition rate is maximized by the method described below. The optimal N_dctx, N_dctyValue was selected. In the fourth operation, the feature amount input vector is a probability function P (h_{(u, v, z)}| μ_{q (u, v, z)}, Σ_{q (u, v, z)}|) Using the discriminant judgment function obtained from the negative log likelihood function Classified. [Equation 4] Where μ^2dct _{q (u, v, z)}, Σ^2dct _{q (u, v, z)}Is the mean of the class assuming a normal distribution , The reference obtained from the variance, and the components of the variance vector. Reference vectors of Mq types of objects Is determined to belong to the type m represented by the reference vector having the shortest distance to the . [Equation 5] [Equation 6] (Np is the number of learning samples.) In order to classify the Mq objects, in the present invention, a set of Mp recognition discriminant functions is used. The recognition discriminant function of the input feature pattern belongs to the class with the least output Then the maximum likelihood class m^*Is identified. [Equation 7] Instead of Equation 3, a three-dimensional DCT is used to reduce the dimension of the histogram density. [Equation 8] (Example 3) In the embodiment of the present invention, a feature obtained from a plurality of two-dimensional images of a target having an arbitrary attribute A quantity input vector is input to a neural network to perform image recognition. FIG. FIG. 1 is an explanatory diagram of a configuration of an image recognition neural network of the present invention. 13 is an input terminal, 14 is an identification element, and 15 is an output terminal. Neuralne Each identification element of the network consists of NxxNyxNz input terminals and one output element. Be composed. To identify Mq types of objects, the neural network , And Mq identification elements having an identification discriminant function defined by Equation (4). Representing the target with respect to the feature input vector obtained from the two-dimensional image of the target Memorized reference vector The identification element of the neural network outputs the minimum output value. FIG. 4 shows an embodiment of the hardware of the neural network identification device. 16 is , A register of the input vector from the input terminal, 17 is a difference arithmetic unit, and 18 is a multiplication , 19 is a square accumulator, 20 is a feature amount reference vector register, 21 is , A feature amount variance vector register. Each discriminating element stores the normal distribution of the feature input pattern in the memory registers 20 and 21. It holds the reference vector and the variance vector obtained from it. It is input register 1 The relative distance between the input feature pattern of the three-dimensional array input to 6 and the reference vector Decide. Using the difference calculator 17, the reference vector register and NxxNyxNz The difference from the feature input vector registers from the input terminals is calculated. Multiplier 19 Multiplies the result by the reciprocal of the feature value variance vector register. Accumulation of squares The adder 19 performs the summation of the sum of the squared values of all the vector components. ,three The distance between the illumination vector and the feature of the feature input vector is calculated. For each component of the three-dimensional structure, the number of pixels and the average and variance of the distribution of frequency components are The reference and the variance vector are obtained by learning of the neural network. here 2 shows an example of learning of a neural network using two-dimensional DCT. K The input vector h of the feature quantity of ras p_{p (u, v, z)}When you enter , And outputs the distance between the input vector and the stored reference vector. Entering a pattern The reference vector corresponding to the class of the output element that outputs the minimum value of force and distance is shown below. Change by formula. When the input pattern belongs to class k, [Equation 9] When the input pattern does not belong to class 1, [Equation 10] (Μ^(t) _{l (u, v, z)}, Μ^(t) _{k (u, v, z)}Is the number of iterations t of learning, and is the number of classes k and l The components of the scatter vector are shown. ) Β1 and β2 are update coefficients, and t indicates the number of updates. [Equation 11] [Equation 12] (Σ^(t) _{l (u, v, z)}, Σ^(t) _{k (u, v, z)}Is the number of iterations t of learning, and is the number of classes k and l The components of the scatter vector are shown. ) ζ (λ) is an update function. [Equation 13] λ> 0 ζ (λ) = ζ₀. [Equation 14] The initial value of the reference and variance vector for an arbitrary class P is set for each DCT coefficient component. Is calculated from the mean and variance, assuming a normal distribution. [Equation 15] [Equation 16] (Example 4) The present invention determines the optimal number of coefficients when expressing a feature of an object by DCT. Provide a means. In this example, h^2dct _{(u, v, z)}Is the histogram degree described in the operation M3 of the second embodiment. Number h_{P (x, y, z)}Is obtained by two-dimensional DCT transformation. FIG. 5 shows the relationship between the number of DCT coefficients, the recognition error rate, and the average absolute error, respectively. 2D DCT h^{(a) 2dct} _{(u, v, z)}, 23 h^{(b) 2dct} _{(u, v, z)}Is shown. [Equation 17] [Equation 18] [Equation 19] Where N^{(a) dct}, N^{(b) dct}Defines the low frequency component (u, v) of the feature quantity. In the DCT approximation, the error and the complexity are contradictory. DCT coefficient Increasing the dimension reduces the approximation error but reduces the complexity of the learning neural network. It increases. The misrecognition rate of the learning neural network is determined by the coefficient dimension 36 (N^{(a) dct} = 5, N^{(b) dct}= 7), which was the minimum. According to the invention, the DCT coefficient Originally, recognition errors determined by the reciprocal relationship between model complexity and mean absolute error were introduced. Decide to minimize it. (Example 5) The invented neural network is an ORL consisting of 10 different poses of 40 people (Olivetti Research Laboratory) Using 400 images from the database, Done. Images in five poses were used for learning and the remaining images were used for testing. 114x The 88 input images were divided into 8x8x8 rectangular blocks. FIG. 6 shows a case where the non-linear table of histogram normalization is used 25 and a case where the non-linear table is not used. A total of 24 learning performances were compared. The test image used to evaluate the luminance change characteristics is gamma Conversion g (y) = 255 * (Y / 255)^{(1 /} ^γ)Was generated using. When the γ coefficient is 1, The error rate of quantization without gram normalization is slightly lower than the error rate of the present invention, As the degree change increased, the error rate became very large. The error rate is that γ is 0.925 Below or above 1.2, it was larger than 0.1. In comparison, Histogram The error rate of the quantization method using the system normalization is 0.04 to 0 when γ is in the range of 0.4 to 3.0. The error rate was relatively low between 09. In order to show the effectiveness of the image composition method of the present invention, FIG. 3. The results of face recognition performance by 3D DCT were compared with other methods of the same data. B The dimension of the lock histogram input is from 4x4x4 to 4x4x8 or 4x4 x4 reduced by two or three dimensional DCT. FIG. 7 shows an image synthesis for neural network learning by image synthesis according to the present invention. FIG. The mixing ratio of two different human faces A and B is 1/0, 3/1, 3 / 2 shows a composite image when the image is set to 2, 2/3, 1/3, and 0/1. Lid to combine The two images are represented by q and m (q). In the database image, any image q To select the image m (q) having the minimum distance of the reference pattern defined by equation (4) . [Equation 21] [Equation 22] [Equation 23] The mixture ratio is changed according to the number of times t of learning. [Formula 24] Recognition performance was evaluated using face identification using an ORL database consisting of 400 face images. The test was performed. The database contains 40 people who open and close their eyes, smile or not, It consists of face images with 10 different facial expressions such as the presence or absence of a mirror. FIG. 8 shows the image 4 shows the recognition performance using learning data based on the composition. It consists of 10 kinds of 40 people In the face database, change the number of learning sample types from 1 to 5 and select the remaining 9 5 samples were used as test images for recognition. 27 is obtained by two-dimensional DCT The similar input feature quantity vector between q and m (q) is calculated using the mixture ratio A / B (α = 0.5). And the mixture ratio is reduced with the time constant (η = 200) with the progress of the number of learning t. Was. 26 inputs an input feature vector not using image synthesis into a neural network This is the identification result. The misrecognition rate of the present invention using the two-dimensional DCT method with a mixture ratio of 0.5 using five learning samples is At 2.4%, the misrecognition rate was very low as compared to 13.6% without using synthesis. did Therefore, the false recognition rate of the 2DCT is improved about 6 times. This result is We demonstrate the effectiveness of face recognition by two-dimensional DCT using synthesis. FIG. 13 shows a self-organizing map (SOM) and a correlation network. CN (Convolution Network) pseudo two-dimensional HHM (Hidden Marcov Model) Shows the performance of the face recognition system of the present invention compared to the method based on the value method (Eigen face). ing. SOM, HMM, Eigenface results are shown in (S. Lawrence et al. Technical Report C S-TR-3608, obtained from University Maryland 1996). Error rate 2.29% is S0M + CN , KLT + correlation network, quasi-two-dimensional HHM, 3.8%, 5.3%, 5% of the facial feature eigenvalue method Was better than 10.5%. (Example 6) Equation (4) expresses the reference vector obtained by assuming the normal distribution of the input feature pattern This is a discriminant equation used with a scatter vector. It is a reference pattern that indicates the feature of interest The correlation distance between the input and the feature pattern of the input is evaluated. When the number of samples is small , Image recognition performance deteriorates because feature pattern distribution assumption deviates from normal distribution I do. Even if the number of training samples is small, the prerequisite We have invented a mixed model of the local feature distribution. The present invention applies to each class. It is assigned to a mixed class so that the number of classes that compose it is variable. Model minutes To minimize the mutual entropy between the cloth and the true distribution, , Local distance parameters (reference vector and variance vector) are determined. FIG. 9 shows the maximum likelihood function l using the number of classes to be mixed as a parameter.^{k (u, v, z)} _minBy FIG. 4 is a diagram illustrating a mixed class construction. Mixed class M^{k (m) (u, v, z)}Is k (0), k (1) ,..., K (m-1) (where k (0) = k). For each reference k, the class k (m) of class m is the maximum likelihood function l^{k (u, v, z)} _min Assigned according to the criteria. [Equation 25] The variance of the feature amount vector of a target of a certain kind k and the feature amount vector of another target The type of the object that minimizes the log sum of the variance is examined. Type with minimum value in m-th Is represented by k (m). The variance of the feature amount vector of an object of a certain kind k and another The sum of the log of the variance with the target feature amount vector is sequentially obtained. Mixed for each class The minimum size of the model is the maximum likelihood function l^{k (u, v, z)} _minUnique by minimizing Is determined. [Equation 26] The optimum variance is obtained from Expression (26). [Equation 27] k (q) is the class of the image class in which the distance between the image variance of class k and the sum of the qth variance is small. It has the variance of the mixture distribution with the feature value. FIG. 10 shows the relationship between the average variance, the false recognition rate, and the number of samples in the mixed class. I have. The performance of the present invention, denoted by 29 and 31, is the same for all 28 and 30 components. The result using the number of components is shown. When the number of classes is three or four, 7.6 Minimum error rates of 5% and 8.24% and variances of 618 and 960 were obtained. Using the present invention Therefore, when the number of mixed classes is increased, the error rate and the average variance are reduced, and 6.2 Saturated to 8% and 518. An important point of the present invention is that the number of mixed classes is a parameter. And change it. The maximum of the mixed class defined by the entropy distance By using an appropriate number of classes, the error rate when the number of classes is variable is 7.78%. From 6.28% to 6.28%. (Example 7) In the seventh embodiment, a neural network that identifies image features that are invariant to rotation is used. Network uses neural network learning data to create multiple images of a rotating object. Thus, the method of the present invention to be constituted is shown. FIG. 11 is an explanatory diagram of the object identification method in consideration of the rotation according to the present invention. 32 and 33 Of the rotating object 34 at angles 0, 60, 120, 1 around the intersection of the horizontal and vertical center lines of Rotate at 80, 240, and 300 degrees. In three-dimensional space, luminance signals of all pixels of a two-dimensional image obtained by scanning a rotating object The signal is input to a non-linear table and is changed to a quantized value. Quantized two-dimensional image The image is divided into a plurality of rectangular areas. In that rectangular area, the quantized luminance signal The number of pixels belonging to each gradation of the corresponding luminance distribution is obtained. The three-dimensional structure feature pattern is obtained from six rotated images per person, and Input to the neural network. Method of the invention using a three-dimensional representation of a rotating object Sample the pixels of different luminance signals at the same position in the rectangular area of the rotating object. Are assigned to different gradations. As a result, Neural networks can extract feature patterns without interference between the signals in the rotated image. Distributed storage is possible. FIG. 12 shows the recognition performance of an object whose rotation angle is changed from 0 to 60 degrees. D 0, 60, 120, 180, 240 degrees to form a neural network The input pattern obtained from the rotated face images of 40 persons was used. In the present invention, The image synthesis of the training sample was used. (That is, θ = 0, 60, 120, 180, 240 degrees). The images are rotated with the angle change range Δθ being 1 to 60 degrees. Then, a face image was obtained. The rotation angle of the test image is θ + Δθ . The recognition error rate of the present invention is lower than that of the method not using image synthesis. Of the present invention The embodiment describes the effect of recognizing a rotating object. When the signals of the image having six rotation angles are superimposed in the luminance space, the luminance of the pixel in each rectangular area is obtained. An image represented by a signal interferes with another image. As described above, according to the present invention, image characteristics invariant to rotation, illumination, changes in position, and the like are obtained. Since the neural network is constructed by inputting the amount of measurement, Shows excellent recognition rate. In the present invention, instead of distance calculation, which was difficult to optimize in conventional distance calculation, the present invention Optimizes the estimation of the statistical distribution of the variance of the feature vectors Even with numbers, learning can be performed efficiently. In the present invention, by changing the mixing ratio over time, similar image feature amounts are regarded as noise. And superimposing, there is an effect of improving the identification rate between similar images. In the present invention, since the feature amount is compressed with a fixed coefficient such as a discrete cosine transform, Easy to incorporate into standard compression communication system of communication.

Claims

[Claims] 1. The image recognition method includes first and second operations, and the first operation is a three-dimensional space In the two-dimensional image obtained by shooting the target, the color and luminance signals of all pixels The second operation is an operation of obtaining a quantized image by inputting the Is divided into a plurality of rectangular areas, and the divided rectangular areas This is an operation for obtaining the number of pixels belonging to the quantized gradation obtained by quantizing the luminance signal. No. In operation 1, the input image signal of all pixels of the two-dimensional image is obtained by quantization. In the number distribution, the first operation is performed so that the total number of pixels belonging to the quantized gradation becomes equal. A non-linear table is constructed, and each rectangular area obtained by the first and second operations is Then, the strength of the quantized signal is represented by the horizontal (x) and vertical (x) coordinates of the rectangular area. y) and a three-dimensional array (x, y, z) of quantized luminance coordinates (set as z) Pattern of a two-dimensional image obtained by shooting an object in a three-dimensional space using the number of pixels Image recognition method. 2. In the image recognition method according to claim 1, an object in a three-dimensional space is The first and second operations of claim 1 are performed on an image signal of a two-dimensional image obtained by shooting. From the two-dimensional image, the horizontal and vertical coordinates of the rectangular element and the quantized luminance A method for calculating a feature pattern of an attribute of a target represented by a frequency component of a three-dimensional array of markers. Instead, each of the gradations of the signal of the quantized two-dimensional image is described in claim 1. From the frequency components of the three-dimensional array, the two-dimensional array A prime number component is constructed, and a pixel number component of a two-dimensional array is converted into a two-dimensional discrete cosine (or In) Transform to find the frequency components in the horizontal, vertical and two-dimensional space, the number of frequency components Low-frequency components selected from frequency components in two-dimensional space so that The three-dimensional array (xt) is constructed for each gradation of the signal of the quantized two-dimensional image. , Yt, z). 3. The image recognition method according to claim 1, wherein an object in a three-dimensional space is photographed. The first and second operations of claim 1 are performed on an image signal of a two-dimensional image obtained by shadowing. From the two-dimensional image, the horizontal and vertical coordinates of the rectangular element, and the quantized luminance coordinates Instead of finding the feature pattern of the attribute of the object represented by the frequency component of the three-dimensional array In addition, the three-dimensional array described in claim 1 is subjected to horizontal coordinate, vertical coordinate, and quantization. Three-dimensional discrete cosine (or sine) conversion in the brightness coordinate direction The frequency components in the dimensional space are obtained, and 3 is set so that the number of frequency components maximizes the recognition rate. A three-dimensional array (xt, xt) composed of low-frequency components selected from frequency components in a three-dimensional space yt, z) An image recognition method using the feature pattern. 4. 3. A three-dimensional space belonging to an arbitrary attribute according to the image recognition method of claim 1. In a plurality of two-dimensional images taken of the target The number of pixels of the two-dimensional image and the frequency component 3 obtained by the method described in the first term In the feature pattern represented as a dimensional array, A plurality of two-dimensional images represented by the number of pixels in a three-dimensional array and frequency, or a database Each element of the three-dimensional array Find the reference vector and variance vector represented by the average and variance of the pixel number distribution of The method according to claim 1, wherein a two-dimensional image of any object other than the image database is described. The feature quantity input vector represented by the number of pixels of the three-dimensional array obtained by the method described above On the other hand, the correlation distance defined by the variance vector and the reference vector Image recognition characterized by determining a target attribute of an arbitrary two-dimensional image other than a source Method. 5. In the image recognition method described in claim 4, A set of a plurality of two-dimensional images obtained by capturing an object in a three-dimensional space belonging to an arbitrary attribute; In other words, in the image database, 3 is obtained by the method described in claim 1. The number of pixels in a two-dimensional array and the frequency component are represented by pairs in a representative two-dimensional image database. For each set of feature values of the same attribute of an elephant, the variance base of the pixel number distribution of each element of the three-dimensional array And calculating the number of pixels and the number of pixels of each element of the three-dimensional array according to claim 4. The reference vector corresponding to the average of the wave number distribution is obtained by learning the neural network. It is characterized by seeking The neural network has an image database described in claim 4. The reference and variance vector representing the same kind of attributes of a limited number are stored as learning coefficients. , A two-dimensional image other than an image database is obtained by the method described in claim 1. The feature vector input of the attribute represented by the number of pixels in the three-dimensional array By determining the correlation distance defined by the illumination vector, An image recognition method characterized by determining a target attribute of a two-dimensional image. 6. The image recognition device according to claim 4, 3. The image recognition method according to claim 1, wherein the object is a three-dimensional space belonging to an arbitrary attribute. Claims in a plurality of two-dimensional images or image databases The number of pixels of the two-dimensional image and the three-dimensional distribution of the frequency components obtained by the method described in the first item. In the feature amount pattern represented as a column, the feature amount pattern of the same attribute of the target And the variance vector of the pixel number distribution of each element of the three-dimensional array for each set of In the component of the variance vector of the attribute, the component of the variance vector of the different attribute The mutual information reference value of each attribute is calculated, and the component of the variance vector of each attribute is minimized. In this way, a decision is made using the components of the variance vector of multiple attributes different from that attribute. An image recognition method characterized by: 7. The image recognition method according to claim 5, By learning neural networks The number of pixels of each element of the three-dimensional array, the reference and variance A plurality of images or images according to claim 5 for obtaining a vector. Represented by the number of pixels and frequency components of a three-dimensional array obtained from a two-dimensional image in a database. In the input of the feature amount of the attribute 6. A correlation distance in a feature space described in claim 5 with a feature input of a certain attribute. Is the smallest and the attribute pattern input different from the attribute is calculated. The feature amount that is a mixture of the feature amount input patterns is input to the neural network, An image recognition method characterized by changing the mixture ratio of the learning with the number of times of learning. 8. The image recognition method according to claim 5, 2. The method according to claim 1, further comprising: Horizontal and vertical coordinates of the rectangular element obtained from the two-dimensional image, and quantization Of the attribute of the object represented by the number of pixels of the three-dimensional array of In the charge pattern, The reference vector obtained from the average and variance of the distribution of the feature patterns of the components of the three-dimensional array and The dispersion vector is stored on the wired or wireless receiving side, and is described in claim 1 on the transmitting side. Of the two-dimensional image represented by the number of pixels in the three-dimensional array The input pattern is encoded, encrypted and transmitted, and the received 2D image is received on the receiving side. The neural network according to claim 5, wherein the feature amount of the image is decoded and decoded. Image recognition for determining the attribute of a target represented by the feature of a two-dimensional image using a workpiece Method. 9. The image recognition method described in claim 1 and described in claim 1. A plurality of two-dimensional images of objects in a three-dimensional space belonging to an arbitrary attribute, or Instead of an image database, a two-dimensional image rotates a three-dimensional object at fixed angles. 3. The method according to claim 1, wherein the image is a two-dimensional image taken by rotating the image. An image recognition method using the number of pixels in a dimensional configuration and frequency components.