JP4238537B2

JP4238537B2 - Image processing device

Info

Publication number: JP4238537B2
Application number: JP2002222712A
Authority: JP
Inventors: 典司加藤; 洋次鹿志村; 仁池田
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-07-31
Filing date: 2002-07-31
Publication date: 2009-03-18
Anticipated expiration: 2022-07-31
Also published as: JP2004062719A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置、特に画像データからのパターン検出が容易となるように行う前処理に関する。
【０００２】
【従来の技術】
計算機を用いて、画像データから、特定のパターンを識別するためには、パターンそのものの形状の検出を確実に行う必要と、含まれているパターンのサイズや回転角度等が様々であることに対応する必要とがある。
【０００３】
前者に対する従来例としては、線形部分空間法、サポートベクトルマシン（ＳＶＭ）法、カーネル非線形部分空間法などがある。線形部分空間法では、複数のカテゴリ毎に部分空間を定め、未知のパターンがどの部分空間に最も関連しているかを評価し、そのパターンの属するカテゴリを判定している。しかし、この方法においては、カテゴリが多く、パターンの次元が低い場合には、検出精度が低下してしまう。また、非線形性をもつパターン分布に対する識別精度も低いという問題がある。
【０００４】
ＳＶＭ法は、カーネル関数を媒介に定義した非線形変換により、低次元のパターンを高次元に写像することで、非線形性をもつパターン分布の識別を可能とする方法である。しかし、２つのカテゴリの分類しか行うことができない点や、必要な計算量が多い点に問題を抱える。
【０００５】
カーネル非線形部分空間法は、これらの問題を解決するパターン識別方法として考案され、特開２０００−９０２７４号公報に開示されている。この方法は、ＳＶＭ法と同様に、カーネル関数を用いて定義した非線形変換によりパターンを高次元空間に写像し、この高次元空間上で部分空間法を実施している。
【０００６】
後者、つまり、様々なサイズや回転角度をもったパターンに対しては、従来は、非常に多くの学習サンプルを用いることで対応してきた。すなわち、上で述べた各パターン識別法などは、一般に特徴的なパターンをもつ学習サンプルを用いて、その特徴を示すカテゴリの分布を定めていく学習をおこなう。そこで、この学習サンプルとして、サイズや角度が様々に変えられたパターンを用いるだけでなく、サイズと角度を組み合わせた変形がなされた非常に多くのパターンについても用いる必要があった。
【０００７】
【発明が解決しようとする課題】
しかしながら、前記カーネル非線形部分空間法では、部分空間を張る基底ベクトルが、全学習サンプルの非線形空間への写像に基づいて定義されるため、学習サンプルが多くなると、依然として多くの計算が必要となる問題があった。また、上で述べたように、従来は、汎用的なパターン識別を行うためには、非常に多くの学習サンプルをもちいなければならない問題があった。本発明の課題は、画像中で、サイズや回転角度などが様々であることが多い人間や動物の顔パターンを、高速かつ高精度に識別する手段を確立する点にある。
【０００８】
【課題を解決するための手段】
本発明の画像処理装置は、非線形変換を利用して、与えられた画像データ中の顔パターンに対してパターン識別手段が識別を行えるように、識別が実施可能となる条件下へと画像データを正規化変換する画像処理装置において、前記正規化変換のための個々の部分変換に対応してそれぞれ設けられた学習手段と、前記正規化変換のための個々の部分変換に対応してそれぞれ設けられた部分変換検出手段と、変換実施手段と、繰り返し制御手段と、を備え、前記各学習手段は、それぞれ、当該学習手段に対応する部分変換についての変換の大きさが異なる複数の学習サンプルを用いて学習を行うことにより、前記非線形変換によって定義された空間Ｆの部分空間であって前記複数の学習サンプルに対応する部分空間を構築するとともに、構築した当該部分空間の各基底ベクトルと、当該部分空間についての変換の大きさと、の対応関係を求め、前記各部分変換検出手段は、それぞれ、当該部分変換検出手段に対応する部分変換に対応して前記学習手段が構築した部分空間に対し対象画像データの前記空間Ｆへの写像ベクトルΦを射影した射影ベクトルΦ'が、当該部分空間の各基底ベクトルのどれに近いかを求め、求めた基底ベクトルと前記対応関係とに基づき、前記対象画像データに対応する変換の大きさを求める変換の大きさ評価手段と、前記写像ベクトルΦと前記射影ベクトルΦ'との距離又は当該距離の単調増加関数である推定誤差を求める推定誤差評価手段と、を備え、前記変換実施手段は、全ての前記部分変換検出手段の中で最も小さい推定誤差を与える部分変換検出手段を判定し、判定した部分変換検出手段に対応する部分変換を、当該部分変換検出手段の前記変換の大きさ評価手段が求めた変換の大きさで、前記対象画像データに施し、前記繰り返し制御手段は、前記与えられた画像データを前記対象画像データとして前記各部分変換検出手段及び前記変換実施手段を動作させ、この動作において当該変換実施手段が前記対象画像データに部分変換を施すことにより得られた更新された画像データを新たな前記対象画像データとして前記各部分変換検出手段及び前記変換実施手段を動作させるという処理を繰り返す、ことを特徴とする。
【０００９】
また、本発明の画像処理装置は、前記部分変換には、サイズ、回転、シフトの少なくともひとつが含まれることを特徴とする。
【００１０】
また、本発明の画像処理装置は、前記非線形変換はニューラルネットワークを用いた計算手段によって与えられることを特徴とする。
【００１１】
また、本発明の画像処理装置は、前記非線形変換はカーネル関数を用いた計算手段によって与えられることを特徴とする。
【００１２】
また、本発明の画像処理装置は、前記対象画像データに空間フィルタを施して粗視化する粗視化手段を有し、前記部分変換検出手段においては粗視化された前記対象画像データが用いられ、前記変換実施手段は粗視化されない元の前記対象画像データに対し前記部分変換を施す、ことを特徴とする。
【００１３】
また、本発明の画像処理装置は、前記各部分変換検出手段の演算は装置内に設けられた並列演算装置により並列的に処理されることを特徴とする。
【００１４】
また、本発明の画像処理方法は、非線形変換を利用して、与えられた画像データ中の顔パターンに対してパターン識別方法により識別を行えるように、識別が実施可能となる条件下へと画像データを正規化変換する画像処理方法において、前記正規化変換のための個々の部分変換に対応してそれぞれ実行される学習工程と、前記正規化変換のための個々の部分変換に対応してそれぞれ実行される部分変換検出工程と、変換実施工程と、繰り返し制御工程と、を含み、前記各学習工程は、それぞれ、当該学習工程に対応する部分変換についての変換の大きさが異なる複数の学習サンプルを用いて学習を行うことにより、前記非線形変換によって定義された空間Ｆの部分空間であって前記複数の学習サンプルに対応する部分空間を構築するとともに、構築した当該部分空間の各基底ベクトルと、当該部分空間についての変換の大きさと、の対応関係を求め、前記各部分変換検出工程は、それぞれ、当該部分変換検出工程に対応する部分変換に対応して前記学習工程が構築した部分空間に対し対象画像データの前記空間Ｆへの写像ベクトルΦを射影した射影ベクトルΦ'が、当該部分空間の各基底ベクトルのどれに近いかを求め、求めた基底ベクトルと前記対応関係とに基づき、前記対象画像データに対応する変換の大きさを求める変換の大きさ評価工程と、前記写像ベクトルΦと前記射影ベクトルΦ'との距離又は当該距離の単調増加関数である推定誤差を求める推定誤差評価工程と、を含み、前記変換実施工程は、全ての前記部分変換検出工程の中で最も小さい推定誤差を与える部分変換検出工程を判定し、判定した部分変換検出工程に対応する部分変換を、当該部分変換検出工程中の前記変換の大きさ評価工程が求めた変換の大きさで、前記対象画像データに施し、前記繰り返し制御工程は、前記与えられた画像データを前記対象画像データとして前記各部分変換検出工程及び前記変換実施工程を実行し、この実行において当該変換実施工程が前記対象画像データに部分変換を施すことにより得られた更新された画像データを新たな前記対象画像データとして前記各部分変換検出工程及び前記変換実施工程を実行するという処理を繰り返す、
ことを特徴とする。
【００１５】
また、本発明の画像処理プログラムは、コンピュータに、非線形変換を利用して、与えられた画像データ中の顔パターンに対してパターン識別手順が識別を行えるように、識別が実施可能となる条件下へと画像データを正規化変換させる画像処理プログラムにおいて、前記コンピュータに、前記正規化変換のための個々の部分変換に対応してそれぞれ実行される学習手順と、前記正規化変換のための個々の部分変換に対応してそれぞれ実行される部分変換検出手順と、変換実施手順と、繰り返し制御手順と、を実行させる画像処理プログラムであって、前記各学習手順は、それぞれ、当該学習手順に対応する部分変換についての変換の大きさが異なる複数の学習サンプルを用いて学習を行うことにより、前記非線形変換によって定義された空間Ｆの部分空間であって前記複数の学習サンプルに対応する部分空間を構築するとともに、構築した当該部分空間の各基底ベクトルと、当該部分空間についての変換の大きさと、の対応関係を求め、前記各部分変換検出手順は、それぞれ、当該部分変換検出手順に対応する部分変換に対応して前記学習手順が構築した部分空間に対し対象画像データの前記空間Ｆへの写像ベクトルΦを射影した射影ベクトルΦ'が、当該部分空間の各基底ベクトルのどれに近いかを求め、求めた基底ベクトルと前記対応関係とに基づき、前記対象画像データに対応する変換の大きさを求める変換の大きさ評価手順と、前記写像ベクトルΦと前記射影ベクトルΦ'との距離又は当該距離の単調増加関数である推定誤差を求める推定誤差評価手順と、を含み、前記変換実施手順は、全ての前記部分変換検出手順の中で最も小さい推定誤差を与える部分変換検出手順を判定し、判定した部分変換検出手順に対応する部分変換を、当該部分変換検出手順中の前記変換の大きさ評価手順が求めた変換の大きさで、前記対象画像データに施し、前記繰り返し制御手順は、前記与えられた画像データを前記対象画像データとして前記各部分変換検出手順及び前記変換実施手順を実行し、この実行において当該変換実施手順が前記対象画像データに部分変換を施すことにより得られた更新された画像データを新たな前記対象画像データとして前記各部分変換検出手順及び前記変換実施手順を実行するという処理を繰り返す、ことを特徴とする。
【００１６】
【発明の実施の形態】
以下に、本発明の好適な実施形態を図面を用いて説明する。図中、同一構成となるものについては説明を省略する。
【００１７】
図１のブロック図は、本発明の実施の形態に係る装置の構成を示している。装置は、演算を行うＣＰＵ２をはじめ、記憶部４、利用者の指示入力部６、表示部８、データ入力部１０、データ出力部１２、およびアプリケーションソフトウエア入力部１４を含む構成となっており、これらはデータを通信する通信網によって結ばれている。すなわち、この装置は、一般的なコンピュータ上で、本発明のアルゴリズムを記載したアプリケーションソフトウエアを実行することで実現される。利用者は、ＣＤ−ＲＯＭ等の記憶媒体や、ネットワークを介して頒布されたアプリケーションソフトウエアを、そのアプリケーションソフトウエア入力部１４を用いてコンピュータに入力し、キーボード等の指示入力部６を使ってＣＰＵ２に実行させる。ＣＰＵ２の動作は、オペレーティングシステム（ＯＳ）と呼ばれるソフトウエアの管理下にあり、利用者ならびにアプリケーションソフトウエアの指示は、このＯＳを通じてＣＰＵ２に伝えられる。本実施形態のアプリケーションソフトウエアやＯＳを始めとする演算実行上必要な情報は、メモリやハードディスク等からなる記憶部４によって一時的または恒久的に保持される。また、実行にあたって必要となる画像データは、ＣＣＤカメラ、スキャナ、ＣＤ−ＲＯＭ等の記憶媒体、あるいはネットワークによるデータ取得等のデータ入力部１０を通して得られる。そして、必要な演算は、そして、必要な演算がＣＰＵ２によって成されると、処理された画像データは、ＭＯ等の記憶媒体、ネットワークによるデータ転送、プリンタ等のデータ出力部１２を通じて出力される。また、利用者は、ディスプレイなどの表示部８によって、処理前後の画像データ等を見ることができる。
【００１８】
図２は、ＣＰＵ２によって行われる画像処理演算の概略を示すブロック図である。データ入力部１０から入力された画像データは、画像正規化部２０によって正規化変換を受け、さらにパターン識別部１００によって詳細なパターンの識別をされる。なお、ここで言う正規化とは、顔パターンの大きさ、回転角度、位置、明るさなどの条件を、パターン識別部１００の想定する状態（これを正規形と呼ぶことにする）へと変換することである。
【００１９】
画像正規化部２０で行われる正規化のための変換は基本的な部分変換からなる要素に分割されており、各部分変換をどのように行えばよいかは、それぞれの部分変換に対応した部分変換検出部２６が算出する。図示した例では、画像データのサイズ（拡大と縮小）に関係したサイズ部分変換検出部２６ａ、画像データの回転角度に関係した回転部分変換検出部２６ｂ、画像データのシフト（平行移動）に関係したシフト部分変換検出部２６ｃの３つの部分変換検出部２６を備える。これらの部分変換検出部２６は、後で詳しく述べるように、画像データに粗視化のための空間フィルタを施して得た粗視データに対して部分変換の状態検出を行い、その結果を正規化処理部２８に渡す。そして、正規化処理部２８は、変換にともなう誤差が最も小さいと判定された変換を画像データに施す。この一連の過程は、通常何度か繰り返され、最終的には、サイズ、回転、シフトの全てについて正規化が行われることになる。もちろん、顔パターンの状況によっては、繰り返しを行わないことも可能である。
【００２０】
パターン識別部１００は、画像正規化部２０によって正規化が行われた画像データに対し、空間フィルタを用いて様々な解像度の粗視データへと粗視化する処理を行い、さらにこの粗視データに対し、顔パターンの識別を行う顔識別部１０２を実行する。図示した例においては、主成分分析のモードを適当な次元だけ足し合わせる粗視化がなされており、２５次元の粗視データに対する顔識別部１０２ａと、１００次元の粗視データに対する顔識別部１０２ｂをはじめ、その間の解像度にも複数の顔識別部１０２が設けられている。また、最も高い次元である１００次元の粗視データに対し、顔パターン以外が含まれることを判定する反例検出部１０４が設けられている。後で詳細に記すように、顔パターンの識別は解像度が一番低い２５次元の顔識別部１０２ａから行われ、顔パターンがある可能性が高いと判定された場合には、次に低い解像度の顔識別部１０２が判定に用いられる。そして、最も解像度が高い１００次元においても、顔パターンのある可能性が高いと判定された場合には、最後に反例検出が行われる。
【００２１】
以下では、画像正規化部２０とパターン識別部１００の詳細な説明を行う。
【００２２】
図３のブロック図は、画像正規化部２０の構成の概略を示している。入力された画像データは、記憶部４に設けられた画像保持部３０に保持される。そして、正規化用粗視化部３２において、この画像データに対し空間解像度を落として大まかな特徴を取り出す粗視化を行い粗視データを得る。この粗視化のために用いる空間フィルタ手段は特に限定されないが、例えば、適当な画像データに対する主成分分析で得たモード成分のうち寄与率の大きな所定次元数のモード成分の和を算出する方法や、フーリエ分解を行い所定の解像度以上のモード成分を取り出す方法などを用いる。粗視化を行う理由は、データ量を減少させ、次に述べる正規化が高速で実行可能になることにある。
【００２３】
続いて粗視データは、並列的に複数配置された部分変換検出部２６に送られる。各部分変換検出部２６では、図４に模式的に示したように、粗視データを画像空間Ｇ内のベクトルｘであるとみなし、パターン識別のための非線形変換によって作られる空間Ｆに写像する。この空間Ｆに写像されたベクトルを写像ベクトルと呼ぶことにし、Φ（ｘ）と書く。部分変換検出部２６は、例えば、サイズと、回転と、シフトについて検出する場合には、サイズ部分変換検出部２６ａ、回転部分変換検出部２６ｂ、シフト部分変換検出部２６ｃからなる。そして、各部分変換検出部２６には、正規化用部分空間学習部３４、正規化用部分空間射影部３６、部分変換評価部３７が含まれ、さらに部分変換評価部３７には変換の大きさ評価部３８と推定誤差評価部４０が含まれる。空間Ｆ内には、正規化用部分空間学習部３４が学習サンプルを用いて事前に学習サンプルに特徴的なカテゴリを表す部分空間Ωを構築しており、写像ベクトルΦ（ｘ）は、正規化用部分空間射影部３６によって、この部分空間Ωに射影される。この射影されたベクトルを射影ベクトルと呼びΦ’（ｘ）と表記する。そして、変換の大きさ評価部３８は、射影ベクトルΦ’（ｘ）が部分空間Ωを張る基底ベクトルのうちのどれに近いかを評価して、変換に必要な大きさを算出する。例えば、サイズ部分変換検出部２６ａにおいては、学習時に、基底ベクトルΦ₁は約１．５倍の大きさをもつ学習サンプルの近傍にあり、他の基底ベクトルΦ₂は約２倍の大きさをもつ学習サンプルの近傍にあるといった対応関係を示すルックアップテーブルを作成している。変換の大きさ検出部３８ａは、このルックアップテーブルを参照して、現在の顔パターンを正規形に変換するためには何倍に拡大すればよいのかを算出することができる。また、推定誤差評価部４０は、写像ベクトルΦ（ｘ）と射影ベクトルΦ’（ｘ）の距離Ｅを基にして推定誤差を算出する。これは、距離Ｅが近ければ射影に含まれる誤差は小さく、距離Ｅが大きければ射影結果は大きな誤差を含むであろうと判断されることを意味する。
【００２４】
これらの結果は、変換判定部４２と変換実施部４４とを含む正規化処理部２８に渡される。そして、変換判定部４２は、どの部分変換検出部２６の推定誤差が最小となるかを判定する。例えば、回転部分変換検出部２６ｂの推定誤差が一番小さいときには、変換実施部４４が、対応する変換の大きさ（すなわち回転させる角度）の分だけもとの画像データを回転させ、画像保持部３０のもつ画像を更新する。更新された画像データは、必要に応じて、さらに複数回、同様の正規化を施される。繰り返しの基準は様々に考えられるが、例えば、あらかじめ所定の回数を設定する方法や、実空間において適当な対比データとから算出した相関、あるいは前記変換の大きさ評価手段が求めた値を所定の閾値と比較する方法などを用いることも可能である。
【００２５】
次に、部分変換検出部２６において用意される空間Ｆをカーネル関数を用いて構築する手段について、数学的表現を交えて詳細に説明する。カーネル関数を用いる方法において特徴的なことは、上で述べた写像ベクトルΦ（ｘ）の作成方法が陽に示されないことである。
【００２６】
粗視データを表す画像空間Ｇ上のｄ次元ベクトルｘを、ｄ_F次元の空間Ｆに写像する式（１）の非線形写像は、適当なカーネル関数ｋ（ｘ，ｙ）を選ぶことで、式（２）の関係を満たすように決められる。
【００２７】
【数１】

ここで、φ_i（ｘ）は適当なカーネル関数の固有関数であり、対応する固有値をλ_iである（ｉ＝１，．．．，ｎ）。
【００２８】
次に、粗視データのカテゴリを分類するｍ次元部分空間Ωを、空間Ｆに張る方法及びその学習方法を説明する。まず、部分空間Ωの基底ベクトルの初期値として、画像空間Ｇ上のｍ個のベクトルｘ₁，．．．，ｘ_m（以下ではプレイメージと呼ぶ）に対応した部分空間Ω上のベクトルΦ₁，．．．，Φ_mを適当に決める。具体的には、例えば、一様乱数を発生させてランダムに与える。ここで、画像空間上の学習サンプルを示すｄ次元ベクトルｘを用いて、この部分空間Ωを修正するように、プレイメージを学習させることを考える。学習サンプルのベクトルｘの空間Ｆへの写像Φ（ｘ）を部分空間Ωに射影したベクトルΦ’（ｘ）は、基底ベクトルの一次結合で表現される。その結合係数をα_iとすると、この射影と、もとの写像ベクトルΦ（ｘ）との距離Ｅは式（３）−（５）で表される。
【００２９】
【数２】

ここで、式（５）への変形には、カーネル関数の定義式（２）を用いている。また、係数α_iは、射影の定義に従いＥが最小の値をとるように、式（６）で与えられる。行列Ｋは、ｋ（ｘ_i，ｘ_j）を（ｉ，ｊ）成分とする行列である。
【００３０】
プレイメージの学習では、部分空間Ωと学習サンプルｘ_iとの距離を最も減少させる方向にプレイメージをΔｘ_i動かす。このΔｘ_iは最急降下法によって式（７）で与えられる。
【００３１】
【数３】

ここで、ηは学習係数であり、正の定数である。また、行列ｇ_ab（ｘ）は、非線形写像によって空間Ｆに埋め込まれている多様体の計量テンソルであり、カーネル関数を用いて式（８）で与えられている。この学習は、高次元空間の線形最適化問題なので、非線形最適化問題に比べ収束性が良く、短時間で終了する。
【００３２】
次にカーネル関数の学習方法について記す。カーネル関数としては、初期には、ガウス関数カーネルや、多項式カーネルなどの既知の関数を与える。学習中には、カーネル関数を式（９）の等角写像によって変形する。
【００３３】
【数４】

その学習則は、学習サンプルに対する係数α_iのばらつきが、どの係数α_iに対しても均一になるようにＣ（ｘ）を与えるものとする。具体的には、係数α_iのばらつきが既定値に対して大きい場合は、係数α_iに対応する部分空間の基底ベクトルのプレイメージｘ_i近傍に関して、Ｃ（ｘ）の値を大きくする。これにより、ｘ_iの近傍は空間Ｆにおいて、式（１０）のように拡大される。
【００３４】
【数５】

したがって、係数α_iを大きな値とする学習サンプルの数は相対的に減少し、係数α_iの学習サンプルに対するばらつきは減少する。逆に係数α_iのばらつきが既定値に対して小さい場合は、係数α_iに対応する基底ベクトルのプレイメージｘ_i近傍に関してＣ（ｘ）の値を小さくする。なお、ここで述べた方法では、Ｃ（ｘ）は部分空間Ωの基底のプレイメージに対してしか適用できないが、プレイメージ近傍に関してはプレイメージにおけるＣ（ｘ）の値を式（１１）のように外挿することで変更が可能となる。
【００３５】
【数６】

ここで、学習に用いる学習サンプルの与え方について説明する。例えば、回転に関する正規化を行う場合には、画像中において正規化の対象となる顔パターンが画像の中心位置に正立（頭が上に、顎が下に配置される）する画像データを複数枚用意し、これらに対し−１８０度から１８０度までの範囲で一様乱数を用いて与えた角度、または等間隔に与えた角度に回転させる。また、シフトについては、同じく顔パターンが画像の中心位置に正立した画像を複数枚用意し、縦方向および横方向に、例えば半値幅が適当なピクセル数をもつガウス分布の乱数に従ってシフトさせる。乱数で与える代わりに確率密度が一様となるように規則的に与えても良い。サイズの場合にも、同様にして、顔パターンが画像の中心位置に正立した画像を拡大および縮小させれば良い。このようにして学習を行うことで、学習サンプルのもつ変換の大きさ（例えば回転の場合にはその角度）と、学習サンプルの部分空間への射影の関係が明らかになる。具体的には、例えば係数α₁が大きければ９０度程度回転したものであるといった関係が導かれる。これを詳細に調べ、ルックアップテーブルや、適切な関数を作成することで、変換の大きさ評価部３８の評価手段が確立する。
【００３６】
以上の学習手続きにより、非線形変換で写像される空間Ｆに、粗視データをカテゴリ分けする部分空間Ωが張られる。学習の過程においては、プレイメージの学習およびカーネル関数の学習を交互に複数回反復するのが望ましいが、学習サンプルがあまり複雑でない場合には、どちらかの方法を１回だけ行うなどの簡略化をすることも可能である。
【００３７】
最後に、学習が完了し正規化が行われる段階において、画像正規化部２０が実行される手順の主要部分を図５に示したフローチャートを用いて説明する。画像データが入力される（Ｓ１）と、正規化用粗視化部３２は空間フィルタを用いて粗視データを作成する（Ｓ２）。粗視データは、サイズ部分変換検出部２６ａ、回転部分変換検出部２６ｂ、シフト部分変換検出部２６ｃに送られる。正規化用部分空間射影部３６は、式（６）で定義される射影の一次結合の係数α_iを求める（Ｓ３）。このα_iの求め方は、必ずしも式（６）の定義に従う必要はなく、適当な反復法を用いて式（５）のＥが最小となるように求めても良い。次に、変換の大きさ評価部３８が、こうして得られたα_iをルックアップテーブルと比較する等して変換の大きさを求め（Ｓ４）、推定誤差評価部４０は、Ｅの大きさ、あるいはＥの単調増加関数値を推定誤差として算出する（Ｓ５）。正規化処理部２８における変換判定部４２は、推定誤差が最小となる部分変換検出部２６を判定し（Ｓ６）、もとの画像データに対して、対応する変換の大きさで、変換を行う。こうして得られた画像データは、適当な判断基準に従って、再変換されるか否かが決められる（Ｓ８）。なお、先にも述べたように、この一連の演算において、式（１）で定義される非線形変換は直接は用いられず、したがって、その形状を知る必要もない。
【００３８】
図６に、サイズ、回転、シフトの各要素からなる正規化をおこなった結果を示す。この実験は、図の右側の写真で示したように、目の近傍を写した２つの写真が正規化されていく様子を、一回の変換毎に追跡したものである。右上の一連の写真では、初期（左上）に反転している写真が、最初のステップで約９０度半時計回りに回転され、次のステップでやや左にシフトされ、といった変換を受け、最後には正立した所望の大きさに正規化されている。左側の３次元のグラフは、この正規化の過程における、サイズ（倍率）、角度（度）、距離（ピクセル）を逐次追跡したものである。左上の黒丸は、初期の写真が、１８０度の回転と、１．３倍程度の拡大と、若干のシフトを受けていることを示している。そして、一回の変換毎に３つの座標軸のいずれか一つに沿って移動し、最終的に右側の正規化された位置に移っている。右下の一連の写真、及び対応する左のグラフの白丸も同様の流れを示しており、この場合には、拡大を中心に正規化が行われている。なお、ここでは、顔パターン全体ではなく目の近傍に限定しているが、顔パターン全体とした場合にも基本的な効果は全くかわらない。ただし、顔パターン全体とした場合には、図示した例とは、学習サンプルを変えなければならないことは言うまでもない。
【００３９】
なお、正規化用粗視化部３２で用いる空間フィルタの解像度には任意性があるが、ここで示した例では、主成分分析の方法により２５次元程度の粗視化を行っている。また、空間Ｆに張る部分空間Ωの次元もいろいろな値を取ることが可能であるが、ここでは２５次元とした。学習サンプルの数は、検出に必要な精度にもよるが、例えば、１００人程度の顔パターンを、各部分変換検出部２６で、一人につき１００通り程度変化させればよい。この結果、部分変換検出部２６を３つ用いた場合には、全学習サンプル数は３万程度になる。一方、本実施の形態を用いずに同じ自由度を与えると、全学習サンプル数は１００万程度になってしまう。したがって、本実施形態を用いることで学習サンプル数を格段に軽減できることがわかる。また、部分変換検出部２６の検出する部分変換は、ここでは、サイズ、回転、シフトとした。これらの要素は、特に限定されないが、単純な変換をおこなうと変換が容易となる。すなわち、サイズおよび回転については、一次変換で記述できる形式を用い、シフトについては剪断性をもたない一様な平行移動を用いると良い。もちろん、扱うパターンの特性に応じて、これよりも複雑な変換を割り当てることもできる。また、画像データの輝度に関する変換等を割り当てることも可能である。
【００４０】
上に説明した非線形変換は、カーネル関数を用いて定義された。しかし、非線形変換の構築方法には任意性がある。ここではニューラルネットワークのアルゴリズムに従ったオートエンコーダを用いて非線形変換を行う方法について説明する。
【００４１】
図７に、オートエンコーダの概略を示す。オートエンコーダは、多層のパーセプトロンの一種であり、入力層６０のニューロン数と、出力層６２のニューロン数が同じで、中間層６４のニューロン数はこれよりも少なくなっている。
【００４２】
このオートエンコーダを部分変換検出部２６として用いるためには、次のようにする。まず、カーネル関数を用いる場合と同様にして作成した学習サンブルを入力層６０へ入力するとともに、同じ値を教師信号として出力層６２に与え、恒等写像を実現するように各シナプスの重みを学習させる。この学習は通常のバックプロパゲーション法で行うことができる。
【００４３】
こうして学習されたオートエンコーダの出力層６２の出力は、非線形変換による写像が作る空間Ｆを表現しているとみなすことができる。また、オートエンコーダの中間層６４のニューロンの出力は、空間Ｆ内に張られたカテゴリを分類する部分空間Ωへの射影に相当する。したがって、入力層６０に粗視データを入力し、中間層６４の出力を得ることで、正規化用部分空間射影部３６を実現することができる。また、学習時に、学習サンプルの特徴と中間層６４の出力との関係を調べ、ルックアップテーブルを作成することで、変換の大きさ評価部３８を実施することができる。さらに、推定誤差評価部４０が評価する推定誤差は、入力層６０のベクトルと出力層６２のベクトルとの距離、あるいはその単調増加の関数によって算出可能である。この距離が変換の精度に対応していることは、距離が短いほど空間Ｆへの写像が入力を精度よく近似できていることから明らかである。
【００４４】
以上に、画像正規化部２０によって、画像データを正規化する様子を説明した。ここからは、画像正規化部２０が出力した画像データから顔パターンを識別する、パターン識別部１００について説明する。
【００４５】
図８は、パターン識別部１００の構成を示すブロック図である。パターン識別部１００は、複数の空間解像度をもつ識別用粗視化部１０６と、各識別用粗視化部１０６に接続された顔識別部１０２、そして反例検出部１０４からなる。識別用粗視化部１０６は、画像正規化部２０における正規化用粗視化部３２と同様に、画像データに空間フィルタを施して粗視データを出力する役割を果たしている。その解像度は自由に設定でき、ここでは最低次元を２５次元、最高次元を１００次元とし、その間にも複数の識別用粗視化部１０６を設けている。顔識別部１０２は、各識別用粗視化部１０６に設けられており、顔パターンの識別を行う。
【００４６】
入力された画像データは、まず、空間解像度が最も低い識別用粗視化部１０６ａに入力され、図示した例においては、２５次元の粗視データに変換される。そして、粗視データは、顔識別部１０２ａに入力される。顔識別部１０２ａは、識別用部分空間学習部１０８ａ、識別用部分空間射影部１１０ａ、および識別用判定部１１２ａを含んでおり、画像正規化部２０で説明した部分変換検出部２６とよく似た動作を行う。すなわち、入力された粗視データは、カーネル関数で定義される非線形変換によって空間Ｆに式（１）のように写像される。この空間Ｆでは、識別用部分空間学習部１０８ａによって事前に学習が行われており、学習サンプルの顔パターンを特徴づける部分空間Ωが張られている。識別用部分空間射影部１１０ａは、空間Ｆに写像された写像ベクトルを、この部分空間Ωに射影する。これにより、射影された射影ベクトルの一次結合の係数α_iが決められ、射影の垂線の長さＥが式（５）から得られる。識別用判定部１１２ａは、両者の位置関係、すなわちＥの大きさを適当な閾値などで評価して、このデータを部分空間Ωのカテゴリに含めるか否かを判定する。閾値の決定は、適当なサンプルデータに対する正答率に基づくなどして決めればよい。判定の結果、顔パターンが含まれている可能性が高いと判断されると、次に解像度の低い識別用粗視化部１０６及び、対応する顔識別部１０２が実行される。
【００４７】
反例検出部１０４は、顔識別部１０２と同様の構成をしており、反例用部分空間学習部１１４、反例用部分空間射影部１１６、および反例用判定部１１８を含んでいる。顔識別部１０２との違いは、反例用部分空間学習部１１４によって、顔以外のパターンが学習される点である。すなわち、顔以外のパターンを含む学習サンプルを用いて、顔以外のパターンが含まれることを特徴とする部分空間Ωが形成される。反例用部分空間射影部１１６が非線形変換の写像をこの部分空間Ωに射影する点と、反例用判定部１１８が写像ベクトルと射影ベクトルの位置関係に基づいて分類を行う点は同じである。
【００４８】
顔識別部１０２と、反例検出部１０４の学習の方法も、画像正規化部において説明した方法と同様である。すなわち、顔識別部１０２においては、識別したい顔パターンの学習サンプルを複数用意し、それをもとに、部分空間の基底ベクトルに対応するプレイメージの更新と、カーネルの変形を行う。なお、このパターン識別部１００は、通常、画像正規化部２０によって正規化された画像データに対してパターン認識を行う。したがって、顔パターンは正規化されていることが期待できるので、学習サンプルはサイズ、回転角度、シフト等に関して正規化されたものだけを用いればよい。反例検出部１０４の学習サンプルとしては顔パターン以外のものを用いればよい。ただし、一般に顔識別部１０２によって識別しにくいものを学習させることで効果を発揮するので、正規化された顔パターンに類似した紛らわしいものを中心に学習させておくとよい。
【００４９】
図１０に、ここで述べた識別を試験的に実施した結果を示す。左側は、本発明を用いずに、５０次元に粗視化されたデータに対してのみ検出を実行している。一方、右側は、本実施例を用いた場合で、２５次元、５０次元、１００次元の３つの解像度に、顔識別部１０２を用いて階層的に検出を行った結果である。ただし、反例検出部１０４は含めていない。使用した画像データは、ひとつの画像データの中に複数の顔を含んでおり、その中から顔パターンを検出したものである。いずれも９０％の確率で顔を検出できる。横軸は、ひとつの画像データの中から顔以外のパターンを誤って見つけた個数であり、縦軸はその比率を示している。従来の方法では、間違いが無かった比率は６３パーセントで、間違いが１つだけ合った比率は２２パーセントであった。本発明では、この値はそれぞれ、８２パーセントと１４パーセントになっている。この結果、画像一つあたりの誤検出率は、０．４０個から、０．２４個に向上している。もちろん、１００次元の高解像度での検出には多くの計算時間を必要とするが、本実施形態では、２５次元の解像度において顔パターンが含まれる可能性が低いと判定した場合にはそれ以上の解像度での検出を行わないので、無駄な計算時間を必要とせず、効率的で高精度な検出が達成できている。なお、図示はしないが、この実験においてさらに、顔パターンが含まれないことを検出する反例検出部１０４を各解像度に含めた場合には、誤検出率はほぼ０になり、その有効性が確認できている。
【００５０】
最後に、本実施の形態における特徴的な点を列挙しておく。本実施の形態の画像正規化部２０により、入力された画像データにおける顔パターンの正規化を、非常に少ない学習サンプルをもとに学習しただけで、実現することができる。また、回転、拡大と縮小、平行移動などに分類して正規化を行うため、対応した学習サンプルだけを用いて学習させればよく、非常に効率的な学習が可能となる。また、正規化をニューラルネットワークを用いて行うため、非線形性をもつパターン分布に対しても容易に正規化を行うことができる。また、正規化をカーネル関数で定義された非線形変換を利用して行うため、非線形性をもつパターン分布に対しても精度よく正規化を行うことができる。また、並列計算機を用いて正規化を行えば、迅速な正規化の実行が可能となる。
【００５１】
本実施の形態のパターン識別部１００により、本質的に非線形性を有する顔パターンの特徴を、非線形変換を用いて高精度に識別できる。また、低分解能から高分解能へと階層化された判定を行うため、顔パターンが含まれないと容易に判定できるものに時間をかけることなく高速に識別できる。また、反例を検出する手段を併用することで、判定の精度が向上する。また、カーネル関数で定義される非線形変換を用いてパターンの識別が行われるので、信頼性の高い識別が可能となる。また、カテゴリを表す部分空間を、非常に高速に構築することができる。また、学習サンプルをもちいて部分空間における基底ベクトルを効率良く張り直すことができる。また、学習サンプルを用いてカーネル関数を容易に変形できるので、パターンの識別の向上を容易に図る事が可能となる。また、並列計算機を用いることで、各解像度におけるパターンの識別を効率良く計算することができる。
【００５２】
これら画像正規化部２０とパターン識別部１００は、お互いに補完しあうことで、非常に高精度で高速な顔パターンの識別が可能になる。
【図面の簡単な説明】
【図１】本実施形態の計算機の構成を示す概略図である。
【図２】画像正規化部およびパターン識別部の概略を示すブロック図である。
【図３】画像正規化部の詳細を示すブロック図である。
【図４】非線形変換の様子を表す模式図である。
【図５】画像正規化部の処理手順を示すフローチャートである。
【図６】画像正規化部の試験結果を示す図である。
【図７】画像正規化部に用いるオートエンコーダの概略図である。
【図８】パターン識別部の概略を示すブロック図である。
【図９】パターン識別部の処理手順を示すフローチャートである。
【図１０】パターン識別部の試験結果を示す図である。
【符号の説明】
２０画像正規化部、２６部分変換検出部、２８正規化処理部、１００パターン識別部、１０２顔識別部、１０４反例検出部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus, and more particularly to preprocessing performed so as to facilitate pattern detection from image data.
[0002]
[Prior art]
In order to identify a specific pattern from image data using a computer, it is necessary to reliably detect the shape of the pattern itself, and the size and rotation angle of the included pattern vary. There is a need to do.
[0003]
Conventional examples of the former include a linear subspace method, a support vector machine (SVM) method, and a kernel nonlinear subspace method. In the linear subspace method, a subspace is defined for each of a plurality of categories, the subspace to which the unknown pattern is most related is evaluated, and the category to which the pattern belongs is determined. However, in this method, when there are many categories and the pattern dimension is low, the detection accuracy is lowered. In addition, there is a problem that the discrimination accuracy for a pattern distribution having nonlinearity is low.
[0004]
The SVM method is a method that makes it possible to identify a pattern distribution having nonlinearity by mapping a low-dimensional pattern to a high dimension by nonlinear transformation defined with a kernel function as a medium. However, there is a problem in that only two categories can be classified and a large amount of calculation is required.
[0005]
  The kernel nonlinear subspace method is devised as a pattern identification method for solving these problems.0-90274issueIt is disclosed in the publication. In this method, similarly to the SVM method, a pattern is mapped to a high-dimensional space by nonlinear transformation defined using a kernel function, and the subspace method is performed on this high-dimensional space.
[0006]
The latter, that is, patterns having various sizes and rotation angles have conventionally been dealt with by using a very large number of learning samples. That is, in each of the pattern identification methods described above, learning is generally performed by using a learning sample having a characteristic pattern to determine a distribution of categories indicating the characteristic. Therefore, it is necessary to use not only a pattern whose size and angle are changed variously as a learning sample, but also a very large number of patterns which are modified by combining size and angle.
[0007]
[Problems to be solved by the invention]
However, in the above-mentioned kernel nonlinear subspace method, since the basis vectors that extend the subspace are defined based on the mapping of all learning samples to the nonlinear space, a large number of computations are still required when the number of learning samples increases. was there. Further, as described above, conventionally, in order to perform general-purpose pattern identification, there is a problem that a very large number of learning samples must be used. An object of the present invention is to establish a means for identifying a human or animal face pattern, which often has various sizes and rotation angles, in an image at high speed and with high accuracy.
[0008]
[Means for Solving the Problems]
  The image processing apparatus of the present inventionUsing nonlinear transformation,In an image processing apparatus that normalizes and converts image data to a condition that enables identification so that the pattern identification unit can identify the face pattern in the given image data,Learning means respectively provided corresponding to each partial conversion for the normalization conversion,Normalization transformFor individualVs partial conversionIn responsePartial conversion detection means provided respectively;A conversion execution unit and a repetition control unit, wherein each learning unit performs learning using a plurality of learning samples having different conversion sizes for partial conversion corresponding to the learning unit, respectively. A subspace of the space F defined by the non-linear transformation and constructing a subspace corresponding to the plurality of learning samples, and each basis vector of the constructed subspace and the size of the transformation for the subspace, , And each of the partial conversion detection means is configured to transfer the target image data to the space F with respect to the partial space constructed by the learning means corresponding to the partial conversion corresponding to the partial conversion detection means. The projection vector Φ ′ obtained by projecting the mapping vector Φ is determined as to which of the basis vectors of the subspace is close, and the obtained basis vector and the corresponding relationship And a conversion magnitude evaluation means for obtaining the magnitude of the conversion corresponding to the target image data, and a distance between the mapping vector Φ and the projection vector Φ ′ or an estimation error which is a monotonically increasing function of the distance. Estimation error evaluation means, the conversion execution means,All ofAbovePartial conversiondetectionDetermine the partial conversion detection means that gives the smallest estimation error among the means,Determined partial conversion detection meansPartial conversion corresponding to, The conversion size obtained by the conversion size evaluation unit of the partial conversion detection unit, and the targetApplied to image dataShi,The repetitive control unit operates the partial conversion detection unit and the conversion execution unit using the given image data as the target image data, and the conversion execution unit performs partial conversion on the target image data in this operation. Repeating the process of operating each of the partial conversion detection means and the conversion execution means using the updated image data obtained as a result of the new target image data,It is characterized by that.
[0009]
  The image processing apparatus of the present invention isAboveThe partial conversion includes at least one of size, rotation, and shift.
[0010]
  The image processing apparatus of the present invention isAboveThe nonlinear transformation is given by a calculation means using a neural network.
[0011]
  The image processing apparatus of the present invention isAboveThe nonlinear transformation is given by a calculation means using a kernel function.
[0012]
  The image processing apparatus of the present invention isThe subjectCoarse-graining means for performing coarse-graining by applying a spatial filter to image data,AboveCoarse-grained in the partial conversion detection meansThe subjectImage data is usedThe conversion executing means performs the partial conversion on the original target image data that is not coarse-grained.It is characterized by that.
[0013]
  The image processing apparatus of the present invention isAboveThe calculation of each partial conversion detection means is processed in parallel by a parallel calculation device provided in the apparatus.
[0014]
  Further, the image processing method of the present invention includes:Using nonlinear transformation,In an image processing method for normalizing and converting image data to a condition where identification can be performed so that identification can be performed by a pattern identification method on a face pattern in given image data,A learning step executed corresponding to each partial transformation for the normalization transformation,Normalization transformFor individualVs partial conversionIn responseRespectivelyExecutedPartial conversion detectorAbout,A conversion execution step and a repetitive control step, and each learning step learns by using a plurality of learning samples with different conversion sizes for partial conversion corresponding to the learning step, A subspace of the space F defined by the non-linear transformation and constructing a subspace corresponding to the plurality of learning samples, and each basis vector of the constructed subspace and the size of the transformation for the subspace, , And each of the partial conversion detection steps is configured so that each of the target image data is transferred to the space F with respect to the partial space constructed by the learning step corresponding to the partial conversion corresponding to the partial conversion detection step. The projection vector Φ ′ obtained by projecting the mapping vector Φ is determined as to which of the basis vectors of the subspace is close, and the obtained basis vector and the corresponding relationship And a conversion magnitude evaluation step for obtaining a magnitude of the transformation corresponding to the target image data, and obtaining an estimation error that is a distance between the mapping vector Φ and the projection vector Φ ′ or a monotonically increasing function of the distance. An estimation error evaluation step, and the conversion execution step includes:All ofAbovePartial conversiondetectionDetermine the partial conversion detection process that gives the smallest estimation error in the process,Determined partial conversion detection processPartial conversion corresponding to, The conversion size obtained in the conversion size evaluation step in the partial conversion detection step, and the targetApplied to image dataShi,The iterative control step executes the partial conversion detection step and the conversion execution step using the given image data as the target image data, and in this execution, the conversion execution step performs partial conversion on the target image data. Repeating the process of executing each of the partial conversion detection step and the conversion execution step using the updated image data obtained as a result of the new target image data,
It is characterized by that.
[0015]
  The image processing program of the present invention is stored in a computer.Using nonlinear transformation,In an image processing program that normalizes and converts image data to a condition where identification can be performed so that a pattern identification procedure can identify a face pattern in given image data,A learning procedure executed in correspondence with each partial transformation for the normalization transformation in the computer; andNormalization transformFor individualVs partial conversionIn responseRespectivelyExecutedPartial conversion detection procedureWhen,An image processing program for executing a conversion execution procedure and a repetitive control procedure, wherein each learning procedure uses a plurality of learning samples having different conversion sizes for partial conversion corresponding to the learning procedure. By performing learning, a subspace corresponding to the plurality of learning samples is constructed in the subspace of the space F defined by the non-linear transformation, and each base vector of the constructed subspace and the portion A correspondence relationship between the size of the transformation with respect to the space is obtained, and each of the partial transformation detection procedures is a target image with respect to the partial space constructed by the learning procedure corresponding to the partial transformation corresponding to the partial transformation detection procedure. The projection vector Φ ′ obtained by projecting the mapping vector Φ into the space F of the data is determined to be close to each of the basis vectors of the subspace, Based on the calculated basis vector and the correspondence relationship, a transform magnitude evaluation procedure for obtaining a transform magnitude corresponding to the target image data, a distance between the mapping vector Φ and the projection vector Φ ′, or An estimation error evaluation procedure for obtaining an estimation error that is a monotonically increasing function, and the conversion execution procedure includes:All ofAbovePartial conversiondetectionDetermine the partial conversion detection procedure that gives the smallest estimation error in the procedure,Judged partial conversion detection procedurePartial conversion corresponding to, The conversion size obtained by the conversion size evaluation procedure in the partial conversion detection procedure, and the targetApplied to image dataShi,In the iterative control procedure, the partial conversion detection procedure and the conversion execution procedure are executed using the given image data as the target image data. In this execution, the conversion execution procedure performs partial conversion on the target image data. Repeating the process of executing each of the partial conversion detection procedure and the conversion execution procedure using the updated image data obtained by the above as new target image data,It is characterized by that.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
Preferred embodiments of the present invention will be described below with reference to the drawings. In the figure, description of components having the same configuration is omitted.
[0017]
The block diagram of FIG. 1 shows the configuration of an apparatus according to an embodiment of the present invention. The apparatus includes a CPU 2 that performs calculations, a storage unit 4, a user instruction input unit 6, a display unit 8, a data input unit 10, a data output unit 12, and an application software input unit 14. These are connected by a communication network for communicating data. That is, this apparatus is realized by executing application software describing the algorithm of the present invention on a general computer. A user inputs a storage medium such as a CD-ROM or application software distributed via a network to the computer using the application software input unit 14 and uses an instruction input unit 6 such as a keyboard. The CPU 2 is made to execute. The operation of the CPU 2 is under the control of software called an operating system (OS), and instructions from the user and application software are transmitted to the CPU 2 through this OS. Information necessary for execution of operations such as application software and OS according to the present embodiment is temporarily or permanently held by the storage unit 4 including a memory and a hard disk. Further, image data necessary for execution is obtained through a storage medium such as a CCD camera, scanner, CD-ROM, or data input unit 10 such as data acquisition by a network. Then, when necessary calculations are performed by the CPU 2, the processed image data is output through a storage medium such as an MO, data transfer via a network, and a data output unit 12 such as a printer. Further, the user can view the image data before and after the processing by the display unit 8 such as a display.
[0018]
FIG. 2 is a block diagram showing an outline of the image processing calculation performed by the CPU 2. The image data input from the data input unit 10 is subjected to normalization conversion by the image normalization unit 20, and further, a detailed pattern is identified by the pattern identification unit 100. Note that normalization here refers to conversion of conditions such as the size, rotation angle, position, and brightness of the face pattern into a state assumed by the pattern identification unit 100 (this will be referred to as a normal form). It is to be.
[0019]
The conversion for normalization performed in the image normalization unit 20 is divided into elements consisting of basic partial conversions, and how each partial conversion should be performed depends on the part corresponding to each partial conversion. The conversion detection unit 26 calculates. In the illustrated example, the size partial conversion detection unit 26a related to the size (enlargement and reduction) of the image data, the rotation partial conversion detection unit 26b related to the rotation angle of the image data, and the shift (parallel movement) of the image data. Three partial conversion detection units 26 of the shift partial conversion detection unit 26c are provided. As will be described in detail later, these partial conversion detection units 26 perform partial conversion state detection on the coarse-grained data obtained by applying a spatial filter for coarse-graining to the image data, and normalize the result. To the processing unit 28. Then, the normalization processing unit 28 performs the conversion that is determined to have the smallest error accompanying the conversion on the image data. This series of processes is usually repeated several times, and finally, normalization is performed for all of the size, rotation, and shift. Of course, depending on the situation of the face pattern, it is possible not to repeat.
[0020]
The pattern identification unit 100 performs a process of coarse-graining the image data that has been normalized by the image normalization unit 20 into coarse-grained data having various resolutions using a spatial filter. On the other hand, the face identification unit 102 for identifying the face pattern is executed. In the illustrated example, coarse-graining is performed in which the principal component analysis modes are added by appropriate dimensions, and a face identification unit 102a for 25-dimensional coarse-grained data and a face identification unit 102b for 100-dimensional coarse-grained data. In addition, a plurality of face identification units 102 are also provided for the resolution between them. Further, a counterexample detection unit 104 is provided that determines whether the highest dimension of 100-dimensional coarse-grained data includes a face pattern other than the face pattern. As will be described in detail later, the face pattern is identified from the 25-dimensional face identification unit 102a having the lowest resolution, and if it is determined that there is a high possibility that there is a face pattern, The face identification unit 102 is used for determination. If it is determined that there is a high possibility of a face pattern even in 100 dimensions with the highest resolution, counterexample detection is performed last.
[0021]
Hereinafter, the image normalization unit 20 and the pattern identification unit 100 will be described in detail.
[0022]
The block diagram of FIG. 3 shows an outline of the configuration of the image normalization unit 20. The input image data is held in an image holding unit 30 provided in the storage unit 4. Then, in the normalizing coarse-grain unit 32, coarse-graining data is obtained by performing coarse-graining to extract a rough feature by reducing the spatial resolution of the image data. The spatial filter means used for the coarse-graining is not particularly limited. For example, a method of calculating the sum of mode components having a predetermined dimensionality having a large contribution ratio among mode components obtained by principal component analysis with respect to appropriate image data. Alternatively, a method of extracting a mode component having a predetermined resolution or higher by performing Fourier decomposition is used. The reason for coarse-graining is that the amount of data is reduced and normalization described below can be performed at high speed.
[0023]
Subsequently, the coarse-grained data is sent to the partial conversion detectors 26 arranged in parallel. As shown schematically in FIG. 4, each partial conversion detection unit 26 regards the coarse-grained data as a vector x in the image space G and maps it to a space F created by nonlinear conversion for pattern identification. . The vector mapped to the space F is called a mapped vector and is written as Φ (x). For example, when detecting the size, rotation, and shift, the partial conversion detection unit 26 includes a size partial conversion detection unit 26a, a rotation partial conversion detection unit 26b, and a shift partial conversion detection unit 26c. Each partial conversion detection unit 26 includes a normalization subspace learning unit 34, a normalization subspace projection unit 36, and a partial conversion evaluation unit 37. The partial conversion evaluation unit 37 further includes a conversion size. An evaluation unit 38 and an estimation error evaluation unit 40 are included. In the space F, the subspace learning unit 34 for normalization uses the learning sample to construct a subspace Ω representing a characteristic category of the learning sample in advance, and the mapping vector Φ (x) is normalized. The partial space projection unit 36 projects this partial space Ω. This projected vector is called a projected vector and expressed as Φ ′ (x). Then, the conversion magnitude evaluation unit 38 evaluates which of the base vectors the projection vector Φ ′ (x) is close to the partial space Ω, and calculates the magnitude required for the conversion. For example, in the size partial conversion detection unit 26a, during learning, the basis vector Φ₁Is in the vicinity of a training sample that is approximately 1.5 times as large as other basis vectors Φ₂Creates a look-up table showing a correspondence relationship in the vicinity of a learning sample having a size about twice as large. The conversion magnitude detection unit 38a can refer to the lookup table and calculate how many times the current face pattern should be enlarged in order to convert it into a normal form. In addition, the estimation error evaluation unit 40 calculates an estimation error based on the distance E between the mapping vector Φ (x) and the projection vector Φ ′ (x). This means that if the distance E is close, the error included in the projection is small, and if the distance E is large, it is determined that the projection result will include a large error.
[0024]
These results are passed to a normalization processing unit 28 including a conversion determination unit 42 and a conversion execution unit 44. Then, the conversion determination unit 42 determines which partial conversion detection unit 26 has the smallest estimation error. For example, when the estimation error of the rotation part conversion detection unit 26b is the smallest, the conversion execution unit 44 rotates the original image data by the corresponding conversion size (that is, the rotation angle), and the image holding unit 30 images are updated. The updated image data is subjected to the same normalization a plurality of times as necessary. There are various repetition criteria. For example, a method of setting a predetermined number of times in advance, a correlation calculated from appropriate contrast data in real space, or a value obtained by the magnitude evaluation means of the conversion is predetermined. It is also possible to use a method of comparing with a threshold.
[0025]
Next, means for constructing the space F prepared in the partial conversion detection unit 26 using a kernel function will be described in detail with mathematical expressions. What is characteristic in the method using the kernel function is that the method for creating the mapping vector Φ (x) described above is not explicitly shown.
[0026]
A d-dimensional vector x on the image space G representing coarse-grained data is expressed as d_FThe nonlinear mapping of Expression (1) that maps to the dimensional space F is determined so as to satisfy the relationship of Expression (2) by selecting an appropriate kernel function k (x, y).
[0027]
[Expression 1]

Where φ_i(X) is an eigenfunction of an appropriate kernel function, and the corresponding eigenvalue is represented by λ_i(I = 1,..., N).
[0028]
Next, a method of extending the m-dimensional subspace Ω for classifying the coarse-grained data category in the space F and a learning method thereof will be described. First, as an initial value of a basis vector of the subspace Ω, m vectors x on the image space G₁,. . . , X_mVector Φ on the subspace Ω corresponding to (hereinafter referred to as pre-image)₁,. . . , Φ_mDetermine appropriately. Specifically, for example, a uniform random number is generated and given at random. Here, let us consider that the pre-image is learned so as to correct this subspace Ω by using the d-dimensional vector x indicating the learning sample in the image space. A vector Φ ′ (x) obtained by projecting the mapping Φ (x) of the learning sample vector x onto the space F onto the subspace Ω is expressed by a linear combination of basis vectors. The coupling coefficient is α_iThen, the distance E between this projection and the original mapping vector Φ (x) is expressed by equations (3)-(5).
[0029]
[Expression 2]

Here, the kernel function definition formula (2) is used for the transformation into the formula (5). The coefficient α_iIs given by equation (6) so that E takes the minimum value according to the definition of projection. The matrix K is k (x_i, X_j) Is a matrix having (i, j) components.
[0030]
In pre-image learning, subspace Ω and learning sample x_iΔx in the direction that reduces the distance between_imove. This Δx_iIs given by equation (7) by the steepest descent method.
[0031]
[Equation 3]

Here, η is a learning coefficient and is a positive constant. Matrix g_ab(X) is a manifold metric tensor embedded in the space F by a non-linear mapping, and is given by Equation (8) using a kernel function. Since this learning is a linear optimization problem in a high-dimensional space, it has better convergence than a nonlinear optimization problem and is completed in a short time.
[0032]
Next, the kernel function learning method is described. As a kernel function, a known function such as a Gaussian function kernel or a polynomial kernel is given initially. During learning, the kernel function is transformed by the conformal mapping of equation (9).
[0033]
[Expression 4]

The learning rule is the coefficient α for the learning sample_iWhich coefficient α_iAlso, C (x) is given so as to be uniform. Specifically, the coefficient α_iIf the variation of_iPreimage x of the subspace basis vector corresponding to_iFor the neighborhood, the value of C (x) is increased. As a result, x_iIs expanded in the space F as shown in Expression (10).
[0034]
[Equation 5]

Therefore, the coefficient α_iThe number of training samples with a large value decreases relatively, and the coefficient α_iThe variation for the learning sample is reduced. Conversely, coefficient α_iIf the variation of_iPreimage x of basis vectors corresponding to_iDecrease the value of C (x) for the neighborhood. In the method described here, C (x) can be applied only to the base pre-image of the subspace Ω, but for the vicinity of the pre-image, the value of C (x) in the pre-image is expressed by Equation (11). Thus, the change can be made by extrapolation.
[0035]
[Formula 6]

Here, how to give a learning sample used for learning will be described. For example, when normalization related to rotation is performed, a plurality of pieces of image data in which the face pattern to be normalized in the image is erected at the center position of the image (the head is located above and the chin is located below) Prepare them and rotate them to an angle given using uniform random numbers in the range of -180 degrees to 180 degrees, or angles given at equal intervals. As for the shift, a plurality of images in which the face pattern is erected at the center position of the image is prepared, and the image is shifted in the vertical and horizontal directions, for example, according to a random number of Gaussian distribution having an appropriate number of pixels at half width. Instead of giving a random number, it may be given regularly so that the probability density is uniform. Similarly, in the case of the size, an image in which the face pattern is upright at the center position of the image may be enlarged and reduced. By performing learning in this way, the relationship between the magnitude of the conversion of the learning sample (for example, its angle in the case of rotation) and the projection of the learning sample onto the subspace becomes clear. Specifically, for example, the coefficient α₁If is large, a relationship of 90 degrees rotation is derived. By examining this in detail and creating a lookup table or an appropriate function, the evaluation means of the conversion size evaluation unit 38 is established.
[0036]
Through the learning procedure described above, the subspace Ω for categorizing the coarse-grained data is created in the space F mapped by the nonlinear transformation. In the learning process, it is desirable to repeat the pre-image learning and kernel function learning multiple times alternately. However, if the learning sample is not very complicated, either method is simplified. It is also possible to do.
[0037]
Finally, the main part of the procedure executed by the image normalization unit 20 when learning is completed and normalization is performed will be described with reference to the flowchart shown in FIG. When image data is input (S1), the normalizing coarse-grain unit 32 creates coarse-grained data using a spatial filter (S2). The coarse-grained data is sent to the size partial conversion detection unit 26a, the rotation partial conversion detection unit 26b, and the shift partial conversion detection unit 26c. The normalization subspace projection unit 36 is a linear combination coefficient α of the projection defined by the equation (6)._iIs obtained (S3). This α_iHowever, it is not always necessary to follow the definition of Equation (6), and an appropriate iterative method may be used so that E in Equation (5) is minimized. Next, the conversion magnitude evaluation unit 38 obtains α thus obtained._iIs compared with a lookup table to determine the magnitude of the conversion (S4), and the estimation error evaluation unit 40 calculates the magnitude of E or a monotonically increasing function value of E as an estimation error (S5). The conversion determination unit 42 in the normalization processing unit 28 determines the partial conversion detection unit 26 that minimizes the estimation error (S6), and converts the original image data with the corresponding conversion size. . Whether or not the image data obtained in this way is reconverted is determined according to an appropriate judgment criterion (S8). As described above, in this series of operations, the non-linear transformation defined by the equation (1) is not directly used, and therefore it is not necessary to know its shape.
[0038]
FIG. 6 shows the result of normalization consisting of size, rotation, and shift elements. In this experiment, as shown in the photograph on the right side of the figure, the manner in which two photographs showing the vicinity of the eyes are normalized is tracked for each conversion. In the upper right series of photos, the photo inverted in the initial (upper left) is rotated about 90 degrees counterclockwise in the first step, shifted slightly to the left in the next step, etc. Is normalized to the desired upright size. The three-dimensional graph on the left is obtained by sequentially tracking the size (magnification), angle (degree), and distance (pixel) in the normalization process. The black circle in the upper left indicates that the initial photo has undergone a 180 ° rotation, an enlargement of about 1.3 times, and a slight shift. And it moves along any one of three coordinate axes for every conversion, and finally moves to the normalized position on the right side. A series of photographs at the lower right and the corresponding white circles on the left graph show the same flow. In this case, normalization is performed mainly for enlargement. In this case, the face pattern is limited to the vicinity of the eyes, not the entire face pattern, but the basic effect is not changed at all when the entire face pattern is used. However, when the entire face pattern is used, it goes without saying that the learning sample must be changed from the illustrated example.
[0039]
Note that the resolution of the spatial filter used in the normalizing coarse-grain unit 32 is arbitrary, but in the example shown here, coarse-graining of about 25 dimensions is performed by the principal component analysis method. Also, the dimension of the subspace Ω spanning the space F can take various values, but here it is 25 dimensions. Although the number of learning samples depends on the accuracy required for detection, for example, a face pattern of about 100 people may be changed by each partial conversion detection unit 26 by about 100 ways per person. As a result, when three partial conversion detection units 26 are used, the total number of learning samples is about 30,000. On the other hand, if the same degree of freedom is given without using this embodiment, the total number of learning samples will be about one million. Therefore, it can be seen that the number of learning samples can be significantly reduced by using this embodiment. In addition, the partial conversion detected by the partial conversion detection unit 26 is assumed to be size, rotation, and shift here. These elements are not particularly limited, but conversion is facilitated by simple conversion. That is, for the size and rotation, a format that can be described by linear transformation is used, and for the shift, uniform translation without shearing is preferably used. Of course, more complex conversions can be assigned according to the characteristics of the pattern to be handled. It is also possible to assign a conversion related to the luminance of the image data.
[0040]
The nonlinear transformation described above was defined using a kernel function. However, the nonlinear transformation construction method is arbitrary. Here, a method for performing non-linear transformation using an auto encoder according to an algorithm of a neural network will be described.
[0041]
FIG. 7 shows an outline of the auto encoder. The auto encoder is a kind of multilayer perceptron. The number of neurons in the input layer 60 and the number of neurons in the output layer 62 are the same, and the number of neurons in the intermediate layer 64 is smaller.
[0042]
In order to use this auto encoder as the partial conversion detection unit 26, the following is performed. First, a learning sample created in the same manner as in the case of using a kernel function is input to the input layer 60, and the same value is given as a teacher signal to the output layer 62 to learn the weight of each synapse so as to realize the identity mapping. Let This learning can be performed by a normal back propagation method.
[0043]
The output of the output layer 62 of the auto encoder learned in this way can be regarded as expressing the space F created by the mapping by nonlinear transformation. Further, the neuron output of the intermediate layer 64 of the auto encoder corresponds to the projection onto the subspace Ω that classifies the categories stretched in the space F. Therefore, the normalization partial space projection unit 36 can be realized by inputting coarse-grained data to the input layer 60 and obtaining the output of the intermediate layer 64. Further, at the time of learning, the relationship between the characteristics of the learning sample and the output of the intermediate layer 64 is examined, and the conversion magnitude evaluation unit 38 can be implemented by creating a lookup table. Further, the estimation error evaluated by the estimation error evaluation unit 40 can be calculated by the distance between the vector of the input layer 60 and the vector of the output layer 62 or a monotonically increasing function thereof. The fact that this distance corresponds to the accuracy of conversion is clear from the fact that the mapping to the space F can approximate the input more accurately as the distance is shorter.
[0044]
In the foregoing, the manner in which the image normalization unit 20 normalizes the image data has been described. From here, the pattern identification unit 100 that identifies a face pattern from the image data output by the image normalization unit 20 will be described.
[0045]
FIG. 8 is a block diagram showing the configuration of the pattern identification unit 100. The pattern identification unit 100 includes a recognition coarse-graining unit 106 having a plurality of spatial resolutions, a face identification unit 102 connected to each identification coarse-graining unit 106, and a counterexample detection unit 104. Similar to the normalization coarse-graining unit 32 in the image normalization unit 20, the identification coarse-graining unit 106 plays a role of applying a spatial filter to image data and outputting coarse-grained data. The resolution can be freely set. Here, the lowest dimension is 25 dimensions and the highest dimension is 100 dimensions, and a plurality of identification coarse-graining units 106 are provided between them. The face identification unit 102 is provided in each identification coarse-graining unit 106 and identifies a face pattern.
[0046]
The input image data is first input to the identification coarse-grain unit 106a having the lowest spatial resolution, and is converted into 25-dimensional coarse-grained data in the illustrated example. The coarse-grained data is input to the face identification unit 102a. The face identification unit 102a includes an identification subspace learning unit 108a, an identification subspace projection unit 110a, and an identification determination unit 112a, and is very similar to the partial conversion detection unit 26 described in the image normalization unit 20. Perform the action. That is, the input coarse-grained data is mapped to the space F as shown in Expression (1) by nonlinear transformation defined by a kernel function. In this space F, learning is performed in advance by the identification subspace learning unit 108a, and a subspace Ω characterizing the face pattern of the learning sample is stretched. The identification subspace projection unit 110a projects the mapping vector mapped to the space F onto the subspace Ω. As a result, the coefficient α of the linear combination of the projected projection vector_iAnd the length E of the projection perpendicular is obtained from equation (5). The determination unit for identification 112a evaluates the positional relationship between the two, that is, the magnitude of E using an appropriate threshold or the like, and determines whether to include this data in the subspace Ω category. The threshold value may be determined based on the correct answer rate for appropriate sample data. As a result of the determination, if it is determined that there is a high possibility that a face pattern is included, the identification coarse-graining unit 106 and the corresponding face identifying unit 102 having the next lowest resolution are executed.
[0047]
The counterexample detection unit 104 has the same configuration as the face identification unit 102, and includes a counterexample subspace learning unit 114, a counterexample subspace projection unit 116, and a counterexample determination unit 118. The difference from the face identification unit 102 is that a pattern other than a face is learned by the counterexample subspace learning unit 114. That is, a subspace Ω that includes a pattern other than a face is formed using a learning sample including a pattern other than a face. The point that the counterexample subspace projection unit 116 projects the mapping of the non-linear transformation onto the subspace Ω is the same as the point that the counterexample determination unit 118 classifies based on the positional relationship between the mapping vector and the projection vector.
[0048]
The learning method of the face identification unit 102 and the counterexample detection unit 104 is the same as the method described in the image normalization unit. That is, the face identification unit 102 prepares a plurality of learning samples of the face pattern to be identified, and updates the pre-image corresponding to the subspace basis vectors and modifies the kernel based on the learning samples. The pattern identifying unit 100 normally performs pattern recognition on the image data normalized by the image normalizing unit 20. Therefore, since it can be expected that the face pattern is normalized, it is only necessary to use a learning sample that is normalized with respect to size, rotation angle, shift, and the like. A learning sample other than the face pattern may be used as the learning sample of the counterexample detection unit 104. However, since it is effective to learn what is difficult to identify by the face identifying unit 102 in general, it is preferable to learn mainly from a confusing thing similar to a normalized face pattern.
[0049]
FIG. 10 shows the result of the identification performed here as a test. On the left side, detection is performed only on data coarse-grained in 50 dimensions without using the present invention. On the other hand, the right side shows the result of hierarchical detection using the face identification unit 102 at three resolutions of 25 dimensions, 50 dimensions, and 100 dimensions when this embodiment is used. However, the counterexample detection unit 104 is not included. The used image data includes a plurality of faces in one image data, and a face pattern is detected from them. Both can detect faces with a probability of 90%. The horizontal axis is the number of patterns other than the face that are erroneously found in one image data, and the vertical axis indicates the ratio. In the conventional method, the ratio without error was 63%, and the ratio with only one error was 22%. In the present invention, this value is 82 percent and 14 percent, respectively. As a result, the false detection rate per image is improved from 0.40 to 0.24. Of course, detection at 100-dimensional high resolution requires a lot of calculation time, but in this embodiment, when it is determined that the possibility of including a face pattern is low at 25-dimensional resolution, more than that is required. Since detection at the resolution is not performed, efficient and highly accurate detection can be achieved without requiring unnecessary calculation time. Although not shown in the figure, in this experiment, when the counter example detection unit 104 that detects that a face pattern is not included is included in each resolution, the false detection rate is almost zero, and its effectiveness is confirmed. is made of.
[0050]
Finally, characteristic points in the present embodiment are listed. By the image normalization unit 20 of the present embodiment, normalization of the face pattern in the input image data can be realized only by learning based on a very small number of learning samples. In addition, since normalization is performed by classifying into rotation, enlargement and reduction, parallel movement, etc., it is sufficient to learn using only the corresponding learning sample, and very efficient learning is possible. Moreover, since normalization is performed using a neural network, normalization can be easily performed even for pattern distributions having nonlinearity. Further, since normalization is performed using a nonlinear transformation defined by a kernel function, normalization can be performed with high accuracy even for a pattern distribution having nonlinearity. Moreover, if normalization is performed using a parallel computer, normalization can be performed quickly.
[0051]
The pattern identifying unit 100 according to the present embodiment can identify a feature of a face pattern having nonlinearity with high accuracy using nonlinear transformation. In addition, since the hierarchical determination from the low resolution to the high resolution is performed, what can be easily determined that the face pattern is not included can be quickly identified without taking time. Moreover, the accuracy of determination is improved by using a means for detecting a counterexample together. In addition, since the pattern is identified using a nonlinear transformation defined by the kernel function, it is possible to identify with high reliability. In addition, a subspace representing a category can be constructed very quickly. In addition, the basis vectors in the subspace can be efficiently re-established using the learning sample. In addition, since the kernel function can be easily deformed using the learning sample, it is possible to easily improve the pattern identification. In addition, by using a parallel computer, pattern identification at each resolution can be efficiently calculated.
[0052]
The image normalization unit 20 and the pattern identification unit 100 complement each other so that the facial pattern can be identified with very high accuracy and high speed.
[Brief description of the drawings]
FIG. 1 is a schematic diagram showing a configuration of a computer according to the present embodiment.
FIG. 2 is a block diagram illustrating an outline of an image normalization unit and a pattern identification unit.
FIG. 3 is a block diagram illustrating details of an image normalization unit.
FIG. 4 is a schematic diagram showing a state of nonlinear conversion.
FIG. 5 is a flowchart illustrating a processing procedure of an image normalization unit.
FIG. 6 is a diagram illustrating test results of an image normalization unit.
FIG. 7 is a schematic diagram of an auto encoder used in an image normalization unit.
FIG. 8 is a block diagram illustrating an outline of a pattern identification unit.
FIG. 9 is a flowchart illustrating a processing procedure of a pattern identification unit.
FIG. 10 is a diagram illustrating a test result of a pattern identification unit.
[Explanation of symbols]
20 image normalization unit, 26 partial conversion detection unit, 28 normalization processing unit, 100 pattern identification unit, 102 face identification unit, 104 counterexample detection unit.

Claims

In an image processing apparatus that normalizes and converts image data to a condition where identification can be performed so that pattern identification means can identify a face pattern in given image data using non-linear transformation ,
Learning means provided corresponding to each partial conversion for the normalization conversion,
A partial conversion detecting means provided respectively correspond to individual partial conversion for the normalization transform,
Conversion means;
Repetitive control means;
With
Each of the learning means performs learning using a plurality of learning samples having different transformation sizes for the partial transformation corresponding to the learning means, thereby obtaining a subspace of the space F defined by the nonlinear transformation. And constructing a subspace corresponding to the plurality of learning samples, and obtaining a correspondence relationship between each basis vector of the constructed subspace and the magnitude of the transformation for the subspace,
Each of the partial conversion detection means,
A projection vector Φ ′ obtained by projecting the mapping vector Φ of the target image data onto the space F onto the partial space constructed by the learning unit corresponding to the partial conversion corresponding to the partial conversion detection unit is represented by each of the partial spaces. Determining which of the basis vectors is close, and based on the determined basis vector and the correspondence relationship, a transform size evaluation means for obtaining a transform size corresponding to the target image data;
An estimation error evaluation means for obtaining an estimation error which is a distance between the mapping vector Φ and the projection vector Φ ′ or a monotonically increasing function of the distance;
With
The conversion execution unit determines a partial conversion detection unit that gives the smallest estimation error among all the partial conversion detection units, and converts the partial conversion corresponding to the determined partial conversion detection unit to the partial conversion detection unit. the size of the conversion the conversion of magnitude evaluation means calculated, and facilities on the target image data,
The repetitive control unit operates the partial conversion detection unit and the conversion execution unit using the given image data as the target image data, and the conversion execution unit performs partial conversion on the target image data in this operation. Repeating the process of operating each of the partial conversion detection means and the conversion execution means using the updated image data obtained as a result of the new target image data,
An image processing apparatus.

An image processing apparatus according to claim 1,
The image processing apparatus according to claim 1, wherein the partial conversion includes at least one of size, rotation, and shift.

An image processing apparatus according to claim 1,
An image processing apparatus according to claim 1, wherein the non-linear transformation is given by a calculation means using a neural network.

An image processing apparatus according to claim 1,
The non-linear transformation is given by a calculation means using a kernel function.

An image processing apparatus according to any one of claims 1 to 4, wherein
Coarse-graining means for applying a spatial filter to the target image data to coarse-grain,
In the partial conversion detection means, the coarse-grained target image data is used ,
The conversion execution means performs the partial conversion on the original target image data that is not coarse-grained.
An image processing apparatus.

An image processing apparatus according to any one of claims 1 to 5,
2. An image processing apparatus according to claim 1, wherein the operations of the respective partial conversion detecting means are processed in parallel by a parallel operation device provided in the apparatus.

In an image processing method that normalizes and converts image data to a condition where identification can be performed so that a face pattern in given image data can be identified by a pattern identification method using nonlinear transformation ,
A learning step executed corresponding to each partial conversion for the normalization conversion;
A higher partial conversion detection Engineering executed each correspond to individual partial conversion for the normalization transform,
A conversion implementation process;
Repetitive control process;
Including
Each learning step performs learning using a plurality of learning samples having different transformation sizes for the partial transformation corresponding to the learning step, so that each learning step is performed in a subspace of the space F defined by the nonlinear transformation. And constructing a subspace corresponding to the plurality of learning samples, and obtaining a correspondence relationship between each basis vector of the constructed subspace and the magnitude of the transformation for the subspace,
Each of the partial conversion detection steps,
A projection vector Φ ′ obtained by projecting the mapping vector Φ of the target image data into the space F with respect to the partial space constructed by the learning process corresponding to the partial conversion corresponding to the partial conversion detection process is represented by each of the partial spaces. Determining which of the basis vectors is close, and based on the obtained basis vector and the correspondence relationship, a transform magnitude evaluation step for obtaining a transform size corresponding to the target image data;
An estimation error evaluation step for obtaining an estimation error which is a distance between the mapping vector Φ and the projection vector Φ ′ or a monotonically increasing function of the distance;
Including
The conversion execution step determines a partial conversion detection step that gives the smallest estimation error among all the partial conversion detection steps, and performs a partial conversion corresponding to the determined partial conversion detection step in the partial conversion detection step. the size of conversion the conversion of magnitude evaluation step is determined, and facilities on the target image data,
The iterative control step executes the partial conversion detection step and the conversion execution step using the given image data as the target image data, and in this execution, the conversion execution step performs partial conversion on the target image data. Repeating the process of executing each of the partial conversion detection step and the conversion execution step using the updated image data obtained as a result of the new target image data,
An image processing method.

On the computer,
In an image processing program that normalizes and converts image data to a condition where identification can be performed so that the pattern identification procedure can identify a face pattern in given image data using non-linear transformation To the computer,
A learning procedure that is executed corresponding to each partial transformation for the normalization transformation;
A partial conversion detection procedure executed each correspond to individual partial conversion for the normalization transform,
Conversion procedure,
Repetitive control procedures;
An image processing program for executing
Each learning procedure is performed in a subspace of the space F defined by the non-linear transformation by performing learning using a plurality of learning samples having different transformation sizes for the partial transformation corresponding to the learning procedure. And constructing a subspace corresponding to the plurality of learning samples, and obtaining a correspondence relationship between each basis vector of the constructed subspace and the magnitude of the transformation for the subspace,
The partial conversion detection procedures are respectively
A projection vector Φ ′ obtained by projecting the mapping vector Φ of the target image data into the space F with respect to the partial space constructed by the learning procedure corresponding to the partial conversion corresponding to the partial conversion detection procedure, Finding which of the basis vectors is close, and based on the obtained basis vector and the correspondence relationship, a transformation magnitude evaluation procedure for obtaining a magnitude of transformation corresponding to the target image data;
An estimation error evaluation procedure for obtaining an estimation error that is a distance between the mapping vector Φ and the projection vector Φ ′ or a monotonically increasing function of the distance;
Including
The conversion execution procedure determines a partial conversion detection procedure that gives the smallest estimation error among all the partial conversion detection procedures, and performs a partial conversion corresponding to the determined partial conversion detection procedure in the partial conversion detection procedure. the size of conversion the conversion of magnitude evaluation procedure was determined, and facilities on the target image data,
In the iterative control procedure, the partial conversion detection procedure and the conversion execution procedure are executed using the given image data as the target image data. In this execution, the conversion execution procedure performs partial conversion on the target image data. Repeating the process of executing each of the partial conversion detection procedure and the conversion execution procedure using the updated image data obtained by the above as new target image data,
An image processing program characterized by that.