JP2004062719A

JP2004062719A - Image processor

Info

Publication number: JP2004062719A
Application number: JP2002222712A
Authority: JP
Inventors: Sukeji Kato; 加藤　典司; Hirotsugu Kashimura; 鹿志村　洋次; Hitoshi Ikeda; 池田　仁
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-07-31
Filing date: 2002-07-31
Publication date: 2004-02-26
Anticipated expiration: 2022-07-31
Also published as: JP4238537B2

Abstract

<P>PROBLEM TO BE SOLVED: To establish a means for quickly and highly accurately identifying a face pattern of a person and an animal from a picture image. <P>SOLUTION: An image processor is provided with a plurality of partial transformation detecting parts 26 which are arranged parallel to each other. The partial transformation detecting parts 26, as a whole, detect transformation related to the total degrees of freedom necessary for normalization and each of the partial transformation detecting parts 26 detects transformation having partial degrees of freedom with respect to rough visual data, applies nonlinear transformation to rough visual data corresponding to image data to be normalized and projects the transformed rough visual data to a prestretched partial space indicative of characteristics of a learning sample to thereby calculate the magnitude of transformation necessary for normalization and an estimated error involved in the transformation. A transformation executing part 44 executes transformation corresponding to the partial transformation detecting part 26 which provides the smallest estimated error among all of the partial transformation detecting parts 26. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置、特に画像データからのパターン検出が容易となるように行う前処理に関する。
【０００２】
【従来の技術】
計算機を用いて、画像データから、特定のパターンを識別するためには、パターンそのものの形状の検出を確実に行う必要と、含まれているパターンのサイズや回転角度等が様々であることに対応する必要とがある。
【０００３】
前者に対する従来例としては、線形部分空間法、サポートベクトルマシン（ＳＶＭ）法、カーネル非線形部分空間法などがある。線形部分空間法では、複数のカテゴリ毎に部分空間を定め、未知のパターンがどの部分空間に最も関連しているかを評価し、そのパターンの属するカテゴリを判定している。しかし、この方法においては、カテゴリが多く、パターンの次元が低い場合には、検出精度が低下してしまう。また、非線形性をもつパターン分布に対する識別精度も低いという問題がある。
【０００４】
ＳＶＭ法は、カーネル関数を媒介に定義した非線形変換により、低次元のパターンを高次元に写像することで、非線形性をもつパターン分布の識別を可能とする方法である。しかし、２つのカテゴリの分類しか行うことができない点や、必要な計算量が多い点に問題を抱える。
【０００５】
カーネル非線形部分空間法は、これらの問題を解決するパターン識別方法として考案され、特開２００２−９０２７４公報に開示されている。この方法は、ＳＶＭ法と同様に、カーネル関数を用いて定義した非線形変換によりパターンを高次元空間に写像し、この高次元空間上で部分空間法を実施している。
【０００６】
後者、つまり、様々なサイズや回転角度をもったパターンに対しては、従来は、非常に多くの学習サンプルを用いることで対応してきた。すなわち、上で述べた各パターン識別法などは、一般に特徴的なパターンをもつ学習サンプルを用いて、その特徴を示すカテゴリの分布を定めていく学習をおこなう。そこで、この学習サンプルとして、サイズや角度が様々に変えられたパターンを用いるだけでなく、サイズと角度を組み合わせた変形がなされた非常に多くのパターンについても用いる必要があった。
【０００７】
【発明が解決しようとする課題】
しかしながら、前記カーネル非線形部分空間法では、部分空間を張る基底ベクトルが、全学習サンプルの非線形空間への写像に基づいて定義されるため、学習サンプルが多くなると、依然として多くの計算が必要となる問題があった。また、上で述べたように、従来は、汎用的なパターン識別を行うためには、非常に多くの学習サンプルをもちいなければならない問題があった。本発明の課題は、画像中で、サイズや回転角度などが様々であることが多い人間や動物の顔パターンを、高速かつ高精度に識別する手段を確立する点にある。
【０００８】
【課題を解決するための手段】
本発明の画像処理装置は、与えられた画像データ中の顔パターンに対してパターン識別手段が識別を行えるように、識別が実施可能となる条件下へと画像データを正規化変換する画像処理装置において、正規化変換を分割した複数の部分変換に対しそれぞれ設けられ、並列配置された部分変換検出手段と、各部分変換検出手段に設けられた部分変換評価手段であって、学習サンプルを用いた学習結果と画像データとの比較に基づいて、画像データを正規化するために必要となる部分変換の大きさ、及び、その変換にともなう推定誤差を評価する部分変換評価手段と、全ての部分変換評価手段の中で最も小さい推定誤差を与える部分変換検出手段を判定し、それに対応する部分変換を画像データに施す変換実施手段と、を備え、前記部分変換評価手段における学習結果と画像データとの比較は、非線形変換によって定義された空間において行われ、前記変換実施手段は、画像データに対して少なくとも一回実行されることを特徴とする。
【０００９】
また、本発明の画像処理装置は、部分変換検出手段が検出する部分変換には、サイズ、回転、シフトの少なくともひとつが含まれることを特徴とする。
【００１０】
また、本発明の画像処理装置は、非線形変換はニューラルネットワークを用いた計算手段によって与えられることを特徴とする。
【００１１】
また、本発明の画像処理装置は、非線形変換はカーネル関数を用いた計算手段によって与えられることを特徴とする。
【００１２】
また、本発明の画像処理装置は、画像データに空間フィルタを施して粗視化する粗視化手段を有し、部分変換検出手段においては粗視化された画像データが用いられることを特徴とする。
【００１３】
また、本発明の画像処理装置は、各部分変換検出手段の演算は装置内に設けられた並列演算装置により並列的に処理されることを特徴とする。
【００１４】
また、本発明の画像処理方法は、与えられた画像データ中の顔パターンに対してパターン識別方法により識別を行えるように、識別が実施可能となる条件下へと画像データを正規化変換する画像処理方法において、正規化変換を分割した複数の部分変換に対しそれぞれ設けられ、並列配置された部分変換検出工程段と、各部分変換検出工程に設けられた部分変換評価工程であって、学習サンプルを用いた学習結果と画像データとの比較に基づいて、画像データを正規化するために必要となる部分変換の大きさ、及び、その変換にともなう推定誤差を評価する部分変換評価工程と、全ての部分変換評価工程の中で最も小さい推定誤差を与える部分変換検出工程を判定し、それに対応する部分変換を画像データに施す変換実施工程と、を備え、前記部分変換評価工程における学習結果と画像データとの比較は、非線形変換によって定義された空間において行われ、前記変換実施工程は、画像データに対して少なくとも一回実行されることを特徴とする。
【００１５】
また、本発明の画像処理プログラムは、コンピュータに、与えられた画像データ中の顔パターンに対してパターン識別手順が識別を行えるように、識別が実施可能となる条件下へと画像データを正規化変換させる画像処理プログラムにおいて、正規化変換を分割した複数の部分変換に対しそれぞれ設けられ、並列配置された部分変換検出手順と、各部分変換検出手順に設けられた部分変換評価手順であって、学習サンプルを用いた学習結果と画像データとの比較に基づいて、画像データを正規化するために必要となる部分変換の大きさ、及び、その変換にともなう推定誤差を評価する部分変換評価手順と、全ての部分変換評価手順の中で最も小さい推定誤差を与える部分変換検出手順を判定し、それに対応する部分変換を画像データに施す変換実施手順と、を含み、前記部分変換評価手順における学習結果と画像データとの比較は、非線形変換によって定義された空間において行われ、前記変換実施手順は、画像データに対して少なくとも一回実行されることを特徴とする。
【００１６】
【発明の実施の形態】
以下に、本発明の好適な実施形態を図面を用いて説明する。図中、同一構成となるものについては説明を省略する。
【００１７】
図１のブロック図は、本発明の実施の形態に係る装置の構成を示している。装置は、演算を行うＣＰＵ２をはじめ、記憶部４、利用者の指示入力部６、表示部８、データ入力部１０、データ出力部１２、およびアプリケーションソフトウエア入力部１４を含む構成となっており、これらはデータを通信する通信網によって結ばれている。すなわち、この装置は、一般的なコンピュータ上で、本発明のアルゴリズムを記載したアプリケーションソフトウエアを実行することで実現される。利用者は、ＣＤ−ＲＯＭ等の記憶媒体や、ネットワークを介して頒布されたアプリケーションソフトウエアを、そのアプリケーションソフトウエア入力部１４を用いてコンピュータに入力し、キーボード等の指示入力部６を使ってＣＰＵ２に実行させる。ＣＰＵ２の動作は、オペレーティングシステム（ＯＳ）と呼ばれるソフトウエアの管理下にあり、利用者ならびにアプリケーションソフトウエアの指示は、このＯＳを通じてＣＰＵ２に伝えられる。本実施形態のアプリケーションソフトウエアやＯＳを始めとする演算実行上必要な情報は、メモリやハードディスク等からなる記憶部４によって一時的または恒久的に保持される。また、実行にあたって必要となる画像データは、ＣＣＤカメラ、スキャナ、ＣＤ−ＲＯＭ等の記憶媒体、あるいはネットワークによるデータ取得等のデータ入力部１０を通して得られる。そして、必要な演算は、そして、必要な演算がＣＰＵ２によって成されると、処理された画像データは、ＭＯ等の記憶媒体、ネットワークによるデータ転送、プリンタ等のデータ出力部１２を通じて出力される。また、利用者は、ディスプレイなどの表示部８によって、処理前後の画像データ等を見ることができる。
【００１８】
図２は、ＣＰＵ２によって行われる画像処理演算の概略を示すブロック図である。データ入力部１０から入力された画像データは、画像正規化部２０によって正規化変換を受け、さらにパターン識別部１００によって詳細なパターンの識別をされる。なお、ここで言う正規化とは、顔パターンの大きさ、回転角度、位置、明るさなどの条件を、パターン識別部１００の想定する状態（これを正規形と呼ぶことにする）へと変換することである。
【００１９】
画像正規化部２０で行われる正規化のための変換は基本的な部分変換からなる要素に分割されており、各部分変換をどのように行えばよいかは、それぞれの部分変換に対応した部分変換検出部２６が算出する。図示した例では、画像データのサイズ（拡大と縮小）に関係したサイズ部分変換検出部２６ａ、画像データの回転角度に関係した回転部分変換検出部２６ｂ、画像データのシフト（平行移動）に関係したシフト部分変換検出部２６ｃの３つの部分変換検出部２６を備える。これらの部分変換検出部２６は、後で詳しく述べるように、画像データに粗視化のための空間フィルタを施して得た粗視データに対して部分変換の状態検出を行い、その結果を正規化処理部２８に渡す。そして、正規化処理部２８は、変換にともなう誤差が最も小さいと判定された変換を画像データに施す。この一連の過程は、通常何度か繰り返され、最終的には、サイズ、回転、シフトの全てについて正規化が行われることになる。もちろん、顔パターンの状況によっては、繰り返しを行わないことも可能である。
【００２０】
パターン識別部１００は、画像正規化部２０によって正規化が行われた画像データに対し、空間フィルタを用いて様々な解像度の粗視データへと粗視化する処理を行い、さらにこの粗視データに対し、顔パターンの識別を行う顔識別部１０２を実行する。図示した例においては、主成分分析のモードを適当な次元だけ足し合わせる粗視化がなされており、２５次元の粗視データに対する顔識別部１０２ａと、１００次元の粗視データに対する顔識別部１０２ｂをはじめ、その間の解像度にも複数の顔識別部１０２が設けられている。また、最も高い次元である１００次元の粗視データに対し、顔パターン以外が含まれることを判定する反例検出部１０４が設けられている。後で詳細に記すように、顔パターンの識別は解像度が一番低い２５次元の顔識別部１０２ａから行われ、顔パターンがある可能性が高いと判定された場合には、次に低い解像度の顔識別部１０２が判定に用いられる。そして、最も解像度が高い１００次元においても、顔パターンのある可能性が高いと判定された場合には、最後に反例検出が行われる。
【００２１】
以下では、画像正規化部２０とパターン識別部１００の詳細な説明を行う。
【００２２】
図３のブロック図は、画像正規化部２０の構成の概略を示している。入力された画像データは、記憶部４に設けられた画像保持部３０に保持される。そして、正規化用粗視化部３２において、この画像データに対し空間解像度を落として大まかな特徴を取り出す粗視化を行い粗視データを得る。この粗視化のために用いる空間フィルタ手段は特に限定されないが、例えば、適当な画像データに対する主成分分析で得たモード成分のうち寄与率の大きな所定次元数のモード成分の和を算出する方法や、フーリエ分解を行い所定の解像度以上のモード成分を取り出す方法などを用いる。粗視化を行う理由は、データ量を減少させ、次に述べる正規化が高速で実行可能になることにある。
【００２３】
続いて粗視データは、並列的に複数配置された部分変換検出部２６に送られる。各部分変換検出部２６では、図４に模式的に示したように、粗視データを画像空間Ｇ内のベクトルｘであるとみなし、パターン識別のための非線形変換によって作られる空間Ｆに写像する。この空間Ｆに写像されたベクトルを写像ベクトルと呼ぶことにし、Φ（ｘ）と書く。部分変換検出部２６は、例えば、サイズと、回転と、シフトについて検出する場合には、サイズ部分変換検出部２６ａ、回転部分変換検出部２６ｂ、シフト部分変換検出部２６ｃからなる。そして、各部分変換検出部２６には、正規化用部分空間学習部３４、正規化用部分空間射影部３６、部分変換評価部３７が含まれ、さらに部分変換評価部３７には変換の大きさ評価部３８と推定誤差評価部４０が含まれる。空間Ｆ内には、正規化用部分空間学習部３４が学習サンプルを用いて事前に学習サンプルに特徴的なカテゴリを表す部分空間Ωを構築しており、写像ベクトルΦ（ｘ）は、正規化用部分空間射影部３６によって、この部分空間Ωに射影される。この射影されたベクトルを射影ベクトルと呼びΦ’（ｘ）と表記する。そして、変換の大きさ評価部３８は、射影ベクトルΦ’（ｘ）が部分空間Ωを張る基底ベクトルのうちのどれに近いかを評価して、変換に必要な大きさを算出する。例えば、サイズ部分変換検出部２６ａにおいては、学習時に、基底ベクトルΦ_１は約１．５倍の大きさをもつ学習サンプルの近傍にあり、他の基底ベクトルΦ_２は約２倍の大きさをもつ学習サンプルの近傍にあるといった対応関係を示すルックアップテーブルを作成している。変換の大きさ検出部３８ａは、このルックアップテーブルを参照して、現在の顔パターンを正規形に変換するためには何倍に拡大すればよいのかを算出することができる。また、推定誤差評価部４０は、写像ベクトルΦ（ｘ）と射影ベクトルΦ’（ｘ）の距離Ｅを基にして推定誤差を算出する。これは、距離Ｅが近ければ射影に含まれる誤差は小さく、距離Ｅが大きければ射影結果は大きな誤差を含むであろうと判断されることを意味する。
【００２４】
これらの結果は、変換判定部４２と変換実施部４４とを含む正規化処理部２８に渡される。そして、変換判定部４２は、どの部分変換検出部２６の推定誤差が最小となるかを判定する。例えば、回転部分変換検出部２６ｂの推定誤差が一番小さいときには、変換実施部４４が、対応する変換の大きさ（すなわち回転させる角度）の分だけもとの画像データを回転させ、画像保持部３０のもつ画像を更新する。更新された画像データは、必要に応じて、さらに複数回、同様の正規化を施される。繰り返しの基準は様々に考えられるが、例えば、あらかじめ所定の回数を設定する方法や、実空間において適当な対比データとから算出した相関、あるいは前記変換の大きさ評価手段が求めた値を所定の閾値と比較する方法などを用いることも可能である。
【００２５】
次に、部分変換検出部２６において用意される空間Ｆをカーネル関数を用いて構築する手段について、数学的表現を交えて詳細に説明する。カーネル関数を用いる方法において特徴的なことは、上で述べた写像ベクトルΦ（ｘ）の作成方法が陽に示されないことである。
【００２６】
粗視データを表す画像空間Ｇ上のｄ次元ベクトルｘを、ｄ_Ｆ次元の空間Ｆに写像する式（１）の非線形写像は、適当なカーネル関数ｋ（ｘ，ｙ）を選ぶことで、式（２）の関係を満たすように決められる。
【００２７】
【数１】

ここで、φ_ｉ（ｘ）は適当なカーネル関数の固有関数であり、対応する固有値をλ_ｉである（ｉ＝１，．．．，ｎ）。
【００２８】
次に、粗視データのカテゴリを分類するｍ次元部分空間Ωを、空間Ｆに張る方法及びその学習方法を説明する。まず、部分空間Ωの基底ベクトルの初期値として、画像空間Ｇ上のｍ個のベクトルｘ_１，．．．，ｘ_ｍ（以下ではプレイメージと呼ぶ）に対応した部分空間Ω上のベクトルΦ_１，．．．，Φ_ｍを適当に決める。具体的には、例えば、一様乱数を発生させてランダムに与える。ここで、画像空間上の学習サンプルを示すｄ次元ベクトルｘを用いて、この部分空間Ωを修正するように、プレイメージを学習させることを考える。学習サンプルのベクトルｘの空間Ｆへの写像Φ（ｘ）を部分空間Ωに射影したベクトルΦ’（ｘ）は、基底ベクトルの一次結合で表現される。その結合係数をα_ｉとすると、この射影と、もとの写像ベクトルΦ（ｘ）との距離Ｅは式（３）−（５）で表される。
【００２９】
【数２】

ここで、式（５）への変形には、カーネル関数の定義式（２）を用いている。また、係数α_ｉは、射影の定義に従いＥが最小の値をとるように、式（６）で与えられる。行列Ｋは、ｋ（ｘ_ｉ，ｘ_ｊ）を（ｉ，ｊ）成分とする行列である。
【００３０】
プレイメージの学習では、部分空間Ωと学習サンプルｘ_ｉとの距離を最も減少させる方向にプレイメージをΔｘ_ｉ動かす。このΔｘ_ｉは最急降下法によって式（７）で与えられる。
【００３１】
【数３】

ここで、ηは学習係数であり、正の定数である。また、行列ｇ_ａｂ（ｘ）は、非線形写像によって空間Ｆに埋め込まれている多様体の計量テンソルであり、カーネル関数を用いて式（８）で与えられている。この学習は、高次元空間の線形最適化問題なので、非線形最適化問題に比べ収束性が良く、短時間で終了する。
【００３２】
次にカーネル関数の学習方法について記す。カーネル関数としては、初期には、ガウス関数カーネルや、多項式カーネルなどの既知の関数を与える。学習中には、カーネル関数を式（９）の等角写像によって変形する。
【００３３】
【数４】

その学習則は、学習サンプルに対する係数α_ｉのばらつきが、どの係数α_ｉに対しても均一になるようにＣ（ｘ）を与えるものとする。具体的には、係数α_ｉのばらつきが既定値に対して大きい場合は、係数α_ｉに対応する部分空間の基底ベクトルのプレイメージｘ_ｉ近傍に関して、Ｃ（ｘ）の値を大きくする。これにより、ｘ_ｉの近傍は空間Ｆにおいて、式（１０）のように拡大される。
【００３４】
【数５】

したがって、係数α_ｉを大きな値とする学習サンプルの数は相対的に減少し、係数α_ｉの学習サンプルに対するばらつきは減少する。逆に係数α_ｉのばらつきが既定値に対して小さい場合は、係数α_ｉに対応する基底ベクトルのプレイメージｘ_ｉ近傍に関してＣ（ｘ）の値を小さくする。なお、ここで述べた方法では、Ｃ（ｘ）は部分空間Ωの基底のプレイメージに対してしか適用できないが、プレイメージ近傍に関してはプレイメージにおけるＣ（ｘ）の値を式（１１）のように外挿することで変更が可能となる。
【００３５】
【数６】

ここで、学習に用いる学習サンプルの与え方について説明する。例えば、回転に関する正規化を行う場合には、画像中において正規化の対象となる顔パターンが画像の中心位置に正立（頭が上に、顎が下に配置される）する画像データを複数枚用意し、これらに対し−１８０度から１８０度までの範囲で一様乱数を用いて与えた角度、または等間隔に与えた角度に回転させる。また、シフトについては、同じく顔パターンが画像の中心位置に正立した画像を複数枚用意し、縦方向および横方向に、例えば半値幅が適当なピクセル数をもつガウス分布の乱数に従ってシフトさせる。乱数で与える代わりに確率密度が一様となるように規則的に与えても良い。サイズの場合にも、同様にして、顔パターンが画像の中心位置に正立した画像を拡大および縮小させれば良い。このようにして学習を行うことで、学習サンプルのもつ変換の大きさ（例えば回転の場合にはその角度）と、学習サンプルの部分空間への射影の関係が明らかになる。具体的には、例えば係数α_１が大きければ９０度程度回転したものであるといった関係が導かれる。これを詳細に調べ、ルックアップテーブルや、適切な関数を作成することで、変換の大きさ評価部３８の評価手段が確立する。
【００３６】
以上の学習手続きにより、非線形変換で写像される空間Ｆに、粗視データをカテゴリ分けする部分空間Ωが張られる。学習の過程においては、プレイメージの学習およびカーネル関数の学習を交互に複数回反復するのが望ましいが、学習サンプルがあまり複雑でない場合には、どちらかの方法を１回だけ行うなどの簡略化をすることも可能である。
【００３７】
最後に、学習が完了し正規化が行われる段階において、画像正規化部２０が実行される手順の主要部分を図５に示したフローチャートを用いて説明する。画像データが入力される（Ｓ１）と、正規化用粗視化部３２は空間フィルタを用いて粗視データを作成する（Ｓ２）。粗視データは、サイズ部分変換検出部２６ａ、回転部分変換検出部２６ｂ、シフト部分変換検出部２６ｃに送られる。正規化用部分空間射影部３６は、式（６）で定義される射影の一次結合の係数α_ｉを求める（Ｓ３）。このα_ｉの求め方は、必ずしも式（６）の定義に従う必要はなく、適当な反復法を用いて式（５）のＥが最小となるように求めても良い。次に、変換の大きさ評価部３８が、こうして得られたα_ｉをルックアップテーブルと比較する等して変換の大きさを求め（Ｓ４）、推定誤差評価部４０は、Ｅの大きさ、あるいはＥの単調増加関数値を推定誤差として算出する（Ｓ５）。正規化処理部２８における変換判定部４２は、推定誤差が最小となる部分変換検出部２６を判定し（Ｓ６）、もとの画像データに対して、対応する変換の大きさで、変換を行う。こうして得られた画像データは、適当な判断基準に従って、再変換されるか否かが決められる（Ｓ８）。なお、先にも述べたように、この一連の演算において、式（１）で定義される非線形変換は直接は用いられず、したがって、その形状を知る必要もない。
【００３８】
図６に、サイズ、回転、シフトの各要素からなる正規化をおこなった結果を示す。この実験は、図の右側の写真で示したように、目の近傍を写した２つの写真が正規化されていく様子を、一回の変換毎に追跡したものである。右上の一連の写真では、初期（左上）に反転している写真が、最初のステップで約９０度半時計回りに回転され、次のステップでやや左にシフトされ、といった変換を受け、最後には正立した所望の大きさに正規化されている。左側の３次元のグラフは、この正規化の過程における、サイズ（倍率）、角度（度）、距離（ピクセル）を逐次追跡したものである。左上の黒丸は、初期の写真が、１８０度の回転と、１．３倍程度の拡大と、若干のシフトを受けていることを示している。そして、一回の変換毎に３つの座標軸のいずれか一つに沿って移動し、最終的に右側の正規化された位置に移っている。右下の一連の写真、及び対応する左のグラフの白丸も同様の流れを示しており、この場合には、拡大を中心に正規化が行われている。なお、ここでは、顔パターン全体ではなく目の近傍に限定しているが、顔パターン全体とした場合にも基本的な効果は全くかわらない。ただし、顔パターン全体とした場合には、図示した例とは、学習サンプルを変えなければならないことは言うまでもない。
【００３９】
なお、正規化用粗視化部３２で用いる空間フィルタの解像度には任意性があるが、ここで示した例では、主成分分析の方法により２５次元程度の粗視化を行っている。また、空間Ｆに張る部分空間Ωの次元もいろいろな値を取ることが可能であるが、ここでは２５次元とした。学習サンプルの数は、検出に必要な精度にもよるが、例えば、１００人程度の顔パターンを、各部分変換検出部２６で、一人につき１００通り程度変化させればよい。この結果、部分変換検出部２６を３つ用いた場合には、全学習サンプル数は３万程度になる。一方、本実施の形態を用いずに同じ自由度を与えると、全学習サンプル数は１００万程度になってしまう。したがって、本実施形態を用いることで学習サンプル数を格段に軽減できることがわかる。また、部分変換検出部２６の検出する部分変換は、ここでは、サイズ、回転、シフトとした。これらの要素は、特に限定されないが、単純な変換をおこなうと変換が容易となる。すなわち、サイズおよび回転については、一次変換で記述できる形式を用い、シフトについては剪断性をもたない一様な平行移動を用いると良い。もちろん、扱うパターンの特性に応じて、これよりも複雑な変換を割り当てることもできる。また、画像データの輝度に関する変換等を割り当てることも可能である。
【００４０】
上に説明した非線形変換は、カーネル関数を用いて定義された。しかし、非線形変換の構築方法には任意性がある。ここではニューラルネットワークのアルゴリズムに従ったオートエンコーダを用いて非線形変換を行う方法について説明する。
【００４１】
図７に、オートエンコーダの概略を示す。オートエンコーダは、多層のパーセプトロンの一種であり、入力層６０のニューロン数と、出力層６２のニューロン数が同じで、中間層６４のニューロン数はこれよりも少なくなっている。
【００４２】
このオートエンコーダを部分変換検出部２６として用いるためには、次のようにする。まず、カーネル関数を用いる場合と同様にして作成した学習サンブルを入力層６０へ入力するとともに、同じ値を教師信号として出力層６２に与え、恒等写像を実現するように各シナプスの重みを学習させる。この学習は通常のバックプロパゲーション法で行うことができる。
【００４３】
こうして学習されたオートエンコーダの出力層６２の出力は、非線形変換による写像が作る空間Ｆを表現しているとみなすことができる。また、オートエンコーダの中間層６４のニューロンの出力は、空間Ｆ内に張られたカテゴリを分類する部分空間Ωへの射影に相当する。したがって、入力層６０に粗視データを入力し、中間層６４の出力を得ることで、正規化用部分空間射影部３６を実現することができる。また、学習時に、学習サンプルの特徴と中間層６４の出力との関係を調べ、ルックアップテーブルを作成することで、変換の大きさ評価部３８を実施することができる。さらに、推定誤差評価部４０が評価する推定誤差は、入力層６０のベクトルと出力層６２のベクトルとの距離、あるいはその単調増加の関数によって算出可能である。この距離が変換の精度に対応していることは、距離が短いほど空間Ｆへの写像が入力を精度よく近似できていることから明らかである。
【００４４】
以上に、画像正規化部２０によって、画像データを正規化する様子を説明した。ここからは、画像正規化部２０が出力した画像データから顔パターンを識別する、パターン識別部１００について説明する。
【００４５】
図８は、パターン識別部１００の構成を示すブロック図である。パターン識別部１００は、複数の空間解像度をもつ識別用粗視化部１０６と、各識別用粗視化部１０６に接続された顔識別部１０２、そして反例検出部１０４からなる。識別用粗視化部１０６は、画像正規化部２０における正規化用粗視化部３２と同様に、画像データに空間フィルタを施して粗視データを出力する役割を果たしている。その解像度は自由に設定でき、ここでは最低次元を２５次元、最高次元を１００次元とし、その間にも複数の識別用粗視化部１０６を設けている。顔識別部１０２は、各識別用粗視化部１０６に設けられており、顔パターンの識別を行う。
【００４６】
入力された画像データは、まず、空間解像度が最も低い識別用粗視化部１０６ａに入力され、図示した例においては、２５次元の粗視データに変換される。そして、粗視データは、顔識別部１０２ａに入力される。顔識別部１０２ａは、識別用部分空間学習部１０８ａ、識別用部分空間射影部１１０ａ、および識別用判定部１１２ａを含んでおり、画像正規化部２０で説明した部分変換検出部２６とよく似た動作を行う。すなわち、入力された粗視データは、カーネル関数で定義される非線形変換によって空間Ｆに式（１）のように写像される。この空間Ｆでは、識別用部分空間学習部１０８ａによって事前に学習が行われており、学習サンプルの顔パターンを特徴づける部分空間Ωが張られている。識別用部分空間射影部１１０ａは、空間Ｆに写像された写像ベクトルを、この部分空間Ωに射影する。これにより、射影された射影ベクトルの一次結合の係数α_ｉが決められ、射影の垂線の長さＥが式（５）から得られる。識別用判定部１１２ａは、両者の位置関係、すなわちＥの大きさを適当な閾値などで評価して、このデータを部分空間Ωのカテゴリに含めるか否かを判定する。閾値の決定は、適当なサンプルデータに対する正答率に基づくなどして決めればよい。判定の結果、顔パターンが含まれている可能性が高いと判断されると、次に解像度の低い識別用粗視化部１０６及び、対応する顔識別部１０２が実行される。
【００４７】
反例検出部１０４は、顔識別部１０２と同様の構成をしており、反例用部分空間学習部１１４、反例用部分空間射影部１１６、および反例用判定部１１８を含んでいる。顔識別部１０２との違いは、反例用部分空間学習部１１４によって、顔以外のパターンが学習される点である。すなわち、顔以外のパターンを含む学習サンプルを用いて、顔以外のパターンが含まれることを特徴とする部分空間Ωが形成される。反例用部分空間射影部１１６が非線形変換の写像をこの部分空間Ωに射影する点と、反例用判定部１１８が写像ベクトルと射影ベクトルの位置関係に基づいて分類を行う点は同じである。
【００４８】
顔識別部１０２と、反例検出部１０４の学習の方法も、画像正規化部において説明した方法と同様である。すなわち、顔識別部１０２においては、識別したい顔パターンの学習サンプルを複数用意し、それをもとに、部分空間の基底ベクトルに対応するプレイメージの更新と、カーネルの変形を行う。なお、このパターン識別部１００は、通常、画像正規化部２０によって正規化された画像データに対してパターン認識を行う。したがって、顔パターンは正規化されていることが期待できるので、学習サンプルはサイズ、回転角度、シフト等に関して正規化されたものだけを用いればよい。反例検出部１０４の学習サンプルとしては顔パターン以外のものを用いればよい。ただし、一般に顔識別部１０２によって識別しにくいものを学習させることで効果を発揮するので、正規化された顔パターンに類似した紛らわしいものを中心に学習させておくとよい。
【００４９】
図１０に、ここで述べた識別を試験的に実施した結果を示す。左側は、本発明を用いずに、５０次元に粗視化されたデータに対してのみ検出を実行している。一方、右側は、本実施例を用いた場合で、２５次元、５０次元、１００次元の３つの解像度に、顔識別部１０２を用いて階層的に検出を行った結果である。ただし、反例検出部１０４は含めていない。使用した画像データは、ひとつの画像データの中に複数の顔を含んでおり、その中から顔パターンを検出したものである。いずれも９０％の確率で顔を検出できる。横軸は、ひとつの画像データの中から顔以外のパターンを誤って見つけた個数であり、縦軸はその比率を示している。従来の方法では、間違いが無かった比率は６３パーセントで、間違いが１つだけ合った比率は２２パーセントであった。本発明では、この値はそれぞれ、８２パーセントと１４パーセントになっている。この結果、画像一つあたりの誤検出率は、０．４０個から、０．２４個に向上している。もちろん、１００次元の高解像度での検出には多くの計算時間を必要とするが、本実施形態では、２５次元の解像度において顔パターンが含まれる可能性が低いと判定した場合にはそれ以上の解像度での検出を行わないので、無駄な計算時間を必要とせず、効率的で高精度な検出が達成できている。なお、図示はしないが、この実験においてさらに、顔パターンが含まれないことを検出する反例検出部１０４を各解像度に含めた場合には、誤検出率はほぼ０になり、その有効性が確認できている。
【００５０】
最後に、本実施の形態における特徴的な点を列挙しておく。本実施の形態の画像正規化部２０により、入力された画像データにおける顔パターンの正規化を、非常に少ない学習サンプルをもとに学習しただけで、実現することができる。また、回転、拡大と縮小、平行移動などに分類して正規化を行うため、対応した学習サンプルだけを用いて学習させればよく、非常に効率的な学習が可能となる。また、正規化をニューラルネットワークを用いて行うため、非線形性をもつパターン分布に対しても容易に正規化を行うことができる。また、正規化をカーネル関数で定義された非線形変換を利用して行うため、非線形性をもつパターン分布に対しても精度よく正規化を行うことができる。また、並列計算機を用いて正規化を行えば、迅速な正規化の実行が可能となる。
【００５１】
本実施の形態のパターン識別部１００により、本質的に非線形性を有する顔パターンの特徴を、非線形変換を用いて高精度に識別できる。また、低分解能から高分解能へと階層化された判定を行うため、顔パターンが含まれないと容易に判定できるものに時間をかけることなく高速に識別できる。また、反例を検出する手段を併用することで、判定の精度が向上する。また、カーネル関数で定義される非線形変換を用いてパターンの識別が行われるので、信頼性の高い識別が可能となる。また、カテゴリを表す部分空間を、非常に高速に構築することができる。また、学習サンプルをもちいて部分空間における基底ベクトルを効率良く張り直すことができる。また、学習サンプルを用いてカーネル関数を容易に変形できるので、パターンの識別の向上を容易に図る事が可能となる。また、並列計算機を用いることで、各解像度におけるパターンの識別を効率良く計算することができる。
【００５２】
これら画像正規化部２０とパターン識別部１００は、お互いに補完しあうことで、非常に高精度で高速な顔パターンの識別が可能になる。
【図面の簡単な説明】
【図１】本実施形態の計算機の構成を示す概略図である。
【図２】画像正規化部およびパターン識別部の概略を示すブロック図である。
【図３】画像正規化部の詳細を示すブロック図である。
【図４】非線形変換の様子を表す模式図である。
【図５】画像正規化部の処理手順を示すフローチャートである。
【図６】画像正規化部の試験結果を示す図である。
【図７】画像正規化部に用いるオートエンコーダの概略図である。
【図８】パターン識別部の概略を示すブロック図である。
【図９】パターン識別部の処理手順を示すフローチャートである。
【図１０】パターン識別部の試験結果を示す図である。
【符号の説明】
２０　画像正規化部、２６　部分変換検出部、２８　正規化処理部、１００　パターン識別部、１０２　顔識別部、１０４　反例検出部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image processing apparatus, and more particularly, to a pre-processing performed to facilitate pattern detection from image data.
[0002]
[Prior art]
In order to identify a specific pattern from image data using a computer, it is necessary to reliably detect the shape of the pattern itself, and the size and rotation angle of the included pattern are various. Need to be done.
[0003]
Conventional examples of the former include a linear subspace method, a support vector machine (SVM) method, and a kernel non-linear subspace method. In the linear subspace method, a subspace is determined for each of a plurality of categories, an unknown pattern is most closely related to which subspace, and a category to which the pattern belongs is determined. However, in this method, when the number of categories is large and the dimension of the pattern is low, the detection accuracy is reduced. Further, there is a problem that the identification accuracy for a pattern distribution having nonlinearity is low.
[0004]
The SVM method is a method for mapping a low-dimensional pattern to a high dimension by a non-linear transformation defined through a kernel function, thereby enabling a pattern distribution having nonlinearity to be identified. However, there are problems in that only two categories can be classified and that a large amount of calculation is required.
[0005]
The kernel nonlinear subspace method has been devised as a pattern identification method for solving these problems, and is disclosed in Japanese Patent Application Laid-Open No. 2002-90274. In this method, similar to the SVM method, a pattern is mapped to a high-dimensional space by a non-linear transformation defined using a kernel function, and a subspace method is performed on the high-dimensional space.
[0006]
The latter, that is, patterns having various sizes and rotation angles, have conventionally been handled by using an extremely large number of learning samples. That is, each of the above-described pattern identification methods and the like generally uses a learning sample having a characteristic pattern to perform learning for determining the distribution of categories indicating the characteristic. Therefore, it is necessary to use not only patterns whose sizes and angles are variously changed but also a very large number of patterns deformed by combining sizes and angles as learning samples.
[0007]
[Problems to be solved by the invention]
However, in the kernel non-linear subspace method, since the basis vectors spanning the subspace are defined based on the mapping of all learning samples to the non-linear space, a problem that a large number of learning samples still require a large number of calculations. was there. Further, as described above, conventionally, there has been a problem that an extremely large number of learning samples must be used in order to perform general-purpose pattern identification. An object of the present invention is to establish a means for identifying a face pattern of a human or an animal, which often has various sizes and rotation angles in an image, at high speed and with high accuracy.
[0008]
[Means for Solving the Problems]
The image processing apparatus according to the present invention is provided with an image processing apparatus that normalizes and converts image data into a condition under which identification can be performed so that pattern identification means can identify a face pattern in given image data. In the above, a partial transformation detecting means provided for each of a plurality of partial transformations obtained by dividing the normalized transformation and arranged in parallel, and a partial transformation evaluating means provided in each partial transformation detecting means, wherein a learning sample is used. Based on a comparison between the learning result and the image data, a partial conversion evaluation means for evaluating the magnitude of the partial conversion required to normalize the image data and an estimation error accompanying the conversion, and all the partial conversions A conversion unit that determines a partial conversion detecting unit that gives the smallest estimation error among the evaluation units and performs a corresponding partial conversion on the image data. Comparison of the learning result and the image data in the unit is carried out in a space defined by the non-linear transformation, the transformation implementation means, characterized in that it is performed at least once for the image data.
[0009]
Further, the image processing apparatus according to the present invention is characterized in that the partial conversion detected by the partial conversion detecting means includes at least one of size, rotation, and shift.
[0010]
Further, the image processing apparatus according to the present invention is characterized in that the non-linear transformation is given by calculation means using a neural network.
[0011]
Further, the image processing apparatus according to the present invention is characterized in that the non-linear conversion is given by calculation means using a kernel function.
[0012]
Further, the image processing apparatus of the present invention has a coarse-graining means for applying a spatial filter to image data to coarsen the image data, and the coarse-grained image data is used in the partial conversion detecting means. I do.
[0013]
Further, the image processing apparatus according to the present invention is characterized in that the operation of each partial conversion detecting means is processed in parallel by a parallel operation device provided in the apparatus.
[0014]
Further, the image processing method according to the present invention is an image processing method which normalizes and converts image data to a condition under which identification can be performed so that a face pattern in given image data can be identified by a pattern identification method. In the processing method, a partial conversion detection step provided for each of a plurality of partial conversions obtained by dividing the normalized conversion and arranged in parallel, and a partial conversion evaluation step provided for each partial conversion detection step, wherein a learning sample Based on a comparison between the learning result using the image data and the image data, the size of the partial conversion required to normalize the image data, and a partial conversion evaluation step of evaluating an estimation error accompanying the conversion, Determining a partial conversion detection step that gives the smallest estimation error in the partial conversion evaluation step, and performing a corresponding partial conversion on the image data. Comparison of the learning result and the image data in the partial conversion evaluation step is performed in a space defined by the non-linear transformation, the transformation implementation process is characterized in that it is performed at least once for the image data.
[0015]
Further, the image processing program according to the present invention normalizes the image data to a condition under which identification can be performed so that the computer can identify the face pattern in the given image data by the pattern identification procedure. In the image processing program to be converted, a partial conversion detection procedure provided for each of the plurality of partial conversions obtained by dividing the normalization conversion, and a partial conversion evaluation procedure provided in each partial conversion detection procedure, Based on a comparison between the learning result using the learning sample and the image data, a partial conversion evaluation procedure for evaluating the magnitude of the partial conversion required to normalize the image data, and an estimation error accompanying the conversion, A partial conversion detection procedure that gives the smallest estimation error among all the partial conversion evaluation procedures is determined, and the corresponding partial conversion is performed on the image data. And a comparison between the learning result and the image data in the partial conversion evaluation procedure is performed in a space defined by a non-linear transformation, and the conversion execution procedure is performed at least once on the image data. It is characterized by that.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings. In the figure, description of components having the same configuration will be omitted.
[0017]
FIG. 1 is a block diagram showing a configuration of an apparatus according to an embodiment of the present invention. The apparatus is configured to include a CPU 2 for performing calculations, a storage unit 4, a user instruction input unit 6, a display unit 8, a data input unit 10, a data output unit 12, and an application software input unit 14. , These are connected by a communication network for communicating data. That is, this apparatus is realized by executing application software describing the algorithm of the present invention on a general computer. The user inputs a storage medium such as a CD-ROM or application software distributed via a network to a computer using the application software input unit 14 and a command input unit 6 such as a keyboard. Causes the CPU 2 to execute. The operation of the CPU 2 is under the control of software called an operating system (OS), and instructions of a user and application software are transmitted to the CPU 2 through the OS. Information necessary for execution of calculations, including the application software and OS of the present embodiment, is temporarily or permanently stored in the storage unit 4 including a memory, a hard disk, and the like. The image data required for execution is obtained through a storage medium such as a CCD camera, a scanner, or a CD-ROM, or a data input unit 10 such as a network for acquiring data. When necessary operations are performed by the CPU 2, the processed image data is output through a storage medium such as an MO, data transfer via a network, and a data output unit 12 such as a printer. In addition, the user can view image data before and after the processing on the display unit 8 such as a display.
[0018]
FIG. 2 is a block diagram schematically illustrating an image processing operation performed by the CPU 2. The image data input from the data input unit 10 undergoes normalization conversion by the image normalization unit 20, and a detailed pattern is identified by the pattern identification unit 100. Here, the normalization means that conditions such as the size, rotation angle, position, and brightness of the face pattern are converted into a state assumed by the pattern identification unit 100 (this is called a normal form). It is to be.
[0019]
The conversion for normalization performed by the image normalization unit 20 is divided into elements composed of basic partial conversions, and how each partial conversion should be performed is determined by a part corresponding to each partial conversion. The conversion detection unit 26 calculates. In the illustrated example, the size partial conversion detection unit 26a related to the size (enlargement and reduction) of the image data, the rotation partial conversion detection unit 26b related to the rotation angle of the image data, and the shift (parallel movement) of the image data It has three partial conversion detection units 26 of the shift partial conversion detection unit 26c. As will be described in detail later, these partial conversion detection units 26 perform partial conversion state detection on coarse-grained data obtained by applying a spatial filter for coarse-graining to image data, and normalize the result. To the conversion processing unit 28. Then, the normalization processing unit 28 applies the conversion determined to have the smallest error due to the conversion to the image data. This series of processes is usually repeated several times, and finally, normalization is performed for all of the size, rotation, and shift. Of course, depending on the situation of the face pattern, the repetition may not be performed.
[0020]
The pattern identification unit 100 performs a coarse-graining process on the image data normalized by the image normalization unit 20 into coarse-grained data having various resolutions using a spatial filter. Then, a face identification unit 102 for identifying a face pattern is executed. In the illustrated example, coarse-graining is performed by adding the principal component analysis mode by an appropriate dimension, and a face identification unit 102a for 25-dimensional coarse-grain data and a face identification unit 102b for 100-dimensional coarse-grain data are provided. , A plurality of face identification units 102 are also provided for the resolution between them. In addition, a counter example detection unit 104 that determines that data other than the face pattern is included in the 100-dimensional coarse-view data that is the highest dimension is provided. As will be described in detail later, the face pattern is identified from the 25-dimensional face identification unit 102a having the lowest resolution, and if it is determined that the face pattern is likely to be present, the next lower resolution is determined. The face identification unit 102 is used for the determination. If it is determined that there is a high possibility that a face pattern is present even in the 100-dimensional image having the highest resolution, counterexample detection is finally performed.
[0021]
Hereinafter, the image normalization unit 20 and the pattern identification unit 100 will be described in detail.
[0022]
The block diagram of FIG. 3 shows the outline of the configuration of the image normalization unit 20. The input image data is held in the image holding unit 30 provided in the storage unit 4. Then, the coarse-graining unit for normalization 32 performs coarse-graining on the image data to reduce the spatial resolution and extract rough features, thereby obtaining coarse-grain data. The spatial filter means used for the coarse-graining is not particularly limited. For example, a method of calculating the sum of the mode components of a predetermined dimension having a large contribution rate among the mode components obtained by the principal component analysis on the appropriate image data For example, a method of performing Fourier decomposition to extract a mode component having a predetermined resolution or more is used. The reason for performing coarse graining is that the amount of data is reduced, and normalization described below can be performed at high speed.
[0023]
Subsequently, the coarse-grained data is sent to a plurality of partial conversion detection units 26 arranged in parallel. Each partial conversion detecting section 26 regards the coarse-grained data as a vector x in the image space G and maps the coarse-grained data to a space F created by a non-linear conversion for pattern identification, as schematically shown in FIG. . The vector mapped in the space F is called a mapped vector, and is written as Φ (x). For example, when detecting the size, rotation, and shift, the partial conversion detection unit 26 includes a size partial conversion detection unit 26a, a rotation partial conversion detection unit 26b, and a shift partial conversion detection unit 26c. Each of the partial conversion detection units 26 includes a normalization subspace learning unit 34, a normalization subspace projection unit 36, and a partial conversion evaluation unit 37. The partial conversion evaluation unit 37 further includes a conversion size An evaluation unit 38 and an estimation error evaluation unit 40 are included. In the space F, the subspace learning unit for normalization 34 previously constructs a subspace Ω representing a category characteristic of the learning sample using the learning sample, and the mapping vector Φ (x) is The subspace projection unit 36 projects the subspace Ω. This projected vector is called a projected vector and is denoted by Φ ′ (x). Then, the transform magnitude evaluating unit 38 evaluates which of the base vectors extending the subspace Ω the projected vector Φ ′ (x) is close to, and calculates a magnitude required for the transform. For example, in the size partial conversion detection unit 26a, the base vector Φ ₁ Is in the vicinity of the training sample, which is about 1.5 times larger, and the other basis vectors Φ ₂ Creates a look-up table that indicates the correspondence, such as being close to a learning sample that is about twice as large. The conversion size detection unit 38a can calculate how many times the current face pattern should be enlarged in order to convert the current face pattern into the normal form with reference to the lookup table. The estimation error evaluation unit 40 calculates an estimation error based on the distance E between the mapping vector Φ (x) and the projection vector Φ ′ (x). This means that it is determined that if the distance E is short, the error included in the projection is small, and if the distance E is large, the projection result will include a large error.
[0024]
These results are passed to the normalization processing unit 28 including the conversion determination unit 42 and the conversion execution unit 44. Then, the conversion determination unit 42 determines which partial conversion detection unit 26 has the minimum estimation error. For example, when the estimation error of the rotation partial conversion detection unit 26b is the smallest, the conversion execution unit 44 rotates the original image data by the size of the corresponding conversion (that is, the rotation angle), and 30 is updated. The updated image data is subjected to the same normalization a plurality of times as necessary. Various repetition criteria can be considered.For example, a method of setting a predetermined number of times in advance, a correlation calculated from appropriate comparison data in real space, or a value obtained by It is also possible to use a method of comparing with a threshold value or the like.
[0025]
Next, means for constructing the space F prepared by the partial conversion detection unit 26 using a kernel function will be described in detail using mathematical expressions. A characteristic feature of the method using the kernel function is that the method of creating the above-described mapping vector Φ (x) is not explicitly shown.
[0026]
The d-dimensional vector x in the image space G representing the coarse-grained data is represented by d _F The nonlinear mapping of Expression (1) that maps to the dimensional space F is determined so as to satisfy the relationship of Expression (2) by selecting an appropriate kernel function k (x, y).
[0027]
(Equation 1)

Where φ _i (X) is the eigenfunction of the appropriate kernel function, and the corresponding eigenvalue is λ _i (I = 1,..., N).
[0028]
Next, a method of extending the m-dimensional subspace Ω for classifying the coarse-grained data category into the space F and a learning method thereof will be described. First, m vectors x in the image space G are used as initial values of base vectors of the subspace Ω. ₁ ,. . . , X _m (Hereinafter referred to as pre-image) corresponding to the vector Φ on the subspace Ω ₁ ,. . . , Φ _m Is determined appropriately. Specifically, for example, a uniform random number is generated and given randomly. Here, let us consider learning a pre-image so as to correct the subspace Ω using a d-dimensional vector x indicating a learning sample in the image space. A vector Φ ′ (x) obtained by projecting the mapping Φ (x) of the vector x of the learning sample onto the space F onto the subspace Ω is represented by a linear combination of base vectors. The coupling coefficient is α _i Then, the distance E between this projection and the original mapping vector Φ (x) is expressed by Expressions (3)-(5).
[0029]
(Equation 2)

Here, the definition expression (2) of the kernel function is used for the transformation into the expression (5). Also, the coefficient α _i Is given by equation (6) so that E takes the minimum value according to the definition of projection. The matrix K is k (x _i , X _j ) Are (i, j) components.
[0030]
In pre-image learning, the subspace Ω and the learning sample x _i Δx in the direction that reduces the distance to _i move. This Δx _i Is given by equation (7) by the steepest descent method.
[0031]
[Equation 3]

Here, η is a learning coefficient and is a positive constant. Also, the matrix g _ab (X) is a metric tensor of the manifold embedded in the space F by the nonlinear mapping, and is given by Expression (8) using a kernel function. Since this learning is a linear optimization problem in a high-dimensional space, it has better convergence than the nonlinear optimization problem, and is completed in a short time.
[0032]
Next, the kernel function learning method will be described. As the kernel function, a known function such as a Gaussian function kernel or a polynomial kernel is initially provided. During the learning, the kernel function is transformed by the conformal mapping of the equation (9).
[0033]
(Equation 4)

The learning rule is a coefficient α for the learning sample. _i Which coefficient α _i C (x) is given so as to be uniform. Specifically, the coefficient α _i Is larger than the default value, the coefficient α _i Pre-image x of the basis vector of the subspace corresponding to _i In the vicinity, the value of C (x) is increased. This gives x _i Is expanded in the space F as shown in Expression (10).
[0034]
(Equation 5)

Therefore, the coefficient α _i The number of learning samples with large values decreases relatively, and the coefficient α _i Of the learning sample for the learning sample is reduced. Conversely, the coefficient α _i Is smaller than the default value, the coefficient α _i Preimage x of the basis vector corresponding to _i The value of C (x) is reduced for the neighborhood. Note that in the method described here, C (x) can be applied only to the pre-image of the basis of the subspace Ω, but in the vicinity of the pre-image, the value of C (x) in the pre-image is calculated by the equation (11). By extrapolating as described above, the change can be made.
[0035]
(Equation 6)

Here, how to provide a learning sample used for learning will be described. For example, when performing normalization related to rotation, a plurality of image data in which the face pattern to be normalized in the image is erected at the center position of the image (the head is placed above and the chin is placed below) Sheets are prepared and rotated at an angle given by using a uniform random number in the range of -180 degrees to 180 degrees or at an angle given at equal intervals. For the shift, a plurality of images having the face pattern erected at the center position of the image are prepared, and the image is shifted in the vertical and horizontal directions, for example, according to a Gaussian distribution random number having a half-width having an appropriate number of pixels. Instead of using random numbers, they may be provided regularly so that the probability density becomes uniform. Similarly, in the case of size, an image in which the face pattern is erected at the center position of the image may be enlarged and reduced. By performing the learning in this manner, the relationship between the magnitude of the transformation of the learning sample (for example, the angle in the case of rotation) and the projection of the learning sample onto the subspace becomes clear. Specifically, for example, the coefficient α ₁ Is larger than 90 degrees, the relationship is derived. By examining this in detail and creating a look-up table and an appropriate function, the evaluation means of the transform size evaluation unit 38 is established.
[0036]
By the learning procedure described above, the subspace Ω for classifying the coarse-grained data is set in the space F mapped by the non-linear transformation. In the learning process, it is desirable to alternately repeat the pre-image learning and kernel function learning multiple times, but if the learning sample is not very complicated, simplification such as performing either method only once It is also possible to do.
[0037]
Finally, the main part of the procedure executed by the image normalization unit 20 at the stage where learning is completed and normalization is performed will be described with reference to the flowchart shown in FIG. When image data is input (S1), the normalization coarse-graining unit 32 creates coarse-grain data using a spatial filter (S2). The coarse-grained data is sent to the size partial conversion detection unit 26a, the rotation partial conversion detection unit 26b, and the shift partial conversion detection unit 26c. The normalization subspace projection unit 36 calculates the coefficient α of the linear combination of the projection defined by Expression (6). _i (S3). This α _i Does not necessarily need to follow the definition of Expression (6), and may be obtained by using an appropriate iterative method so that E in Expression (5) is minimized. Next, the transform magnitude evaluation unit 38 calculates the α _i Is compared with a look-up table to determine the magnitude of the conversion (S4), and the estimation error evaluation unit 40 calculates the magnitude of E or the monotonically increasing function value of E as the estimation error (S5). The conversion determination unit 42 in the normalization processing unit 28 determines the partial conversion detection unit 26 that minimizes the estimation error (S6), and performs conversion on the original image data using the corresponding conversion magnitude. . It is determined whether or not the image data thus obtained is to be re-converted in accordance with an appropriate criterion (S8). As described above, in this series of operations, the non-linear transformation defined by the equation (1) is not directly used, and therefore, it is not necessary to know its shape.
[0038]
FIG. 6 shows a result of normalization including each element of size, rotation, and shift. In this experiment, as shown in the photograph on the right side of the figure, the normalization of two photographs of the vicinity of the eye was tracked for each conversion. In the series of photos in the upper right, the initial (upper left) inverted photo is rotated about 90 degrees counterclockwise in the first step, shifted slightly left in the next step, and finally converted. Has been normalized to the desired upright size. The three-dimensional graph on the left is obtained by sequentially tracking the size (magnification), angle (degree), and distance (pixel) in the normalization process. The black circle in the upper left indicates that the initial photo has been rotated 180 degrees, magnified about 1.3 times, and slightly shifted. Then, it moves along any one of the three coordinate axes for each conversion, and finally moves to the right-side normalized position. A series of photographs in the lower right and the corresponding white circles in the left graph show a similar flow, in which case normalization is performed centering on enlargement. Here, although the whole face pattern is limited to the vicinity of the eyes, the basic effect does not change at all even when the whole face pattern is used. However, when the entire face pattern is used, it is needless to say that the learning sample must be changed from the illustrated example.
[0039]
Although the resolution of the spatial filter used in the normalization coarse-graining unit 32 is arbitrary, in the example shown here, coarse-graining of about 25 dimensions is performed by a principal component analysis method. Further, the dimension of the subspace Ω extending in the space F can also take various values, but is set to 25 dimensions here. Although the number of learning samples depends on the accuracy required for detection, for example, a face pattern of about 100 may be changed by each partial conversion detecting section 26 by about 100 per person. As a result, when three partial conversion detection units 26 are used, the total number of learning samples is about 30,000. On the other hand, if the same degree of freedom is given without using this embodiment, the total number of learning samples will be about one million. Therefore, it is understood that the number of learning samples can be significantly reduced by using the present embodiment. Here, the partial conversions detected by the partial conversion detection unit 26 are size, rotation, and shift. These elements are not particularly limited, but a simple conversion facilitates the conversion. That is, for the size and rotation, a format that can be described by a linear transformation is used, and for the shift, it is preferable to use a uniform translation having no shearing property. Of course, a more complicated conversion can be assigned according to the characteristics of the pattern to be handled. Further, it is also possible to assign a conversion or the like relating to the brightness of the image data.
[0040]
The non-linear transformation described above was defined using a kernel function. However, the method of constructing the non-linear transformation is arbitrary. Here, a method of performing a non-linear conversion using an auto-encoder according to a neural network algorithm will be described.
[0041]
FIG. 7 shows an outline of the auto encoder. The auto encoder is a kind of a multi-layer perceptron. The number of neurons in the input layer 60 and the number of neurons in the output layer 62 are the same, and the number of neurons in the intermediate layer 64 is smaller.
[0042]
In order to use this auto encoder as the partial conversion detection unit 26, the following is performed. First, a learning sample created in the same manner as in the case of using the kernel function is input to the input layer 60, and the same value is given to the output layer 62 as a teacher signal, and the weight of each synapse is learned so as to realize identity mapping. Let it. This learning can be performed by a normal back propagation method.
[0043]
The output of the output layer 62 of the auto encoder that has been learned in this way can be regarded as expressing a space F created by a mapping by non-linear transformation. Further, the output of the neuron of the intermediate layer 64 of the auto encoder corresponds to the projection onto the subspace Ω for classifying the category set in the space F. Therefore, by inputting coarse-grained data to the input layer 60 and obtaining the output of the intermediate layer 64, the normalization subspace projection unit 36 can be realized. In addition, at the time of learning, the relationship between the characteristics of the learning sample and the output of the intermediate layer 64 is examined, and a look-up table is created, whereby the conversion magnitude evaluation unit 38 can be implemented. Further, the estimation error evaluated by the estimation error evaluator 40 can be calculated by the distance between the vector of the input layer 60 and the vector of the output layer 62, or a monotonically increasing function thereof. The fact that this distance corresponds to the accuracy of the conversion is apparent from the fact that the shorter the distance, the more accurately the mapping to the space F can approximate the input.
[0044]
The manner in which the image data is normalized by the image normalization unit 20 has been described above. Hereinafter, the pattern identification unit 100 that identifies a face pattern from the image data output by the image normalization unit 20 will be described.
[0045]
FIG. 8 is a block diagram illustrating a configuration of the pattern identification unit 100. The pattern identification unit 100 includes an identification coarse-graining unit 106 having a plurality of spatial resolutions, a face identification unit 102 connected to each identification coarse-graining unit 106, and a counterexample detection unit 104. The coarse-graining unit 106 for identification plays a role of applying a spatial filter to the image data and outputting coarse-grained data, similarly to the coarse-graining unit 32 for normalization in the image normalizing unit 20. The resolution can be freely set. Here, the lowest dimension is set to 25 dimensions, the highest dimension is set to 100 dimensions, and a plurality of identification coarse-graining units 106 are provided between them. The face identification unit 102 is provided in each identification coarse-graining unit 106 and identifies a face pattern.
[0046]
The input image data is first input to the coarse-graining unit 106a for identification having the lowest spatial resolution, and is converted into 25-dimensional coarse-grained data in the illustrated example. Then, the coarse-grained data is input to the face identification unit 102a. The face identification unit 102a includes an identification subspace learning unit 108a, an identification subspace projection unit 110a, and an identification determination unit 112a, and is very similar to the partial conversion detection unit 26 described in the image normalization unit 20. Perform the operation. That is, the input coarse-grained data is mapped to the space F by the non-linear transformation defined by the kernel function as shown in Expression (1). In the space F, learning is performed in advance by the identification subspace learning unit 108a, and a subspace Ω characterizing the face pattern of the learning sample is provided. The identification subspace projection unit 110a projects the mapping vector mapped in the space F onto the subspace Ω. Thus, the coefficient α of the linear combination of the projected projection vector _i Is determined, and the length E of the perpendicular of the projection is obtained from Expression (5). The discrimination determination unit 112a evaluates the positional relationship between the two, that is, the magnitude of E by using an appropriate threshold or the like, and determines whether to include this data in the category of the subspace Ω. The threshold may be determined based on a correct answer rate for appropriate sample data. As a result of the determination, when it is determined that the possibility that a face pattern is included is high, the coarse-graining unit for identification 106 having the next lower resolution and the corresponding face identification unit 102 are executed.
[0047]
The counter example detection unit 104 has the same configuration as the face identification unit 102, and includes a counter example subspace learning unit 114, a counter example subspace projection unit 116, and a counter example determination unit 118. The difference from the face identification unit 102 is that a pattern other than a face is learned by the counterexample subspace learning unit 114. That is, a subspace Ω characterized by including a pattern other than a face is formed using a learning sample including a pattern other than a face. The counter-example subspace projection unit 116 projects the non-linear transformation mapping onto this subspace Ω, and the counter-example determination unit 118 performs classification based on the positional relationship between the mapping vector and the projection vector.
[0048]
The learning method of the face identification unit 102 and the counter example detection unit 104 is the same as the method described in the image normalization unit. That is, the face identification unit 102 prepares a plurality of learning samples of the face pattern to be identified, and updates the pre-image corresponding to the base vector of the subspace and deforms the kernel based on the learning samples. The pattern identification unit 100 normally performs pattern recognition on the image data normalized by the image normalization unit 20. Therefore, since the face pattern can be expected to be normalized, it is sufficient to use only the learning sample normalized with respect to the size, the rotation angle, the shift, and the like. A sample other than the face pattern may be used as the learning sample of the counter example detecting unit 104. However, in general, the effect is exhibited by learning the ones that are difficult to be identified by the face identification unit 102. Therefore, it is preferable to mainly learn the confusing ones similar to the normalized face pattern.
[0049]
FIG. 10 shows the results of the above-described identification performed on a trial basis. On the left, detection is performed only on data coarse-grained into 50 dimensions without using the present invention. On the other hand, the right side shows the result of hierarchical detection using the face identification unit 102 at three resolutions of 25 dimensions, 50 dimensions, and 100 dimensions in the case of using this embodiment. However, the counter example detection unit 104 is not included. The used image data includes a plurality of faces in one image data, and a face pattern is detected from the plurality of faces. In each case, a face can be detected with a probability of 90%. The horizontal axis represents the number of erroneous patterns other than the face found in one piece of image data, and the vertical axis represents the ratio. In the conventional method, the ratio of no errors was 63%, and the ratio of only one error was 22%. In the present invention, this value is 82 percent and 14 percent, respectively. As a result, the erroneous detection rate per image is improved from 0.40 to 0.24. Of course, detection at 100-dimensional high resolution requires a lot of calculation time. However, in this embodiment, if it is determined that the possibility that a face pattern is included at 25-dimensional resolution is low, further Since detection is not performed at the resolution, efficient and highly accurate detection can be achieved without using unnecessary calculation time. Although not shown, in this experiment, when the counterexample detection unit 104 for detecting that a face pattern is not included is included in each resolution, the false detection rate becomes almost 0, and its effectiveness is confirmed. is made of.
[0050]
Lastly, characteristic points in the present embodiment are listed. With the image normalization unit 20 of the present embodiment, normalization of a face pattern in input image data can be realized only by learning based on a very small number of learning samples. In addition, since normalization is performed by classifying into rotation, enlargement / reduction, parallel movement, and the like, learning only needs to be performed using corresponding learning samples, and highly efficient learning can be performed. Further, since the normalization is performed using the neural network, the normalization can be easily performed on the pattern distribution having the non-linearity. In addition, since the normalization is performed by using the non-linear transformation defined by the kernel function, the normalization can be accurately performed even on a pattern distribution having non-linearity. If normalization is performed using a parallel computer, rapid normalization can be performed.
[0051]
The pattern identification unit 100 according to the present embodiment can identify a feature of a face pattern having essentially nonlinearity with high accuracy using nonlinear conversion. In addition, since the determination is hierarchized from a low resolution to a high resolution, it is possible to quickly identify an image that can be easily determined if no face pattern is included without taking time. In addition, by using the counter-example detecting means together, the accuracy of the determination is improved. In addition, since the pattern is identified using the non-linear transformation defined by the kernel function, highly reliable identification is possible. Further, a subspace representing a category can be constructed very quickly. In addition, the basis vectors in the subspace can be efficiently redone using the learning samples. Further, since the kernel function can be easily deformed using the learning sample, it is possible to easily improve the pattern identification. Further, by using a parallel computer, it is possible to efficiently calculate the pattern identification at each resolution.
[0052]
The image normalizing unit 20 and the pattern identifying unit 100 complement each other, thereby enabling very accurate and high-speed face pattern identification.
[Brief description of the drawings]
FIG. 1 is a schematic diagram illustrating a configuration of a computer according to an embodiment.
FIG. 2 is a block diagram schematically illustrating an image normalization unit and a pattern identification unit.
FIG. 3 is a block diagram illustrating details of an image normalization unit.
FIG. 4 is a schematic diagram illustrating a state of nonlinear conversion.
FIG. 5 is a flowchart illustrating a processing procedure of an image normalization unit.
FIG. 6 is a diagram illustrating test results of an image normalization unit.
FIG. 7 is a schematic diagram of an auto encoder used for an image normalization unit.
FIG. 8 is a block diagram schematically illustrating a pattern identification unit.
FIG. 9 is a flowchart illustrating a processing procedure of a pattern identification unit.
FIG. 10 is a diagram showing test results of a pattern identification unit.
[Explanation of symbols]
20 image normalization section, 26 partial conversion detection section, 28 normalization processing section, 100 pattern identification section, 102 face identification section, 104 counterexample detection section.

Claims

In an image processing apparatus that normalizes and converts image data to a condition that enables identification, so that pattern identification means can identify a face pattern in given image data,
Partial conversion detection means provided for each of a plurality of partial conversions obtained by dividing the normalization conversion and arranged in parallel;
A partial conversion evaluation unit provided in each of the partial conversion detection units, the size of the partial conversion required for normalizing the image data based on a comparison between the learning result using the learning sample and the image data. And, a partial conversion evaluation means for evaluating an estimation error accompanying the conversion,
A conversion execution unit that determines a partial conversion detection unit that gives the smallest estimation error among all the partial conversion evaluation units, and performs a corresponding partial conversion on the image data,
With
The comparison between the learning result and the image data in the partial conversion evaluation unit is performed in a space defined by the non-linear conversion,
An image processing apparatus according to claim 1, wherein said conversion executing means is executed at least once for image data.

The image processing device according to claim 1,
The image processing apparatus according to claim 1, wherein the partial conversion detected by the partial conversion detecting means includes at least one of size, rotation, and shift.

The image processing device according to claim 1,
An image processing apparatus characterized in that the non-linear transformation is given by calculation means using a neural network.

The image processing device according to claim 1,
An image processing apparatus characterized in that the non-linear transformation is given by calculation means using a kernel function.

An image processing apparatus according to claim 1, wherein
A coarse-graining means for applying a spatial filter to the image data for coarse-graining,
An image processing apparatus characterized in that coarse-grained image data is used in the partial conversion detecting means.

An image processing apparatus according to claim 1, wherein:
An image processing apparatus, wherein the operation of each partial conversion detecting means is processed in parallel by a parallel operation device provided in the apparatus.

In an image processing method for normalizing and converting image data to a condition where identification can be performed so that identification can be performed on a face pattern in given image data by a pattern identification method,
A partial transformation detection step provided for each of the plurality of partial transformations obtained by dividing the normalized transformation, and arranged in parallel;
A partial conversion evaluation step provided in each partial conversion detection step, the size of the partial conversion required to normalize the image data based on a comparison between the learning result using the learning sample and the image data. And, a partial conversion evaluation step of evaluating an estimation error due to the conversion,
A conversion execution step of determining a partial conversion detection step that gives the smallest estimation error among all the partial conversion evaluation steps, and performing a corresponding partial conversion on the image data,
With
The comparison between the learning result and the image data in the partial conversion evaluation step is performed in a space defined by the non-linear conversion,
The image processing method according to claim 1, wherein the converting is performed at least once on the image data.

On the computer,
In an image processing program for normalizing and converting image data to a condition that enables identification, so that a pattern identification procedure can identify a face pattern in given image data,
A partial transformation detection procedure provided for each of the plurality of partial transformations obtained by dividing the normalization transformation and arranged in parallel;
A partial conversion evaluation procedure provided for each partial conversion detection procedure, the size of the partial conversion required to normalize the image data based on a comparison between the learning result using the learning sample and the image data. , And a partial conversion evaluation procedure for evaluating an estimation error accompanying the conversion,
A conversion execution procedure of determining a partial conversion detection procedure that gives the smallest estimation error among all the partial conversion evaluation procedures, and performing a corresponding partial conversion on the image data,
Including
The comparison between the learning result and the image data in the partial conversion evaluation procedure is performed in a space defined by the non-linear conversion,
The image processing program according to claim 1, wherein the conversion execution procedure is executed at least once for the image data.