JP4426029B2

JP4426029B2 - Image collation method and apparatus

Info

Publication number: JP4426029B2
Application number: JP27196499A
Authority: JP
Inventors: 直毅指田; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-09-27
Filing date: 1999-09-27
Publication date: 2010-03-03
Anticipated expiration: 2019-09-27
Also published as: JP2001092963A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力画像中の認識対象と画像データベースなどに蓄積されたモデル画像とを照合し、入力画像中の認識対象を同定する画像照合方法および装置に関する。入力画像中の認識対象に応じて、人物認証、印鑑認証、署名認証などに用いることができ、また認識対象として自動車や工業製品など任意の３次元物体も扱うこともでき、多様な画像照合・認識システムに用いることができる。
【０００２】
【従来の技術】
市場において銀行のＡＴＭでのカード利用や、パソコン通信によるオンライン・ショッピングなどのように、システムの利用者が誰であるかを正しく判定する利用者認証、個人識別が必要となるアプリケーションが増加しつつある。このような利用者認証、個人識別の手段として、従来はパスワード入力による識別方法が多く用いられてきたが、近年、設置されたカメラから人物画像を取り込み、その顔画像から、そこに写っている人物が誰であるか特定する顔画像照合・検索技術が注目を集めている。この技術を用いれば、例えば、従来、暗証番号やパスワードを用いていたマンションやビルの入室管理、パソコンやインターネットヘのログインを、顔画像で代用することが可能になる。また、犯罪捜査などにおいても、ＡＴＭに設置された監視カメラの画像を基に、不正利用者を割り出して犯人を特定することなども期待されている。
【０００３】
様々な撮影環境下において撮影され、取りこまれた入力画像を用いて、安定的かつ高精度の画像照合が実行できる顔画像照合・検索システムが実現できれば、上記で示したようなセキュリティ関連のアプリケーションでの個人認証、犯罪捜査などでの人物同定、さらには、自動受付端末、顧客管理システムでの利用者認証など、幅広い分野への応用が期待される。
【０００４】
従来の画像照合システムにおけるモデル画像と入力画像との照合処理手法の１つとして、固有空間法を拡張し、画像中から抽出した特徴的な局所領域群の画像情報を特徴空間上の点集合として投影して両者を照合するアイゲンウィンドウ法(または局所固有空間法)がある。
【０００５】
従来のアイゲンウィンドウ法による画像照合手順の概略は、次の通りである。まず、入力画像、照合用のモデル画像双方の画像領域に対し、画素値の変化に基づいて特徴点を見つけ出し、「窓」と呼ばれる局所領域画像を選択する。選択された窓画像データをＫＬ展開し、比較的高い固有値をもつ固有ベクトルで張られる特徴空間へのマッピングを行う。特徴空間内で両者を比較照合し、モデル画像中の各窓データに対して最も距離の近い入力画像中の窓データを探索する。もし、両者が同じ認識対象であれば、各々の窓画像データには特徴空間内で近いものが存在していることとなり、両者が異なる認識対象であれば、特徴空間内で対応し合うものが存在しない窓画像データが含まれている。このように特徴空間上へのマッピングにより両画像の類似・非類似が判断できる。
【０００６】
【発明が解決しようとする課題】
しかしながら、上記の従来のアイゲンウィンドウ法による画像照合方法では、入力画像中の顔の姿勢（角度）やスケール（大きさ）が、モデル画像中のそれと比較して大きく変化している場合には、識別精度が極端に低下してしまうという問題があった。例えば、モデル画像としてある人物を正面から撮影したモデル画像を登録している場合において、人物認証にあたり入力される顔画像の撮影が何らかの理由で斜めに向いた状態や傾いた状態で行なわれると、入力画像とモデル画像間で生じる顔向きの違いのため、同じ人物の顔画像であるにも関わらず、両画像間の類似度として低い値が出力され、異なる人物であると誤った判定をしてしまう。
【０００７】
このような問題に対して、従来のアイゲンウィンドウ法による画像照合方法では、モデル画像として顔姿勢・スケールを変化させた多数のパターンを登録しておくという方法が考えられる。つまり、斜め方向を向いた顔画像、カメラから遠ざかった時の顔画像など多様なパターンを予め複数撮影して登録しておく。原理的には、あらゆる顔姿勢・スケールの顔画像を事前に登録すれば、安定した顔画像照合が可能である。しかしながら実際問題として、あらゆる顔姿勢・スケールに対するモデル画像を予め複数登録しておくのは実現的ではなく、また、登録データ量が大きくなってしまう。認識対象が顔である場合には、姿勢変化の自由度が高いため、顔姿勢・スケールに対するバリエーションは膨大なものになり、モデル画像の記憶容量や、認識のための処理速度などを考慮すると、上記のような方法ではやはり実用上の限界がある。
【０００８】
本発明は、このような問題を解決し、モデル画像に対して、顔姿勢・スケールが変化した場合においても、高精度な照合を維持することが可能な顔画像照合方法および装置を提供することを目的とする。
【０００９】
【課題を解決するための手段】
上記課題を解決するために本発明の画像照合方法は、認識対象が含まれる入力画像と、あらかじめ登録済みの対象を含むモデル画像群を比較照合することにより、入力画像中に存在する認識対象を同定する画像照合方法であって、前記入力画像および前記モデル画像からそれぞれ複数の特徴的な局所領域を切り出す処理と、選択した局所領域の画像情報を特徴空間上の点集合に投影する処理と、一方の画像の局所領域ごとに、前記特徴空間内において最も近傍の位置に投影されている他方の画像の局所領域を探索して局所領域同士を対応づける処理と、前記対応付けられた一方の画像の局所領域と他方の画像の局所領域の配置関係に基づいて入力画像とモデル画像間の幾何変換パラメタを推定する幾何変換パラメタ推定処理と、前記推定した幾何変換パラメタを用いて前記一方の画像の局所領域の画像情報か画像全体の画像情報か画像内の局所領域の位置情報のいずれかに対して幾何変換を施し、前記対応付けられた局所領域両者の整合性を評価して照合する処理を備えることを特徴とする。
【００１０】
上記構成により、モデル画像の局所領域と入力画像の局所領域のずれを幾何変換パラメタを用いた幾何変換により調整して、両者の姿勢、カメラ距離による違いを吸収でき、照合精度を向上させることが可能となる。
【００１１】
ここで、幾何変換パラメタの推定処理を少なくとも一つは異なる複数の対応する局所領域の組について繰り返し、得られた複数の推定した幾何変換パラメタ群を基に幾何変換パラメタを決定すれば照合精度のさらなる向上が期待でき、幾何変換パラメタ群のばらつき度合いに基づいて推定幾何変換パラメタの信頼度を算出しておけば、照合処理の信頼性の目安とすることもできる。また、局所領域間の画像内相対位置と特徴空間内距離を求め、距離に応じた投票数重み付けを行なって投票空間上に投票すれば、投票結果のピークの集中度合いに基づいて入力画像とモデル画像の類似度の高さの目安とできる。
【００１２】
また、局所領域同士の対応づけにおいて、画像内の相対位置を手掛かりとして当該相対位置から一定の範囲内に存在する局所領域に絞り込んで局所領域の対応づけを行なえば処理時間の低減が期待できる。
【００１３】
次に、本発明の画像照合装置は、認識対象が含まれる入力画像と、あらかじめ登録済みの対象を含むモデル画像群を比較照合することにより、入力画像中に存在する認識対象を同定する画像照合装置であって、入力画像を入力する画像入力部と、モデル画像を登録・格納するモデル画像格納部と、前記入力画像および前記モデル画像から特徴的な局所領域を切り出す局所領域切り出し部と、選択した局所領域の画像情報を特徴空間上の点集合に投影し、一方の画像の局所領域ごとに、前記特徴空間内において最も近傍の位置に投影されている他方の画像の局所領域を探索して局所領域同士を対応づける局所領域対応付け部と、前記対応付けられた一方の画像の局所領域と他方の画像の局所領域の配置関係に基づいて入力画像とモデル画像間の幾何変換パラメタを推定する幾何変換パラメタ計算部を備えた幾何変換パラメタ推定部と、前記推定した幾何変換パラメタを用いて前記一方の画像の局所領域の画像情報か画像全体の画像情報か画像内の局所領域の位置情報のいずれかに対して幾何変換を施し、を幾何変換を施す画像幾何変換部と、前記幾何変換した局所領域を用いて局所領域相互間の整合性を評価して照合する画像照合部を備えたことを特徴とする。
【００１４】
上記構成により、モデル画像の局所領域と入力画像の局所領域のずれを幾何変換パラメタを用いた幾何変換により調整することが可能な画像照合装置が得られ、両者の姿勢、カメラ距離による違いを吸収し、照合精度の高い装置を実現することができる。
【００１５】
また、上記画像照合装置を実現する処理プログラムを記述したコンピュータ読み取り可能な記録媒体を用意すれば、パーソナルコンピュータ、ワークステーションなどのコンピュータを利用して本発明の画像照合処理装置を構築することができる。
【００１６】
【発明の実施の形態】
（実施形態１）
本発明の画像照合処理方法は、モデル画像の対応窓と入力画像の対応窓の画像内位置情報を用いて、モデル画像と入力画像の間の幾何変換関係を表す幾何変換パラメタを推定する機能と、当該推定した幾何変換パラメタを用いて、モデル画像もしくは入力画像の一方に対して適切な幾何変換処理を施す機能と、幾何変換後の修正画像を用いて画像照合する機能を備えるものである。モデル画像と入力画像関の幾何学的なずれを吸収した上で画像照合処理を実行する。ここで推定した幾何変換パラメタを用いた幾何変換処理としては幾何変換を施す対象に応じて幾通りかの方法がある。一つはモデル画像もしくは入力画像の窓画像（局所領域）の画像情報を幾何変換する処理、他にはモデル画像もしくは入力画像全体の画像情報を幾何変換する処理、他には画像内の局所領域の位置情報に対して幾何変換する処理がある。
【００１７】
以下に、本発明のアイゲンウィンドウ法を用いた顔画像照合方法による処理過程の概略を図１のフローチャートを参照しつつ説明する。
【００１８】
本発明のアイゲンウィンドウ法を用いた顔画像照合方法による処理過程は、大別して以下のステップＳ１からステップＳ８までの８つのステップからなり、用意したモデル画像から照合用のモデル窓データを作成する「モデル登録フェーズ」（モデル画像についてのステップＳ１〜Ｓ３）と、入力画像と各モデル画像の類似度を計算する「認識フェーズ」（入力画像についてのステップＳ１〜Ｓ３とステップＳ４〜Ｓ８）により構成される。
【００１９】
（１）対象領域の切り出しステップ（ステップＳ１）
画像中から認識対象である顔の位置を検出し、顔領域の画像を切り出す処理を行なう。この処理自体は、厳密にはアイゲンウィンドウ法の処理において必須ステップではないが、認識性能(識別精度・処理速度)の実効性向上には、有効なステップであり、前処理として実効しておくことが好ましい。画像中から人物の顔領域を判定して切り出す方法としては、顔部品検出による顔領域切り出し処理、肌色抽出による顔領域検出処理、輪郭（エッジ）抽出による顔領域検出処理などの画像処理方法などがあり、それら手法を適用することが可能である。
【００２０】
（２）窓画像切り出しステップ（ステップＳ２）
上記（１）対象領域の切り出しステップＳ１で切り出された画像領域に対して、各画素値に基づいて特徴点を見つけ出し、その周辺の局所領域画像を切り出す。ここでは、この選択された局所領域を「窓」と呼ぶ。この窓画像切り出し処理を顔画像に適用した場合には、結果的に目や鼻や口などの顔部品の周辺に窓画像が切り出される。
【００２１】
具体的には、画像内の各画素におけるエッジ強度を基に対象が含まれる窓（コーナー）を検出する。各画素位置における隣接画素との画素値の差の総和（エッジ強度）を計算し、当該エッジ強度に基づいて画像の一部を窓画像として切り出し、窓画像中に含まれる画素値を要素とするベクトル（以下、窓画像ベクトルと呼ぶ）に対して、その切り出し位置（以下、画像内窓位置と呼ぶ）及び画像の識別子を付与し、窓データを作成する。この処理をすべての画像データに対して行い、窓データ群を得る。なお、上記の例では窓画像切り出しのため算出するエッジ強度を隣接画素との画素値の差の総和として計算したが、エッジ強度を空間周波数における高周波成分量として算出して窓画像の切り出しを判断することもできる。
【００２２】
画像から窓画像を切り出した様子を図２に示す。なお切り出した局所領域画像に対して、照明などの撮影環境の変化に対してある程度のロバスト性を確保するため、画像明るさについて正規化処理を行っておくことが好ましい。
【００２３】
（３）窓画像圧縮処理ステップ（ステップＳ３）
上記（２）窓画像切り出しステップ（ステップＳ２）で切出された窓画像に対して、次元圧縮のための画像変換を行う。次元圧縮のための画像変換としては、ＫＬ展開(Karhunen-Loeve展開。主成分分析の一種として知られる。以下、ＫＬ展開と略記する)、ウェーブレット変換、フーリエ変換、ＤＣＴ変換などを用いることができる。
【００２４】
例えば、ＫＬ展開の場合を例に説明すると、画像から抽出した窓画像に対して、ＫＬ展開を施し、比較的高い固有値をもつ固有ベクトルで張られる特徴空間へのマッピングを行う。この固有空間への投影にあたり、まず、固有空間を作成する。顔画像の画素値を要素とする列ベクトルから構成される行列を作成し、その行列の共分散行列から固有ベクトル行列を求める。この固有ベクトル行列で定義される空間を特徴空間とする。図３に特徴空間と窓画像のマッピングの例を示す。図３の例では、モデル画像から切り出した窓画像の特徴空間への投影を示し、便宜上、Ｘ₁，Ｘ₂，Ｘ_kの３軸のみを表示している。この特徴空間への投影により、Ｎ×Ｎ[ピクセル]サイズの窓画像がもつＮ×Ｎ次元の画素値ベクトルの集合データに対して次元圧縮変換処理を施すことができ、窓群の分布状態を考慮して無駄な次元を削減することができ、特徴空間内の窓分布状態をなるべく保ったまま、より低い次元の特徴空間へ変換が可能となる。この圧縮された窓画像(以下、窓データ)と、その窓の画像内での位置(以下、窓位置)を、画像照合のパラメタとして利用する。
【００２５】
なお、上記のステップＳ１〜ステップＳ３の処理は、照合に用いる登録モデル画像および照合のために入力された入力画像の双方に対して実行するが、登録モデル画像に対するＳ１〜Ｓ３の処理は「モデル登録フェーズ」の前処理として先に実行しておくことが可能である。
【００２６】
（４）窓照合処理(対応窓選択)ステップ（ステップＳ４）
以下、Ｓ４〜Ｓ８に示す処理は、入力画像と各モデル画像の類似度を計算する「認識フェーズ」で実行される。
【００２７】
この窓照合処理ステップＳ４では、登録済みのモデル窓データと入力窓データを特徴空間内で比較照合し、モデル画像中の各窓データに対し、特徴空間内で最も距離の近い入力画像中の窓データを探索する。図４に入力画像の窓データを特徴空間に投影した例を示す。投影点は“□”で示している。この投影結果から対応する窓データを探索して対応窓情報として記憶する。また、図５に対応窓情報の組み合わせを記録したテーブルの例を示す。
【００２８】
（５）幾何変換パラメタ推定処理（ステップＳ５）
幾何変換パラメタ推定処理ステップＳ５では、窓照合(対応窓探索)処理ステップＳ４で求めたモデル窓と入力窓の組合せが記述されている対応窓情報を受け取り、その対応窓群の中から複数の組合せを選び出し、それらの画像内での位置情報を基にモデル画像・入力画像間の幾何変換を表す幾何変換パラメタの推定を行う。
【００２９】
この幾何変換パラメタの推定において、用いる幾何変換パラメタの種類と、当該幾何変換パラメタを推定するために用いる対応窓情報組み合わせの数についてバリエーションを設けることができる。
【００３０】
まず、用いる幾何変換パラメタの種類であるが、例としてはアフィン変換、相似変換、射影変換などがある。
【００３１】
アフィン変換は以下の（数１）で示される幾何変換であり、幾何変換パラメタは６つあり、つまり自由度は６である。この幾何変換パラメタを求めるには対応窓画像情報の組み合わせが最低３個必要である。
【００３２】
【数１】

【００３３】
このアフィン変換による幾何変換は、図６（ａ）に示されるような変換であり、長方形を任意の平行四辺形に変換するようなことができ、逆に任意の平行四辺形画像を調整して正しい長方形画像に変換することができる。顔画像撮影の際に歪みが発生している場合に歪みを調整する場合には有効な幾何変換である。
【００３４】
相似変換は以下の（数２）で示される幾何変換であり、幾何変換パラメタは４つあり、つまり自由度は４である。この幾何変換パラメタを求めるには対応窓画像情報の組み合わせが最低２個必要である。
【００３５】
【数２】

【００３６】
この相似変換による幾何変換は、図６（ｂ）に示されるような変換であり、対象画像の外形は変えずに画像全体の移動、回転という幾何変換である。顔画像撮影の際に傾きや位置のずれが発生している場合にそのずれを調整する場合に有効な幾何変換である。
【００３７】
射影変換は以下の（数３）で示される幾何変換であり、幾何変換パラメタは８つあり、つまり、自由度は８である。この幾何変換パラメタを求めるには対応窓画像情報の組み合わせが最低４個必要である。
【００３８】
【数３】

【００３９】
この射影変換による幾何変換は、図６（ｃ）に示されるような変換であり、長方形を任意の四角形に変換するようなことができ、逆に任意の四角形画像を調整して正しい長方形画像に変換することができる。図６（ｃ）に示した変換は、顔画像撮影が斜めから行なわれたために、遠近法に見られるような画像の傾きが発生している場合にその歪みを調整する場合には有効な幾何変換である。
【００４０】
本発明の思想は、上記以外の幾何変換にも適用可能であることは言うまでもない。以下、本実施形態１では、幾何変換パラメタとしてアフィン変換を用いる場合を例としてその推定処理を詳しく説明する。
【００４１】
正面方向を基準とした上下、左右、傾き方向の変化と、サイズの変化(カメラからの相対距離変化)がある場合の幾何変換パラメタとして、（数１）に示した回転パラメタＲと並進パラメタＴから構成される2次元のアフィン変換パラメタを推定する(ただし、ここで定義したアフィン変換式は、人物の顔面上の顔部品が同一平面上の存在し、かつ両画像間の変形量が比較的小さいという仮定に基づくものである)。
【００４２】
画像中央を原点とし、画像平面内にＸ軸(横方向)、Ｙ(縦方向)、画像平面に垂直な方向にＺ軸を設定した場合のアフィン変換式は、上記（数１）に示したものである。したがって、この場合、推定すべきアフィン変換パラメタは、回転パラメタ4つ、並進パラメタ2つの計6つのパラメタ(r[0］,r［1］,r［2］,r［3],u,v)となる。
【００４３】
幾何変換パラメタを推定するために用いる対応窓情報の組み合わせの数であるが、選択された幾何変換パラメタの自由度に応じて幾何変換パラメタの推定に最低限必要な対応窓情報の組み合わせセット数が決まる。例えば、アフィン変換であれば、自由度は６であり、３組の対応窓情報を用いて１つのアフィン変換の幾何変換パラメタを推定することができる。
【００４４】
まず、図１のステップＳ４の窓照合処理により得られた対応窓情報の中から、図７に示すような３組の対応窓を選択する。多数の対応窓から３つの対応窓を選択する方法として、図５に示したテーブルから任意ランダムに選ぶ方法、テーブル内の並びに沿って３つを選ぶ方法、対応窓情報間の距離ｄによるソート結果においてｄが小さいものから３つ選択する方法、逆に大きいものから３つ選択する方法など様々な方法が考えられる。本実施形態１では、対応窓情報に記述されている対応窓を順番に一つ取り出し、この対応窓に対して、残り２つの対応窓を重複しないようにランダムに選択するものとする。
【００４５】
次に、この選択された３つの対応窓の画像内位置(xi,yi)-(Xi,Yi)(i=1-3)の値を基に上述の（数１）の方程式を解くことにより、回転パラメタr［0］,r［1］,r[2］,r［3］の値を計算する。
【００４６】
上記処理により選択した３組の対応窓情報から１セットの幾何変換パラメタが得られる。上記処理により求められた１セットの幾何変換パラメタの値をもって、推定結果としても良いが、さらに、異なる３組の対応窓情報組合せを多数選び出して、それら対応窓情報の組合せを用いた幾何変換パラメタの推定処理を複数回繰り返し、多数の幾何変換パラメタ推定値を得て、それらの値からもっとも確からしい幾何変換パラメタの値を決めることもできる。例えば、平均値処理、ヒストグラム処理、最小二乗推定などを施して、最終的なパラメタ値を決定する方法などがある。本実施形態１では、推定して得たパラメタ値を用意された１次元投票空間に投票する方法を用いる(言い換えれば、各パラメタ毎に適当な分割数でヒストグラムを作成する)。上記の「対応窓を３組選択」→「回転パラメタの推定」→「回転パラメタの１次元投票」の処理ループを、対応窓の個数分繰返し実行した後に、各パラメタ毎に最大得票を示すピークを求め、その最大ピークに対応する値を回転パラメタの推定値として採用する。
【００４７】
また、最大ピークの得票数を全投票数で正規化したものを推定幾何変換パラメタの信頼度として採用することができる。推定した回転パラメタが不正確な場合、最終的な類似度計算に悪影響を及ぼす可能性があるため、推定幾何変換パラメタの信頼度が設定した基準値よりも低い場合には、幾何変換パラメタによる画像修正を行なわない扱いとしたり、または、幾何変換パラメタを恒等変換Ｉと設定して実質的に幾何変換を行なわない扱いとする。
【００４８】
次に、上記で推定された回転パラメタを用いて、全ての対応窓に対して並進パラメタＴを計算する。具体的には、（数１）を変形した以下の（数４）に基づいて、並進パラメタ(u,v)を求める。この操作では、推定した回転パラメタＲを用いてモデル画像内窓位置を回転変換した上で、入力画像内窓位置との相対位置を求めていることになる。
【００４９】
【数４】

【００５０】
上記投票により幾何変換パラメタを求める方法を採用すれば、対応窓情報に含まれる誤対応窓や画像内位置情報の変動や不適切な対応窓の組合わせを選択したこと等に起因する推定誤差が低減され、より高精度なパラメタ推定が可能になることが期待される。
【００５１】
（６）幾何変換パラメタによる画像補正処理（ステップＳ６）
幾何変換パラメタによる画像補正処理ステップＳ６では、第１の処理として、推定された幾何変換パラメタを用いて入力画像を幾何変換して画像補正を施す。この幾何変換による画像補正により、アフィン変換であれば図６（ａ）に示したような幾何変換が可能となる。同様に相似変換であれば図６（ｂ）に示したような幾何変換が可能となり、射影変換であれば図６（ｃ）に示したような幾何変換が可能となる。
【００５２】
次に、画像補正処理ステップＳ６では、第２の処理として、幾何変換後の修正画像から再び、窓画像を抽出して特徴空間上でマッピングおよび対応付けを行なう。つまり、図１のフローチャートに示した窓位置選択ステップＳ２に戻り、引き続き、窓画像圧縮処理ステップＳ３、窓照合処理ステップＳ４を実行し、窓画像を特徴空間上にマッピングして対応づける。
【００５３】
図８に幾何変換画像処理を行なって再度窓画像を特徴空間上にマッピングして対応づけた例を示す。この例では“□”で表わされた入力画像の窓画像が、モデル画像１の特徴空間上の投影点グループと精度良く対応づけられるものとなったことが分かる。
【００５４】
なお、推定した幾何変換パラメタを用いて幾何変換する画像は、入力画像としても良く、また、モデル画像としても良い。入力画像またはモデル画像のいずれか一方を幾何変換して他方に幾何的に合わせれば良い。
【００５５】
（７）対応窓の相対位置投票処理（ステップＳ７）
ステップＳ６で得た幾何変換による画像修正後の対応窓情報を基に、入力画像とモデル画像内での窓間の配置関係の整合性を評価する。具体的には、各対応窓毎に、２つの画像の窓位置の差分である相対位置ベクトル(△x,△y)(xは画像横方向、yは画像縦方向)を計算し、各モデル毎に用意された２次元(x,y)投票空間内の対応するポイントに一定数の投票を行う。この投票操作を対応窓の個数分繰り返す。
【００５６】
この投票により次のことが分かる。幾何変換による画像修正の結果、両画像の外形が等しくなれば、画像の差異として残っている要素は、左右上下方向など画像全体の平行移動による位置ずれである。投票対象となる相対位置ベクトルは２つの画像の窓位置の差分であるので、原理的にはすべての対応窓が等量のずれ、つまり同じ相対位置ベクトルを持っていることとなる。つまり、投票の結果、投票が一定値に集中しているほど照合精度が良く、バラツキがあるほど両画像の照合精度が悪いこととなる。
【００５７】
（８）ピーク位置検出・類似度計算処理（ステップＳ８）
上記ステップＳ７の相対位置ベクトル投票処理の後、２次元投票空間内で最大ピークを探索し、そのピーク位置に投票された得票数に基づいて類似度を定義する。上記に示したように投票が投票空間上の一点に集中しているほど照合精度が良く、両画像が一致していることとなる。図９は２次元投票空間内での相対位置ベクトルの投票の様子を示しており、図９（ａ）は投票結果の集中度合いが大きく、照合精度が高い場合を示しており、図９（ｂ）は投票結果の集中度合いが比較的小さく、照合精度が低い場合を示している。
【００５８】
なお、ピーク位置検出により相対位置ベクトルが求まれば、当該相対位置ベクトルを用いて左右上下方向など画像全体のずれを補正することが可能となる。
【００５９】
上記ステップＳ１〜Ｓ８の処理過程により、登録されている各モデル画像と入力画像の類似度を計算し、その値の大小判定や閾値処理を行うことにより、登録人物のうちの誰であるか、または登録されている人物以外の人であるのか等の認識を行う。
【００６０】
なお、上記の説明では、幾何変換パラメタとして、６自由度のアフィン変換パラメタを採用する場合を説明したが、アフィン変換以外の幾何変換であっても同様に本発明の思想を適用することができる。例えば、撮像カメラに対する認識対象の動きがＸ，Ｙ，Ｚ方向の並進運動とＺ軸まわりの回転による相似変換である場合には、並進(u,v)、スケールs，Ｚ軸まわりの回転γの4つのパラメタが推定すべき幾何変換パラメタとなる。この回転パラメタＲと並進パラメタＴは、上記の（数２）のようになる。
【００６１】
したがって、このＲ,Ｔを用いて幾何変換を表現することにより、上記で説明した方法を適用し、幾何変換パラメタs,γ,(u,v)を求めることが可能である。
【００６２】
以上、本実施形態１の画像照合方法によれば、モデル画像と入力画像間の幾何学的関係の推定を行い、その推定値に基づいて、モデル画像もしくは入力画像に適切な幾何変換処理を施し、モデル画像と入力画像関の幾何学的なずれを吸収した上で画像照合処理を実行する。その結果、従来手法では困難であった、モデル画像と入力画像間で顔姿勢・スケールが変化した場合においても、照合精度を低下させずに高い精度で照合処理を実行することが可能となる。
【００６３】
（実施形態２）
本実施形態２は、当初与えられた入力画像を用いて幾何変換パラメタを推定し、当該推定した幾何変換パラメタを用いて、撮影状態、撮影環境のパラメタを変更し、入力画像を再度撮影することにより照合に適した入力画像を得るものである。
【００６４】
本実施形態２の例として、銀行の自動預け払い機（ＡＴＭ）において本発明の画像照合方法を用いる場合の１バリエーションを説明する。
【００６５】
従来のＡＴＭでは、本人認証の手段として、キャッシュカードと４桁の暗証番号入力により行なっている。このＡＴＭでの本人認証手段として、バイオメトリックな認証方法が研究されている。その中には顔画像入力による本人認証、指紋による本人認証などがある。顔画像入力による本人認証であれば、カメラによりＡＴＭ利用者の顔画像を撮影し、あらかじめ登録している本人の顔画像と照合する。ここで、本実施形態２の画像照合方法は、以下の処理を行なう。
【００６６】
まず、ＡＴＭ利用者の顔画像をカメラにより撮影する（ステップＳ１００１）。
【００６７】
次に、入力された撮影画像と認証用の登録画像を基に、本発明の実施形態１に示したアイゲンウィンドウ法を用いた照合処理過程において幾何変換パラメタを推定する（ステップＳ１００２）。例えば、アフィン変換の幾何変換パラメタを推定する。
【００６８】
次に、推定した幾何変換パラメタを用いて撮影状態、撮影環境を調整する（ステップＳ１００３）。実施形態１では、推定した幾何変換パラメタを用いて入力画像またはモデル画像のいずれか一方を画像修正し、他方に合わせる処理を施していたが、本実施形態２では、撮影状態、撮影環境を調整して再撮影を行なう。例えば、カメラの向きや角度、フォーカスなどの調整、利用者とカメラ間の距離を補正する。また、利用者インタフェース画面を通じて利用者の立ち位置に関するガイダンスを表示して利用者の立ち位置変更を促したり、顔の角度に関するガイダンスを表示して顔の向きの変更を促したりすることも可能である。また、利用者立ち場所にターンテーブルを用意しておいて利用者の角度を調整することも可能である。
【００６９】
次に、撮影状態、撮影環境の調整後、再撮影を行なう（ステップＳ１００４）。
【００７０】
次に、再撮影した入力画像を基に、再度、図１のフローチャートに示した窓位置選択ステップＳ２に戻り、引き続き、窓画像圧縮処理ステップＳ３、窓照合処理ステップＳ４を実行し、窓画像を特徴空間上にマッピングして対応づけ、さらに、対応窓の相対位置投票処理ステップＳ７、ピーク位置検出・類似度計算処理ステップＳ８を実行して画像照合処理を実行する（ステップＳ１００５）。
【００７１】
以上、本実施形態２の画像照合方法によれば、当初与えられた入力画像を用いて幾何変換パラメタを推定し、当該幾何変換パラメタを用いて、撮影状態、撮影環境のパラメタを変更し、入力画像を再度撮影することにより照合に適した入力画像を得ることができ、照合精度を高めることができる。
【００７２】
（実施形態３）
本実施形態３の画像照合方法は、実施形態１において説明した窓照合処理ステップ（図１のステップＳ４）に関し、改良を加えたものである。
【００７３】
窓照合処理ステップでは、登録済みのモデル窓データと入力窓データを特徴空間内で比較照合し、モデル画像中の各窓データに対し、特徴空間内で最も距離の近い入力画像中の窓データを探索し、対応する窓データの組合わせを対応窓情報としてテーブルなどに記憶する。実施形態１で示した例では、対応窓を探索する場合に、一方の窓に対し、他方の全ての窓から特徴空間内で最近傍窓を探し出す方法を採用した。これに対し、本実施形態３では、画像内位置上で窓の探索範囲を限定し、一方の窓から一定範囲内に存在する窓のみを対象とし、その中から特徴空間内での最近傍窓を探し出すという方法を採る。
【００７４】
図１１は、本実施形態３の窓照合処理の基本概念を示したものである。
【００７５】
顔画像など認識対象が限定されており、直立して正面近くを向いて撮影するなど一定の撮影状態が期待できる場合には、目、鼻、眉、耳、口など切り出される窓画像は、個人差はあるものの画像内において一定の範囲にあることが十分期待できる。例えば、図１１（ａ）の一方の目の窓画像の探索にあたり、図１１（ｂ）の他方の画像上に太枠で示したある一定範囲を探索すれば目の窓画像が存在していることが十分期待できる。この性質に着目すれば、１つの窓画像の照合にあたり、他方の画像中の該当位置周辺範囲に存在する窓画像との照合により対応する窓画像の探索が可能となる。つまり他方の画像中に存在するすべての窓画像との照合処理を行なう必要がない。このように、特徴空間内での最近傍窓の探索範囲を小さくすることができるため、窓照合処理速度を向上させることができる。天地が逆転した状態で入力されたなど特別な場合を除いて、該当位置周辺範囲に対応する窓画像が存在していないことは、そもそも両画像が同じ認識対象ではないということが言える。
【００７６】
なお、特徴空間内での最近傍窓の探索範囲の設定であるが、例えば人間の顔画像の場合、各窓画像の対象となる部分、例えば目部分について、位置や大きさなど統計的なバラツキの範囲を加味した上で設定すれば良い。
【００７７】
以上、実施形態３の画像照合方法によれば、特徴空間内での最近傍窓の探索範囲を小さくすることができるため、窓照合処理速度を向上させることができる。また、それと同時に、実際は対応していない窓同士を対応窓として選択してしまう誤対応の発生を低減することが期待できるため、推定される幾何変換パラメタの精度を向上させることが可能となる。
【００７８】
（実施形態４）
本実施形態４の画像照合方法は、実施形態１において説明した対応窓の相対位置投票処理ステップ（図１のステップＳ７）に関し、改良を加えたものである。
【００７９】
対応窓の相対位置投票処理ステップでは、実施形態１で説明した幾何変換パラメタによる画像補正処理Ｓ６による画像修正の後、対応窓情報を基に入力画像とモデル画像内での窓間の配置関係の整合性を評価するため、各対応窓毎に２つの画像の窓位置の差分である相対位置ベクトルを計算し、各モデル毎に用意された２次元(x,y)投票空間内の対応するポイントに一定数の投票操作を対応窓の個数分繰り返して行なう。実施形態１では、投票処理に際し、各対応窓ごとに一定の投票数を割り当てて投票空間上の相対位置ポイントに投票していた。それに対して本実施形態４の画像照合処理方法では、対応窓の特徴空間内での距離に応じて決定された投票数を、投票空間上の画像内相対位置で示されるポイントに投票する方法を採用する。例えば、対応窓の「見え」の近さを類似度に反映させるため、対応する窓同士の特徴空間内での距離値が小さい程、投票数が大きくなるように線形関係を設定して投票数を決定する。
【００８０】
この操作により、対応窓の特徴空間内距離、つまり対応窓の「見え」の近さが大きい程、投票ポイントを高く設定して対応窓の「見え」の近さを類似度計算に反映することができる。顔の場合で考えると、たとえ窓に相当する顔部品の位置関係が一致しても、顔部品の「見え」に違いがあると、類似度が低く出力される。その結果、従来手法と比較して、本人と他人の顔造作の違いが類似度の差として現れやすくなるため、最終的な識別精度を向上させることが可能となる。特に、個人認証での他人受理率の抑制が可能となる。
【００８１】
（実施形態５）
本発明の画像照合装置の実施形態を以下に示す。本実施形態５の画像照合装置は、上記実施形態１〜４に示した本発明の画像照合方法を実施できる装置である。本装置を適用可能なプラットフォームとしては、パーソナルコンピュータ、ワークステーションに加え、銀行のＡＴＭや、入退室管理装置など多様な装置が挙げられる。
【００８２】
最初に、本実施形態５の画像照合装置の全体構成の概略と本システムによる処理流れの全体像を図面を参照しつつ説明する。
【００８３】
図１２は、本実施形態５にかかる画像照合装置の概略構成図を示している。
【００８４】
図１２に示すように、本発明の画像照合装置は、大別して、画像入力部１０、登録画像データ格納部２０、幾何変換パラメタ推定部３０、幾何変換部４０、画像照合部５０を備えている。なお、図示していないが、システム全体の制御処理に必要なコントローラ、メモリ、ユーザーインタフェースなどは装備している。
【００８５】
画像入力部１０は、画像照合を行なう入力画像を取り込む部分である。利用形態に合わせて、画像データ撮影のためのカメラ１１、ファイル入力のための画像ファイル入力部１２を備えている。また、実施形態２で説明した撮影条件、撮影環境を調整して再撮影する方法に対応できるように撮影条件環境調整部１３を備えている。撮影条件環境調整部１３は、利用形態に合わせて、カメラ１１の可動制御部、ターンテーブルなどのカメラ被写体の角度・位置調整部、照明条件などを変更する撮影環境調整部などを備えている。
【００８６】
登録画像データ格納部２０は、画像照合に用いるモデル画像を登録・格納する部分で、例えば、大容量のハードディスクなど記憶装置と制御部分を備えている。なお、実施形態１で説明したように前処理としてモデル画像から窓画像情報を抽出しておく場合には、当該モデル画像の窓画像情報を登録画像データの一部として格納する。
【００８７】
幾何変換パラメタ推定部３０は、幾何変換パラメタを推定する部分であり、幾何変換パラメタ指定部３１、窓画像切り出し部３２、窓画像圧縮部３３、窓画像対応付け部３４、幾何変換パラメタ計算部３５、推定幾何変換パラメタ評価部３６を備えている。
【００８８】
幾何変換パラメタ指定部３１は、利用形態に応じて採用する幾何変換パラメタを指定する部分であり、実施形態１で説明したように、アフィン変換、相似変換、射影変換など用いる幾何変換を指定する。もちろんデフォルトとして固定しておいても良いし、利用者が幾何変換を指定入力しても良く、また、当該画像照合装置が画像照合の過程で自動的に選定しても良い。
【００８９】
窓画像切り出し部３２は、実施形態１で説明した対象領域の切り出し処理および窓位置選択処理を実行する部分であり、利用形態に応じてエッジ検出機能、特徴点検出機能、窓画像検出機能、正規化処理機能などを備えている。
【００９０】
窓画像圧縮部３３は、窓画像に対して、次元圧縮のための画像変換を行う部分であり、利用形態に応じて窓画像に対し、ＫＬ展開、ウェーブレット変換、フーリエ変換、ＤＣＴ変換など適した画像変換を施して窓データを生成するものである。
【００９１】
窓画像対応付け部３４は、モデル窓データと入力窓データを特徴空間内で比較照合し、モデル画像中の各窓データに対して特徴空間内で最も距離の近い入力画像中の窓データを探索し、窓データを対応付けて組合わせを対応窓情報として得る部分である。窓データを特徴空間に投影する特徴空間投影処理機能、投影された各窓データ間の特徴空間内における距離を計算する特徴距離計算処理機能、距離が相互に最も近く対応し合う窓データ同士を対応づける対応窓決定処理機能、対応窓情報格納テーブルを備えている。
【００９２】
なお、利用形態として、実施形態３で説明した窓対応付け処理の改良を適用する場合には、上記の特徴距離計算処理機能において、一の窓データから特徴距離を計算する対象となる窓データの範囲を画像上での位置が一定距離内にあるもののみに制限する機能を併せ持つ。
【００９３】
幾何変換パラメタ計算部３５は、対応窓情報の中から複数の組合せを選び出し、それらの画像位置情報を基にモデル画像・入力画像間の幾何変換を表す幾何変換パラメタの計算を行う部分である。実施形態１で説明したように、採用されている幾何変換の自由度に応じて幾何変換パラメタ計算に必要な対応窓情報のセット数分、対応窓情報格納テーブルから対応窓情報を選び出す機能と、実施形態１で説明した（数１）〜（数４）などをから幾何変換パラメタを計算する機能を備えている。アフィン変換であれば、回転パラメタr［0］,r［1］,r[2］,r［3］の値と、並進パラメタ(u,v)を計算する。
【００９４】
なお、実施形態１で説明したように、推定幾何変換パラメタを評価する場合には、この幾何変換パラメタ計算部３５において、異なる組み合わせの対応窓情報セットの選定および幾何変換パラメタ計算を繰り返し、複数の幾何変換パラメタを得る。
【００９５】
推定幾何変換パラメタ評価部３６は、幾何変換パラメタ計算部３５が計算した複数の幾何変換パラメタを基にしてより確からしい幾何変換パラメタの値を評価して決定する部分である。複数の幾何変換パラメタの平均値処理、ヒストグラム処理、最小二乗推定処理などがあり、例えば、ヒストグラム処理では、幾何変換パラメタ値を１次元投票空間に投票し、最大得票を示すピークを求め、その最大ピークに対応する値を回転パラメタの推定値とする。また、最大ピークの得票数を全投票数で正規化したものを推定幾何変換パラメタの信頼度として採用することができる。
【００９６】
幾何変換部４０は、実施形態１で述べた幾何変換パラメタによる画像補正処理を実行する部分であり、得られた幾何変換パラメタを基に、画像幾何変換を施し、モデル画像と入力画像の幾何的差異を補正する。幾何変換を施す画像は、入力画像またはモデル画像のいずれか一方で良い。例えば入力画像に幾何変換を施す。
【００９７】
なお、幾何変換部４０による画像幾何変換後、再度、窓画像切り出し処理、窓画像圧縮処理、窓画像対応付け処理を行なう必要がある。上記幾何変換パラメタ推定部３０の窓画像切り出し部３２、窓画像圧縮部３３、窓画像対応付け部３４を利用しても良いし、それら構成要素と同様のものを幾何変換部４０が備えている構成でも良い。本実施形態５では、処理の流れを分かりやすく説明するため、ブロック構成図上、幾何変換部４０がそれら構成要素、窓画像切り出し部４２、窓画像圧縮部４３、窓画像対応付け部４４を備えている構成とする。
【００９８】
画像照合部５０は、幾何変換による画像修正後の対応窓情報を基に、入力画像とモデル画像内での窓間の配置関係の整合性を評価する部分であり、２つの画像の窓位置の差分である相対位置ベクトルの計算機能、相対位置ベクトルを用意された２次元(x,y)投票空間内の対応するポイントに一定数の投票を行う投票機能、２次元投票空間内で最大ピークを探索し、そのピーク位置に投票された得票数に基づいて類似度を計算する類似度計算機能を備えている。
【００９９】
本発明の画像照合装置は、以上の構成を備えている。処理の流れの概略は、図１に示したフローチャートと同様であるのでここでは、適宜省略する。
【０１００】
（実施形態６）
実施形態６にかかる画像照合装置について図面を参照しながら説明する。本実施形態６は、実施形態５に示した画像照合装置をクライアントサーバ構成で構築した例である。
【０１０１】
図１３は、画像照合装置をクライアントサーバ構成で構築したシステムの全体概略構成を示している。図１３に示すように、１００は画像照合サーバ、１０１は画像照合クライアント、１０２はネットワーク網である。画像照合サーバ１００は、登録画像格納部２０、幾何変換パラメタ推定部３０、幾何変換部４０、画像照合部５０、ネットワーク接続のための通信インタフェース７０を備えており、画像照合クライアント１０１は、画像入力部１０を備えており、それぞれ実施形態５において同じ構成名で説明したものと基本的に同じものである。また、ユーザインタフェース部６０、通信インタフェース７０を備えている。ネットワーク１０２は、データを通信できるものであれば良く、ローカルエリアネットワーク、インターネットなどのネットワーク網であり、専用線、公衆回線、有線、無線を問わない。
【０１０２】
クライアントサーバ構成の画像照合装置による処理の流れの全体像は以下の通りである。まず、利用者は、画像照合クライアント１０１のユーザインタフェースから人物認証を受ける人の顔画像を入力する。画像照合クライアント１０１は通信インタフェースを介して顔画像を発信する。発信された顔画像ネットワーク網１０２を経由して画像照合サーバ１００に通信インタフェースを介して受信される。画像照合サーバ１００において実施形態１で説明したものと同様の処理が行われ、幾何変換パラメタの推定、画像幾何変換、画像照合を実行し、人物認証の適否の回答を画像照合クライアント１０１に返信する。なお、利用形態実施形態２で説明したように顔画像の再撮影を要求する運用であれば、画像照合クライアント１０１は、必要に応じて幾何変換パラメタを手掛かりに撮影条件、環境条件を調整して顔画像を再撮影する。その際、ユーザインタフェースからガイダンスを出しても良い。
【０１０３】
なお、本実施形態６のクライアントサーバシステム構成においても、画像照合処理に先立ち、画像照合サーバ１００におけるモデル窓データ作成は前処理として行っておくことが好ましい。
【０１０４】
（実施形態７）
本発明の画像照合方法、画像照合装置は、上記に説明した構成を実現する処理ステップを記述したプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することにより、各種コンピュータを用いて構築することができる。本発明の画像照合装置を実現する処理ステップを備えたプログラムを記録した記録媒体は、図１４に図示した記録媒体の例に示すように、ＣＤ−ＲＯＭ２０２やフレキシブルディスク２０３等の可搬型記録媒体２０１だけでなく、ネットワーク上にある記録装置内の記録媒体２００や、コンピュータのハードディスクやＲＡＭ等の記録媒体２０５のいずれであっても良く、プログラム実行時には、プログラムはコンピュータ２０４上にローディングされ、主メモリ上で実行される。
【０１０５】
【発明の効果】
以上のように本発明の画像照合方法、画像照合装置によれば、モデル画像と入力画像間の幾何学的関係の推定を行い、その推定値に基づいて、モデル画像もしくは入力画像に適切な幾何変換処理を施し、モデル画像と入力画像関の幾何学的なずれを吸収した上で画像照合処理を実行できる。その結果、従来手法では困難であった、モデル画像と入力画像間で顔姿勢・スケールが変化した場合においても、照合精度を低下させずに高い精度で照合処理を実行することが可能となる。
【０１０６】
また、本発明の画像照合方法、画像照合装置において、当初与えられた入力画像を用いて幾何変換パラメタを推定し、当該幾何変換パラメタを用いて、撮影状態、撮影環境のパラメタを変更し、入力画像を再度撮影することにより照合に適した入力画像を得ることができ、照合精度を高めることができる。
【０１０７】
さらに、特徴空間内での最近傍窓の探索範囲を小さくするという窓照合処理の改善により窓照合速度を向上させることができる。また、画像照合処理の対応窓画像の相対位置投票処理において特徴空間内での距離に応じて決定された投票数を用いた投票を行なう改善により照合精度を向上させることができる。
【図面の簡単な説明】
【図１】本発明のアイゲンウィンドウ法を用いた画像照合方法による処理過程の概略を示すフローチャート
【図２】本発明の画像から窓画像を切り出した様子を示す図
【図３】本発明の特徴空間と窓画像のマッピングの例を示す図
【図４】本発明の入力画像の窓データを特徴空間に投影した例を示す図
【図５】本発明の対応窓情報の組み合わせを記録したテーブルの例を示す図
【図６】幾何変換の例を示す図
【図７】対応窓情報の中から、組の対応窓を選択する様子を説明する図
【図８】本発明の入力画像の窓データを幾何変換画像修正後に特徴空間に投影した様子を示す図
【図９】２次元投票空間内での相対位置ベクトルの投票の様子を示す図
【図１０】本発明の実施形態２の画像照合方法による処理過程の概略を示すフローチャート
【図１１】本発明の実施形態３の窓照合処理の基本概念を示した図
【図１２】本発明の実施形態５の画像照合装置の概略構成図
【図１３】本発明の実施形態６のクライアントサーバ構成で構築した画像照合装置のシステム全体の概略構成図
【図１４】本発明の実施形態７の処理プログラムを記録した記録媒体の例を示す図
【符号の説明】
１０画像入力部
１１カメラ
１２画像ファイル入力部
１３撮影条件環境調整部
２０登録画像データ格納部
３０幾何変換パラメタ推定部
３１幾何変換パラメタ指定部
３２，４２窓画像切り出し部
３３，４３窓画像圧縮部
３４，４４窓画像対応付け部
３５幾何変換パラメタ計算部
３６推定幾何変換パラメタ評価部
４０幾何変換部
５０画像照合部
１００画像照合サーバ
１０１画像照合クライアント
１０２ネットワーク網
２００回線先のハードディスク等の記録媒体
２０１ＣＤ−ＲＯＭやフレキシブルディスク等の可搬型記録媒体
２０２ＣＤ−ＲＯＭ
２０３フレキシブルディスク
２０４コンピュータ
２０５コンピュータ上のＲＡＭ／ハードディスク等の記録媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image collation method and apparatus for collating a recognition target in an input image with a model image stored in an image database and identifying the recognition target in the input image. Depending on the recognition target in the input image, it can be used for personal authentication, seal authentication, signature authentication, etc., and can also handle any three-dimensional object such as automobiles and industrial products as recognition targets. It can be used in a recognition system.
[0002]
[Prior art]
Applications that require user authentication and personal identification to correctly determine who the system users are are increasing in the market, such as using cards at bank ATMs and online shopping via personal computer communications. is there. Conventionally, as a means for user authentication and personal identification, an identification method by password input has been used in many cases. However, in recent years, a person image is captured from an installed camera, and the face image is captured there. Face image matching / retrieval techniques that identify who a person is attracting attention. If this technology is used, for example, it is possible to substitute a facial image for entrance management of condominiums and buildings, which conventionally used a personal identification number or password, and login to a personal computer or the Internet. In crime investigations and the like, it is also expected to identify criminals by identifying unauthorized users based on images from surveillance cameras installed in ATMs.
[0003]
If a face image matching / retrieval system that can perform stable and highly accurate image matching using input images captured and captured in various shooting environments can be realized, security-related applications such as those shown above It is expected to be applied to a wide range of fields, such as personal authentication in Japan, person identification in criminal investigations, automatic reception terminals, and user authentication in customer management systems.
[0004]
As one of the collation processing methods of model images and input images in the conventional image collation system, the eigenspace method is extended, and the image information of the characteristic local region group extracted from the image is used as a point set on the feature space. There is an Eigen window method (or local eigenspace method) that projects and collates both.
[0005]
The outline of the image collation procedure by the conventional Eigenwindow method is as follows. First, feature points are found based on pixel value changes in both the input image and the model image for matching, and a local region image called “window” is selected. The selected window image data is KL-developed and mapped to a feature space spanned by eigenvectors having relatively high eigenvalues. Both are compared and collated in the feature space, and the window data in the input image closest to each window data in the model image is searched. If both are the same recognition target, each window image data has something close in the feature space, and if both are different recognition targets, they correspond in the feature space. Window image data that does not exist is included. In this way, similarity / dissimilarity between both images can be determined by mapping onto the feature space.
[0006]
[Problems to be solved by the invention]
However, in the image collation method by the conventional Eigenwindow method described above, when the posture (angle) and scale (size) of the face in the input image are greatly changed compared to that in the model image, There is a problem that the identification accuracy is extremely lowered. For example, in the case where a model image obtained by photographing a person from the front as a model image is registered, if the photographing of the face image input in person authentication is performed in an obliquely or inclined state for some reason, Because of the difference in face orientation that occurs between the input image and the model image, a low value is output as the similarity between both images even though they are the same person's face image. End up.
[0007]
In order to deal with such a problem, in the conventional image matching method based on the Eigenwindow method, a method of registering a large number of patterns with different face postures and scales as model images can be considered. That is, a plurality of various patterns such as a face image facing in an oblique direction and a face image when moving away from the camera are captured and registered in advance. In principle, if face images of all face postures and scales are registered in advance, stable face image collation is possible. However, as a practical problem, it is not practical to register a plurality of model images for every face posture / scale in advance, and the amount of registered data becomes large. When the recognition target is a face, since the degree of freedom of posture change is high, the variation to the face posture / scale becomes enormous, considering the storage capacity of the model image, the processing speed for recognition, etc. The above method still has practical limitations.
[0008]
The present invention solves such problems and provides a face image matching method and apparatus capable of maintaining highly accurate matching even when the face posture / scale changes with respect to a model image. With the goal.
[0009]
[Means for Solving the Problems]
In order to solve the above-described problem, the image matching method of the present invention compares an input image including a recognition target with a model image group including a target registered in advance, thereby identifying a recognition target existing in the input image. An image collation method for identifying, a process of cutting out a plurality of characteristic local areas from the input image and the model image, a process of projecting image information of the selected local area onto a point set on a feature space, For each local region of one image, a process of searching for the local region of the other image projected at the nearest position in the feature space and associating the local regions with each other, and the associated one image A geometric transformation parameter estimation process for estimating a geometric transformation parameter between the input image and the model image based on the positional relationship between the local region of the other image and the local region of the other image, and the estimated Applying a geometric transformation to any one of the image information of the local region of the one image, the image information of the entire image, or the position information of the local region in the image using any conversion parameter, and both of the associated local regions It is characterized by comprising a process of evaluating and checking the consistency of the above.
[0010]
With the above configuration, the difference between the local area of the model image and the local area of the input image can be adjusted by geometric transformation using geometric transformation parameters, so that differences due to their posture and camera distance can be absorbed and collation accuracy can be improved. It becomes possible.
[0011]
Here, when the geometric transformation parameter estimation process is repeated for at least one set of a plurality of corresponding local regions different from each other, and the geometric transformation parameters are determined based on the obtained plurality of estimated geometric transformation parameter groups, the matching accuracy is improved. Further improvement can be expected, and if the reliability of the estimated geometric transformation parameter is calculated based on the degree of variation of the geometric transformation parameter group, it can be used as a measure of the reliability of the matching process. In addition, if the relative position in the image between the local regions and the distance in the feature space are obtained, and the number of votes is weighted according to the distance and voted on the voting space, the input image and the model are based on the degree of concentration of the peak of the vote result It can be used as a measure of the degree of similarity between images.
[0012]
Further, in associating local areas with each other, if the relative positions in the image are used as clues and the local areas are correlated by narrowing down to local areas existing within a certain range from the relative positions, a reduction in processing time can be expected.
[0013]
Next, the image matching apparatus of the present invention performs image matching for identifying a recognition target existing in the input image by comparing and matching the input image including the recognition target with a model image group including the target registered in advance. An image input unit that inputs an input image, a model image storage unit that registers and stores a model image, a local region cutout unit that cuts out a characteristic local region from the input image and the model image, and a selection The image information of the local area is projected onto a point set on the feature space, and the local area of the other image projected at the nearest position in the feature space is searched for each local area of one image. A local region associating unit that associates local regions with each other, and between the input image and the model image based on the arrangement relationship between the local region of the associated one image and the local region of the other image A geometric transformation parameter estimation unit including a geometric transformation parameter calculation unit for estimating a transformation parameter, and using the estimated geometric transformation parameter, image information of a local region of the one image, image information of the entire image, An image geometric transformation unit that performs geometric transformation on any of the positional information of the local region and performs geometric transformation, and an image to be collated by evaluating the consistency between the local regions using the geometrically transformed local region A collation unit is provided.
[0014]
With the above configuration, it is possible to obtain an image matching device that can adjust the displacement between the local area of the model image and the local area of the input image by geometric transformation using geometric transformation parameters, and absorbs the difference due to their posture and camera distance. In addition, it is possible to realize a device with high matching accuracy.
[0015]
If a computer-readable recording medium describing a processing program for realizing the image collating apparatus is prepared, the image collating apparatus of the present invention can be constructed using a computer such as a personal computer or a workstation. .
[0016]
DETAILED DESCRIPTION OF THE INVENTION
(Embodiment 1)
The image collation processing method of the present invention has a function of estimating a geometric transformation parameter representing a geometric transformation relationship between a model image and an input image, using positional information in the images of the corresponding window of the model image and the corresponding window of the input image. A function of performing an appropriate geometric transformation process on one of the model image and the input image using the estimated geometric transformation parameter and a function of performing image collation using the corrected image after the geometric transformation are provided. Image matching processing is executed after the geometrical deviation between the model image and the input image is absorbed. As the geometric transformation process using the geometric transformation parameter estimated here, there are several methods depending on the object to be subjected to the geometric transformation. One is the process of geometrically converting the image information of the model image or the window image (local area) of the input image, the other is the process of geometrically converting the image information of the model image or the entire input image, and the other is the local area in the image There is a process of performing geometric transformation on the position information.
[0017]
Hereinafter, an outline of a processing process by the face image matching method using the Eigenwindow method of the present invention will be described with reference to the flowchart of FIG.
[0018]
The process by the face image matching method using the Eigenwindow method of the present invention is roughly divided into the following eight steps from Step S1 to Step S8, and creates model window data for matching from prepared model images. “Model registration phase” (steps S1 to S3 for the model image) and “recognition phase” (steps S1 to S3 and steps S4 to S8 for the input image) for calculating the similarity between the input image and each model image. The
[0019]
(1) Step of cutting out target area (step S1)
A process of detecting the position of the face to be recognized from the image and cutting out the image of the face area is performed. Strictly speaking, this process itself is not an essential step in the Eigenwindow method, but it is an effective step for improving the effectiveness of recognition performance (identification accuracy and processing speed), and should be implemented as a pre-process. Is preferred. Image processing methods such as face region extraction processing by face component detection, face region detection processing by skin color extraction, face region detection processing by contour (edge) extraction, etc., are used as methods for determining and extracting a human face region from an image. Yes, it is possible to apply these methods.
[0020]
(2) Window image cutout step (step S2)
The feature point is found based on each pixel value in the image region cut out in the above-described (1) target region cut-out step S1, and the surrounding local region image is cut out. Here, the selected local region is referred to as a “window”. When this window image cut-out process is applied to a face image, as a result, a window image is cut out around the face parts such as eyes, nose and mouth.
[0021]
Specifically, a window (corner) in which the target is included is detected based on the edge intensity at each pixel in the image. Calculate the sum of the pixel value differences (edge strength) from adjacent pixels at each pixel position, cut out a part of the image as a window image based on the edge strength, and use the pixel value included in the window image as an element A cutout position (hereinafter referred to as an in-image window position) and an image identifier are assigned to a vector (hereinafter referred to as a window image vector), and window data is created. This process is performed on all the image data to obtain a window data group. In the above example, the edge strength calculated for clipping the window image is calculated as the sum of the differences between the pixel values of adjacent pixels. However, the edge strength is calculated as the amount of high-frequency components at the spatial frequency to determine the clipping of the window image. You can also
[0022]
FIG. 2 shows a state in which a window image is cut out from the image. In order to ensure a certain degree of robustness with respect to changes in the shooting environment such as illumination, it is preferable to perform normalization processing on the image brightness for the cut-out local area image.
[0023]
(3) Window image compression processing step (step S3)
Image conversion for dimension compression is performed on the window image cut out in the above (2) window image cut-out step (step S2). As image conversion for dimensional compression, KL expansion (Karhunen-Loeve expansion, known as a kind of principal component analysis, hereinafter abbreviated as KL expansion), wavelet transform, Fourier transform, DCT transform, and the like can be used. .
[0024]
For example, in the case of KL expansion, the window image extracted from the image is subjected to KL expansion and mapped to a feature space spanned by eigenvectors having relatively high eigenvalues. In projecting onto this eigenspace, first, an eigenspace is created. A matrix composed of column vectors having the face image pixel values as elements is created, and an eigenvector matrix is obtained from the covariance matrix of the matrix. A space defined by this eigenvector matrix is defined as a feature space. FIG. 3 shows an example of mapping between the feature space and the window image. In the example of FIG. 3, the projection of the window image cut out from the model image onto the feature space is shown. ₁ , X ₂ , X _k Only the three axes are displayed. By projecting to this feature space, a dimensional compression conversion process can be performed on a set of N × N dimensional pixel value vectors of a N × N [pixel] size window image, and the distribution state of the window group can be changed. It is possible to reduce a useless dimension in consideration, and it is possible to convert to a feature space of a lower dimension while keeping the window distribution state in the feature space as much as possible. The compressed window image (hereinafter referred to as window data) and the position of the window in the image (hereinafter referred to as the window position) are used as parameters for image verification.
[0025]
Note that the processing in steps S1 to S3 is performed on both the registered model image used for matching and the input image input for matching. The processing in S1 to S3 for the registered model image is “model”. It can be executed first as pre-processing of the “registration phase”.
[0026]
(4) Window collation processing (corresponding window selection) step (step S4)
Hereinafter, the processing shown in S4 to S8 is executed in the “recognition phase” in which the similarity between the input image and each model image is calculated.
[0027]
In this window collation processing step S4, the registered model window data and the input window data are compared and collated in the feature space, and the window in the input image that is the closest in the feature space to each window data in the model image. Explore data. FIG. 4 shows an example in which the window data of the input image is projected onto the feature space. Projection points are indicated by “□”. Corresponding window data is searched from this projection result and stored as corresponding window information. FIG. 5 shows an example of a table in which combinations of corresponding window information are recorded.
[0028]
(5) Geometric transformation parameter estimation processing (step S5)
In geometric transformation parameter estimation processing step S5, corresponding window information in which a combination of the model window and the input window obtained in window matching (corresponding window search) processing step S4 is described is received, and a plurality of combinations are selected from the corresponding window group. Is selected, and a geometric transformation parameter representing a geometric transformation between the model image and the input image is estimated based on position information in the images.
[0029]
In the estimation of the geometric transformation parameters, variations can be provided for the types of geometric transformation parameters used and the number of corresponding window information combinations used for estimating the geometric transformation parameters.
[0030]
First, the types of geometric transformation parameters to be used include affine transformation, similarity transformation, and projective transformation.
[0031]
The affine transformation is a geometric transformation represented by the following (Equation 1), and there are six geometric transformation parameters, that is, the degree of freedom is six. In order to obtain this geometric transformation parameter, at least three combinations of corresponding window image information are required.
[0032]
[Expression 1]

[0033]
The geometric transformation by this affine transformation is a transformation as shown in FIG. 6 (a), in which a rectangle can be transformed into an arbitrary parallelogram, and conversely an arbitrary parallelogram image is adjusted. It can be converted into a correct rectangular image. This is an effective geometric transformation in the case of adjusting distortion when distortion occurs during face image shooting.
[0034]
The similarity transformation is a geometric transformation represented by the following (Equation 2), and there are four geometric transformation parameters, that is, the degree of freedom is four. In order to obtain this geometric transformation parameter, at least two combinations of corresponding window image information are required.
[0035]
[Expression 2]

[0036]
The geometric transformation by the similarity transformation is a transformation as shown in FIG. 6B, and is a geometric transformation of moving and rotating the entire image without changing the outer shape of the target image. This is a geometric transformation that is effective for adjusting the deviation when a tilt or position shift occurs during face image shooting.
[0037]
The projective transformation is a geometric transformation represented by the following (Equation 3), and there are eight geometric transformation parameters, that is, the degree of freedom is eight. In order to obtain this geometric transformation parameter, at least four combinations of corresponding window image information are required.
[0038]
[Equation 3]

[0039]
The geometric transformation by this projective transformation is a transformation as shown in FIG. 6C, and a rectangle can be transformed into an arbitrary quadrangle. Conversely, an arbitrary quadrilateral image is adjusted to a correct rectangular image. Can be converted. The conversion shown in FIG. 6C is an effective geometry for adjusting the distortion when the image is tilted as seen in perspective because the face image was taken obliquely. It is a conversion.
[0040]
It goes without saying that the idea of the present invention is applicable to geometric transformations other than those described above. Hereinafter, in the first embodiment, the estimation process will be described in detail by taking as an example the case of using affine transformation as the geometric transformation parameter.
[0041]
As a geometric transformation parameter when there is a change in the vertical, horizontal, and tilt directions with respect to the front direction and a change in size (change in relative distance from the camera), the rotation parameter R and the translation parameter T shown in (Equation 1) (However, in the affine transformation formula defined here, the facial parts on the face of a person exist on the same plane, and the deformation between both images is relatively Based on the assumption that it is small).
[0042]
The affine transformation formula when the center of the image is the origin, the X axis (horizontal direction) and Y (vertical direction) are set in the image plane, and the Z axis is set in the direction perpendicular to the image plane is shown in (Expression 1) above. Is. Therefore, in this case, the affine transformation parameters to be estimated are six parameters (r [0], r [1], r [2], r [3], u, v, four rotation parameters and two translation parameters). ).
[0043]
This is the number of combinations of corresponding window information used to estimate the geometric transformation parameter, but the minimum number of combinations of corresponding window information necessary for estimating the geometric transformation parameter according to the degree of freedom of the selected geometric transformation parameter is Determined. For example, in the case of affine transformation, the degree of freedom is 6, and the geometric transformation parameter of one affine transformation can be estimated using three sets of corresponding window information.
[0044]
First, three sets of corresponding windows as shown in FIG. 7 are selected from the corresponding window information obtained by the window matching process in step S4 of FIG. As a method of selecting three corresponding windows from a large number of corresponding windows, a method of selecting arbitrarily from the table shown in FIG. Various methods can be considered, such as a method of selecting three from the smallest d in FIG. In the first embodiment, one corresponding window described in the corresponding window information is taken out in order, and the remaining two corresponding windows are selected at random so as not to overlap.
[0045]
Next, by solving the above equation (Equation 1) based on the values of the positions (xi, yi)-(Xi, Yi) (i = 1-3) in the images of the selected three corresponding windows. The values of the rotation parameters r [0], r [1], r [2], r [3] are calculated.
[0046]
One set of geometric transformation parameters is obtained from the three sets of corresponding window information selected by the above processing. A set of geometric transformation parameter values obtained by the above processing may be used as an estimation result. Furthermore, a large number of three different pairs of corresponding window information combinations are selected, and the geometric transformation parameters using the combinations of the corresponding window information are selected. It is also possible to obtain the most probable geometric transformation parameter value from these values by repeating the above estimation process a plurality of times to obtain a large number of geometric transformation parameter estimation values. For example, there is a method of determining a final parameter value by performing average value processing, histogram processing, least square estimation, and the like. In the first embodiment, a method of voting parameter values obtained by estimation in a prepared one-dimensional voting space is used (in other words, a histogram is created with an appropriate number of divisions for each parameter). After repeating the above processing loop of “Select 3 corresponding windows” → “Estimation of rotation parameters” → “1D voting of rotation parameters” for the number of corresponding windows, the peak indicating the maximum number of votes for each parameter And the value corresponding to the maximum peak is adopted as the estimated value of the rotation parameter.
[0047]
In addition, a value obtained by normalizing the number of votes obtained at the maximum peak with the total number of votes can be adopted as the reliability of the estimated geometric transformation parameter. If the estimated rotation parameter is inaccurate, the final similarity calculation may be adversely affected.If the estimated geometric transformation parameter is less reliable than the set reference value, the image based on the geometric transformation parameter Either no correction is performed, or the geometric transformation parameter is set as an identity transformation I and the geometric transformation is not substantially performed.
[0048]
Next, the translation parameter T is calculated for all corresponding windows using the rotation parameter estimated above. Specifically, the translation parameter (u, v) is obtained based on the following (Expression 4) obtained by modifying (Expression 1). In this operation, the model image inner window position is rotationally converted using the estimated rotation parameter R, and the relative position with respect to the input image inner window position is obtained.
[0049]
[Expression 4]

[0050]
If the method for obtaining the geometric transformation parameters by voting is used, there is an estimation error caused by selecting an incorrect correspondence window included in the correspondence window information, a variation in position information in the image, or an inappropriate combination of correspondence windows. It is expected that it will be reduced and parameter estimation with higher accuracy will be possible.
[0051]
(6) Image correction processing using geometric transformation parameters (step S6)
In the image correction processing step S6 using the geometric transformation parameter, as the first processing, the input image is geometrically transformed using the estimated geometric transformation parameter to perform image correction. As a result of the image correction by this geometric transformation, geometric transformation as shown in FIG. Similarly, geometric transformation as shown in FIG. 6B can be performed with similarity transformation, and geometric transformation as shown in FIG. 6C can be conducted with projection transformation.
[0052]
Next, in image correction processing step S6, as a second process, a window image is extracted again from the corrected image after geometric transformation, and mapping and association are performed on the feature space. That is, the processing returns to the window position selection step S2 shown in the flowchart of FIG. 1, and subsequently, the window image compression processing step S3 and the window matching processing step S4 are executed to map and associate the window image on the feature space.
[0053]
FIG. 8 shows an example in which geometric transformation image processing is performed and the window image is mapped again on the feature space and associated. In this example, it can be seen that the window image of the input image represented by “□” is associated with the projection point group on the feature space of the model image 1 with high accuracy.
[0054]
Note that an image to be geometrically transformed using the estimated geometric transformation parameter may be an input image or a model image. Any one of the input image and the model image may be geometrically transformed and geometrically matched with the other.
[0055]
(7) Relative position voting process of corresponding window (step S7)
Based on the corresponding window information after image correction by geometric transformation obtained in step S6, the consistency of the arrangement relationship between the windows in the input image and the model image is evaluated. Specifically, for each corresponding window, a relative position vector (Δx, Δy) (x is the horizontal direction of the image and y is the vertical direction of the image) that is the difference between the window positions of the two images is calculated. A certain number of votes are given to corresponding points in the two-dimensional (x, y) voting space prepared for each. This voting operation is repeated for the number of corresponding windows.
[0056]
This vote shows the following: As a result of image correction by geometric transformation, if the outer shapes of both images become equal, the remaining elements as image differences are displacements due to parallel movement of the entire image, such as the horizontal and vertical directions. Since the relative position vector to be voted is a difference between the window positions of the two images, in principle, all the corresponding windows have the same amount of deviation, that is, the same relative position vector. That is, as a result of voting, as the voting is concentrated on a certain value, the collation accuracy is better, and as there is variation, the collation accuracy of both images is worse.
[0057]
(8) Peak position detection / similarity calculation processing (step S8)
After the relative position vector voting process in step S7, the maximum peak is searched in the two-dimensional voting space, and the similarity is defined based on the number of votes voted for the peak position. As shown above, the more the votes are concentrated on one point in the voting space, the better the collation accuracy is, and the two images match. FIG. 9 shows a state of voting of the relative position vector in the two-dimensional voting space, and FIG. 9A shows a case where the concentration degree of the voting result is large and the collation accuracy is high, and FIG. ) Shows a case where the degree of voting concentration is relatively small and the collation accuracy is low.
[0058]
If the relative position vector is obtained by detecting the peak position, it is possible to correct the shift of the entire image such as the horizontal and vertical directions using the relative position vector.
[0059]
Through the process of steps S1 to S8, the similarity between each registered model image and the input image is calculated, and by determining the magnitude of the value and threshold processing, who is the registered person, Or it is recognized whether the person is a person other than the registered person.
[0060]
In the above description, the case where an affine transformation parameter having 6 degrees of freedom is employed as the geometric transformation parameter has been described. However, the idea of the present invention can be similarly applied to geometric transformations other than affine transformation. . For example, when the movement of the recognition target with respect to the imaging camera is translational movement in the X, Y, and Z directions and similarity transformation by rotation around the Z axis, translation (u, v), scale s, rotation around the Z axis γ These are the geometric transformation parameters to be estimated. The rotation parameter R and the translation parameter T are as shown in (Expression 2) above.
[0061]
Therefore, by expressing the geometric transformation using R and T, it is possible to obtain the geometric transformation parameters s, γ, (u, v) by applying the method described above.
[0062]
As described above, according to the image matching method of the first embodiment, the geometric relationship between the model image and the input image is estimated, and an appropriate geometric transformation process is performed on the model image or the input image based on the estimated value. Then, the image matching process is executed after absorbing the geometric shift between the model image and the input image. As a result, even when the face posture / scale changes between the model image and the input image, which is difficult with the conventional method, it is possible to execute the matching process with high accuracy without reducing the matching accuracy.
[0063]
(Embodiment 2)
In the second embodiment, a geometric transformation parameter is estimated using an initially given input image, the shooting state and shooting environment parameters are changed using the estimated geometric transformation parameter, and the input image is taken again. Thus, an input image suitable for collation is obtained.
[0064]
As an example of the second embodiment, one variation in the case where the image collating method of the present invention is used in a bank automatic teller machine (ATM) will be described.
[0065]
In the conventional ATM, as a means for personal authentication, a cash card and a 4-digit password are input. Biometric authentication methods have been studied as means for personal authentication in this ATM. Among them are personal authentication by face image input and personal authentication by fingerprint. In the case of personal authentication based on face image input, the ATM user's face image is captured by a camera and collated with a pre-registered person's face image. Here, the image collating method of the second embodiment performs the following processing.
[0066]
First, a face image of an ATM user is taken by a camera (step S1001).
[0067]
Next, based on the input photographed image and the registered image for authentication, a geometric transformation parameter is estimated in the matching process using the Eigenwindow method shown in the first embodiment of the present invention (step S1002). For example, a geometric transformation parameter of affine transformation is estimated.
[0068]
Next, the shooting state and shooting environment are adjusted using the estimated geometric transformation parameters (step S1003). In the first embodiment, either the input image or the model image is subjected to image correction using the estimated geometric transformation parameter, and processing for matching with the other is performed. In the second embodiment, the shooting state and the shooting environment are adjusted. Then re-shoot. For example, the camera orientation, angle, focus adjustment, and the distance between the user and the camera are corrected. It is also possible to display the guidance on the user's standing position through the user interface screen to prompt the user to change the standing position, or display the guidance on the face angle to prompt the user to change the face orientation. is there. It is also possible to adjust the angle of the user by preparing a turntable at the user standing place.
[0069]
Next, after adjusting the shooting state and shooting environment, re-shooting is performed (step S1004).
[0070]
Next, based on the re-captured input image, the process returns again to the window position selection step S2 shown in the flowchart of FIG. 1, and subsequently, the window image compression processing step S3 and the window collation processing step S4 are executed, Mapping is performed on the feature space and associated with each other. Further, the relative position voting processing step S7 of the corresponding window and the peak position detection / similarity calculation processing step S8 are executed to execute image collation processing (step S1005).
[0071]
As described above, according to the image collating method of the second embodiment, the geometric transformation parameter is estimated using the initially given input image, the photographing state and the photographing environment parameters are changed using the geometric transformation parameter, and the input is performed. By capturing the image again, an input image suitable for collation can be obtained, and collation accuracy can be improved.
[0072]
(Embodiment 3)
The image collation method of the third embodiment is an improvement on the window collation processing step (step S4 in FIG. 1) described in the first embodiment.
[0073]
In the window matching processing step, the registered model window data and input window data are compared and matched in the feature space, and the window data in the input image that is the closest in the feature space is obtained for each window data in the model image. Search and store the combination of corresponding window data in a table or the like as corresponding window information. In the example shown in the first embodiment, when searching for the corresponding window, a method of searching for the nearest window in the feature space from all the other windows is adopted for one window. On the other hand, in the third embodiment, the search range of the window is limited on the position in the image, and only the window existing within a certain range from one window is targeted, and the nearest window in the feature space from among them The method of finding out.
[0074]
FIG. 11 shows the basic concept of the window matching process of the third embodiment.
[0075]
When the subject to be recognized is limited, such as face images, and when shooting in a certain position, such as shooting upright and facing close to the front, window images cut out such as eyes, nose, eyebrows, ears and mouth, Although there is a difference, it can be sufficiently expected to be within a certain range in the image. For example, when searching for the window image of one eye in FIG. 11A, if a certain range indicated by a thick frame is searched on the other image in FIG. 11B, the window image of the eye exists. I can expect enough. Focusing on this property, when collating one window image, it is possible to search for the corresponding window image by collating with the window image existing in the peripheral range of the corresponding position in the other image. That is, it is not necessary to perform collation processing with all window images existing in the other image. Thus, since the search range of the nearest window in the feature space can be reduced, the window matching processing speed can be improved. Except for special cases such as when the image is input with the top and bottom reversed, the fact that there is no window image corresponding to the surrounding area of the corresponding position means that both images are not the same recognition object in the first place.
[0076]
Note that the search range of the nearest window in the feature space is set. For example, in the case of a human face image, statistical variations such as the position and size of the target portion of each window image, for example, the eye portion, are set. This should be set with the range of.
[0077]
As described above, according to the image matching method of the third embodiment, the search range of the nearest window in the feature space can be reduced, so that the window matching processing speed can be improved. At the same time, it can be expected to reduce the occurrence of erroneous correspondence that selects windows that do not actually correspond as correspondence windows, so that the accuracy of the estimated geometric transformation parameter can be improved.
[0078]
(Embodiment 4)
The image collating method according to the fourth embodiment is an improvement on the relative window voting process step (step S7 in FIG. 1) of the corresponding window described in the first embodiment.
[0079]
In the relative position voting process step of the corresponding window, after the image correction by the image correction process S6 using the geometric transformation parameter described in the first embodiment, the positional relationship between the windows in the input image and the model image is based on the corresponding window information. In order to evaluate the consistency, a relative position vector, which is the difference between the window positions of the two images, is calculated for each corresponding window, and the corresponding points in the two-dimensional (x, y) voting space prepared for each model. A predetermined number of voting operations are repeated for the number of corresponding windows. In the first embodiment, in the voting process, a fixed number of votes is assigned to each corresponding window, and the relative position points in the voting space are voted. On the other hand, in the image matching processing method of the fourth embodiment, a method of voting the number of votes determined according to the distance in the feature space of the corresponding window to the point indicated by the relative position in the image on the voting space. adopt. For example, in order to reflect the closeness of the “look” of the corresponding window in the similarity, the number of votes is set by setting a linear relationship such that the smaller the distance value in the feature space between corresponding windows, the larger the number of votes. To decide.
[0080]
By this operation, the larger the distance within the feature window of the corresponding window, that is, the closer the “appearance” of the corresponding window is, the higher the voting point is set, and the closer the “appearance” of the corresponding window is reflected in the similarity calculation. Can do. Considering the case of a face, even if the positional relationship of the facial parts corresponding to the windows is the same, if there is a difference in the “appearance” of the facial parts, the similarity is output low. As a result, compared with the conventional method, the difference between the facial features of the person and the other person is likely to appear as a difference in similarity, so that the final identification accuracy can be improved. In particular, it is possible to suppress the acceptance rate of others by personal authentication.
[0081]
(Embodiment 5)
Embodiments of the image collating apparatus of the present invention are shown below. The image collation apparatus of the fifth embodiment is an apparatus that can implement the image collation method of the present invention shown in the first to fourth embodiments. Platforms to which this apparatus can be applied include various apparatuses such as bank ATMs and entry / exit management apparatuses in addition to personal computers and workstations.
[0082]
First, the outline of the overall configuration of the image collating apparatus according to the fifth embodiment and the overall image of the processing flow of this system will be described with reference to the drawings.
[0083]
FIG. 12 is a schematic configuration diagram of an image collating apparatus according to the fifth embodiment.
[0084]
As shown in FIG. 12, the image collation apparatus of the present invention roughly includes an image input unit 10, a registered image data storage unit 20, a geometric transformation parameter estimation unit 30, a geometric transformation unit 40, and an image collation unit 50. . Although not shown, a controller, a memory, a user interface and the like necessary for control processing of the entire system are provided.
[0085]
The image input unit 10 is a part that captures an input image for image comparison. A camera 11 for photographing image data and an image file input unit 12 for file input are provided in accordance with the usage form. In addition, the photographing condition environment adjusting unit 13 is provided so as to be compatible with the method of adjusting the photographing conditions and photographing environment described in the second embodiment and performing re-shooting. The shooting condition environment adjustment unit 13 includes a movable control unit of the camera 11, an angle / position adjustment unit of a camera subject such as a turntable, a shooting environment adjustment unit that changes illumination conditions, and the like according to the usage form.
[0086]
The registered image data storage unit 20 is a part for registering and storing a model image used for image collation, and includes a storage device such as a large-capacity hard disk and a control part. As described in the first embodiment, when window image information is extracted from a model image as preprocessing, the window image information of the model image is stored as a part of registered image data.
[0087]
The geometric transformation parameter estimation unit 30 is a part that estimates a geometric transformation parameter, and includes a geometric transformation parameter designation unit 31, a window image cutout unit 32, a window image compression unit 33, a window image association unit 34, and a geometric transformation parameter calculation unit 35. The estimated geometric transformation parameter evaluation unit 36 is provided.
[0088]
The geometric transformation parameter designating unit 31 is a part that designates a geometric transformation parameter to be adopted according to the usage form, and designates geometric transformation to be used such as affine transformation, similarity transformation, projective transformation, etc., as described in the first embodiment. Of course, it may be fixed as a default, the user may designate and input geometric transformation, or the image collation apparatus may automatically select in the image collation process.
[0089]
The window image cutout unit 32 is a part that executes the target region cutout process and the window position selection process described in the first embodiment, and according to the usage pattern, an edge detection function, a feature point detection function, a window image detection function, a normal one It has a processing function.
[0090]
The window image compression unit 33 performs image conversion for dimensional compression on the window image, and is suitable for KL expansion, wavelet transform, Fourier transform, DCT transform, etc., on the window image according to the usage form. Window data is generated by performing image conversion.
[0091]
The window image association unit 34 compares the model window data and the input window data in the feature space, and searches for the window data in the input image that is the closest in the feature space to each window data in the model image. The window data is associated with each other to obtain a combination as corresponding window information. Feature space projection processing function that projects window data onto the feature space, feature distance calculation processing function that calculates the distance in the feature space between each projected window data, and window data that correspond closest to each other A corresponding window determination processing function and a corresponding window information storage table are provided.
[0092]
In addition, when the improvement of the window matching processing described in the third embodiment is applied as a usage mode, the above-described feature distance calculation processing function uses the window data for which the feature distance is calculated from one window data. It also has the function of limiting the range to only those whose positions on the image are within a certain distance.
[0093]
The geometric transformation parameter calculation unit 35 is a part that selects a plurality of combinations from the corresponding window information and calculates a geometric transformation parameter that represents the geometric transformation between the model image and the input image based on the image position information. As described in the first embodiment, the function of selecting corresponding window information from the corresponding window information storage table for the number of sets of corresponding window information necessary for geometric conversion parameter calculation according to the degree of freedom of geometric conversion adopted, A function for calculating geometric transformation parameters from (Equation 1) to (Equation 4) described in Embodiment 1 is provided. In the case of affine transformation, the values of the rotation parameters r [0], r [1], r [2], r [3] and the translation parameters (u, v) are calculated.
[0094]
As described in the first embodiment, when the estimated geometric transformation parameter is evaluated, the geometric transformation parameter calculation unit 35 repeatedly selects different combinations of corresponding window information sets and calculates the geometric transformation parameter to obtain a plurality of parameters. Get the geometric transformation parameters.
[0095]
The estimated geometric transformation parameter evaluation unit 36 is a part that evaluates and determines a more probable geometric transformation parameter value based on a plurality of geometric transformation parameters calculated by the geometric transformation parameter calculation unit 35. There are average value processing of multiple geometric transformation parameters, histogram processing, least square estimation processing, etc. For example, in histogram processing, the geometric transformation parameter values are voted on a one-dimensional voting space, the peak indicating the maximum vote is obtained, and the maximum The value corresponding to the peak is the estimated value of the rotation parameter. In addition, a value obtained by normalizing the number of votes obtained at the maximum peak with the total number of votes can be adopted as the reliability of the estimated geometric transformation parameter.
[0096]
The geometric transformation unit 40 is a part that executes image correction processing using the geometric transformation parameters described in the first embodiment, performs image geometric transformation on the basis of the obtained geometric transformation parameters, and generates a geometric image between the model image and the input image. Correct for differences. The image subjected to geometric transformation may be either an input image or a model image. For example, geometric transformation is performed on the input image.
[0097]
Note that it is necessary to perform the window image clipping process, the window image compression process, and the window image association process again after the image geometric conversion by the geometric conversion unit 40. The window image cutout unit 32, the window image compression unit 33, and the window image association unit 34 of the geometric conversion parameter estimation unit 30 may be used, or the geometric conversion unit 40 includes the same components as those components. It may be configured. In the fifth embodiment, the geometric transformation unit 40 includes these components, a window image clipping unit 42, a window image compression unit 43, and a window image association unit 44 in the block configuration diagram for easy understanding of the processing flow. The configuration is as follows.
[0098]
The image matching unit 50 is a part that evaluates the consistency of the positional relationship between the windows in the input image and the model image based on the corresponding window information after the image correction by geometric transformation. A function for calculating the relative position vector that is the difference, a voting function for voting a fixed number of points to the corresponding points in the two-dimensional (x, y) voting space where the relative position vector is prepared It has a similarity calculation function for searching and calculating the similarity based on the number of votes voted for that peak position.
[0099]
The image collating apparatus of the present invention has the above configuration. Since the outline of the processing flow is the same as that of the flowchart shown in FIG. 1, it is omitted here as appropriate.
[0100]
(Embodiment 6)
An image collating apparatus according to Embodiment 6 will be described with reference to the drawings. The sixth embodiment is an example in which the image matching apparatus shown in the fifth embodiment is constructed with a client server configuration.
[0101]
FIG. 13 shows an overall schematic configuration of a system in which the image matching apparatus is constructed with a client-server configuration. As shown in FIG. 13, 100 is an image collation server, 101 is an image collation client, and 102 is a network. The image matching server 100 includes a registered image storage unit 20, a geometric transformation parameter estimation unit 30, a geometric transformation unit 40, an image matching unit 50, and a communication interface 70 for network connection. Each of which is basically the same as that described in Embodiment 5 with the same configuration name. A user interface unit 60 and a communication interface 70 are also provided. The network 102 only needs to be capable of communicating data, and is a network network such as a local area network or the Internet, and may be a dedicated line, a public line, a wired line, or a wireless line.
[0102]
An overview of the flow of processing by the image collating apparatus having the client server configuration is as follows. First, the user inputs a face image of a person who receives person authentication from the user interface of the image verification client 101. The image verification client 101 transmits a face image via a communication interface. It is received via the communication interface by the image verification server 100 via the transmitted face image network 102. The image collation server 100 performs the same processing as that described in the first embodiment, executes geometric transformation parameter estimation, image geometric transformation, and image collation, and returns a reply indicating whether or not person authentication is appropriate to the image collation client 101. . As described in the second embodiment, in the case of an operation that requires re-photographing of the face image, the image matching client 101 adjusts the photographing condition and the environmental condition as a clue to the geometric transformation parameter as necessary. Re-shoot the face image. At that time, guidance may be issued from the user interface.
[0103]
Also in the client server system configuration of the sixth embodiment, it is preferable that the model window data creation in the image matching server 100 is performed as a pre-process prior to the image matching process.
[0104]
(Embodiment 7)
The image collation method and the image collation apparatus of the present invention are constructed using various computers by recording and providing a program describing processing steps for realizing the configuration described above on a computer-readable recording medium. Can do. As shown in the example of the recording medium shown in FIG. 14, the recording medium on which the program having the processing steps for realizing the image collating apparatus of the present invention is recorded is a portable recording medium 201 such as a CD-ROM 202 or a flexible disk 203. In addition to the recording medium 200 in the recording apparatus on the network, the recording medium 205 such as a computer hard disk or RAM may be used, and when the program is executed, the program is loaded onto the computer 204 and the main memory Run on.
[0105]
【The invention's effect】
As described above, according to the image matching method and the image matching apparatus of the present invention, the geometric relationship between the model image and the input image is estimated, and based on the estimated value, the geometric image appropriate for the model image or the input image is estimated. The image matching process can be executed after the conversion process is performed and the geometrical deviation between the model image and the input image is absorbed. As a result, even when the face posture / scale changes between the model image and the input image, which is difficult with the conventional method, it is possible to execute the matching process with high accuracy without reducing the matching accuracy.
[0106]
Further, in the image collation method and the image collation apparatus of the present invention, a geometric transformation parameter is estimated using an initially given input image, and the parameters of the photographing state and photographing environment are changed using the geometric transformation parameter. By capturing the image again, an input image suitable for collation can be obtained, and collation accuracy can be improved.
[0107]
Furthermore, the window matching speed can be improved by improving the window matching process of reducing the search range of the nearest window in the feature space. In addition, in the relative position voting process of the corresponding window image in the image matching process, the matching accuracy can be improved by improving the voting using the number of votes determined according to the distance in the feature space.
[Brief description of the drawings]
FIG. 1 is a flowchart showing an outline of a processing process by an image matching method using the Eigenwindow method of the present invention.
FIG. 2 is a diagram showing a state in which a window image is cut out from the image of the present invention.
FIG. 3 is a diagram illustrating an example of mapping between a feature space and a window image according to the present invention.
FIG. 4 is a diagram showing an example in which window data of an input image according to the present invention is projected onto a feature space.
FIG. 5 is a diagram showing an example of a table in which combinations of corresponding window information according to the present invention are recorded.
FIG. 6 is a diagram showing an example of geometric transformation
FIG. 7 is a diagram for explaining how to select a pair of corresponding windows from corresponding window information;
FIG. 8 is a diagram showing a state in which window data of an input image according to the present invention is projected onto a feature space after correcting a geometric transformation image;
FIG. 9 is a diagram showing a voting state of a relative position vector in a two-dimensional voting space.
FIG. 10 is a flowchart showing an outline of a processing process by an image matching method according to the second embodiment of the present invention.
FIG. 11 is a diagram showing a basic concept of window matching processing according to the third embodiment of the present invention.
FIG. 12 is a schematic configuration diagram of an image matching device according to a fifth embodiment of the present invention.
FIG. 13 is a schematic configuration diagram of the entire system of an image collation apparatus constructed with a client-server configuration according to a sixth embodiment of the present invention.
FIG. 14 is a diagram showing an example of a recording medium on which a processing program according to the seventh embodiment of the present invention is recorded.
[Explanation of symbols]
10 Image input section
11 Camera
12 Image file input section
13 Shooting condition environment adjustment section
20 Registered image data storage
30 Geometric transformation parameter estimation unit
31 Geometric transformation parameter specification part
32, 42 Window image clipping unit
33, 43 Window image compression unit
34, 44 Window image association unit
35 Geometric transformation parameter calculator
36 Estimated Geometric Transformation Parameter Evaluation Unit
40 Geometric transformation unit
50 Image verification unit
100 image verification server
101 Image verification client
102 network
200 Recording medium such as hard disk at line destination
201 Portable recording media such as CD-ROM and flexible disk
202 CD-ROM
203 Flexible disk
204 Computer
205 Recording medium such as RAM / hard disk on computer

Claims

An image matching method for identifying a recognition target existing in an input image by comparing and collating an input image including a recognition target with a model image group including a target registered in advance,
A process of cutting out a plurality of characteristic local regions from the input image and the model image,
A process of projecting image information of the selected local area onto a point set on the feature space;
For each local region of one image, by searching the local region of the other image that is projected onto a nearest position in the feature space, corresponding to a local region of the local region and the model image of the input image Process to attach,
A geometric transformation parameter estimation process for estimating a geometric transformation parameter between an input image and a model image based on a positional relationship in each image between the local region of the associated one image and the local region of the other image; ,
Using the estimated geometric transformation parameter, perform geometric transformation on image information of the local area of the one image, image information of the entire image, or position information of the local area in the image, and the correspondence An image collation processing method comprising a process of evaluating and collating the consistency of both local regions.

In the geometric transformation parameter estimation processing, the geometric transformation parameter estimation processing by the associated local region is repeated by selecting a plurality of different local regions, and a plurality of estimated geometric transformation parameter groups are obtained. The image collation processing method according to claim 1, wherein a geometric transformation parameter is determined based on the basis.

In determining a geometric transformation parameter based on the geometric transformation parameter group, the reliability of the estimated geometric transformation parameter is calculated based on a degree of variation of the geometric transformation parameter group, and the geometric transformation parameter is determined based on the reliability. Item 3. The image matching processing method according to Item 2.

In the process of associating the local areas with each other, a local area that is a local area of the other image and that exists within a certain range from a position in the other image corresponding to a relative position in the local area of the one image The image collation processing method according to claim 1, wherein a region is narrowed down, a search for a local region in the feature space is executed, and a local region is associated.

In the process of evaluating and collating the consistency between the associated local areas, the relative position in the image and the distance in the feature space between the associated local areas are obtained, and voting is performed according to the distance in the feature space. The number of weights is performed, the points indicated by the relative positions in the image in the voting space are voted, and the similarity between the input image and the model image is calculated based on the degree of concentration on the voting peak obtained as a result. The image collation processing method described.

An image collation device for identifying a recognition target existing in an input image by comparing and collating an input image including a recognition target and a model image group including a pre-registered target,
An image input unit for inputting an input image;
A model image storage unit for registering and storing model images;
A local region cutout unit that cuts out a characteristic local region from the input image and the model image;
The image information of the selected local area is projected onto a point set on the feature space, and the local area of the other image projected at the nearest position in the feature space is searched for each local area of one image. A local region associating unit that correlates the local region of the input image and the local region of the model image, and between each image for the local region of the associated one image and the local region of the other image A geometric transformation parameter estimation unit comprising a geometric transformation parameter calculation unit for estimating a geometric transformation parameter between the input image and the model image based on the positional relationship in the image;
An image geometric transformation unit that performs geometric transformation on either the image information of the local region of the one image or the image information of the entire image or the position information of the local region in the image using the estimated geometric transformation parameter;
Image matching processing apparatus and an image matching unit for matching to evaluate the consistency between local regions each other using the geometric transformation and local region.

Described an image matching processing program that realizes an image matching device that identifies a recognition target existing in an input image by comparing and collating an input image including the recognition target with a model image group including a target registered in advance. A computer-readable recording medium,
A processing step of cutting out a plurality of characteristic local regions from each of the input image and the model image;
A processing step of projecting image information of the selected local region onto a set of points on the feature space;
For each local region of one image, by searching the local region of the other image that is projected onto a nearest position in the feature space, corresponding to a local region of the local region and the model image of the input image Processing steps to attach,
Geometric transformation parameter estimation processing step for estimating a geometric transformation parameter between the input image and the model image based on an intra-image positional relationship between the images for the local region of the associated one image and the local region of the other image When,
Using the estimated geometric transformation parameter, perform geometric transformation on image information of the local area of the one image, image information of the entire image, or position information of the local area in the image, and the correspondence The recording medium which recorded the image collation processing program for making the said image collation apparatus perform the process step which evaluates and collates the consistency of both local area | regions.