JP2004246478A

JP2004246478A - Image search device

Info

Publication number: JP2004246478A
Application number: JP2003033846A
Authority: JP
Inventors: Hirotsugu Kashimura; 洋次鹿志村; Sukeji Kato; 典司加藤; Hitoshi Ikeda; 仁池田
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-02-12
Filing date: 2003-02-12
Publication date: 2004-09-02
Anticipated expiration: 2023-02-12
Also published as: JP4241074B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image search device capable of reducing processing load. <P>SOLUTION: In this image search device for searching an image data part that is an object of search from object image data that are an object of processing, a control part 11 specifies search area candidates from the object image data according to a predetermined rule, extracts a plurality of partial search area candidates from the search area candidates according to a predetermined partial search area extraction rule, calculates the respective image characteristic quantity by use of the respective image data part defined by each partial search area candidate, counts the number of image characteristic quantities satisfying a predetermined characteristic quantity condition among the image characteristic quantities, determines, when the number satisfies a predetermined number condition, the corresponding search area candidate as a search area, and performs the processing of searching the object of search for the determined search area. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、写真などの画像データから、顔の部分などといった特定の画像部分を探索する画像探索装置に関する。
【０００２】
【従来の技術】
近年、写真等に含まれる特定の対象体、例えば人の顔などの部分を特定し、当該特定した部分に基づいて所定の処理を行うことが考えられている。その一例としては撮影された写真から各人の顔の部分を検出し、当該顔の部分のみを焼き付けたり、または撮像中の映像から人の顔部分を検出して顔の認証処理に供したり、といったものが考えられる。
【０００３】
従来の顔画像等、対象体を認識する装置では、画像データを所定のルールで探索領域に分割した上で、こうして得られた探索領域のすべてについて探索対象が含まれるか否かを判断する。この際、対象体の撮像状態（傾き、大きさ、照明状態など）によっては対象体の認識が困難になる場合に対応するため、撮像状態を所定の撮像状態（基準状態）に適合させる処理を行う。
【０００４】
従来、この処理では、具体的には撮像状態を変化させながら撮影した学習用画像データを用いてニューラルネットワークを学習させ、当該学習させたニューラルネットワークを利用して処理の対象となった写真での撮像状態が基準状態からどの程度ずれているかを検出し、当該ずれを補正するよう画像処理を行うことが考えられてきた。
【０００５】
なお、対象となる画像データから所望のパターンを検出する処理の例としては、特許文献１に開示される、カーネル非線形部分空間法等の方法が知られている。
【０００６】
【特許文献１】
特開２００１−９０２７４号公報
【０００７】
【発明が解決しようとする課題】
しかしながら一般に、対象画像データから所望の探索対象を検出する処理や基準状態に適合させる処理は負荷のかかる処理であり、上記従来の装置では、各探索領域についてそれぞれを行うので処理負荷が高くなる。
【０００８】
本発明は、上記実情に鑑みて為されたもので、処理の負荷を軽減できる画像探索装置を提供することをその目的の一つとする。
【０００９】
【課題を解決するための手段】
上記従来例の問題点を解決するための本発明は、処理の対象となった対象画像データ内から、探索対象の画像データ部分を探索する画像探索装置であって、所定のルールに従って前記対象画像データから探索領域候補を特定する探索領域特定手段と、所定の部分探索領域抽出ルールに従って、前記探索領域候補から複数の部分探索領域候補を抽出する部分探索領域抽出手段と、前記各部分探索領域候補で画定される画像データ部分のそれぞれを用いて、それぞれの画像特徴量を演算し、各画像特徴量のうち所定の特徴量条件を満足する画像特徴量の個数をカウントし、当該個数が所定の個数条件を満足する場合に、対応する探索領域候補を探索領域として決定する手段と、を含み、当該決定された探索領域について、探索対象を探索する処理を行うことを特徴としている。
【００１０】
ここで、前記部分探索領域抽出手段は、前記探索領域候補から一部重複を許して前記部分探索領域候補を抽出することとしてもよい。さらに前記探索処理手段は、予め学習獲得された変換データベースを参照して、前記決定された少なくとも一つの探索領域に含まれる画像データ部分の各々について適用すべき変換の方法及び変換の量を含んでなる変換条件を取得し、当該取得した変換条件に基づく変換を、対応する画像データ部分に対して少なくとも一度行う変換手段と、基準状態での探索対象の画像データを用いて学習獲得された探索データベースを参照し、前記変換後の画像データ部分のうち少なくとも一つについて探索対象が含まれているか否かを判断する探索手段と、を含むこととしてもよい。
【００１１】
また、上記従来例の問題点を解決するための本発明は、処理の対象となった対象画像データ内から、探索対象の画像データ部分を探索する画像探索方法であって、所定のルールに従って前記対象画像データから探索領域候補を特定する工程と、所定の部分探索領域抽出ルールに従って、前記探索領域候補から複数の部分探索領域候補を抽出する工程と、前記各部分探索領域候補で画定される画像データ部分のそれぞれを用いて、それぞれの画像特徴量を演算し、各画像特徴量のうち所定の特徴量条件を満足する画像特徴量の個数をカウントし、当該個数が所定の個数条件を満足する場合に、対応する探索領域候補を探索領域として決定する工程と、を含み、当該決定された探索領域について、探索対象を探索する処理を行うことを特徴としている。
【００１２】
さらに、上記従来例の問題点を解決するための本発明は、処理の対象となった対象画像データ内から、探索対象の画像データ部分を探索する画像探索プログラムであって、コンピュータに、所定のルールに従って前記対象画像データから探索領域候補を特定する手順と、所定の部分探索領域抽出ルールに従って、前記探索領域候補から複数の部分探索領域候補を抽出する手順と、前記各部分探索領域候補で画定される画像データ部分のそれぞれを用いて、それぞれの画像特徴量を演算し、各画像特徴量のうち所定の特徴量条件を満足する画像特徴量の個数をカウントし、当該個数が所定の個数条件を満足する場合に、対応する探索領域候補を探索領域として決定する手順と、当該決定された探索領域について、探索対象を探索する処理を行う手順と、を実行させることを特徴としている。
【００１３】
【発明の実施の形態】
［基本構成］
本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る画像探索装置は、図１に示すように、制御部１１と、記憶部１２と、データベース部１３と、表示部１４と、操作部１５と、外部記憶部１６とを含んで構成された、一般的なコンピュータを用いて実現される。このコンピュータは、他の製品、例えばカメラなどに組み込まれたものであっても構わない。
【００１４】
制御部１１は、記憶部１２に格納されているプログラムに従って動作するものであり、処理の対象となった対象画像データのうち、探索領域を少なくとも一つ画定する探索領域画定処理と、基準状態に変換する変換処理と、探索対象が含まれている探索領域を検出する探索処理と、探索結果を用いた所定の処理とを実行する。これらの制御部１１の具体的処理内容については、後に詳しく述べる。
【００１５】
記憶部１２は、制御部１１が実行するソフトウエアを格納している。また、この記憶部１２は、制御部１１がその処理の過程で必要とする種々のデータを保持するワークメモリとしても動作する。具体的にこの記憶部１２は、ハードディスクなどの記憶媒体、あるいは半導体メモリ、ないしこれらの組み合わせとして実現できる。
【００１６】
データベース部１３は、後に説明するように、制御部１１の第１変換処理において用いられる変換データベース１３ａ、並びに探索処理において用いられる探索データベース１３ｂを含んだデータベースである。このデータベース部１３は、具体的にはハードディスクなどの記憶媒体であり、記憶部１２がこのデータベース部１３を兼ねてもよいが、ここでは説明のため、特に分けて示している。
【００１７】
表示部１４は、例えばディスプレイ装置やプリンタ装置などであり、制御部１１から入力される指示に従い、情報の表示などを行うものである。操作部１５は、例えばキーボードやマウスなどであり、ユーザの操作を受け入れて、当該操作の内容を制御部１１に出力する。
【００１８】
外部記憶部１６は、例えばＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなど、コンピュータ可読なリムーバブルメディア（記憶媒体の一種）からプログラムやデータを読み出して制御部１１に出力し、制御部１１の処理によって記憶部１２に格納させる処理を行うものである。本実施の形態に係るプログラムは、例えばＣＤ−ＲＯＭなどの可搬的な記憶媒体に格納されて頒布でき、この外部記憶部１６を用いて記憶部１２に複写されて利用される。なお、本実施の形態に係るプログラムは、こうした記憶媒体だけでなく、ネットワーク上のサーバなどから図示しない通信部を介して記憶部１２に複写されることとしてもよい。
【００１９】
［制御部１１の処理］
ここで、制御部１１の処理の内容について具体的に説明する。本実施の形態においては、各処理の対象となる画像データ（対象画像データ）は、外部記憶部１６や通信部（図示せず）を介して外部から入力され、記憶部１２に格納される。ここで対象画像データは、一つであっても複数であっても構わない。ユーザが操作部１５を操作して、制御部１１に対し、特定の対象画像データについて探索対象を探索する処理を行うべき旨の指示（処理開始の指示）を行うと、制御部１１は、図２に示す処理を開始する。
【００２０】
制御部１１は、対象画像データを順次縮小変換しながら、各縮小変換された対象画像データについて探索領域を画定し、各探索領域について基準状態への変換処理と、探索処理とを行い、各縮小変換された対象画像データ上のどの部分に探索対象が含まれているかを表すマップデータを生成する。
【００２１】
具体的に制御部１１は、まず縮小率Ｓを最小縮小率（例えば１倍、つまり縮小せず）に設定し（Ｓ１）、対象画像データを縮小率Ｓで縮小する（Ｓ２）。そして縮小後の対象画像データのサイズに等しいサイズのマップデータの領域を記憶部１２上に確保し、当該領域の値を「偽（ｆａｌｓｅ）」に設定して、マップデータの初期化を実行する（Ｓ３）。例えば縮小後の対象画像データが１０００×１０００ピクセルの画像データであれば、１０００×１０００ビット分の領域を確保し、各ビット値を「０」に初期設定する。
【００２２】
次に制御部１１は、縮小後の対象画像データについて、少なくとも一つの探索領域を決定する処理を行う（Ｓ４）。この探索領域の決定処理については後に詳しく述べる。そして探索領域の一つを選択し（Ｓ５）、当該探索領域について変換処理を実行する（Ｓ６）。この変換処理についても後に詳しく述べる。
【００２３】
制御部１１は、変換処理後の探索領域に含まれる画像データ部分について、探索対象が含まれているか否かを判定する処理（探索処理）を実行し（Ｓ７）、含まれていると判断されるときには（Ｙｅｓのときには）、当該変換処理後の探索領域に相当する、マップデータ上の領域の値を「真（ｔｒｕｅ）」に設定する（Ｓ８）。そしてさらに選択していない探索領域があるか否かを調べ（Ｓ９）、選択していない探索領域があれば（Ｙｅｓであれば）、処理Ｓ５に戻り、当該選択していない探索領域の一つを選択して処理を続ける（Ａ）。
【００２４】
一方、処理Ｓ７における探索処理の結果、探索対象が含まれていないと判定されるときには（Ｎｏのときには）、そのまま処理Ｓ９に移行する。また、処理Ｓ９において、選択していない探索領域がなければ、つまり、すべての探索領域について変換処理と探索処理とを完了したならば（Ｎｏならば）、現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っているか否かを調べ（Ｓ１０）、上回っていなければ（Ｎｏならば）、縮小率Ｓを大きくするように調整して（Ｓ１１）、処理Ｓ２に戻って処理を続ける（Ｂ）。ここで、縮小率Ｓを大きく調整する処理Ｓ１１は、例えば縮小率Ｓを所定比で高めるような処理としてもよいし、縮小率Ｓである倍率に対し、所定乗率ΔＳを乗じて、Ｓ＝Ｓ×ΔＳとして新たな縮小率Ｓを定めてもよい。
【００２５】
また、処理Ｓ１０において現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っていれば（Ｙｅｓならば）、各縮小率での対象画像データに対応するマップデータに基づき、元の（縮小前の）対象画像データ内で、探索対象が含まれている領域を画定して（Ｓ１２）、処理を終了する。
【００２６】
［探索領域決定処理］
次に、制御部１１が探索領域を決定する処理（探索領域決定処理）について説明する。制御部１１は、図３に示すように、縮小後の対象画像データの左上隅の座標（例えばＸ＝０，Ｙ＝０）を開始点として、予め定められたサイズＬ×Ｌ′（例えば２７×２７）の矩形領域を探索領域候補として画定する（Ｓ２０）。なお、以下では説明のため、この探索領域候補上の座標系を次のように設定する。すなわち左上端をη＝０、ξ＝０として、右方向をη軸正の方向、下方向をξ軸正の方向として説明する。
【００２７】
制御部１１は、記憶部１２上にカウンタの記憶領域を確保し、「０」にリセットする（Ｓ２１）。また制御部１１は部分探索領域の抽出開始点の座標（η，ξ）を（０，０）に初期設定しておく。次に制御部１１は、探索領域候補から（η，ξ）を左上隅としたｎ×ｎ′（ｎ＜Ｌ、ｎ′＜Ｌ′、例えば９×９）の部分探索領域候補を抽出し（Ｓ２２）、この部分探索領域候補によって確定される画像データ部分について探索対象の性状に応じて予め定められている特徴量、例えばエントロピーｅを算出する（Ｓ２３）。
【００２８】
そしてこの算出したエントロピーｅが、所定のしきい値ｅｔｈ１より大であるか否かを判断し（Ｓ２４）、大であれば（Ｙｅｓならば）カウンタをインクリメントして（Ｓ２５）、部分探索領域候補をＸ軸方向に予め定めた移動量ｄ（例えば３画素、すなわち部分探索領域候補同士の部分的重複を許してもよい）だけ移動する（つまり、η＝η＋ｄ；Ｓ２６）。制御部１１は、新たな部分探索領域候補が探索領域候補の右端から逸脱したか、すなわちη＋ｎ＞Ｌとなったか否かを調べ（Ｓ２７）、逸脱していれば（Ｙｅｓならば）、ηを「０」とし、ξ＝ξ＋ｄとして、新たな（η，ξ）を左上端とする部分探索領域候補を抽出する（Ｓ２８）。制御部１１は、さらに新たな部分探索領域候補が探索領域候補の下端を逸脱したか、すなわちξ＋ｎ′＞Ｌ′となったか否かを調べ（Ｓ２９）、逸脱していれば（Ｙｅｓならば）、カウンタの値が予め定めたしきい値ＴＨ以上であるか否かを調べる（Ｓ３０）。そしてカウンタの値がしきい値ＴＨ以上であれば（Ｙｅｓならば）、処理している探索領域候補を探索領域として決定し（Ｓ３１）、次の探索領域候補を画定すべきか否かを調べ（対象画像データ上に画定可能な探索領域候補がまだあるかを調べ；Ｓ３２）、画定すべき、すなわち、まだ画定可能な探索領域候補があれば（Ｙｅｓならば）、処理Ｓ２０に戻って、開始点の座標をＸ方向にΔＸだけ移動し、Ｘ方向の端に到達したら、Ｘを「０」とし、Ｙ方向にΔＹだけ移動して、次のＬ×Ｌ′の探索領域を画定して処理を続ける。
【００２９】
またこの処理Ｓ３２において、画定すべき、すなわち、まだ画定可能な探索領域候補がなければ（Ｎｏならば）、処理を終了する。
【００３０】
さらに、処理Ｓ２７及びＳ２９において、それぞれ逸脱していないときは（それぞれにおいてＮｏであれば）、処理Ｓ２２に戻って処理を続ける。また、処理Ｓ３０においてカウンタの値が予め定めたしきい値ＴＨ未満であれば（Ｎｏならば）、処理している探索領域候補を探索領域とせずに、そのまま処理Ｓ３２に移行する。さらに処理Ｓ２４において、エントロピーｅが所定のしきい値ｅｔｈ１より大でなければ（Ｎｏならば）、処理Ｓ２６に移行する。
【００３１】
またここでは、特徴量条件として、エントロピーｅが所定のしきい値ｅｔｈ１より大であるか否かとの条件を用いているが、エントロピーｅが所定の２つのしきい値ｅｔｈ１と、ｅｔｈ２とで挟まれる範囲内であるか否かを特徴量条件としてもよい。
【００３２】
ここで制御部１１がエントロピーｅを算出する処理Ｓ２３の具体的内容について説明する。制御部１１は、処理の対象となった部分探索領域候補によって確定される画像データ部分に含まれる画素の輝度のヒストグラムを生成し、さらにこのヒストグラムを例えば線形近似等によって補完し、補完後のヒストグラムの総和をエントロピーｅとして算出する。このようにヒストグラムを補完するのは、エントロピーの値は本来連続量であるのに対して、ヒストグラムは離散量として演算されてしまうからである。そこで演算した輝度ヒストグラムを連続的関数で近似的に表現し、この近似的な連続的関数の積分としての総和をヒストグラムとする。
【００３３】
なお、ヒストグラムの補完は、線形近似でなくとも、ヒストグラムの値を所定の内挿法で内挿して二次以上の近似を行ってもよい。
【００３４】
このようにすると、単なる総和においてはヒストグラムのピーク位置同士の相対的関係が配慮されなくなるのに対して、近似関数の積分とすることで、ピーク位置同士の相対的距離等が積分結果に寄与するようになるので、エントロピープロファイルが高精度化される。
【００３５】
なお、ここでは開始点の移動量を幅方向、高さ方向にそれぞれΔＸ，ΔＹとしているが、これら移動量は対象画像データの縮小率Ｓに応じて、１倍のときのΔＸ，ΔＹに対してΔＸ／Ｓ，ΔＹ／Ｓとしてもよい。
【００３６】
本実施の形態においては、このように探索領域の候補に含まれる部分探索領域候補のそれぞれについて、各部分探索領域候補で確定される画像データ部分から演算されるエントロピーなどの特徴量が所定の条件を満たしているものがいくつあるかによって、当該探索領域候補を探索領域として決定するか否かが判断される。
【００３７】
これは例えば顔の部分の画像であれば、目や鼻、輪郭などの部分ではエントロピーが比較的高くなるのに対して、顔の内部、額や頬の部分などは比較的エントロピーが低くなるので、探索領域候補全体で特徴量を捉えると、平均的にエントロピーが低下してしまい、顔の部分を捉えにくいことに配慮したものである。つまり、領域内に特徴量が高くなる部分とそうでない部分とがある場合に、当該特徴量が領域全体で平均化され、当該特徴量による識別が困難になることを防止したのである。
【００３８】
このように本実施の形態によると、変換処理、探索処理の前に探索対象が含まれている可能性があるか否かを的確に判断して、可能性のない部分についての変換処理や探索処理を行わないようにするので、処理負荷が軽減される。
【００３９】
［変換処理］
次に変換処理について説明する。本実施の形態において特徴的なことの一つは、この変換処理が段階的に行われ、各段階では一つの変換自由度に対応する変換が行われることである。本実施の形態の制御部１１は、探索領域に含まれている画像データ部分に基づいて所定の特徴量ベクトル情報を演算する。ここで特徴量ベクトル情報は、探索対象の性状に合わせて選択された、複数の特徴量要素を含んでなるベクトル量である。
【００４０】
本実施の形態では、制御部１１は、この特徴量ベクトル情報と、変換データベース１３ａに格納されている特徴量ベクトル情報とを用いた、カーネル非線形部分空間法によって変換を特定することとして説明する。
【００４１】
［変換データベース１３ａの内容］
このカーネル非線形部分空間法は、データを何らかのカテゴリに分類する方法として広く知られているので、詳しい説明を省略するが、その概要を述べれば、特徴量要素を基底として張られる空間Ｆにおいて、当該空間Ｆに含まれる複数の部分空間Ωのそれぞれをデータの分類先であるカテゴリとして認識し、分類しようとするデータに基づいて作成される空間Ｆ内の特徴量ベクトル情報（例えばΦとする）を各部分空間Ωに射影し（射影の結果を例えばφとする）、射影前の特徴量ベクトル情報Φと、射影後の特徴量ベクトル情報φとの距離Ｅが最も小さくなる部分空間Ω（仮に最近接部分空間と呼ぶ）を検出し、分類しようとするデータは、その部分空間Ωによって表されるカテゴリに属すると判断する方法である。
【００４２】
そこで学習段階では、同一のカテゴリに属するべき学習用の例示データ（学習サンプル）に対応する特徴量ベクトル情報に基づく最近接部分空間Ωが同一となるよう、非線形写像（空間Ｆへの写像、すなわちカーネル関数に含まれるパラメータ等）と、各カテゴリに対応する部分空間Ω間を隔てる超平面との少なくとも一方を調整することとなる。
【００４３】
本実施の形態においては、探索対象を基準状態に変換する方法（変換の種類及び量）を決定するために、この変換データベース１３ａが形成される。つまり、基準状態にあるか否かが不明な画像データに対して、行うべき変換の種類及び変換の量（カテゴリ）を決定できるように変換データベース１３ａが学習獲得されている。本実施の形態では、画像の回転、平行移動、サイズ変更という、画像に対して行うべき変換の種類（自由度）ごとに、変換データベース１３ａを作成している。変換の各自由度に対応する変換データベース１３ａは、対応する変換の変換量をカテゴリとして学習獲得したものである。
【００４４】
この学習獲得のため、本実施の形態の変換データベース１３ａの学習過程では、学習サンプルを次のように生成する。すなわち、所定の基準状態での探索対象である画像データの例を複数用意し、各画像データの例について、変換の自由度ごとに、それぞれの自由度について、互いに異なる変換量での変換が行われた複数の変換画像データを生成する。こうして自由度ごとに生成された変換画像データを、各自由度ごとの学習サンプルとする。具体的に顔を探索対象とする場合、所定の基準状態（所定の撮影条件・姿勢）にある顔の画像データを例として複数用意し、各画像データについて、変換の自由度として、例えば回転・平行移動・サイズ変更等という各自由度ごとに、回転であれば−１８０度から１８０度までの範囲で５度ずつ等の角度で回転させた変換画像データを回転の自由度に対する学習サンプルとする。また、平行移動であれば、縦横にそれぞれ５ピクセルずつ移動させた複数の変換画像データを平行移動の自由度に対する学習サンプルとする。なお、これらの学習サンプルは、移動等の変換自由度を含むために、基準状態よりも広い領域の画像データのうちから基準状態の面積を５ピクセルずつ移動させながら取り出すことで生成する。
【００４５】
こうして複数の画像データ例のそれぞれについて、さらに自由度ごとにそれぞれ複数の変換が施された複数の画像データを生成し、各画像データにどのような変換を行ったかを表す情報（変換量の大きさ等）を関連づける。
【００４６】
なお、ここでは互いに異なる変換量の変換を施した画像データを得るために、変換量を所定のステップ（例えば回転で言えば５度）ずつ変化させながらそれぞれ変換を行った画像データを学習サンプルに含めるようにしたが、所定のステップずつ変化させながらでなくとも、変換量を乱数によって決定しながら変換を行って、それぞれを学習サンプルに含めるようにしてもよい。
【００４７】
次に、各自由度ごとの学習サンプルを用いて、各自由度に対応する変換データベース１３ａを学習させる。
【００４８】
［変換処理の動作］
制御部１１は、こうして学習された各変換データベース１３ａを用いて、探索領域画定処理によって画定された探索領域の各々について次のように変換処理を行う。すなわち、処理の対象となった探索領域に含まれている画像データ部分（例えば画素値の列としてベクトル値と同視し得る）を、空間Ｆ内の特徴量ベクトル情報（各変換データベース１３ａごと、つまり変換の各自由度ごとに定義されている特徴量の組）に写像し、さらにその写像を各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する。また制御部１１は、距離Ｅの二乗値Ｌを演算し、これを誤差として記憶部１２に保持する。
【００４９】
ここで変換量は、各変換データベース１３ａに基づき自由度ごとに決定されるが、制御部１１は、各自由度に対応する変換量のうち一つを所定の条件（例えば各変換量に対応する距離Ｅが最小となるもの等の条件）に基づいて選択し、選択した自由度に対応する変換を、選択した変換量の分だけ変換する。
【００５０】
つまり、探索領域に含まれている画像データ部分からは、各自由度に対応する各変換データベース１３ａに学習獲得された情報によって、例えば回転の自由度に対しては１０度の回転変換により基準状態に近づき、その誤差がＬｒであり、平行移動の自由度に対しては左へ５ピクセルの変換で基準状態に近づき、その誤差がＬｐといった情報が得られるので、この中から、誤差が最小となる自由度の変換を選択する。例えば上述の例の場合、Ｌｒ＜Ｌｐならば１０度の回転変換を探索領域に施して、新たな探索領域を画定する。そして、この新たな探索領域に含まれる画像データ部分をさらに空間Ｆ内の特徴量ベクトル情報に写像し、その写像をさらに各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する処理から繰り返す。
【００５１】
また、各自由度に対応する変換量がいずれも「０」（つまり無変換）を表すものとなっている場合は、その段階で処理を終了し、さらに未処理の探索領域があれば、当該未処理の探索領域のいずれかを処理の対象として変換処理を行う。
【００５２】
なお、ここでは対象画像データのうち、画定された探索領域に含まれる画像データ部分をそのまま用いているが、当該画像データ部分の解像度を低減する処理を行って、粗視データとし、当該粗視データを用いて変換処理を実行してもよい。この場合は、当該粗視データに対応する学習サンプルを用いて、各変換データベース１３ａを学習獲得させておく。
【００５３】
また、制御部１１は、特徴量ベクトル情報の演算、部分空間への写像、距離の評価、誤差の評価といった処理を各自由度ごとに順次行うのではなく、並列して行ってもよい。
【００５４】
さらにここではカーネル非線形部分空間法を用いる場合を例として説明したが、データの分類と、分類時の誤差評価が可能であれば例えばオートエンコーダ等、他の方法を用いても構わない。
【００５５】
［探索処理］
次に探索処理について説明する。この探索処理では変換処理を完了した探索領域の各々にそれぞれ含まれる画像データ部分について、探索データベース１３ｂを用いて、探索対象が含まれているか否かを判定する。具体的な探索処理の例としては、特開２００２−３２９１８８号公報に開示された方法などがある。次にその概要を説明する。
【００５６】
［探索データベースの学習課程］
この探索データベース１３ｂは、基準状態にある探索対象の画像データの例を学習サンプルとして用い、ニューラルネットワークを学習させて形成する。すなわち、制御部１１は、複数の学習サンプルの入力を受けて、その各々について、探索対象の性状に合わせて予め選択された特徴量のセット（特徴量ベクトル）を演算し、学習用データを生成する。次に、この学習用データを用いて、記憶部１２に格納されたＭ×Ｍ′の格子空間上に、ＳＯＭ（自己組織化マップ）によって格子空間マップを形成する。つまり、制御部１１は、入力された学習用データである特徴量ベクトルと、各格子ごとに割り当てられた重みベクトルとの距離を所定の測度（例えばユークリッド測度）で演算し、この距離が最小となる格子（最整合ノード）ｃを検出する。そしてこの最整合ノード近傍の複数の格子について、その重みベクトルを当該入力された特徴量ベクトルを用いて更新する。この処理の繰り返しにより、記憶部１２上に格子空間マップが形成され、互いに類似する特徴量ベクトルに対する最整合ノードが連続的な領域を形成するようになる。つまり、この格子空間には、多次元の入力信号である特徴量ベクトルから２次元のマップへの非線形射影が位相を保持したまま形成され、重みの更新により、データの特徴部分が組織化され、その学習成果として類似のデータに反応する格子が近接して存在しているようになる。
【００５７】
各学習データに基づく学習が完了すると、次に制御部１１は、格子空間マップの各格子をカテゴリに分類する。この分類は、例えば各格子間の距離（各格子に関連づけられた重みベクトル間の距離）に基づいて行うことができ、探索対象に似た画像データに反応する格子群のカテゴリ（探索対象カテゴリ）と、そうでない格子群のカテゴリ（非探索対象カテゴリ）とに分類される。
【００５８】
［探索処理の動作］
制御部１１は、対象画像データと同じサイズのマップデータを記憶する領域を記憶部１２に確保し、当該領域の値を「偽（ｆａｌｓｅ）」に初期化する。
【００５９】
制御部１１は、学習獲得した探索データベース１３ｂを用い、変換処理を完了した探索領域のに画像データ部分に基づいて所定の特徴量ベクトルを演算する。そして当該演算した特徴量ベクトルと探索データベース１３ｂ内の各格子に関連づけられた重みベクトルとの距離を求め、特徴量ベクトルとの距離が最小となる格子（最整合ノード）を特定し、特定した格子が探索対象カテゴリに属していれば、探索領域に探索対象が含まれていると判断し、特定した格子が非探索対象カテゴリに属していれば、探索領域には探索対象が含まれていないと判断する。
【００６０】
［制御部１１の動作］
本実施の形態の制御部１１は、以上のように、探索の対象となった対象画像データを順次縮小しながら、縮小後のそれぞれの対象画像データから探索処理を行う領域を取り出し、当該領域内の画像データが基準状態となるよう変換をした上で、探索対象が当該変換後の探索領域内の画像データに含まれているか否かを判断する。すなわち、制御部１１は探索処理の対象となる画像データを基準状態とする処理を行うので、変換前の探索領域の位置が多少ずれていても構わない。また、縮小率が基準状態から多少ずれていたとしても問題とならない。
【００６１】
そこで従来であれば、０．８倍ずつ縮小した多段階の縮小画像データを生成し、しかも探索領域を１画素ずつずらしながら取り出すようにしていたのに対し、本実施の形態のものでは０．５倍ずつの縮小で構わないし、探索領域を画定する際に、所定の条件を満足する領域を自律的に取り出す場合であっても、ΔＸやΔＹを６画素等とすることができる。これにより、探索処理の対象となるパターン数を大幅に低減でき、探索の対象体を写真などから探索する処理の負荷を軽減できる。
【００６２】
制御部１１は、探索対象が含まれていると判断された領域を表すマップデータを探索結果情報として生成するが、このマップデータは各縮小率で縮小された後の対象画像データのそれぞれに対応して複数生成される。そこで、これら複数のマップデータ（それぞれ縮小後の対象画像データのサイズとなっている）を統合的に用いて探索対象が含まれている領域を決定する。
【００６３】
例えば、各マップデータを、それぞれの縮小率に応じた拡大率で拡大し、元の対象画像データのサイズに揃えて比較し、すべてのマップデータで共通して「真」となっている領域（どの縮小率の対象画像データに基づいても、探索対象が含まれていると判断された領域）に探索対象が含まれていると判断することとしてもよい。また、いずれか一つのマップデータで「真」となっている領域を探索対象が含まれている領域と判断するようにしてもよい。
【００６４】
［その他の変形例］
ここまでの説明では、変換処理において行われる変換は、２次元的な回転、平行移動、拡大縮小（探索領域を拡大縮小し、その内部の画像データ部分を元の（拡大縮小前の）探索領域のサイズに変換して扱えばよい）であるとして説明したが、これ以外にも例えば人の顔であれば、姿勢（うつむき加減や振り向き加減）に影響される３次元的な回転を含んでもよい。この場合、探索対象の平均的３次元モデルを想定し、当該平均的３次元モデルへ画像データ部分を投射したものを用いて３次元的回転の変換を実現することができる。
【００６５】
また、探索処理においては探索対象として、例えば人の顔であっても、さらに細かくカテゴリを分けて、年齢や性別、口を開けているか否かなどの条件を含めてもよい。
【００６６】
さらに、図２のフローチャート図においては各縮小率における処理を制御部１１が順次行うものとしていたが、各縮小率における処理は互いに独立しているので、制御部１１は、これらの各縮小率における処理を並列して行ってもよい。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る画像探索装置の構成ブロック図である。
【図２】制御部１１の処理の一例を表すフローチャート図である。
【図３】制御部１１による探索領域の決定の処理の例を表すフローチャート図である。
【符号の説明】
１１制御部、１２記憶部、１３データベース部、１４表示部、１５操作部、１６外部記憶部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image search device that searches for a specific image portion such as a face portion from image data such as a photograph.
[0002]
[Prior art]
In recent years, it has been considered that a specific target object included in a photograph or the like, for example, a portion such as a human face is specified, and a predetermined process is performed based on the specified portion. As an example, a face portion of each person is detected from a photographed photograph, and only the face portion is burned, or a human face portion is detected from a video being captured and subjected to face authentication processing, And so on.
[0003]
In a conventional apparatus for recognizing an object such as a face image, image data is divided into search areas according to a predetermined rule, and it is determined whether or not the search object is included in all of the search areas thus obtained. At this time, in order to cope with a case where it is difficult to recognize the target object depending on the imaging state (tilt, size, illumination state, and the like) of the target object, a process of adapting the imaging state to a predetermined imaging state (reference state) is performed. Do.
[0004]
Conventionally, in this processing, specifically, a neural network is learned using learning image data taken while changing the imaging state, and the processing is performed using the learned neural network. It has been considered to detect how much the imaging state deviates from the reference state and perform image processing to correct the deviation.
[0005]
As an example of a process of detecting a desired pattern from target image data, a method such as a kernel nonlinear subspace method disclosed in Patent Document 1 is known.
[0006]
[Patent Document 1]
JP 2001-90274 A
[Problems to be solved by the invention]
However, generally, processing for detecting a desired search target from target image data and processing for adapting to a reference state is a load-intensive processing. In the above-described conventional apparatus, each processing is performed for each search area, and thus the processing load increases.
[0008]
The present invention has been made in view of the above circumstances, and has as its object to provide an image search device capable of reducing the processing load.
[0009]
[Means for Solving the Problems]
The present invention for solving the problem of the above-mentioned conventional example is an image search apparatus for searching an image data portion of a search target from within target image data to be processed, wherein the target image is searched for in accordance with a predetermined rule. Search area identification means for identifying search area candidates from data; partial search area extraction means for extracting a plurality of partial search area candidates from the search area candidates according to a predetermined partial search area extraction rule; Using each of the image data portions defined in the above, each image feature amount is calculated, the number of image feature amounts satisfying a predetermined feature amount condition among each image feature amount is counted, and the number is determined to be a predetermined number. Means for determining a corresponding search area candidate as a search area when the number condition is satisfied, and searching for a search target in the determined search area. It is characterized by performing.
[0010]
Here, the partial search area extracting means may extract the partial search area candidate from the search area candidate while allowing partial overlap. Further, the search processing means includes a conversion method and a conversion amount to be applied to each of the image data portions included in the determined at least one search area with reference to a conversion database acquired in advance by learning. A conversion means for acquiring a conversion condition, and performing a conversion based on the acquired conversion condition at least once for the corresponding image data portion, and a search database learned and acquired using the image data to be searched in the reference state. And searching means for determining whether a search target is included in at least one of the converted image data portions.
[0011]
Further, the present invention for solving the problems of the above-described conventional example is an image search method for searching an image data portion to be searched from within target image data to be processed, and the method according to a predetermined rule. Identifying a search area candidate from the target image data; extracting a plurality of partial search area candidates from the search area candidate in accordance with a predetermined partial search area extraction rule; and an image defined by the partial search area candidates Using each of the data portions, the respective image feature amounts are calculated, the number of image feature amounts that satisfy a predetermined feature amount condition among the image feature amounts is counted, and the number satisfies the predetermined number condition. Determining a corresponding search area candidate as a search area, and performing a process of searching for a search target with respect to the determined search area. That.
[0012]
Further, the present invention for solving the problem of the above-mentioned conventional example is an image search program for searching an image data portion to be searched from within target image data to be processed. A step of specifying a search area candidate from the target image data according to a rule, a step of extracting a plurality of partial search area candidates from the search area candidate according to a predetermined partial search area extraction rule, and a step of defining each of the partial search area candidates Using each of the image data portions to be calculated, the respective image feature amounts are calculated, the number of image feature amounts satisfying a predetermined feature amount condition among the image feature amounts is counted, and the number is determined by a predetermined number condition. Is satisfied, a procedure of determining a corresponding search area candidate as a search area, and a process of searching for a search target in the determined search area are performed. It is characterized in that to execute the order, the.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
[Basic configuration]
An embodiment of the present invention will be described with reference to the drawings. As shown in FIG. 1, the image search device according to the embodiment of the present invention includes a control unit 11, a storage unit 12, a database unit 13, a display unit 14, an operation unit 15, an external storage unit 16, , And is realized using a general computer configured. This computer may be incorporated in another product, for example, a camera.
[0014]
The control unit 11 operates in accordance with a program stored in the storage unit 12, and includes a search area defining process for defining at least one search area in the target image data to be processed, and a reference state. A conversion process for conversion, a search process for detecting a search area including a search target, and a predetermined process using a search result are executed. The specific processing contents of these control units 11 will be described later in detail.
[0015]
The storage unit 12 stores software executed by the control unit 11. The storage unit 12 also operates as a work memory for holding various data required by the control unit 11 during the processing. Specifically, the storage unit 12 can be realized as a storage medium such as a hard disk, a semiconductor memory, or a combination thereof.
[0016]
The database unit 13 is a database including a conversion database 13a used in the first conversion process of the control unit 11 and a search database 13b used in the search process, as described later. The database unit 13 is specifically a storage medium such as a hard disk, and the storage unit 12 may also serve as the database unit 13, but is separately illustrated here for the sake of explanation.
[0017]
The display unit 14 is, for example, a display device or a printer device, and displays information and the like in accordance with an instruction input from the control unit 11. The operation unit 15 is, for example, a keyboard, a mouse, or the like, and accepts a user operation and outputs the content of the operation to the control unit 11.
[0018]
The external storage unit 16 reads a program or data from a computer-readable removable medium (a type of storage medium) such as a CD-ROM or a DVD-ROM, and outputs the read program or data to the control unit 11. Is performed. The program according to the present embodiment can be distributed by being stored in a portable storage medium such as a CD-ROM, and is copied to the storage unit 12 using the external storage unit 16 and used. The program according to the present embodiment may be copied not only from such a storage medium but also from a server on a network to the storage unit 12 via a communication unit (not shown).
[0019]
[Process of Control Unit 11]
Here, the content of the process of the control unit 11 will be specifically described. In the present embodiment, image data (target image data) to be subjected to each processing is externally input via the external storage unit 16 or a communication unit (not shown) and stored in the storage unit 12. Here, the number of target image data may be one or plural. When the user operates the operation unit 15 to instruct the control unit 11 to perform a process of searching for a search target for specific target image data (instruction to start processing), the control unit 11 The process shown in FIG. 2 is started.
[0020]
The control unit 11 defines a search area for each reduced-converted target image data while sequentially reducing and converting the target image data, performs a conversion process to a reference state and a search process for each search region, and performs each reduction process. Map data is generated that indicates which part of the converted target image data contains the search target.
[0021]
Specifically, the control unit 11 first sets the reduction rate S to a minimum reduction rate (for example, 1 time, that is, does not reduce) (S1), and reduces the target image data at the reduction rate S (S2). Then, an area of the map data having a size equal to the size of the reduced target image data is secured in the storage unit 12, the value of the area is set to “false”, and the map data is initialized. (S3). For example, if the reduced target image data is image data of 1000 × 1000 pixels, an area for 1000 × 1000 bits is secured and each bit value is initialized to “0”.
[0022]
Next, the control unit 11 performs a process of determining at least one search area for the reduced target image data (S4). This search area determination processing will be described later in detail. Then, one of the search areas is selected (S5), and a conversion process is performed on the search area (S6). This conversion processing will also be described later in detail.
[0023]
The control unit 11 executes a process (search process) of determining whether or not a search target is included in the image data portion included in the search area after the conversion process (S7), and is determined to be included. If (Yes), the value of the area on the map data corresponding to the search area after the conversion processing is set to “true” (S8). Then, it is checked whether or not there is a search area that has not been selected (S9). If there is a search area that has not been selected (if Yes), the process returns to step S5 and one of the search areas that has not been selected is selected. To continue the process (A).
[0024]
On the other hand, when it is determined that the search target is not included as a result of the search process in the process S7 (No), the process directly proceeds to the process S9. In addition, if there is no unselected search area in the processing S9, that is, if the conversion processing and the search processing have been completed for all the search areas (if No), the currently set reduction rate S is set in advance. (S10), and if not (No), adjust so as to increase the reduction ratio S (S11), and return to the process S2 to process. (B). Here, the process S11 of adjusting the reduction rate S to a large value may be, for example, a process of increasing the reduction rate S at a predetermined ratio, or by multiplying the magnification which is the reduction rate S by a predetermined power factor ΔS to obtain S = A new reduction ratio S may be determined as S × ΔS.
[0025]
If the currently set reduction ratio S in the process S10 exceeds the predetermined maximum reduction ratio (Yes), the original reduction ratio is determined based on the map data corresponding to the target image data at each reduction ratio. A region including the search target is defined in the target image data (before reduction) (S12), and the process ends.
[0026]
[Search area determination processing]
Next, a process in which the control unit 11 determines a search area (search area determination processing) will be described. As illustrated in FIG. 3, the control unit 11 starts from the coordinates (for example, X = 0, Y = 0) of the upper left corner of the reduced target image data and determines a predetermined size L × L ′ (for example, 27 × 27). The rectangular area of × 27) is defined as a search area candidate (S20). In the following, for the sake of explanation, the coordinate system on this search area candidate is set as follows. That is, the upper left end is assumed to be η = 0, ξ = 0, the rightward direction is the positive η-axis direction, and the downward direction is the positive ξ-axis direction.
[0027]
The control unit 11 secures a storage area for the counter in the storage unit 12, and resets the counter to “0” (S21). Further, the control unit 11 initializes the coordinates (η, ξ) of the extraction start point of the partial search area to (0, 0). Next, the control unit 11 extracts n × n ′ (n <L, n ′ <L ′, for example, 9 × 9) partial search area candidates with (η, ξ) at the upper left corner from the search area candidates ( S22) For the image data portion determined by the partial search area candidate, a feature amount, for example, entropy e, which is predetermined according to the property of the search target is calculated (S23).
[0028]
Then, it is determined whether or not the calculated entropy e is larger than a predetermined threshold value eth1 (S24). If the calculated entropy e is larger (Yes), the counter is incremented (S25), and the partial search area candidate is determined. Is moved in the X-axis direction by a predetermined movement amount d (for example, three pixels, that is, partial overlap between partial search area candidates may be allowed) (that is, η = η + d; S26). The control unit 11 checks whether the new partial search area candidate has deviated from the right end of the search area candidate, that is, whether η + n> L has been satisfied (S27). With “0” and ξ = ξ + d, a partial search area candidate having a new (η, ξ) at the upper left corner is extracted (S28). The control unit 11 further checks whether or not the new partial search area candidate has deviated from the lower end of the search area candidate, that is, whether or not ξ + n ′> L ′ (S29). It is checked whether the value of the counter is equal to or greater than a predetermined threshold value TH (S30). If the value of the counter is equal to or larger than the threshold value TH (if Yes), the search area candidate being processed is determined as the search area (S31), and it is checked whether the next search area candidate should be defined (S31). It is checked whether there is still a definable search area candidate on the target image data; S32). If the search area candidate is to be defined, that is, if there is still a definable search area candidate (Yes), the process returns to step S20 to start. The coordinates of the point are moved by ΔX in the X direction, and when reaching the end in the X direction, X is set to “0” and moved by ΔY in the Y direction to define and process the next L × L ′ search area. Continue.
[0029]
In this process S32, if there is no search area candidate to be defined, that is, if there is still no definable search area candidate (No), the process ends.
[0030]
Further, in steps S27 and S29, if the respective values do not deviate from each other (if No in each case), the process returns to step S22 to continue the processing. If the value of the counter is smaller than the predetermined threshold value TH in step S30 (if No), the process directly proceeds to step S32 without setting the search area candidate being processed as the search area. Further, in the process S24, if the entropy e is not larger than the predetermined threshold value eth1 (if No), the process shifts to the process S26.
[0031]
Further, here, a condition of whether or not entropy e is greater than a predetermined threshold value eth1 is used as the feature amount condition, but entropy e is sandwiched between two predetermined threshold values eth1 and eth2. It may be set as a feature amount condition whether or not it is within a range to be performed.
[0032]
Here, the specific contents of the processing S23 in which the control unit 11 calculates the entropy e will be described. The control unit 11 generates a histogram of the luminance of the pixels included in the image data portion determined by the partial search area candidate subjected to the processing, further complements this histogram by, for example, linear approximation, and the like. Is calculated as entropy e. The reason why the histogram is complemented in this way is that the histogram is calculated as a discrete amount while the entropy value is originally a continuous amount. Therefore, the calculated luminance histogram is approximately represented by a continuous function, and the sum of the approximate continuous function as an integral is defined as a histogram.
[0033]
Note that the histogram may be complemented not by linear approximation, but by quadratic or higher approximation by interpolating the values of the histogram by a predetermined interpolation method.
[0034]
In this way, the relative relationship between the peak positions of the histogram is not considered in the mere summation, but the relative distance between the peak positions contributes to the integration result by integrating the approximate functions. As a result, the entropy profile is made highly accurate.
[0035]
Here, the moving amount of the start point is set to ΔX and ΔY in the width direction and the height direction, respectively. However, these moving amounts are different from ΔX and ΔY at 1 time in accordance with the reduction ratio S of the target image data. ΔX / S and ΔY / S.
[0036]
In the present embodiment, for each of the partial search area candidates included in the search area candidates, a feature amount such as entropy calculated from an image data portion determined by each partial search area candidate is determined by a predetermined condition. It is determined whether or not to determine the search area candidate as a search area depending on how many satisfy the condition.
[0037]
For example, in the case of a face image, the entropy is relatively high in the eyes, nose, contour, etc., whereas the entropy is relatively low in the face, forehead, cheeks, etc. When the feature amount is captured in the entire search area candidate, the entropy is reduced on average, and it is considered that it is difficult to capture the face part. That is, when there is a portion where the feature amount is high and a portion where the feature amount is not in the region, the feature amount is averaged over the entire region, thereby preventing the identification by the feature amount from becoming difficult.
[0038]
As described above, according to the present embodiment, before the conversion processing and the search processing, it is accurately determined whether or not the search target may be included, and the conversion processing and the search for the part having no possibility are performed. Since processing is not performed, the processing load is reduced.
[0039]
[Conversion processing]
Next, the conversion process will be described. One of the characteristic features of the present embodiment is that this conversion process is performed in stages, and in each stage, a conversion corresponding to one conversion degree of freedom is performed. The control unit 11 according to the present embodiment calculates predetermined feature amount vector information based on the image data portion included in the search area. Here, the feature amount vector information is a vector amount including a plurality of feature amount elements selected according to the property of the search target.
[0040]
In the present embodiment, the description will be made assuming that the control unit 11 specifies the conversion by the kernel nonlinear subspace method using the feature amount vector information and the feature amount vector information stored in the conversion database 13a.
[0041]
[Contents of Conversion Database 13a]
This kernel non-linear subspace method is widely known as a method of classifying data into some category, and therefore detailed description is omitted. Each of the plurality of subspaces Ω included in the space F is recognized as a category to which data is classified, and feature vector information (for example, Φ) in the space F created based on the data to be classified is defined. Each of the subspaces Ω is projected (the projection result is assumed to be φ, for example), and the subspace Ω at which the distance E between the feature vector information Φ before the projection and the feature vector information φ after the projection becomes the smallest (temporarily, This is a method of detecting the tangent subspace) and determining that the data to be classified belongs to the category represented by the subspace Ω.
[0042]
Therefore, in the learning stage, a non-linear mapping (mapping to space F, that is, mapping to space F, At least one of the parameters included in the kernel function) and the hyperplane separating the subspaces Ω corresponding to the respective categories is adjusted.
[0043]
In the present embodiment, the conversion database 13a is formed to determine a method (type and amount of conversion) of converting the search target into the reference state. That is, the conversion database 13a is learned and acquired so that the type of conversion to be performed and the amount (category) of conversion to be performed on the image data for which it is unknown whether or not the image data is in the reference state. In the present embodiment, the conversion database 13a is created for each type of conversion (degree of freedom) to be performed on the image, such as rotation, translation, and size change of the image. The conversion database 13a corresponding to each degree of freedom of conversion is obtained by learning the conversion amount of the corresponding conversion as a category.
[0044]
In order to acquire the learning, in the learning process of the conversion database 13a of the present embodiment, a learning sample is generated as follows. That is, a plurality of examples of image data to be searched in a predetermined reference state are prepared, and for each example of image data, conversion is performed with a different conversion amount for each degree of freedom. And generating a plurality of converted image data. The converted image data generated for each degree of freedom is used as a learning sample for each degree of freedom. When a face is specifically targeted for search, a plurality of image data of a face in a predetermined reference state (predetermined shooting condition / posture) are prepared as an example, and the degree of freedom of the conversion of each image data is set to, for example, rotation / rotation. For each degree of freedom such as parallel movement and size change, if it is a rotation, converted image data rotated at an angle of 5 degrees in the range from -180 to 180 degrees is used as a learning sample for the degree of freedom of rotation. . In the case of parallel movement, a plurality of converted image data, which are moved vertically and horizontally by 5 pixels each, are used as learning samples for the degree of freedom of the parallel movement. Note that these learning samples are generated by moving the area of the reference state by 5 pixels from the image data of the area wider than the reference state while including the degree of freedom of conversion such as movement.
[0045]
In this manner, for each of the plurality of image data examples, a plurality of image data items each of which is subjected to a plurality of conversions for each degree of freedom are generated, and information indicating the type of conversion performed on each image data item (the amount of conversion amount) Etc.).
[0046]
Here, in order to obtain image data that has been subjected to conversion with different conversion amounts from each other, the image data obtained by performing each conversion while changing the conversion amount by a predetermined step (for example, 5 degrees in terms of rotation) is used as a learning sample. Although they are included, the conversion may be performed while determining the conversion amount by random numbers, and may be included in the learning sample, instead of being changed by predetermined steps.
[0047]
Next, the conversion database 13a corresponding to each degree of freedom is trained using a learning sample for each degree of freedom.
[0048]
[Operation of conversion processing]
The control unit 11 performs the following conversion process on each of the search areas defined by the search area definition processing using the conversion databases 13a learned in this manner. That is, the image data portion (for example, which can be regarded as a vector value as a column of pixel values) included in the search area targeted for processing is converted into feature amount vector information in the space F (for each conversion database 13a, that is, (A set of feature amounts defined for each degree of freedom of the transformation), and the mapping is projected to each subspace Ω. Then, a conversion amount that minimizes the distance E between the feature amount vector information before projection and the feature amount vector information after projection is determined. Further, the control unit 11 calculates the square value L of the distance E, and stores this as an error in the storage unit 12.
[0049]
Here, the conversion amount is determined for each degree of freedom based on each conversion database 13a, but the control unit 11 determines one of the conversion amounts corresponding to each degree of freedom under a predetermined condition (for example, corresponding to each conversion amount). And the conversion corresponding to the selected degree of freedom is converted by the selected amount of conversion.
[0050]
In other words, from the image data portion included in the search area, information obtained by learning in each conversion database 13a corresponding to each degree of freedom is used. , And the error is Lr. With respect to the degree of freedom of the parallel movement, the pixel approaches the reference state by conversion of 5 pixels to the left, and information such as the error Lp is obtained. Choose a transformation with more degrees of freedom. For example, in the case of the above example, if Lr <Lp, a 10-degree rotation conversion is performed on the search area to define a new search area. Then, the image data portion included in the new search area is further mapped to feature amount vector information in the space F, and the mapping is further projected to each subspace Ω. Then, the process is repeated from the process of determining the conversion amount that minimizes the distance E between the feature amount vector information before projection and the feature amount vector information after projection.
[0051]
If the conversion amounts corresponding to the respective degrees of freedom are all “0” (that is, no conversion), the process is terminated at that stage. The conversion process is performed on any of the unprocessed search areas as a processing target.
[0052]
Here, the image data portion included in the defined search area in the target image data is used as it is, but a process of reducing the resolution of the image data portion is performed to obtain coarse-grained data, and the coarse-grained data is obtained. The conversion process may be performed using the data. In this case, each conversion database 13a is learned by using a learning sample corresponding to the coarse-grained data.
[0053]
In addition, the control unit 11 may perform processing such as calculation of feature amount vector information, mapping to a subspace, evaluation of distance, and evaluation of error in parallel, instead of sequentially for each degree of freedom.
[0054]
Further, although the case where the kernel nonlinear subspace method is used has been described as an example, other methods such as an auto encoder may be used as long as data classification and error evaluation at the time of classification are possible.
[0055]
[Search process]
Next, the search processing will be described. In this search processing, it is determined whether or not a search target is included in the image data portions included in each of the search areas for which the conversion processing has been completed, using the search database 13b. As a specific example of the search processing, there is a method disclosed in JP-A-2002-329188. Next, the outline will be described.
[0056]
[Study course of search database]
The search database 13b is formed by learning a neural network by using, as a learning sample, an example of image data to be searched in a reference state. That is, the control unit 11 receives a plurality of learning samples, calculates a set of feature amounts (feature amount vectors) preliminarily selected according to the properties of the search target for each of the samples, and generates learning data. I do. Next, a grid space map is formed by SOM (self-organizing map) on the M × M ′ grid space stored in the storage unit 12 using the learning data. That is, the control unit 11 calculates the distance between the input feature amount vector as the learning data and the weight vector assigned to each lattice by a predetermined measure (for example, a Euclidean measure), and determines that the distance is minimum. (A best matching node) c is detected. Then, for the plurality of grids near the best matching node, the weight vectors are updated using the input feature amount vectors. By repeating this process, a lattice space map is formed on the storage unit 12, and the best matching nodes for the feature amount vectors similar to each other form a continuous area. That is, in this grid space, a non-linear projection from a feature vector, which is a multi-dimensional input signal, to a two-dimensional map is formed while maintaining the phase, and by updating the weights, the characteristic part of the data is organized, As a result of the learning, a grid responding to similar data is present in close proximity.
[0057]
When the learning based on each learning data is completed, the control unit 11 classifies each grid of the grid space map into a category. This classification can be performed based on, for example, the distance between the respective lattices (the distance between the weight vectors associated with the respective lattices), and a category of a lattice group that responds to image data similar to the search target (search target category). And the other categories of the grid group (non-search target categories).
[0058]
[Operation of search processing]
The control unit 11 secures an area for storing map data of the same size as the target image data in the storage unit 12, and initializes the value of the area to “false”.
[0059]
The control unit 11 calculates a predetermined feature vector based on the image data portion in the search area for which the conversion process has been completed, using the search database 13b acquired by learning. Then, the distance between the calculated feature vector and the weight vector associated with each grid in the search database 13b is determined, and the grid (best matching node) with the minimum distance to the feature vector is specified. If it belongs to the search target category, it is determined that the search target includes the search target.If the specified lattice belongs to the non-search target category, it is determined that the search target does not contain the search target. to decide.
[0060]
[Operation of Control Unit 11]
As described above, while sequentially reducing the target image data to be searched, the control unit 11 according to the present embodiment extracts an area for performing a search process from each of the reduced target image data, and After the image data is converted so as to be in the reference state, it is determined whether or not the search target is included in the image data in the search area after the conversion. That is, since the control unit 11 performs the process of setting the image data to be searched for as the reference state, the position of the search area before the conversion may be slightly shifted. In addition, there is no problem even if the reduction ratio is slightly deviated from the reference state.
[0061]
Therefore, in the related art, reduced image data in multiple stages reduced by 0.8 times is generated, and the search area is extracted while being shifted one pixel at a time. It is possible to reduce by 5 times, and when defining a search area, even when autonomously extracting an area satisfying a predetermined condition, ΔX and ΔY can be set to 6 pixels or the like. As a result, the number of patterns to be searched can be significantly reduced, and the load of processing for searching for a search target from a photograph or the like can be reduced.
[0062]
The control unit 11 generates map data representing an area determined to include the search target as search result information, and this map data corresponds to each of the target image data after being reduced at each reduction rate. Are generated. Therefore, an area including the search target is determined by integrally using the plurality of map data (each having the size of the reduced target image data).
[0063]
For example, each map data is enlarged at an enlargement ratio corresponding to the respective reduction ratio, compared with the size of the original target image data and compared, and an area ("true") common to all map data ( Regardless of the target image data at any reduction ratio, it may be determined that the search target is included in the area (the area determined to include the search target). Further, an area that is “true” in any one of the map data may be determined to be an area including the search target.
[0064]
[Other Modifications]
In the description so far, the conversion performed in the conversion processing includes two-dimensional rotation, translation, and scaling (the search area is scaled, and the image data portion inside the search area is changed to the original (before scaling) search area). However, in the case of a human face, for example, a three-dimensional rotation affected by the posture (adjusting downward or turning) may be included. . In this case, assuming an average three-dimensional model to be searched, a three-dimensional rotation conversion can be realized using a projection of the image data portion onto the average three-dimensional model.
[0065]
In the search process, even if the search target is, for example, a human face, the category may be further divided into categories, and conditions such as age, gender, and whether or not the mouth is open may be included.
[0066]
Further, in the flowchart of FIG. 2, the control unit 11 sequentially performs the processing at each reduction rate. However, since the processing at each reduction rate is independent of each other, the control unit 11 The processing may be performed in parallel.
[Brief description of the drawings]
FIG. 1 is a configuration block diagram of an image search device according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an example of a process of a control unit 11;
FIG. 3 is a flowchart illustrating an example of a process of determining a search area by a control unit 11;
[Explanation of symbols]
11 control unit, 12 storage unit, 13 database unit, 14 display unit, 15 operation unit, 16 external storage unit.

Claims

An image search apparatus that searches for image data portions to be searched from within target image data that has been processed,
Search area specifying means for specifying a search area candidate from the target image data according to a predetermined rule;
A partial search area extraction unit configured to extract a plurality of partial search area candidates from the search area candidates according to a predetermined partial search area extraction rule;
Using each of the image data portions defined by the respective partial search area candidates, the respective image feature amounts are calculated, and the number of image feature amounts satisfying a predetermined feature amount condition among the image feature amounts is counted. Means for determining a corresponding search area candidate as a search area when the number satisfies a predetermined number condition;
Including
An image search apparatus, which performs a process of searching for a search target in the determined search area.

2. The image search apparatus according to claim 1, wherein the partial search area extracting unit extracts the partial search area candidate from the search area candidate while allowing partial overlap.

The image search device according to claim 1 or 2,
The search processing means includes a conversion method and a conversion amount to be applied to each of the image data portions included in the determined at least one search region with reference to a conversion database acquired in advance by learning. A conversion unit that obtains a conversion condition, and performs conversion based on the obtained conversion condition at least once for a corresponding image data portion;
A search means for referring to a search database acquired by learning using image data to be searched in the reference state, and determining whether or not a search object is included in at least one of the converted image data portions; and ,
An image search device comprising:

An image search method for searching an image data portion of a search target from target image data that has been processed,
Identifying a search area candidate from the target image data according to a predetermined rule;
Extracting a plurality of partial search area candidates from the search area candidates according to a predetermined partial search area extraction rule;
Using each of the image data portions defined by the respective partial search area candidates, the respective image feature amounts are calculated, and the number of image feature amounts satisfying a predetermined feature amount condition among the image feature amounts is counted. Determining the corresponding search area candidate as a search area when the number satisfies a predetermined number condition;
Including
An image search method characterized by performing a process of searching for a search target in the determined search area.

An image search program for searching for an image data portion to be searched from within the target image data that has been processed, the computer comprising:
A procedure for specifying a search area candidate from the target image data according to a predetermined rule,
Extracting a plurality of partial search area candidates from the search area candidates according to a predetermined partial search area extraction rule;
Using each of the image data portions defined by the respective partial search area candidates, the respective image feature amounts are calculated, and the number of image feature amounts satisfying a predetermined feature amount condition among the image feature amounts is counted. Determining a corresponding search area candidate as a search area when the number satisfies a predetermined number condition;
Performing a process of searching for a search target for the determined search area;
And an image search program.