JP4217954B2

JP4217954B2 - Image search device

Info

Publication number: JP4217954B2
Application number: JP2003033845A
Authority: JP
Inventors: 仁池田; 典司加藤; 洋次鹿志村
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-02-12
Filing date: 2003-02-12
Publication date: 2009-02-04
Anticipated expiration: 2023-02-12
Also published as: JP2004246477A

Description

【０００１】
【発明の属する技術分野】
本発明は、写真などの画像データから、顔の部分などといった特定の画像部分を探索する画像探索装置に関する。
【０００２】
【従来の技術】
近年、写真等に含まれる特定の対象体、例えば人の顔などの部分を特定し、当該特定した部分に基づいて所定の処理を行うことが考えられている。その一例としては撮影された写真から各人の顔の部分を検出し、当該顔の部分のみを焼き付けたり、または撮像中の映像から人の顔部分を検出して顔の認証処理に供したり、といったものが考えられる。
【０００３】
従来の顔画像等、対象体を認識する装置では、対象体の撮像状態（傾き、大きさ、照明状態など）によっては対象体の認識が困難になる場合に対応するため、撮像状態を所定の撮像状態（基準状態）に適合させる処理を行うものがある。
【０００４】
従来、この処理では、具体的には撮像状態を変化させながら撮影した学習用画像データを用いてニューラルネットワークを学習させ、当該学習させたニューラルネットワークを利用して処理の対象となった写真での撮像状態が基準状態からどの程度ずれているかを検出し、当該ずれを補正するよう画像処理を行うことが考えられてきた。
【０００５】
なお、対象となる画像データから所望のパターンを検出する処理の例としては、特許文献１に開示される、カーネル非線形部分空間法等の方法が知られている。
【０００６】
【特許文献１】
特開２００１−９０２７４号公報
【０００７】
【発明が解決しようとする課題】
しかしながら、例えば人物の顔部分で言えば、横向き加減や上向き加減、首のかしげ具合、照明の具合といった様々な変化があり、従来の基準状態からのずれを検出する処理を行おうとすると、ニューラルネットワークの学習に用いる学習用画像データが上記様々な変化に合わせて大量に必要となる。また、こうした大量の画像データによって学習された結果、ニューラルネットワークの規模も膨大なものとなって、当該処理を現実的な時間内に完了することは不可能であった。
【０００８】
また従来の対象体を認識するための装置では、対象体を探索する元となる写真等について、その全体を探索範囲として処理を行っている。このため、処理すべきデータ量も増大してしまい、処理負荷が多大であった。
【０００９】
本発明は、上記実情に鑑みて為されたもので、探索の対象体を写真などから探索する処理の負荷を軽減でき、探索の精度を向上できる画像探索装置を提供することをその目的の一つとする。
【００１０】
【課題を解決するための手段】
請求項１記載の発明は、処理の対象となった対象画像データ内から、探索の対象となる探索対象の画像データ部分を探索する画像探索装置であって、前記対象画像データ内に、探索領域を少なくとも一つ画定する手段と、変換方法ごとに予め学習獲得された複数の第１変換データベースを参照して、前記画定されたそれぞれの探索領域に含まれる各探索部分データについて適用すべきＮ１個の変換の方法及び変換の量を含んでなる変換条件を取得し、当該取得した変換条件に基づく変換を、各探索部分データに対して少なくとも一度行う第１変換手段と、前記第１変換手段によって変換された各探索部分データについて、さらに、前記第１変換手段における変換に比べ、変換の方法をＮ２個（Ｎ２はＮ１とは異なる）とし、変換の量を制限した制限変換条件を、変換方法ごとに予め学習獲得された複数の第２変換データベースを参照して取得し、当該取得した制限変換条件に基づく変換を前記変換された各探索部分データに対して少なくとも一度行い、前記各探索部分データを基準状態へ変換する第２変換手段と、を含み、予め、前記基準状態での画像データを用いて学習獲得された探索データベースを参照し、前記変換後の探索部分データの各々に探索対象が含まれているか否かを判断し、当該判断結果を出力することとしたものである。ここで第１、第２の各変換手段における変換条件の取得の際には、それぞれ探索部分データに基づき所定の特徴量のセットを含んだ特徴量ベクトル情報を演算し、当該特徴量ベクトル情報を用いてそれぞれ第１、第２変換データベースを参照することとなるが、第１変換手段での特徴量ベクトル情報に含まれる特徴量の数Ｎ１を、第２変換手段での特徴量ベクトル情報に含まれる特徴量の数Ｎ２より少なくして、その精度を粗くしておくこととしてもよい。これによると、第１変換手段における処理負荷が軽減される。
【００１１】
請求項２記載の発明は、請求項１に記載の画像探索装置において、前記第１変換手段によって変換された各探索部分データについて、探索対象が含まれているか否かを判断する予備探索手段をさらに有し、当該予備探索手段により、探索対象が含まれていると判断された探索部分データについてのみ、前記第２変換手段が変換を行うこととしたものである。
【００１２】
また請求項３記載の発明は、請求項２に記載の画像探索装置において、前記予備探索手段は、第１変換手段によって変換された後の探索部分データについて、探索対象の画像データ例を用いて学習獲得された探索データベースを参照し、前記変換後の探索部分データの各々に探索対象が含まれているか否かを判断することとしたものである。
【００１３】
また請求項４記載の発明は、請求項３記載の画像探索装置において、前記探索領域を画定する手段が、探索領域として画定しようとする領域の内部に含まれる画像データのエントロピー、階層エントロピー、色、及び輝度分散、の少なくとも一つを用い、探索領域として実際に画定するか否かを決定することとしたものである。また請求項５記載の発明は、請求項３に記載の画像探索装置において、前記探索領域を画定する手段が、探索領域として画定しようとする領域の内部に含まれる画像データのエントロピーが、所定のしきい値よりも大きい場合には当該探索領域として画定しようとしている領域を実際に探索領域として画定することとしたものである。
【００１４】
請求項６記載の発明は、処理の対象となった対象画像データ内から、探索の対象となる探索対象の画像データ部分を探索する画像探索プログラムであって、コンピュータを、前記対象画像データ内に、探索領域を少なくとも一つ画定する手段と、変換方法ごとに予め学習獲得された複数の第１変換データベースを参照して、前記画定されたそれぞれの探索領域に含まれる各探索部分データについて適用すべきＮ１個の変換の方法及び変換の量を含んでなる変換条件を取得し、当該取得した変換条件に基づく変換を、各探索部分データに対して少なくとも一度行う第１変換手段と、前記第１変換手段によって変換された各探索部分データについて、さらに、前記第１変換手段における変換に比べ、変換の方法をＮ２個（Ｎ２はＮ１とは異なる）とし、変換の量を制限した制限変換条件を、変換方法ごとに予め学習獲得された複数の第２変換データベースを参照して取得し、当該取得した制限変換条件に基づく変換を前記変換された各探索部分データに対して少なくとも一度行い、前記各探索部分データを基準状態へ変換する第２変換手段と、予め、前記基準状態での画像データを用いて学習獲得された探索データベースを参照し、前記変換後の探索部分データの各々に探索対象が含まれているか否かを判断し、当該判断結果を出力する手段と、として機能させることとしたものである。
【００１５】
請求項７記載の発明は、請求項６に記載の画像探索プログラムにおいて、前記第１変換手段としての機能によって変換された各探索部分データについて、探索対象が含まれているか否かを判断し、探索対象が含まれていると判断された探索部分データについてのみ、前記第２変換手段として機能させるよう、前記コンピュータを機能させる手段をさらに含むこととしたものである。
【００１６】
請求項８記載の発明は、請求項７に記載の画像探索プログラムにおいて、前記予備探索手段として機能させる際に、第１変換手段によって変換された後の探索部分データについて、探索対象の画像データ例を用いて学習獲得された探索データベースを参照し、前記変換後の探索部分データの各々に探索対象が含まれているか否かを判断させることとしたものである。
【００１７】
【発明の実施の形態】
［基本構成］
本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る画像探索装置は、図１に示すように、制御部１１と、記憶部１２と、データベース部１３と、表示部１４と、操作部１５と、外部記憶部１６とを含んで構成された、一般的なコンピュータを用いて実現される。このコンピュータは、他の製品、例えばカメラなどに組み込まれたものであっても構わない。
【００１８】
制御部１１は、記憶部１２に格納されているプログラムに従って動作するものであり、処理の対象となった対象画像データのうち、探索領域を少なくとも一つ画定する探索領域画定処理と、基準状態に変換する変換処理と、探索対象が含まれている探索領域を検出する探索処理と、探索結果を用いた所定の処理とを実行する。これらの制御部１１の具体的処理内容については、後に詳しく述べる。
【００１９】
記憶部１２は、制御部１１が実行するソフトウエアを格納している。また、この記憶部１２は、制御部１１がその処理の過程で必要とする種々のデータを保持するワークメモリとしても動作する。具体的にこの記憶部１２は、ハードディスクなどの記憶媒体、あるいは半導体メモリ、ないしこれらの組み合わせとして実現できる。
【００２０】
データベース部１３は、後に説明するように、制御部１１の第１変換処理において用いられる第１変換データベース１３ａ、第２変換処理において用いられる第２変換データベース１３ｂ、並びに探索処理において用いられる探索データベース１３ｃを含んだデータベースである。このデータベース部１３は、具体的にはハードディスクなどの記憶媒体であり、記憶部１２がこのデータベース部１３を兼ねてもよいが、ここでは説明のため、特に分けて示している。
【００２１】
表示部１４は、例えばディスプレイ装置やプリンタ装置などであり、制御部１１から入力される指示に従い、情報の表示などを行うものである。操作部１５は、例えばキーボードやマウスなどであり、ユーザの操作を受け入れて、当該操作の内容を制御部１１に出力する。
【００２２】
外部記憶部１６は、例えばＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなど、コンピュータ可読なリムーバブルメディア（記憶媒体の一種）からプログラムやデータを読み出して制御部１１に出力し、制御部１１の処理によって記憶部１２に格納させる処理を行うものである。本実施の形態に係るプログラムは、例えばＣＤ−ＲＯＭなどの可搬的な記憶媒体に格納されて頒布でき、この外部記憶部１６を用いて記憶部１２に複写されて利用される。なお、本実施の形態に係るプログラムは、こうした記憶媒体だけでなく、ネットワーク上のサーバなどから図示しない通信部を介して記憶部１２に複写されることとしてもよい。
【００２３】
［制御部１１の処理］
ここで、制御部１１の処理の内容について具体的に説明する。本実施の形態においては、各処理の対象となる画像データ（対象画像データ）は、外部記憶部１６や図示しない通信部を介して外部から入力され、記憶部１２に格納される。ここで対象画像データは、一つであっても複数であっても構わない。ユーザが操作部１５を操作して、制御部１１に対し、特定の対象画像データについて探索対象を探索する処理を行うべき旨の指示（処理開始の指示）を行うと、制御部１１は、図２に示す処理を開始する。
【００２４】
制御部１１は、対象画像データを順次縮小変換しながら、各縮小変換された対象画像データについて探索領域を画定し、各探索領域に含まれる画像データ部分を基準状態に近づける第１変換処理を行い、その後さらに当該各探索領域に含まれる画像データ部分を基準状態により近づける第２変換処理を実行する。そして制御部１１は、第２変換処理後の探索領域に含まれる画像データ部分に対して探索処理を実行する。
【００２５】
［第１変換処理］
ここでまず、第１変換処理の内容について説明する。本実施の形態において特徴的なことの一つは、この第１変換処理が段階的に、つまり一回ずつ、一つの変換自由度に対応する変換を逐次的に行うようになっていることである。
【００２６】
各段階で行うべき変換の内容と変換の量を決定するため、制御部１１は、現在選択している探索領域に含まれている画像データ部分に基づいてＮ1次元の所定の特徴量ベクトル情報を演算する。ここで特徴量ベクトル情報は、探索対象の性状に合わせて選択された、Ｎ1個の特徴量要素を含んでなるベクトル量である。
【００２７】
本実施の形態では、制御部１１は、この特徴量ベクトル情報と、変換データベース１３ａに格納されている特徴量ベクトル情報とを用いた、カーネル非線形部分空間法によって変換を特定することとして説明する。
【００２８】
［第１変換データベース１３ａの内容］
このカーネル非線形部分空間法は、データを何らかのカテゴリに分類する方法として広く知られているので、詳しい説明を省略するが、その概要を述べれば、Ｎ1個の特徴量要素を基底として張られる空間Ｆ1において、当該空間Ｆ1に含まれる複数の部分空間Ωのそれぞれをデータの分類先であるカテゴリとして認識し、分類しようとするデータに基づいて作成される空間Ｆ内の特徴量ベクトル情報（例えばΦとする）を各部分空間Ωに射影し（射影の結果を例えばφとする）、射影前の特徴量ベクトル情報Φと、射影後の特徴量ベクトル情報φとの距離Ｅが最も小さくなる部分空間Ω（仮に最近接部分空間と呼ぶ）を検出し、分類しようとするデータは、その部分空間Ωによって表されるカテゴリに属すると判断する方法である。
【００２９】
そこで学習段階では、同一のカテゴリに属するべき学習用の例示データ（学習サンプル）に対応するＮ1次元の特徴量ベクトル情報に基づく最近接部分空間Ωが同一となるよう、非線形写像（空間Ｆ1への写像、すなわちカーネル関数に含まれるパラメータ等）と、各カテゴリに対応する部分空間Ω間を隔てる超平面との少なくとも一方を調整することとなる。
【００３０】
本実施の形態においては、探索対象を基準状態に変換する方法（変換の種類及び量）を決定するために、この変換データベース１３ａが形成される。つまり、基準状態にあるか否かが不明な画像データに対して、行うべき変換の種類ごとに、変換の量（カテゴリ）を決定できるように変換データベース１３ａが学習獲得されている。本実施の形態では、画像の回転、平行移動、サイズ変更という、画像に対して行うべき変換の種類（自由度）ごとに、変換データベース１３ａを作成している。変換の各自由度に対応する変換データベース１３ａは、対応する変換の変換量をカテゴリとして学習獲得したものである。
【００３１】
この学習獲得のため、本実施の形態の変換データベース１３ａの学習過程では、学習サンプルを次のように生成する。すなわち、所定の基準状態での探索対象である画像データの例を複数用意し、各画像データの例について、変換の自由度ごとに、それぞれの自由度について、互いに異なる変換量での変換が行われた複数の変換画像データを生成する。こうして自由度ごとに生成された変換画像データを、各自由度ごとの学習サンプルとする。具体的に顔を探索対象とする場合、所定の基準状態（所定の撮影条件・姿勢）にある顔の画像データを例として複数用意し、各画像データについて、変換の自由度として、例えば回転・平行移動・サイズ変更等という各自由度ごとに、回転であれば−１８０度から１８０度までの範囲で５度ずつ等の角度で回転させた変換画像データを回転の自由度に対する学習サンプルとする。また、平行移動であれば、縦横にそれぞれ５ピクセルずつ移動させた複数の変換画像データを平行移動の自由度に対する学習サンプルとする。なお、これらの学習サンプルは、移動等の変換自由度を含むために、基準状態よりも広い領域の画像データのうちから基準状態の面積を５ピクセルずつ移動させながら取り出すことで生成する。
【００３２】
こうして複数の画像データ例のそれぞれについて、さらに自由度ごとにそれぞれ複数の変換が施された複数の画像データを生成し、各画像データにどのような変換を行ったかを表す情報（変換量の大きさ等）を関連づける。
【００３３】
なお、ここでは互いに異なる変換量の変換を施した画像データを得るために、変換量を所定のステップ（例えば回転で言えば５度）ずつ変化させながらそれぞれ変換を行った画像データを学習サンプルに含めるようにしたが、所定のステップずつ変化させながらでなくとも、変換量を乱数によって決定しながら変換を行って、それぞれを学習サンプルに含めるようにしてもよい。
【００３４】
次に、各自由度ごとの学習サンプルを用いて、各自由度に対応する変換データベース１３ａを学習させる。
【００３５】
［第１変換処理の動作］
制御部１１は、こうして学習された各変換データベース１３ａを用いて、探索領域画定処理によって画定された探索領域の各々について次のように変換処理を行う。すなわち、処理の対象となった探索領域に含まれている画像データ部分（例えば画素値の列としてベクトル値と同視し得る）を、空間Ｆ1内のＮ1次元特徴量ベクトル情報（各変換データベース１３ａごと、つまり変換の各自由度ごとに定義されている特徴量の組）に写像し、さらにその写像を各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する。また制御部１１は、距離Ｅの二乗値Ｌを演算し、これを誤差として記憶部１２に保持する。
【００３６】
ここで変換量は、各変換データベース１３ａに基づき自由度ごとに決定されるが、制御部１１は、各自由度に対応する変換量のうち一つを所定の条件（例えば各変換量に対応する距離Ｅが最小となるもの等の条件）に基づいて選択し、選択した自由度に対応する変換を、選択した変換量の分だけ変換する。
【００３７】
つまり、探索領域に含まれている画像データ部分からは、各自由度に対応する各変換データベース１３ａに学習獲得された情報によって、例えば回転の自由度に対しては１０度の回転変換により基準状態に近づき、その誤差がＬｒであり、平行移動の自由度に対しては左へ５ピクセルの変換で基準状態に近づき、その誤差がＬｐといった情報が得られるので、この中から、誤差が最小となる自由度の変換を選択する。例えば上述の例の場合、Ｌｒ＜Ｌｐならば１０度の回転変換を探索領域に施して、新たな探索領域を画定する。そして、この新たな探索領域に含まれる画像データ部分をさらに空間Ｆ1内の特徴量ベクトル情報に写像し、その写像をさらに各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する処理から繰り返す。
【００３８】
また、各自由度に対応する変換量がいずれも「０」（つまり無変換）を表すものとなっている場合は、その段階で処理を終了し、さらに未処理の探索領域があれば、当該未処理の探索領域のいずれかを処理の対象として変換処理を行う。
【００３９】
なお、ここでは対象画像データのうち、画定された探索領域に含まれる画像データ部分をそのまま用いているが、当該画像データ部分の解像度を低減する処理を行って、粗視データとし、当該粗視データを用いて変換処理を実行してもよい。この場合は、当該粗視データに対応する学習サンプルを用いて、各変換データベース１３ａを学習獲得させておく。
【００４０】
［第２変換処理］
ここでまず、第２変換処理の内容について説明する。本実施の形態において特徴的なことの一つは、この第２変換処理が段階的に、つまり一回ずつ、一つの変換自由度に対応する変換を逐次的に行うようになっていることである。
【００４１】
各段階で行うべき変換の内容と変換の量を決定するため、制御部１１は、現在選択している探索領域に含まれている画像データ部分に基づいてＮ2次元の所定の特徴量ベクトル情報を演算する。ここで特徴量ベクトル情報は、探索対象の性状に合わせて選択された、Ｎ2個の特徴量要素を含んでなるベクトル量である。本実施の形態においては、第１変換処理によってまず大まかに基準状態に近接させ、ついで第２変換処理を行ってより精密に基準状態に近づける。このため、第２変換処理における特徴ベクトル情報の次元数は、第１変換処理におけるものより大きくする（Ｎ2＞Ｎ1）。これにより、第２変換処理はより精度の高い変換とすることができる。
【００４２】
制御部１１は、第１変換処理と同様に、このＮ2次元の特徴量ベクトル情報と、変換データベース１１ｂに格納されている特徴量ベクトル情報とを用いた、カーネル非線形部分空間法によって変換を特定する。
【００４３】
［第２変換データベース１３ｂの内容］
第２変換データベース１３ｂは、第１変換データベース１３ａと同様の方法によって学習獲得されるものであるが、第１変換データベース１３ａの学習課程では、学習サンプルに基づき、所定Ｎ1次元の特徴量ベクトル情報に変換していたのに対し、第２変換データベース１３ｂの学習課程では学習サンプルから所定Ｎ2次元の特徴量ベクトル情報に変換して、これを利用した学習を行う点が異なる。
【００４４】
［第２変換処理の動作］
制御部１１は、各変換自由度ごとに学習獲得されている第２変換データベース１３ｂを用いて、第１変換処理によって変換された探索領域の各々についてさらに第２変換処理を施す。すなわち、処理の対象となった探索領域に含まれている画像データ部分（例えば画素値の列としてベクトル値と同視し得る）を、空間Ｆ2内のＮ2元特徴量ベクトル情報（各第２変換データベース１３ｂごと、つまり変換の各自由度ごとに定義されている特徴量の組）に写像し、さらにその写像を各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する。また制御部１１は、距離Ｅの二乗値Ｌを演算し、これを誤差として記憶部１２に保持する。
【００４５】
ここで変換量は、第１変換処理の場合と同じく、各第２変換データベース１３ｂに基づき自由度ごとに決定されるが、制御部１１は、各自由度に対応する変換量のうち一つを所定の条件（例えば各変換量に対応する距離Ｅが最小となるもの等の条件）に基づいて選択し、選択した自由度に対応する変換を、選択した変換量の分だけ変換する。
【００４６】
つまり、この場合も第１変換処理と同様に、探索領域に含まれている画像データ部分からは、各自由度に対応する各第２変換データベース１３ｂに学習獲得された情報によって、例えば回転の自由度に対しては１０度の回転変換により基準状態に近づき、その誤差がＬｒであり、平行移動の自由度に対しては左へ５ピクセルの変換で基準状態に近づき、その誤差がＬｐといった情報が得られるので、この中から、誤差が最小となる自由度の変換を選択する。例えば上述の例の場合、Ｌｒ＜Ｌｐならば１０度の回転変換を探索領域に施して、新たな探索領域を画定する。そして、この新たな探索領域に含まれる画像データ部分をさらに空間Ｆ2内の特徴量ベクトル情報に写像し、その写像をさらに各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する処理から繰り返す。
【００４７】
一方各自由度に対応する変換量がいずれも「０」（つまり無変換）を表すものとなっている場合は、その段階で処理を終了し、さらに未処理の探索領域があれば、当該未処理の探索領域のいずれかを処理の対象として変換処理を行う。
【００４８】
また、ここでも対象画像データのうち、画定された探索領域に含まれる画像データ部分をそのまま用いているが、当該画像データ部分の解像度を低減する処理を行って、粗視データとし、当該粗視データを用いて第２変換処理を実行してもよい。この場合は、当該粗視データに対応する学習サンプルを用いて、各第２変換データベース１３ｂを学習獲得させておく。
【００４９】
なお、これら第１・第２の変換処理において、制御部１１は、特徴量ベクトル情報の演算、部分空間への写像、距離の評価、誤差の評価といった処理を各自由度ごとに順次行うのではなく、並列して行ってもよい。
【００５０】
さらにここではカーネル非線形部分空間法を用いる場合を例として説明したが、データの分類と、分類時の誤差評価が可能であれば例えばオートエンコーダ等、他の方法を用いても構わない。
【００５１】
さらにここでは第２変換処理においても第１変換処理と同様に、各自由度の変換が行われるとして説明したが、第２変換処理においては、変換に係る自由度数を低減してもよい。例えば第１変換処理において最後に施した変換の自由度と同じ自由度の変換だけを行うようにしたり、第２変換処理において最初は変換の自由度を選択するが、第２回目以降の第２変換処理は当該最初の第２変換処理で選択した自由度だけに限って変換を繰り返すようにしてもよい。
【００５２】
また、第２変換処理において決定されるカテゴリとしての変換量の最大値を、第１変換処理において決定される変換量の最大値よりも小さく設定しておいてもよい。すなわち、第２変換処理において利用される第２変換データベース１３ｂの学習サンプルは、第１変換データベース１３ａの学習サンプルよりも変換量の範囲を狭めたものを用いておいてもよい。例えば、第１変換データベース１３ａ用の学習サンプルでは、回転角度−１８０度から１８０度までの範囲で１０度ずつ回転させた画像データを用い、第２変換データベース１３ｂ用の学習サンプルでは、その範囲ときざみ量とをそれぞれ小さくして、−３０度から３０度までの範囲で３度ずつ回転させた画像データを用いるといったようにする。
【００５３】
このように、第２変換処理における変換の条件（変換の自由度又は変換量の少なくとも一方）は第１変換処理に比べて制限されたものであってもよい。
【００５４】
［制御部１１の処理の流れ］
具体的に制御部１１は、まず縮小率Ｓを最小縮小率（例えば１倍、つまり縮小せず）に設定し（Ｓ１）、対象画像データを縮小率Ｓで縮小する（Ｓ２）。そして縮小後の対象画像データのサイズに等しいサイズのマップデータの領域を記憶部１２上に確保し、当該領域の値を「偽（false）」に設定して、マップデータの初期化を実行する（Ｓ３）。例えば縮小後の対象画像データが１０００×１０００ピクセルの画像データであれば、１０００×１０００ビット分の領域を確保し、各ビット値を「０」に初期設定する。
【００５５】
次に制御部１１は、縮小後の対象画像データについて、少なくとも一つの探索領域を画定する処理を行う（Ｓ４）。この探索領域の画定処理については後に詳しく述べる。そして探索領域の一つを選択し（Ｓ５）、当該探索領域について第１変換処理を実行する（Ｓ６）。
【００５６】
制御部１１は、続いて、第１変換処理の結果となった探索領域についてさらに第２変換処理を実行し（Ｓ７）、これら第１・第２変換処理後の探索領域に含まれる画像データ部分について、探索対象が含まれているか否かを判定する処理（探索処理）を実行する（Ｓ８）。ここで探索対象が含まれていると判断されるときには（Ｙｅｓのときには）、当該変換処理後の探索領域に相当する、マップデータ上の領域の値を「真（true）」に設定する（Ｓ９）。そしてさらに選択していない探索領域があるか否かを調べ（Ｓ１０）、選択していない探索領域があれば（Ｙｅｓであれば）、処理Ｓ５に戻り（Ａ）、当該選択していない探索領域の一つを選択して処理を続ける。
【００５７】
一方、処理Ｓ８における探索処理の結果、探索対象が含まれていないと判定されるときには（Ｎｏのときには）、そのまま処理Ｓ１０に移行する。また、処理Ｓ１０において、選択していない探索領域がなければ、つまり、すべての探索領域について変換処理と探索処理とを完了したならば（Ｎｏならば）、現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っているか否かを調べ（Ｓ１１）、上回っていなければ（Ｎｏならば）、縮小率Ｓを大きくするように調整して（Ｓ１２）、処理Ｓ２に戻って処理を続ける（Ｂ）。ここで、縮小率Ｓを大きく調整する処理Ｓ１２は、例えば縮小率Ｓを所定比で高めるような処理としてもよいし、縮小率Ｓである倍率に対し、所定乗率ΔＳを乗じて、Ｓ＝Ｓ×ΔＳとして新たな縮小率Ｓを定めてもよい。
【００５８】
また、処理Ｓ１１において現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っていなければ（Ｙｅｓならば）、各縮小率での対象画像データに対応するマップデータに基づき、元の（縮小前の）対象画像データ内で、探索対象が含まれている領域を画定して（Ｓ１３）、処理を終了する。
【００５９】
なお、ここでは第１変換処理の後、直ちに第２変換処理を実行しているが、第１変換処理の後で予備的な探索処理（その内容については後に述べる）を行い、この予備的な探索処理の結果、第１変換処理後の探索領域に含まれる画像データ部分に探索対象が含まれる可能性があると判断されたときにのみ第２変換処理を行うようにし、予備的な探索処理の際に第１変換処理後の探索領域に含まれる画像データ部分に探索対象が含まれる可能性はないと判断されたときには、上記の処理Ｓ１０に移行して処理を続けるようにしてもよい。これによると探索対象が含まれない領域について第２変換処理を行うことがなくなるので、処理負荷をより軽減できる。
【００６０】
［探索領域画定処理］
ここで制御部１１が探索領域を画定する処理（探索領域画定処理）について説明する。探索領域画定処理は、ユーザから入力された開始点の情報を利用しても、また、所定の条件を満足する領域を自律的に画定することによっても行うことができる。
【００６１】
例えばユーザから入力される情報を利用する場合、制御部１１は、操作部１５などから入力された少なくとも一つの開始点座標の情報に基づき、各開始点座標を左上隅とする予め定められたサイズの矩形領域を探索領域としてそれぞれ画定する。
【００６２】
また所定の条件を満足する領域を自律的に検索して画定する場合、制御部１１は、（縮小後の）対象画像データの左上隅の座標（例えばＸ＝０，Ｙ＝０）を開始点として、予め定められたサイズの矩形領域について所定の画定条件を満足しているか否かを調べ、画定条件を満足しているときには、当該矩形領域を探索領域とするという処理を、開始点を幅方向に所定量ずつ移動しながら（Ｘ＝Ｘ＋ΔＸ）順次行い、開始点が対象画像データの幅を逸脱する（Ｘ＞対象画像データの幅）と、高さ方向に所定量だけ開始点を移動して（Ｘ＝０，Ｙ＝Ｙ＋ΔＹ）、幅方向の処理を繰り返す。こうして対象画像データ全体のうち、画定条件を満足する領域を探索領域として画定する。
【００６３】
なお、ここでは開始点の移動量を幅方向、高さ方向にそれぞれΔＸ，ΔＹとしているが、これら移動量は対象画像データの縮小率Ｓに応じて、１倍のときのΔＸ，ΔＹに対してΔＸ／Ｓ，ΔＹ／Ｓとしてもよい。
【００６４】
［探索処理］
次に処理Ｓ８で行われる探索処理について説明する。この探索処理では変換処理を完了した探索領域の各々にそれぞれ含まれる画像データ部分について、探索データベース１３ｃを用いて、探索対象が含まれているか否かを判定する。具体的な探索処理の例としては、特開２００２−３２９１８８号公報に開示された方法などがある。次にその概要を説明する。
【００６５】
［探索データベースの学習課程］
この探索データベース１３ｃは、基準状態にある探索対象の画像データの例を学習サンプルとして用い、ニューラルネットワークを学習させて形成する。すなわち、制御部１１は、複数の学習サンプルの入力を受けて、その各々について、探索対象の性状に合わせて予め選択された特徴量のセット（特徴量ベクトル）を演算し、学習用データを生成する。次に、この学習用データを用いて、記憶部１２に格納されたＭ×Ｍ′の格子空間上に、ＳＯＭ（自己組織化マップ）によって格子空間マップを形成する。つまり、制御部１１は、入力された学習用データである特徴量ベクトルと、各格子ごとに割り当てられた重みベクトルとの距離を所定の測度（例えばユークリッド測度）で演算し、この距離が最小となる格子（最整合ノード）ｃを検出する。そしてこの最整合ノード近傍の複数の格子について、その重みベクトルを当該入力された特徴量ベクトルを用いて更新する。この処理の繰り返しにより、記憶部１２上に格子空間マップが形成され、互いに類似する特徴量ベクトルに対する最整合ノードが連続的な領域を形成するようになる。つまり、この格子空間には、多次元の入力信号である特徴量ベクトルから２次元のマップへの非線形射影が位相を保持したまま形成され、重みの更新により、データの特徴部分が組織化され、その学習成果として類似のデータに反応する格子が近接して存在しているようになる。
【００６６】
各学習データに基づく学習が完了すると、次に制御部１１は、格子空間マップの各格子をカテゴリに分類する。この分類は、例えば各格子間の距離（各格子に関連づけられた重みベクトル間の距離）に基づいて行うことができ、探索対象に似た画像データに反応する格子群のカテゴリ（探索対象カテゴリ）と、そうでない格子群のカテゴリ（非探索対象カテゴリ）とに分類される。
【００６７】
［探索処理の動作］
制御部１１は、対象画像データと同じサイズのマップデータを記憶する領域を記憶部１２に確保し、当該領域の値を「偽（false）」に初期化する。
【００６８】
制御部１１は、学習獲得した探索データベース１３ｃを用い、変換処理を完了した探索領域の画像データ部分に基づいて所定の特徴量ベクトルを演算する。そして当該演算した特徴量ベクトルと探索データベース１３ｃ内の各格子に関連づけられた重みベクトルとの距離を求め、特徴量ベクトルとの距離が最小となる格子（最整合ノード）を特定し、特定した格子が探索対象カテゴリに属していれば、探索領域に探索対象が含まれていると判断し、特定した格子が非探索対象カテゴリに属していれば、探索領域には探索対象が含まれていないと判断する。
【００６９】
［予備的探索処理］
また、第２変換処理を行うか否かを決定するための予備的な探索処理について説明する。この予備的な探索処理は、上記処理Ｓ８で行われる探索処理と同様の処理を行ってもよいが、探索処理自体の処理負荷に配慮して、より簡便な処理としておくことも好ましい。
【００７０】
例えばこの予備的探索処理としては、第１変換処理後の探索領域に含まれる画像データ部分に基づき、所定の特徴量を演算し、その特徴量が所定の条件を満足しているか否かによって探索対象が含まれている可能性があるか否かを判断するようにしてもよい。一例として、人の顔を探索対象とする場合、人の顔の輪郭部分ではその他の部分に比べてエントロピーが高いので、特徴量としてこのエントロピーを演算し、所定のしきい値（例えば対象画像データ全体の平均的なエントロピー値に基づいて決定されるしきい値）に比べて当該演算したエントロピーの値が高い場合に探索対象が含まれている可能性があると判断する。
【００７１】
［制御部１１の動作］
本実施の形態の制御部１１は、以上のように、探索の対象となった対象画像データを順次縮小しながら、縮小後のそれぞれの対象画像データから探索処理を行う領域を取り出し、当該領域内の画像データ部分を基準状態に近づけるべく第１変換処理を実行し、ついで当該第１変換処理後の画像データ部分に探索対象が含まれているか否かを予備的に探索する。
【００７２】
そして、この予備的な探索の結果、探索対象が含まれている可能性があると判断されると、さらに基準状態に近づけるように第２変換処理を実行し、探索対象が当該変換後の探索領域内の画像データに含まれているか否かを判断する。すなわち、制御部１１は探索処理の対象となる画像データを基準状態とする処理をまずは粗く行い、ついで細かく行うことで、全体としての処理負荷を軽減する。
【００７３】
このように基準状態への変換処理が行われるので、本実施の形態においては、変換前の探索領域の位置が多少ずれていても構わない。また、縮小率が基準状態から多少ずれていたとしても問題とならない。従って従来であれば、０．８倍ずつ縮小した多段階の縮小画像データを生成し、しかも探索領域を１画素ずつずらしながら取り出すようにしていたのに対し、本実施の形態のものでは０．５倍ずつの縮小で構わないし、探索領域を画定する際に、所定の条件を満足する領域を自律的に取り出す場合であっても、ΔＸやΔＹを６画素等とすることができる。これにより、探索処理の対象となるパターン数を大幅に低減でき、探索の対象体を写真などから探索する処理の負荷を軽減できる。
【００７４】
制御部１１は、探索対象が含まれていると判断された領域を表すマップデータを探索結果情報として生成するが、このマップデータは各縮小率で縮小された後の対象画像データのそれぞれに対応して複数生成される。そこで、これら複数のマップデータ（それぞれ縮小後の対象画像データのサイズとなっている）を統合的に用いて探索対象が含まれている領域を決定する。
【００７５】
例えば、各マップデータを、それぞれの縮小率に応じた拡大率で拡大し、元の対象画像データのサイズに揃えて比較し、すべてのマップデータで共通して「真」となっている領域（どの縮小率の対象画像データに基づいても、探索対象が含まれていると判断された領域）に探索対象が含まれていると判断することとしてもよい。また、いずれか一つのマップデータで「真」となっている領域を探索対象が含まれている領域と判断するようにしてもよい。
【００７６】
［探索領域画定処理の変形例］
さらに探索領域画定処理において、制御部１１は探索領域として画定しようとする領域について、その内部に含まれる画像データのエントロピーや、階層エントロピー、色、輝度分散、及びこれらのうちの二以上の値の組み合わせを用い、探索領域として実際に画定するか否かを決定してもよい。例えば人物の顔部分を探索する場合、顔の周辺部（輪郭部分）ではエントロピーが高くなるので、エントロピーが所定のしきい値よりも大きい場合には当該探索領域として画定しようとしている領域を実際に探索領域として画定し、そうでない場合には、当該領域を探索領域とせずに、他の処理を続けるようにする。これによると、変換処理・探索処理の対象となる探索領域を合理的に減少させることができ、処理負荷の軽減が図られる。
【００７７】
［その他の変形例］
ここまでの説明では、変換処理において行われる変換は、探索領域に関する２次元的な回転、平行移動、拡大縮小（探索領域を拡大縮小し、その内部の画像データ部分を元の（拡大縮小前の）探索領域のサイズに変換して扱えばよい）であり、その結果として、当該探索領域に含まれる画像データ部分が変換されるものとして説明したが、これ以外にも例えば人の顔であれば、姿勢（うつむき加減や振り向き加減）に影響される３次元的な回転等、画像データ部分そのものに対しての変換を含んでもよい。具体的にこのような３次元的な回転などの場合、探索対象の平均的３次元モデルを想定し、当該平均的３次元モデルへ画像データ部分を投射したものを用いて実現することができる。
【００７８】
また、探索処理においては探索対象として、例えば人の顔であっても、さらに細かくカテゴリを分けて、年齢や性別、口を開けているか否かなどの条件を含めてもよい。
【００７９】
さらに、図２のフローチャート図においては各縮小率における処理を制御部１１が順次行うものとしていたが、各縮小率における処理は互いに独立しているので、制御部１１は、これらの各縮小率における処理を並列して行ってもよい。
【００８０】
［動作］
次に、本実施の形態の画像探索装置の動作について、対象画像データとして与えられた写真の画像データから人の顔部分を探索対象として探索する場合を例として説明する。なお、以下の例では簡単のため、変換の自由度は平行移動（ｘ，ｙ）と、回転（θ）のみであるとして説明する。
【００８１】
制御部１１は、図示しない外部インタフェースや通信部など、外部から入力される対象画像データを記憶部１２に格納し、既に説明した方法で探索領域を画定する。例えば制御部１１は、対象画像データを、所定サイズの画像ブロック（その一部が互いに重なり合ってもよい）に区切り、そのうち、例えばエントロピーが所定の値より高い画像ブロックを探索領域として画定する。そして画定された探索領域の一つを選択し、当該選択した探索領域について第１変換処理を実行する。この第１変換処理では、概念的には、基準状態Ｏから、図３（ａ）に示すような範囲Ｒ１にある画像データ部分が基準状態Ｏ近傍に近づけられる。
【００８２】
制御部１１は、さらに第１変換処理後の画像データ部分に探索対象である人の顔が含まれているか否かを予備的に探索する。この予備的な探索処理は、ニューラルネットワークを利用した本探索処理と同様のものであってもよいし、エントロピーなどの特徴量を用いた簡易なものであってもよい。制御部１１は、この予備的な探索処理の結果、第１変換処理後の画像データ部分に探索対象である人の顔が含まれていると判断される場合は、続いて当該画像データ部分に係る探索領域に第２変換処理を実行する。この第２変換処理では、概念的には、図３（ａ）に対応する図３（ｂ）に示すように、自由度が例えばθ方向（回転）のみで、その変換量の範囲も狭い変換処理（Ｒ２）が適用され、画像データ部分を基準状態Ｏにより近接させる微調整が行われる。
【００８３】
制御部１１は、この第２変換処理後の探索領域に含まれる画像データ部分に対して探索処理を実行し、探索対象が含まれているか否かを調べ、探索対象が含まれていれば、当該探索処理を行った探索領域に相当する対象画像データの部分を特定する情報を生成する。
【００８４】
このように、本実施の形態では、二段階の変換処理を行って、一つの変換データベースを用いるだけの変換処理の場合に比べ、処理負荷を軽減でき、より基準状態に近接させて探索を行わせるため、探索精度も向上できる。
【００８５】
なお、ここまでの説明では二段階の変換処理を行う場合を例として示しているが、３回以上の複数回であっても構わない。この場合、段階があがるにつれて変換処理における変換条件をより制限したり、特徴量ベクトル情報の次元数をより高めてもよい。
【００８６】
【実施例】
図４（ａ）は、一種類の変換データベースを用いた変換処理によって基準状態への変換を行った後の画像データが、実際の基準状態からどれだけの誤差を生じていたかを表す分布図であり、図４（ｂ）は、本実施の形態と同様に、二種類の変換データベースを用いて、まずは粗く、次に細かくと２回に分けて変換処理を行った場合に、その結果としての画像データが、実際の基準状態からどれだけの誤差を生じていたかを表す分布図である。各図を参照して理解されるように、本実施の形態の方法によれば、誤差の平均が小さくなり、分布状態からも、より基準状態に近くなるよう変換が為されている。
【００８７】
また、前記一種類の変換データベースを用いた変換処理の場合では、収束までに平均的に２０回前後の変換処理を要する（各自由度ごとの単位変換がおおよそ２０回行われる）のに対し、前記二種類の変換データベースを用いる場合、粗い変換処理に約１０回、細かい変換処理に約５回で済んでおり、全体的に変換処理自体の処理負荷も軽減されている。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る画像探索装置の構成ブロック図である。
【図２】制御部１１の処理の一例を表すフローチャート図である。
【図３】本発明の実施の形態に係る画像探索装置が行う変換処理の概要を表す説明図である。
【図４】一種類の変換データベースを用いた変換処理の結果と二種類の変換データベースを用いた二段階の変換処理の結果とを比較した実験結果を表す説明図である。
【符号の説明】
１１制御部、１２記憶部、１３データベース部、１４表示部、１５操作部、１６外部記憶部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image search apparatus for searching for a specific image portion such as a face portion from image data such as a photograph.
[0002]
[Prior art]
In recent years, it has been considered that a specific object included in a photograph or the like, for example, a part such as a human face is specified and predetermined processing is performed based on the specified part. As an example, detect the face part of each person from the photographed picture and burn only the face part, or detect the face part of the person from the image being captured and use it for face authentication processing, Such a thing can be considered.
[0003]
In a device for recognizing a target object such as a conventional face image, in order to cope with the case where it is difficult to recognize the target object depending on the imaging state (tilt, size, illumination state, etc.) of the target object, Some perform processing adapted to the imaging state (reference state).
[0004]
Conventionally, in this process, specifically, a neural network is learned using learning image data taken while changing the imaging state, and a photograph that is a processing target using the learned neural network is used. It has been considered to detect how much the imaging state deviates from the reference state and perform image processing so as to correct the deviation.
[0005]
As an example of processing for detecting a desired pattern from target image data, a method such as a kernel nonlinear subspace method disclosed in Patent Document 1 is known.
[0006]
[Patent Document 1]
JP 2001-90274 A
[0007]
[Problems to be solved by the invention]
However, for example, in the case of a person's face, there are various changes such as lateral adjustment, upward adjustment, neck caulking condition, lighting condition, etc. When trying to detect a deviation from the conventional reference state, a neural network A large amount of learning image data used for learning is required in accordance with the various changes described above. Moreover, as a result of learning with such a large amount of image data, the scale of the neural network has become enormous, and it has been impossible to complete the processing within a realistic time.
[0008]
Further, in a conventional apparatus for recognizing a target object, processing is performed using the entire photograph or the like as a search source for the target object as a search range. For this reason, the amount of data to be processed also increases, and the processing load is great.
[0009]
The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide an image search apparatus capable of reducing the processing load for searching for an object to be searched from a photograph or the like and improving the search accuracy. I will.
[0010]
[Means for Solving the Problems]
  The invention according to claim 1 is an image search device for searching for a search target image data portion to be searched from within the target image data to be processed, wherein a search region is included in the target image data. Referring to the means for defining at least one and a plurality of first conversion databases learned and acquired in advance for each conversion method, the search partial data included in each of the defined search areas should be applied.N1 piecesA conversion condition including a conversion method and a conversion amount is acquired, and conversion based on the acquired conversion condition is performed at least once for each search partial data, and conversion is performed by the first conversion means. A conversion method for each of the search partial data further compared to the conversion in the first conversion meansN2 (N2 is different from N1)A limited conversion condition that limits the amount of conversion is acquired with reference to a plurality of second conversion databases that have been acquired in advance for each conversion method, and a conversion based on the acquired limited conversion condition is performed for each of the converted search parts. At least once on the dataAnd convert each search partial data to a reference stateSecond conversion means,In advance, the reference stateWith reference to a search database acquired by learning using image data, it is determined whether or not a search target is included in each of the converted search partial data, and the determination result is output. . Here, when the conversion conditions are acquired by the first and second conversion means, feature amount vector information including a predetermined set of feature amounts is calculated based on the respective search partial data, and the feature amount vector information is calculated. The number of feature quantities included in the feature quantity vector information in the first conversion means will be respectively referred to the first and second conversion databases.N1Is the number of feature quantities included in the feature quantity vector information in the second conversion meansN2The accuracy may be reduced by reducing the accuracy. According to this, the processing load in the first conversion means is reduced.
[0011]
According to a second aspect of the present invention, in the image search device according to the first aspect, preliminary search means for determining whether or not a search target is included for each search partial data converted by the first conversion means. In addition, the second conversion means converts only the search partial data that is determined to be included in the search target by the preliminary search means.
[0012]
According to a third aspect of the present invention, in the image search device according to the second aspect, the preliminary search means uses the search target image data example for the search partial data converted by the first conversion means. The search database obtained by learning is referred to and it is determined whether or not a search target is included in each of the converted search partial data.
[0013]
  According to a fourth aspect of the present invention, in the image search device according to the third aspect, the means for demarcating the search area includes entropy, hierarchical entropy, color of image data included in the area to be demarcated as the search area. And at least one of luminance dispersion are used to determine whether or not the search area is actually defined. According to a fifth aspect of the present invention, in the image search device according to the third aspect, the entropy of the image data included in the area to be defined as the search area by the means for defining the search area is a predetermined value. If it is larger than the threshold value, the area to be defined as the search area is actually defined as the search area.
[0014]
  A sixth aspect of the present invention is an image search program for searching for a search target image data portion to be searched from target image data to be processed, wherein the computer is included in the target image data. And applying to each search partial data included in each of the defined search areas with reference to a means for defining at least one search area and a plurality of first conversion databases learned and acquired in advance for each conversion method. ShouldN1 piecesA conversion condition including a conversion method and a conversion amount is acquired, and conversion based on the acquired conversion condition is performed at least once for each search partial data, and conversion is performed by the first conversion means. A conversion method for each of the search partial data further compared to the conversion in the first conversion meansN2 (N2 is different from N1)A limited conversion condition that limits the amount of conversion is acquired with reference to a plurality of second conversion databases that have been acquired in advance for each conversion method, and a conversion based on the acquired limited conversion condition is performed for each of the converted search parts. At least once on the dataAnd convert each search partial data to a reference stateA second conversion means;In advance, the reference stateRefer to the search database learned and acquired using image data, determine whether each of the converted search partial data includes a search target, and function as a means for outputting the determination result That's what it meant.
[0015]
  The invention according to claim 7 is the image search program according to claim 6, wherein for each search partial data converted by the function as the first conversion means, it is determined whether a search target is included, Only search partial data determined to include a search target is caused to function as the second conversion means.Like,Make the computer workFurther comprising meansThat's what it meant.
[0016]
  According to an eighth aspect of the present invention, in the image search program according to the seventh aspect, when functioning as the preliminary search means, the search partial data converted by the first conversion means is an example of search target image data. The search database acquired by learning is referred to to determine whether or not a search target is included in each of the converted search partial data.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
[Basic configuration]
Embodiments of the present invention will be described with reference to the drawings. As shown in FIG. 1, the image search apparatus according to the embodiment of the present invention includes a control unit 11, a storage unit 12, a database unit 13, a display unit 14, an operation unit 15, and an external storage unit 16. It is implemented using a general computer configured to include This computer may be incorporated in another product, such as a camera.
[0018]
The control unit 11 operates according to a program stored in the storage unit 12, and includes a search area demarcation process that demarcates at least one search area from the target image data to be processed, and a reference state. A conversion process for conversion, a search process for detecting a search area including a search target, and a predetermined process using the search result are executed. Specific processing contents of these control units 11 will be described in detail later.
[0019]
The storage unit 12 stores software executed by the control unit 11. The storage unit 12 also operates as a work memory that holds various data required by the control unit 11 during the process. Specifically, the storage unit 12 can be realized as a storage medium such as a hard disk, a semiconductor memory, or a combination thereof.
[0020]
As will be described later, the database unit 13 includes a first conversion database 13a used in the first conversion process of the control unit 11, a second conversion database 13b used in the second conversion process, and a search database 13c used in the search process. It is a database that contains The database unit 13 is specifically a storage medium such as a hard disk, and the storage unit 12 may also serve as the database unit 13, but is illustrated here separately for the sake of explanation.
[0021]
The display unit 14 is, for example, a display device or a printer device, and displays information according to an instruction input from the control unit 11. The operation unit 15 is a keyboard or a mouse, for example, and accepts a user operation and outputs the content of the operation to the control unit 11.
[0022]
The external storage unit 16 reads a program or data from a computer-readable removable medium (a type of storage medium) such as a CD-ROM or a DVD-ROM, and outputs the program or data to the control unit 11. The processing to be stored in is performed. The program according to the present embodiment can be distributed and stored in a portable storage medium such as a CD-ROM, and is copied to the storage unit 12 using the external storage unit 16 and used. Note that the program according to the present embodiment may be copied to the storage unit 12 not only from such a storage medium but also from a server on the network via a communication unit (not shown).
[0023]
[Processing of control unit 11]
Here, the content of the process of the control part 11 is demonstrated concretely. In the present embodiment, image data (target image data) to be processed is input from the outside via the external storage unit 16 or a communication unit (not shown) and stored in the storage unit 12. Here, the target image data may be one or plural. When the user operates the operation unit 15 to instruct the control unit 11 to perform a process of searching for a search target for specific target image data (instruction to start processing), the control unit 11 2 is started.
[0024]
The control unit 11 performs a first conversion process for demarcating the target image data and demarcating search areas for each of the reduced and converted target image data and bringing the image data portion included in each search area close to the reference state. Thereafter, a second conversion process is performed to further bring the image data portion included in each search area closer to the reference state. And the control part 11 performs a search process with respect to the image data part contained in the search area | region after a 2nd conversion process.
[0025]
[First conversion process]
First, the contents of the first conversion process will be described. One of the characteristic features of this embodiment is that the first conversion process is performed step by step, that is, the conversion corresponding to one degree of freedom of conversion is performed sequentially. is there.
[0026]
In order to determine the content of conversion and the amount of conversion to be performed at each stage, the control unit 11 obtains predetermined N1-dimensional feature vector information based on the image data portion included in the currently selected search area. Calculate. Here, the feature amount vector information is a vector amount including N1 feature amount elements selected in accordance with the property of the search target.
[0027]
In the present embodiment, the control unit 11 will be described as specifying the transformation by the kernel nonlinear subspace method using the feature vector information and the feature vector information stored in the transformation database 13a.
[0028]
[Contents of the first conversion database 13a]
Since this kernel nonlinear subspace method is widely known as a method for classifying data into a certain category, a detailed description thereof will be omitted. To put it briefly, a space F1 that is stretched based on N1 feature elements is used. , Each of the plurality of subspaces Ω included in the space F1 is recognized as a category to which data is classified, and feature vector information in the space F created based on the data to be classified (for example, Φ and Are projected onto each subspace Ω (the projection result is, for example, φ), and the subspace Ω that minimizes the distance E between the feature vector information Φ before projection and the feature vector information φ after projection This is a method for detecting that (referred to as a nearest subspace) and determining that the data to be classified belongs to the category represented by the subspace Ω.
[0029]
Therefore, in the learning stage, the non-linear mapping (to the space F1) is performed so that the closest subspace Ω based on the N1 dimensional feature vector information corresponding to the example data for learning (learning sample) that should belong to the same category is the same. At least one of the mapping, that is, the parameter included in the kernel function) and the hyperplane that separates the subspaces Ω corresponding to each category is adjusted.
[0030]
In the present embodiment, this conversion database 13a is formed in order to determine a method (type and amount of conversion) for converting the search target into the reference state. That is, the conversion database 13a has been learned and acquired so that the amount (category) of conversion can be determined for each type of conversion to be performed on image data for which it is unknown whether or not it is in the reference state. In the present embodiment, a conversion database 13a is created for each type of conversion (degree of freedom) to be performed on an image, such as image rotation, parallel movement, and size change. The conversion database 13a corresponding to each degree of freedom of conversion is acquired by learning the conversion amount of the corresponding conversion as a category.
[0031]
In order to acquire this learning, a learning sample is generated as follows in the learning process of the conversion database 13a of the present embodiment. That is, a plurality of examples of image data to be searched in a predetermined reference state are prepared, and for each example of image data, conversion is performed with different conversion amounts for each degree of freedom. A plurality of converted image data is generated. The converted image data generated for each degree of freedom is used as a learning sample for each degree of freedom. Specifically, when a face is a search target, a plurality of face image data in a predetermined reference state (predetermined shooting condition / posture) is prepared as an example, and for each image data, for example, rotation / For each degree of freedom such as parallel movement and size change, if it is a rotation, the converted image data rotated by an angle of 5 degrees in the range from -180 degrees to 180 degrees is used as a learning sample for the degree of freedom of rotation. . In the case of parallel movement, a plurality of converted image data moved by 5 pixels vertically and horizontally are used as learning samples for the degree of freedom of translation. Since these learning samples include conversion degrees of freedom such as movement, the learning samples are generated by moving the area of the reference state from the image data in a region wider than the reference state while moving it by 5 pixels.
[0032]
In this way, for each of a plurality of image data examples, a plurality of image data that has been subjected to a plurality of conversions for each degree of freedom is generated, and information indicating what conversion has been performed on each image data (a large amount of conversion) Etc.).
[0033]
Here, in order to obtain image data having undergone conversions of different conversion amounts, the image data that has been converted while changing the conversion amount by a predetermined step (for example, 5 degrees in terms of rotation) is used as a learning sample. Although they are included, it is also possible to perform conversion while determining the conversion amount by a random number, and include each in the learning sample, instead of changing each predetermined step.
[0034]
Next, the conversion database 13a corresponding to each degree of freedom is learned using the learning sample for each degree of freedom.
[0035]
[Operation of the first conversion process]
The control unit 11 performs the conversion process as follows for each of the search areas defined by the search area defining process, using each of the conversion databases 13a learned in this way. That is, the image data portion included in the search area to be processed (for example, it can be equated with a vector value as a column of pixel values) is converted into N1-dimensional feature vector information (for each conversion database 13a) in the space F1. , That is, a set of feature values defined for each degree of freedom of conversion), and the mapping is projected onto each subspace Ω. Then, a conversion amount that minimizes the distance E between the feature vector information before projection and the feature vector information after projection is determined. Further, the control unit 11 calculates a square value L of the distance E, and stores this in the storage unit 12 as an error.
[0036]
Here, the conversion amount is determined for each degree of freedom based on each conversion database 13a, but the control unit 11 selects one of the conversion amounts corresponding to each degree of freedom as a predetermined condition (for example, corresponding to each conversion amount). The conversion corresponding to the selected degree of freedom is converted by an amount corresponding to the selected conversion amount.
[0037]
That is, from the image data portion included in the search area, the reference state is obtained by, for example, 10 degree rotation conversion with respect to the rotation degree of freedom according to information acquired by learning in each conversion database 13a corresponding to each degree of freedom. Since the error is Lr, and the degree of freedom of translation is approached to the reference state by conversion of 5 pixels to the left, and information such as Lp is obtained, the error is the smallest among these Select a transformation with degrees of freedom. For example, in the case of the above-described example, if Lr <Lp, a rotation search of 10 degrees is performed on the search area to define a new search area. Then, the image data portion included in the new search area is further mapped to the feature vector information in the space F1, and the mapping is further projected to each partial space Ω. And it repeats from the process which determines the conversion amount from which the distance E of the feature-value vector information before projection and the feature-value vector information after projection becomes the minimum.
[0038]
In addition, when the conversion amounts corresponding to the respective degrees of freedom all represent “0” (that is, no conversion), the process ends at that stage, and if there is an unprocessed search area, The conversion process is performed with any of the unprocessed search areas as a processing target.
[0039]
Here, the image data portion included in the defined search area is used as it is in the target image data. However, processing for reducing the resolution of the image data portion is performed to obtain coarse-grained data, and the coarse-grained data. You may perform a conversion process using data. In this case, each conversion database 13a is learned and acquired using a learning sample corresponding to the coarse-grained data.
[0040]
[Second conversion process]
First, the contents of the second conversion process will be described. One of the characteristic features of the present embodiment is that the second conversion process is performed step by step, that is, the conversion corresponding to one degree of freedom of conversion is sequentially performed once. is there.
[0041]
In order to determine the content of conversion and the amount of conversion to be performed at each stage, the control unit 11 obtains predetermined N2-dimensional feature vector information based on the image data portion included in the currently selected search area. Calculate. Here, the feature amount vector information is a vector amount including N2 feature amount elements selected in accordance with the property of the search target. In the present embodiment, first, the first conversion process is used to roughly approximate the reference state, and then the second conversion process is performed to more closely approximate the reference state. For this reason, the number of dimensions of the feature vector information in the second conversion process is made larger than that in the first conversion process (N2> N1). Accordingly, the second conversion process can be performed with higher accuracy.
[0042]
Similar to the first conversion process, the control unit 11 specifies the conversion by the kernel nonlinear subspace method using the N2 dimensional feature vector information and the feature vector information stored in the conversion database 11b. .
[0043]
[Contents of second conversion database 13b]
The second conversion database 13b is learned and acquired by the same method as the first conversion database 13a. However, in the learning process of the first conversion database 13a, based on the learning sample, the feature vector information of predetermined N1 dimensions is converted. In contrast to the conversion, the learning process of the second conversion database 13b differs from the learning sample by converting the learning sample into predetermined N 2 -dimensional feature vector information and performing learning using this.
[0044]
[Operation of second conversion process]
The control unit 11 further performs the second conversion process on each of the search areas converted by the first conversion process, using the second conversion database 13b that has been learned and acquired for each conversion degree of freedom. That is, the image data portion included in the search area to be processed (for example, it can be regarded as a vector value as a column of pixel values) is converted to N2 original feature vector information (each second conversion database) in the space F2. 13b, that is, a set of feature values defined for each degree of freedom of conversion), and the mapping is projected onto each subspace Ω. Then, the conversion amount that minimizes the distance E between the feature vector information before projection and the feature vector information after projection is determined. Further, the control unit 11 calculates a square value L of the distance E, and stores this in the storage unit 12 as an error.
[0045]
Here, as in the case of the first conversion process, the conversion amount is determined for each degree of freedom based on each second conversion database 13b, but the control unit 11 selects one of the conversion amounts corresponding to each degree of freedom. Selection is made based on a predetermined condition (for example, a condition such that the distance E corresponding to each conversion amount is minimum), and conversion corresponding to the selected degree of freedom is converted by the selected conversion amount.
[0046]
That is, in this case as well, in the same way as the first conversion process, from the image data portion included in the search area, for example, by the information acquired by learning in each second conversion database 13b corresponding to each degree of freedom, for example, free rotation For a degree, information is approached to a reference state by a rotational transformation of 10 degrees, and its error is Lr, and for a degree of freedom of translation, information is approached to a reference state by conversion of 5 pixels to the left, and the error is Lp. Therefore, the conversion of the degree of freedom that minimizes the error is selected from these. For example, in the case of the above-described example, if Lr <Lp, a rotation search of 10 degrees is performed on the search area to define a new search area. Then, the image data portion included in the new search area is further mapped onto the feature vector information in the space F2, and the mapping is further projected onto each partial space Ω. And it repeats from the process which determines the conversion amount from which the distance E of the feature-value vector information before projection and the feature-value vector information after projection becomes the minimum.
[0047]
On the other hand, if the conversion amount corresponding to each degree of freedom represents “0” (that is, no conversion), the process ends at that stage, and if there is an unprocessed search area, the unprocessed search area Conversion processing is performed with any one of the processing search areas as a processing target.
[0048]
In this case as well, the image data portion included in the defined search area is used as it is in the target image data. However, processing for reducing the resolution of the image data portion is performed to obtain coarse-grained data. You may perform a 2nd conversion process using data. In this case, each second conversion database 13b is learned and acquired using a learning sample corresponding to the coarse-grained data.
[0049]
In the first and second conversion processes, the control unit 11 does not sequentially perform processes such as calculation of feature vector information, mapping to a partial space, distance evaluation, and error evaluation for each degree of freedom. It may be performed in parallel.
[0050]
Furthermore, although the case where the kernel nonlinear subspace method is used has been described here as an example, other methods such as an auto encoder may be used as long as data classification and error evaluation at the time of classification are possible.
[0051]
Further, here, it has been described that the conversion of each degree of freedom is performed in the second conversion process similarly to the first conversion process. However, in the second conversion process, the number of degrees of freedom related to the conversion may be reduced. For example, only the conversion with the same degree of freedom as the conversion performed last in the first conversion process is performed, or the conversion degree of freedom is first selected in the second conversion process. The conversion process may be repeated only for the degrees of freedom selected in the first second conversion process.
[0052]
Further, the maximum value of the conversion amount as a category determined in the second conversion process may be set smaller than the maximum value of the conversion amount determined in the first conversion process. In other words, the learning sample of the second conversion database 13b used in the second conversion process may be a sample having a narrower conversion amount range than the learning sample of the first conversion database 13a. For example, in the learning sample for the first conversion database 13a, image data rotated by 10 degrees in a range from a rotation angle of −180 degrees to 180 degrees is used, and in the learning sample for the second conversion database 13b, the range and The step amount is reduced, and image data rotated by 3 degrees in a range from −30 degrees to 30 degrees is used.
[0053]
As described above, the conversion condition (at least one of the degree of freedom of conversion or the conversion amount) in the second conversion process may be limited as compared with the first conversion process.
[0054]
[Processing flow of control unit 11]
Specifically, the control unit 11 first sets the reduction rate S to the minimum reduction rate (for example, 1 time, that is, does not reduce) (S1), and reduces the target image data at the reduction rate S (S2). Then, an area of map data having a size equal to the size of the target image data after reduction is secured on the storage unit 12, the value of the area is set to “false”, and the map data is initialized. (S3). For example, if the target image data after reduction is image data of 1000 × 1000 pixels, an area for 1000 × 1000 bits is secured, and each bit value is initialized to “0”.
[0055]
Next, the control unit 11 performs a process of defining at least one search area for the reduced target image data (S4). The search area defining process will be described in detail later. Then, one of the search areas is selected (S5), and the first conversion process is executed for the search area (S6).
[0056]
Subsequently, the control unit 11 further executes a second conversion process for the search area resulting from the first conversion process (S7), and the image data portion included in the search area after the first and second conversion processes. (S8) is executed to determine whether or not a search target is included (S8). If it is determined that the search target is included (Yes), the value of the area on the map data corresponding to the search area after the conversion process is set to “true” (S9). ). Then, it is checked whether or not there is an unselected search area (S10). If there is an unselected search area (if Yes), the process returns to step S5 (A), and the unselected search area is selected. Select one of them and continue processing.
[0057]
On the other hand, when it is determined that the search target is not included as a result of the search process in the process S8 (when No), the process proceeds to the process S10 as it is. In addition, if there is no search area that has not been selected in process S10, that is, if the conversion process and the search process have been completed for all search areas (if No), the currently set reduction rate S is determined in advance. (S11), if not (if No), the reduction rate S is adjusted to be increased (S12), and the process returns to step S2 for processing. Continue (B). Here, the process S12 for largely adjusting the reduction rate S may be, for example, a process for increasing the reduction rate S by a predetermined ratio, or by multiplying the magnification which is the reduction rate S by a predetermined multiplier ΔS, S = A new reduction ratio S may be set as S × ΔS.
[0058]
If the currently set reduction rate S does not exceed the predetermined maximum reduction rate in Step S11 (if Yes), based on the map data corresponding to the target image data at each reduction rate, the original In the target image data (before reduction), an area including a search target is defined (S13), and the process ends.
[0059]
Here, after the first conversion process, the second conversion process is executed immediately. However, after the first conversion process, a preliminary search process (the contents will be described later) is performed, and this preliminary conversion process is performed. As a result of the search process, the second conversion process is performed only when it is determined that the search target may be included in the image data portion included in the search area after the first conversion process. In this case, if it is determined that there is no possibility that the search target is included in the image data portion included in the search area after the first conversion process, the process may be shifted to the above-described process S10 and the process may be continued. According to this, since the second conversion process is not performed on the area not including the search target, the processing load can be further reduced.
[0060]
[Search area definition processing]
Here, a process in which the control unit 11 demarcates a search area (search area demarcation process) will be described. The search area defining process can be performed using information on the start point input by the user or by autonomously defining an area that satisfies a predetermined condition.
[0061]
For example, when using information input from the user, the control unit 11 determines a predetermined size with each start point coordinate as an upper left corner based on information of at least one start point coordinate input from the operation unit 15 or the like. Are defined as search areas.
[0062]
When the area satisfying the predetermined condition is autonomously searched and demarcated, the control unit 11 starts the coordinates (for example, X = 0, Y = 0) of the upper left corner of the target image data (after reduction) as the starting point. As a result, it is checked whether or not a predetermined demarcation condition is satisfied for a rectangular area having a predetermined size. If the delimitation condition is satisfied, the process of setting the rectangular area as a search area When the start point deviates from the width of the target image data (X> the width of the target image data), the start point is moved by a predetermined amount in the height direction. (X = 0, Y = Y + ΔY) and repeat the process in the width direction. Thus, an area that satisfies the demarcation condition is defined as a search area in the entire target image data.
[0063]
Here, the moving amounts of the starting point are ΔX and ΔY in the width direction and the height direction, respectively, but these moving amounts are relative to ΔX and ΔY when the image data is 1 time depending on the reduction rate S of the target image data. ΔX / S, ΔY / S may be used.
[0064]
[Search process]
Next, the search process performed in process S8 will be described. In this search process, it is determined whether or not a search target is included using the search database 13c for each image data portion included in each search area for which the conversion process has been completed. As a specific example of the search process, there is a method disclosed in Japanese Patent Laid-Open No. 2002-329188. Next, the outline will be described.
[0065]
[Learning course of search database]
This search database 13c is formed by learning a neural network using an example of search target image data in a reference state as a learning sample. That is, the control unit 11 receives a plurality of learning samples, calculates a feature amount set (feature amount vector) selected in advance according to the properties of the search target for each of the learning samples, and generates learning data. To do. Next, using this learning data, a lattice space map is formed on the M × M ′ lattice space stored in the storage unit 12 by SOM (self-organizing map). That is, the control unit 11 calculates the distance between the feature vector that is the input learning data and the weight vector assigned to each grid with a predetermined measure (for example, Euclidean measure), and the distance is the minimum. A lattice (most matching node) c is detected. Then, the weight vectors of the plurality of grids near the most matched node are updated using the input feature vector. By repeating this process, a lattice space map is formed on the storage unit 12, and the best matching nodes for similar feature vectors form a continuous region. That is, in this lattice space, a non-linear projection from a feature vector, which is a multi-dimensional input signal, to a two-dimensional map is formed while maintaining the phase, and by updating the weights, feature portions of the data are organized, As a learning result, there are grids that react to similar data in close proximity.
[0066]
When the learning based on each learning data is completed, the control unit 11 classifies each lattice of the lattice space map into a category. This classification can be performed based on, for example, the distance between the lattices (the distance between the weight vectors associated with each lattice), and the category of the lattice group that reacts to image data similar to the search target (search target category). And other categories of lattice groups (non-search target categories).
[0067]
[Search operation]
The control unit 11 secures an area for storing map data having the same size as the target image data in the storage unit 12 and initializes the value of the area to “false”.
[0068]
The control unit 11 calculates a predetermined feature vector based on the image data portion of the search area for which the conversion process has been completed, using the learned and acquired search database 13c. Then, the distance between the calculated feature vector and the weight vector associated with each grid in the search database 13c is obtained, the grid (most matching node) having the minimum distance from the feature vector is specified, and the specified grid If it belongs to the search target category, it is determined that the search area includes the search target. If the specified lattice belongs to the non-search target category, the search area does not include the search target. to decide.
[0069]
[Preliminary search processing]
A preliminary search process for determining whether to perform the second conversion process will be described. This preliminary search process may be the same process as the search process performed in the above-described process S8, but it is also preferable to make the process simpler in consideration of the processing load of the search process itself.
[0070]
For example, as the preliminary search process, a predetermined feature amount is calculated based on the image data portion included in the search area after the first conversion process, and the search is performed based on whether the feature amount satisfies a predetermined condition. It may be determined whether there is a possibility that the target is included. As an example, when a human face is a search target, the entropy is higher in the contour portion of the human face than in other portions. Therefore, the entropy is calculated as a feature amount, and a predetermined threshold (for example, target image data) is calculated. When the calculated entropy value is higher than the threshold value determined based on the overall average entropy value), it is determined that the search target may be included.
[0071]
[Operation of Control Unit 11]
As described above, the control unit 11 according to the present embodiment sequentially extracts the target image data to be searched, extracts the area to be searched from each of the reduced target image data, The first conversion processing is executed to bring the image data portion closer to the reference state, and then a preliminary search is made as to whether or not the search target is included in the image data portion after the first conversion processing.
[0072]
If it is determined that there is a possibility that the search target is included as a result of this preliminary search, the second conversion process is executed so as to be closer to the reference state, and the search target is the search after the conversion. It is determined whether it is included in the image data in the area. In other words, the control unit 11 first performs the process of setting the image data to be searched for as a reference state roughly, and then performs the process finely, thereby reducing the overall processing load.
[0073]
Since the conversion process to the reference state is performed in this way, in the present embodiment, the position of the search area before conversion may be slightly shifted. Even if the reduction ratio is slightly deviated from the reference state, there is no problem. Therefore, conventionally, multi-stage reduced image data reduced by 0.8 times is generated and the search area is extracted while being shifted by one pixel. It is possible to reduce by 5 times, and ΔX and ΔY can be set to 6 pixels or the like even when a region that satisfies a predetermined condition is autonomously extracted when defining a search region. As a result, the number of patterns to be searched can be significantly reduced, and the processing load for searching for a search object from a photograph or the like can be reduced.
[0074]
The control unit 11 generates map data representing an area determined to include the search target as search result information. This map data corresponds to each of the target image data after being reduced at each reduction rate. Are generated. Therefore, an area including the search target is determined by using the plurality of map data (each having the reduced target image data size) in an integrated manner.
[0075]
For example, each map data is enlarged at an enlargement ratio corresponding to the respective reduction ratios, compared with the size of the original target image data, and a region that is “true” in common to all map data ( Based on the target image data of any reduction ratio, it may be determined that the search target is included in an area where the search target is determined to be included. Further, an area that is “true” in any one of the map data may be determined as an area that includes a search target.
[0076]
[Variation of search area definition processing]
Further, in the search area defining process, the control unit 11 determines the entropy, hierarchical entropy, color, luminance distribution, and two or more of these values of the image data included in the area to be defined as the search area. A combination may be used to determine whether or not the search area is actually defined. For example, when searching for a face part of a person, the entropy is high in the peripheral part (outline part) of the face, so if the entropy is larger than a predetermined threshold, the area to be defined as the search area is actually A search area is defined. Otherwise, the area is not set as a search area, and other processes are continued. According to this, the search area to be subjected to the conversion process / search process can be rationally reduced, and the processing load can be reduced.
[0077]
[Other variations]
In the description so far, the conversion performed in the conversion process is the two-dimensional rotation, translation, enlargement / reduction of the search area (the search area is enlarged / reduced, and the original image data portion is enlarged (before the enlargement / reduction). As a result, the image data portion included in the search area is converted. However, other than this, for example, if it is a human face Further, conversion may be included for the image data portion itself, such as a three-dimensional rotation influenced by the posture (growth adjustment or swing direction adjustment). Specifically, such a three-dimensional rotation can be realized by assuming an average three-dimensional model to be searched and projecting an image data portion on the average three-dimensional model.
[0078]
Further, in the search process, for example, even a human face may be further classified into categories and may include conditions such as age, sex, and whether or not the mouth is open.
[0079]
Furthermore, in the flowchart of FIG. 2, the control unit 11 sequentially performs processing at each reduction ratio. However, since the processing at each reduction ratio is independent from each other, the control unit 11 performs processing at each reduction ratio. Processing may be performed in parallel.
[0080]
[Operation]
Next, the operation of the image search device of the present embodiment will be described by taking as an example a case where a human face portion is searched for as a search target from image data of a photograph given as target image data. In the following example, for the sake of simplicity, it is assumed that the degree of freedom of conversion is only translation (x, y) and rotation (θ).
[0081]
The control unit 11 stores target image data input from the outside such as an external interface or a communication unit (not shown) in the storage unit 12, and demarcates the search area by the method described above. For example, the control unit 11 divides the target image data into image blocks of a predetermined size (parts of which may overlap each other), and for example, defines an image block whose entropy is higher than a predetermined value as a search region. Then, one of the defined search areas is selected, and the first conversion process is executed for the selected search area. In this first conversion process, conceptually, from the reference state O, the image data portion in the range R1 as shown in FIG.
[0082]
The control unit 11 further performs a preliminary search to determine whether or not the face of the search target person is included in the image data portion after the first conversion process. This preliminary search process may be the same as the main search process using a neural network, or may be a simple process using a feature quantity such as entropy. When it is determined as a result of the preliminary search process that the image data portion after the first conversion process includes the face of the person to be searched for, the control unit 11 continues to the image data portion. The second conversion process is executed on the search area. In this second conversion process, conceptually, as shown in FIG. 3B corresponding to FIG. 3A, the degree of freedom is, for example, only the θ direction (rotation), and the conversion range is narrow. The process (R2) is applied, and fine adjustment is performed to bring the image data portion closer to the reference state O.
[0083]
The control unit 11 performs a search process on the image data portion included in the search area after the second conversion process, checks whether the search target is included, and if the search target is included, Information for specifying the portion of the target image data corresponding to the search area where the search processing has been performed is generated.
[0084]
As described above, in this embodiment, the processing load can be reduced and the search is performed closer to the reference state as compared to the case of the conversion process using only one conversion database by performing the two-stage conversion process. Therefore, the search accuracy can be improved.
[0085]
In the above description, a case where two-stage conversion processing is performed is shown as an example, but it may be performed three or more times. In this case, the conversion conditions in the conversion process may be further limited as the level is increased, or the number of dimensions of the feature vector information may be further increased.
[0086]
【Example】
FIG. 4A is a distribution diagram showing how much error has occurred in the image data after the conversion to the reference state by the conversion process using one type of conversion database from the actual reference state. Yes, FIG. 4B shows, as in this embodiment, when two types of conversion databases are used, and the conversion processing is performed in two steps, first coarsely and then finely. It is a distribution diagram showing how much error image data has produced from the actual reference state. As can be understood with reference to the drawings, according to the method of the present embodiment, the average error is reduced, and conversion is performed so that the distribution state is closer to the reference state.
[0087]
Further, in the case of the conversion process using the one type of conversion database, an average of about 20 conversion processes are required before convergence (unit conversion for each degree of freedom is performed approximately 20 times). When the two types of conversion databases are used, the rough conversion process is performed about 10 times and the fine conversion process is performed about 5 times, and the processing load of the conversion process itself is reduced as a whole.
[Brief description of the drawings]
FIG. 1 is a configuration block diagram of an image search apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an example of processing of a control unit 11.
FIG. 3 is an explanatory diagram showing an outline of conversion processing performed by the image search apparatus according to the embodiment of the present invention.
FIG. 4 is an explanatory diagram showing an experimental result comparing a result of a conversion process using one type of conversion database and a result of a two-stage conversion process using two types of conversion databases.
[Explanation of symbols]
11 control unit, 12 storage unit, 13 database unit, 14 display unit, 15 operation unit, 16 external storage unit.

Claims

An image search device for searching a search target image data portion to be searched from target image data to be processed,
Means for defining at least one search area in the target image data;
With reference to a plurality of first conversion databases learned and acquired in advance for each conversion method, N1 conversion methods and conversion amounts to be applied to each search partial data included in each of the defined search areas are determined. First conversion means for acquiring a conversion condition including, and performing conversion based on the acquired conversion condition at least once for each search partial data;
Each search partial data converted by the first conversion means is further limited to N2 conversion methods (N2 is different from N1) compared to the conversion in the first conversion means , and the amount of conversion is limited. A conversion condition is acquired by referring to a plurality of second conversion databases that have been acquired in advance for each conversion method, and conversion based on the acquired limited conversion condition is performed at least once for each of the converted search partial data. Second search means for converting each search partial data into a reference state ;
Including
A search database acquired by learning using image data in the reference state is referred to in advance to determine whether a search target is included in each of the converted search partial data, and the determination result is output. An image search apparatus characterized by:

The image search device according to claim 1,
For each search partial data converted by the first conversion means, it further comprises preliminary search means for determining whether or not a search target is included,
An image search apparatus characterized in that the second conversion means converts only search partial data determined to contain a search target by the preliminary search means.

The image search device according to claim 2,
The preliminary search means refers to the search database acquired by learning using the search target image data example for the search partial data converted by the first conversion means, and sets each of the search partial data after conversion. An image search apparatus for determining whether or not a search target is included.

In the image search device according to claim 3,
The means for defining the search area is:
Using at least one of entropy, hierarchical entropy, color, and luminance dispersion of image data included in an area to be defined as a search area, and determining whether to actually demarcate as a search area An image search device.

In the image search device according to claim 3,
The means for defining the search area is:
When the entropy of the image data included in the area to be defined as the search area is larger than a predetermined threshold, the area to be defined as the search area is actually defined as the search area An image search device.

An image search program for searching for a search target image data portion to be searched from target image data to be processed, the computer comprising:
Means for defining at least one search area in the target image data;
With reference to a plurality of first conversion databases learned and acquired in advance for each conversion method, N1 conversion methods and conversion amounts to be applied to each search partial data included in each of the defined search areas are determined. First conversion means for acquiring a conversion condition including, and performing conversion based on the acquired conversion condition at least once for each search partial data;
Each search partial data converted by the first conversion means is further limited to N2 conversion methods (N2 is different from N1) compared to the conversion in the first conversion means , and the amount of conversion is limited. A conversion condition is acquired with reference to a plurality of second conversion databases that have been learned and acquired in advance for each conversion method, and conversion based on the acquired limited conversion condition is performed at least once for each of the converted search partial data. Second search means for converting each search partial data into a reference state ;
A search database acquired by learning using image data in the reference state is referred to in advance to determine whether a search target is included in each of the converted search partial data, and the determination result is output. Means to
A program characterized by functioning as

The image search program according to claim 6,
For each search partial data converted by the function as the first conversion means, it is determined whether or not a search target is included, and only the search partial data determined to include a search target is 2. An image search program further comprising means for causing the computer to function so as to function as two conversion means.

The image search program according to claim 7, wherein
When functioning as the preliminary search means, the search portion data after being converted by the first conversion means is referred to a search database acquired by learning using an example of image data to be searched, and the search portion after the conversion An image search program for determining whether or not a search target is included in each piece of data.