JP4298283B2

JP4298283B2 - Pattern recognition apparatus, pattern recognition method, and program

Info

Publication number: JP4298283B2
Application number: JP2002364369A
Authority: JP
Inventors: 克彦森; 優和真継; 美絵石井; 裕輔御手洗
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-12-16
Filing date: 2002-12-16
Publication date: 2009-07-15
Anticipated expiration: 2022-12-16
Also published as: JP2004199200A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば、対象画像に対して階層的な演算処理を施すことで、対象画像でのパターン認識や特定被写体の検出等を行なうパターン認識装置、パターン認識方法、及びプログラムに関するものである。
【０００２】
【従来の技術】
従来より例えば、画像認識や音声認識の分野においては、特定の認識対象に特化した認識処理アルゴリズムを、コンピュータソフトウェア或いは専用並列画像処理プロセッサを用いたハードウェアにより実行することで、認識対象及び背景を含む画像から、認識対象を検出する技術が知られている。
【０００３】
特に、対象画像中に存在する顔領域を特定の認識対象として検出するための構成としては、例えば、特許２７６７８１４号、特開平９−２５１５３４号、特開平９−４４６７６号、特許２９７３６７６号、及び特開平１１−２８３０３６号等に開示されているものがある。
【０００４】
具体的には、まず、特開平９−２５１５３４号等に記載された構成は、入力画像（対象画像）に対して、標準顔と呼ばれるテンプレートを使用し、対象画像中の顔領域を探索し、その後、当該顔領域から、眼や、鼻孔、或いは口等の特徴点候補に対して部分テンプレートを使用することで、対象画像中の人物を認証するものである。
【０００５】
特許２７６７８１４号等に記載された構成は、顔画像から眼と口候補群を求め、これらを組み合わせた顔候補群と、予め記録してある顔構造とを照合することで、顔画像中の眼と口に対応する領域を発見するものである。
【０００６】
特開平９−４４６７６号等に記載された構成は、対象画像から、眼、鼻、及び口候補をそれぞれ複数求め、これらの候補と、予め用意されている特徴点との間の位置関係から、対象画像中の顔領域を検出するものである。
【０００７】
特許２９７３６７６号等に記載された構成は、顔の各部品の形状データと、入力画像との一致度を調べる際に、形状データを変更させるようになされており、また、各顔部品の探索領域を、以前に求めた部品の位置関係に基づき決定するものである。
【０００８】
特開平１１−２８３０３６号等に記載された構成は、複数の判定要素取得領域を設定した領域モデルを入力画像（対象画像）中で移動させ、各点で、それら判定要素取得領域内で、判定要素の有無を判定することで、対象画像における顔領域を認識するものである。
【０００９】
【発明が解決しようとする課題】
しかしながら、上述したような従来の画像認識（パターン認識）のための構成は、以下のような問題があった。
【００１０】
まず、特開平９−２５１５３４号等に記載された従来の構成では、対象画像に最初に標準顔と呼ばれるテンプレートを使用して、顔全体のマッチングを行うことで対象画像中の顔領域を探索するため、様々なサイズの顔や、顔の向きの変化に対応することが難しい。また、これに対応するためには、顔のサイズや顔の向きに対応した複数の標準顔を用意して、それぞれを使用して検出する必要があるため、顔全体のテンプレートのサイズが大きくなり、これに伴って、処理コストも大きくなる。
【００１１】
特許２７６７８１４号等に記載された従来の構成では、顔画像から求めた眼と口候補群を組み合わせた顔候補群と、予め記録してある顔構造とを照合することで、顔画像中の眼と口に対応する領域を発見するが、例えば、対象画像から似顔絵を作成するという用途である場合、対象画像中の顔の数は通常１つ若しくは少数であり、その顔の大きさもある程度大きなサイズであり、対象画像中の大部分の領域は顔であり背景は少ないと考えられる。このような対象画像であれば、全ての眼及び口の候補群から、顔候補を作成しても顔候補の数は限定される。しかしながら、対象画像が、例えば、一般的なカメラやビデオで撮影して得られた画像である場合、対象画像中の顔のサイズが小さくなり、その分背景が多くなる場合等が考えられ、このような場合には、背景中で眼候補や口候補が多数誤検出され、このため、全ての眼と口候補群から顔候補を作成すると膨大な数になってしまい、顔構造との照合の処理コストが増大することになる。
【００１２】
特開平９−４４６７６号等に記載された従来の構成は、対象画像から、眼、鼻、及び口候補をそれぞれ複数求め、これらの候補と、予め用意されている特徴点との間の位置関係から、対象画像中の顔領域を検出するが、この構成においても、特許２７６７８１４号等に記載された構成と同様に、背景に眼、鼻、及び口の候補が多数存在した場合、これらの位置関係を照合するための処理コストが膨大になる。
【００１３】
特許２９７３６７６号等に記載された従来の構成では、顔の各部品の形状データ（虹彩、口、鼻等の形状データ）を保持しておき、先ず、２つの虹彩を求め、続いて口及び鼻等を求める際に、虹彩の位置に基づいて、口及び鼻等の顔部品の探索領域を限定している。すなわち、当該構成のアルゴリズムでは、虹彩（眼）、口、及び鼻等の顔を構成する顔部品を並列的に検出するのではなく、虹彩（眼）を最初に検出し、この検出結果を使用して、順に口及び鼻等の顔部品を検出している。したがって、当該構成は、対象画像中に顔が１つしか存在せず、且つ虹彩が正確に求まった場合を想定したものであるため、検出された虹彩が誤検出であった場合、口や鼻等の他の特徴の探索領域を正しく設定できない。
【００１４】
特開平１１−２８３０３６号等に記載された従来の構成では、複数の判定要素取得領域を設定した領域モデルを入力画像（対象画像）中で移動させ、各点で、それら判定要素取得領域内で、判定要素の有無を判定することで、対象画像における顔領域を認識するが、多種の顔のサイズに対応させるためには、サイズの異なった領域モデルを用意する必要があり、実際に該当するサイズの顔が存在しない場合、無駄な処理を多数実行することになり非効率的である。
【００１５】
そこで、本発明は、上記の欠点を除去するために成されたもので、対象画像中に存在する任意の領域を特定の認識対象として検出するにあたり、認識対象が如何なるものであっても、少ない処理コストで効率的に検出できる、パターン認識装置、パターン認識方法、及びプログラムを提供することを目的とする。
【００１６】
具体的には例えば、対象画像中にサイズの異なる複数の認識対象が存在する場合であっても、全ての認識対象を、少ない処理コストで抽出することを実現する。また、認識対象のパターンではないにも関わらず認識対象のパターンとして誤って検出してしまう誤検出を防ぐことを実現する。
【００１７】
【課題を解決するための手段】
斯かる目的下において、本発明に係る、入力信号に含まれる所定パターンを検出するパターン認識装置は、上記所定パターンの複数の特徴を階層的に検出する特徴検出手段と、上記複数の特徴に対応し、前段で検出される特定の特徴に対応する領域の組み合わせで各段の基準データが構成された複数の基準データを保持する基準データ保持手段と、上記基準データ保持手段に保持されている各段の基準データを、上記特徴検出手段で得られた対象特徴の前段の複数の特徴の位置関係に合せて変換し、変換後の基準データを上記特徴検出手段で対象特徴の検出に使用するデータとして設定するデータ設定手段とを備え、上記特徴検出手段は、上記各段の基準データを構成する各領域において前段で検出された対応する特徴の最大値がしきい値より高ければ、その位置にその基準データを使用して検出すべき特徴が存在するものとし、各領域の最大値の平均を該検出すべき特徴の値とすることを特徴とする。
【００２０】
また、本発明に係る、入力信号中に含まれる所定パターンを検出するためのパターン認識方法は、上記所定パターンを構成する複数の特徴を階層的に検出する特徴検出ステップと、上記特徴検出ステップで複数特徴をそれぞれ検出するために、前段で検出される特定の特徴に対応する領域の組み合わせで構成された各段の基準データを複数保持する基準データ保持ステップと、上記特徴検出ステップで特徴検出するために使用するデータを、上記基準データ保持ステップにより保持された基準データに基づき設定するデータ設定ステップとを含み、上記データ設定ステップは、上記特徴検出ステップで特徴を検出するためのデータを設定する際に、上記基準データ保持ステップにより保持された各段の基準データを、上記特徴検出ステップにより得られた検出対象特徴の前段の複数の特徴の位置関係に合せて変換し、変換後の基準データを上記特徴検出ステップで対象特徴の検出に使用するデータとして設定するステップを含み、上記特徴検出ステップでは、上記各段の基準データを構成する各領域において前段で検出された対応する特徴の最大値がしきい値より高ければ、その位置にその基準データを使用して検出すべき特徴が存在するものとし、各領域の最大値の平均を該検出すべき特徴の値とすることを特徴とする。
【００２６】
また、本発明に係る、コンピュータを所定の手段として機能させるためのプログラムは、上記所定の手段が、入力信号に含まれる所定パターンの複数の特徴を階層的に検出する特徴検出手段と、上記複数の特徴に対応し、前段で検出される特定の特徴に対応する領域の組み合わせで各段の基準データが構成された複数の基準データを保持する基準データ保持手段と、上記基準データ保持手段に保持されている各段の基準データを、上記特徴検出手段で得られた対象特徴の前段の複数の特徴の位置関係に合せて変換し、変換後の基準データを上記特徴検出手段で対象特徴の検出に使用するデータとして設定するデータ設定手段とを備え、上記特徴検出手段は、上記各段の基準データを構成する各領域において前段で検出された対応する特徴の最大値がしきい値より高ければ、その位置にその基準データを使用して検出すべき特徴が存在するものとし、各領域の最大値の平均を該検出すべき特徴の値とすることを特徴とする。
【００３０】
【発明の実施の形態】
以下、本発明の実施の形態について図面を用いて説明する。
【００３１】
[第１の実施の形態]
本発明は、例えば、図１に示すようなパターン認識装置１００に適用される。
本実施の形態のパターン認識装置１００は、撮像装置等に適用可能であり、対象画像中に存際する全ての認識対象（パターン）を検出するにあたり、対象画像から認識対象を構成する複数の特徴を階層的に検出するための基準データを複数保持し、当該基準データの基づき、前段の特徴の検出結果から求めたパラメータを用いて、対象とする特徴検出のためのデータを設定する構成により、対象画像中にサイズの異なる複数の認識対象が存在する場合であっても、全ての認識対象を少ない処理コストで効率的に検出するようになされている。
以下、本実施の形態のパターン認識装置１００の構成及び動作について説明する。
【００３２】
＜パターン認識装置１００の構成＞
パターン認識装置１００は、上記図１に示すように、信号入力部１３０、１次特徴検出部１０１、１次特徴検出フィルタ設定部１１１、２次特徴検出部１０２、２次特徴検出モデル設定部１１２、２次特徴基準モデル保持部１２２、３次特徴検出部１０３、３次特徴検出モデル設定部１１３、３次特徴基準モデル保持部１２３、４次特徴検出部１０４、４次特徴検出モデル設定部１１４、４次特徴基準モデル保持部１２４、パターン確認部１０５、確認パターン設定部１１５、及び基準確認パターン保持部１２５を備えている。
【００３３】
信号入力部１３０は、画像信号や音声信号等の処理対象となる信号（ここでは、対象画像の信号）を入力する。
【００３４】
１次特徴検出部１０１は、信号入力部１３０から入力された信号に対して、１次の特徴を検出するための処理を施し、この処理結果（１次特徴検出結果）を２次特徴検出部１０２に供給すると共に、当該１次特徴検出結果及びそのパラメータを２次特徴検出モデル設定部１１２に供給する。
【００３５】
このとき、１次特徴検出フィルタ設定部１１１は、１次特徴検出部１０１で１次特徴を検出するためのフィルタ特性又はパラメータを設定する。
【００３６】
２次特徴検出部１０２は、１次特徴検出部１０１からの１次特徴検出結果に対して、２次特徴検出モデル設定部１１２により設定された検出モデルを用いて、２次の特徴を検出する処理を施し、この処理結果（２次特徴検出結果）を３次特徴検出部１０３に供給すると共に、当該２次特徴検出結果及びそのパラメータを３次特徴検出モデル設定部１１３に供給する。
【００３７】
このとき、２次特徴検出モデル設定部１１２は、２次特徴検出部１０２で２次特徴を検出する際に使用する、１次特徴それぞれの位置関係を示すモデルを、２次特徴基準モデル保持部１２２に保持された基準モデル、１次特徴検出部１０１からの１次特徴検出結果、及びそのパラメータを用いて設定する。
２次特徴基準モデル保持部１２２は、２次特徴検出モデル設定部１１２で設定する検出モデルの基準モデルを保持する。
【００３８】
３次特徴検出部１０３は、２次特徴検出部１０２からの２次特徴検出結果に対して、３次特徴検出モデル設定部１１３により設定された検出モデルを用いて、３次の特徴を検出する処理を施し、この処理結果（３次特徴検出結果）を４次特徴検出部１０４に供給すると共に、当該３次特徴検出結果及びそのパラメータを４次特徴検出モデル設定部１１４に供給する。
【００３９】
このとき、３次特徴検出モデル設定部１１３は、３次特徴検出部１０３で３次特徴を検出する際に使用する、２次特徴それぞれの位置関係を示すモデルを、３次特徴基準モデル保持部１２３に保持された基準モデル、及び２次特徴検出部１０２からの２次特徴検出結果及びそのパラメータとを用いて設定する。
３次特徴基準モデル保持部１２３は、３次特徴検出モデル設定部１１３で設定する検出モデルの基準モデルを保持する。
【００４０】
４次特徴検出部１０４は、３次特徴検出部１０３からの３次特徴検出結果に対して、４次特徴検出モデル設定部１１４により設定された検出モデルを用いて、４次の特徴を検出する処理を施し、この処理結果（４次特徴検出結果）をパターン確認部１０５に供給すると共に、当該４次特徴検出結果及びそのパラメータを確認パターン設定部１１５に供給する。
【００４１】
このとき、４次特徴検出モデル設定部１１４は、４次特徴検出部１０４で４次特徴を検出する際に使用する、３次特徴それぞれの位置関係を示すモデルを、４次特徴基準モデル保持部１２４に保持された基準モデル、及び３次特徴検出部１０３からの３次特徴検出結果及びそのパラメータとを用いて設定する。
４次特徴基準モデル保持部１２４は、４次特徴検出モデル設定部１１４で設定する検出モデルの基準モデルを保持する。
【００４２】
パターン確認部１０５は、信号入力部１３０により入力された信号中に、確認パターン設定部１１５で設定された確認パターンが存在するか否かを確認する。確認パターン設定部１１５は、基準確認パターン保持部１２５に保持された基準パターン、４次特徴検出部１０４からの４次特徴検出結果、及びそのパラメータを使用して、パターン確認部１０５で使用する確認パターンを設定する。
基準確認パターン保持部１２５は、確認パターン設定部１１５で設定する確認パターンの基準パターンを保持する。
【００４３】
＜パターン認識装置１００の動作＞
図２は、パターン認識装置１００の動作をフローチャートにより示したものである。
【００４４】
尚、ここではパターン認識処理の一例として、信号入力１３０からは画像信号が入力され、その画像中の顔領域を検出するものとする。
【００４５】
ステップ２０１：
信号入力部１３０は、処理対象信号として画像信号を入力する。
【００４６】
ステップＳ２０２：
１次特徴検出部１０１は、例えば、１次特徴検出フィルタ設定部１１１により設定されたフィルタを用いて、信号入力部１３０により入力された画像信号から構成される画像（対象画像）の各位置で1次特徴を検出する。
【００４７】
具体的には例えば、図３（ａ）に示すように、１次特徴検出部１０１は、対象画像において、縦特徴大（１−１−１）、横特徴大（１−２−１）、右上がり斜め特徴大（１−３−１）、右下がり斜め特徴大（１−４−１）、縦特徴小（１−１−２）、横特徴小（１−２−２）、右上がり斜め特徴小（１−３−２）、及び右下がり斜め特徴小（１−４−２）等の異なる方向及び異なるサイズの特徴を検出し、この検出結果（１次特徴検出結果）を、各特徴毎に対象画像と同等の大きさの検出結果画像という形で出力する。
この結果、ここでは８種類の１次特徴の検出結果画像が得られることになる。これにより、各特徴の検出結果画像の各位置の値を参照することで、対象画像の該当する位置に各特徴が存在するか否かを判断できる。
【００４８】
尚、１次特徴検出部１０１で使用するフィルタは、最初から複数用意するようにしてもよいし、或いは、方向やサイズをパラメータとして、1次特徴検出フィルタ設定部１１１で作成するようにしてもよい。
また、上記図３（ｂ）〜（ｄ）に示すように、後述する処理で検出する２次特徴は、右空きＶ字特徴（２−１）、左空きＶ字特徴（２−２）、水平平行線特徴（２−３）、及び垂直平行線特徴（２−４）であり、３次特徴は、眼特徴（３−１）及び口特徴（３−２）であり、４次特徴は、顔特徴（４−１）であるものとする。
【００４９】
ステップＳ２０３：
２次特徴検出モデル設定部１１２は、２次特徴検出部１０２で２次特徴を検出するためのモデルを設定する。
【００５０】
具体的には例えば、まず、上記図３（ｂ）に示す右空きＶ字特徴（２−１）を検出するための検出モデルの設定を一例として挙げて考えるものとする。
右空きＶ字特徴（２−１）は、例えば、図４（Ａ）に示すように、１次特徴である右上がり斜め特徴が上部に、右下がり斜め特徴が下部に存在している。すなわち、右空きＶ字特徴を検出するためには、ステップＳ２０２で求めた１次特徴検出の結果を利用して、上部に右上がり斜め特徴が存在し、下部に右下がり斜め特徴が存在する位置を求めればよく、その位置に、右空きＶ字特徴（２−１）が存在することになる。
このように、複数種類の１次特徴を組み合わせて、２次特徴を検出することができる。
【００５１】
しかしながら、対象画像中に存在する顔のサイズは固定サイズではなく、また、個人によって眼や口の大きさが異なり、さらに、眼や口は開閉動作をするため、
右空きＶ字の大きさも変化する。
【００５２】
そこで、本実施の形態では、上記図４（Ｂ）に示すような、右空きＶ字検出基準モデル４００を用いる。
右空きＶ字検出基準モデル４００において、４０３は右上がり斜め領域、４０４は右下がり斜め領域である。右上がり斜め領域４０３に対して、ステップＳ２０２で求めた１次特徴のうち、右上がり斜め特徴大、若しくは右上がり斜め特徴小のみが存在し、また、右下がり斜め領域４０４に対して、右下がり斜め特徴大、若しくは右下がり斜め特徴小のみが存在する場合に、その位置に右空きＶ次特徴（２−１）が存在するとする。このような構成することで、右空きＶ字について、ある程度の大きさや形状の変化に対して、頑健な処理を施すことができる。
【００５３】
しかしながら、例えば、図５（Ａ）及び（Ｂ）に示すように、大きさがかなり異なる右空きＶ字特徴の検出のためには、同じＶ字検出基準モデル４００を使用しても検出が困難である。
【００５４】
もちろん、上記図５（Ａ）及び（Ｂ）に示すような、大きさがかなり異なる右空きＶ字特徴を同一のＶ字基準モデル４００を用いて検出するために、例えば、図４（Ｂ）に示す右空きＶ字検出基準モデル４００を非常に大きく設定し、その結果右上がり斜め領域４０３や右下がり斜め領域４０４を非常に広く取るようにすれば、大小のサイズの異なる右空きＶ字特徴の検出は可能である。
【００５５】
しかしながら、各１次特徴の探索範囲が大きくなるため、例えば、右上がり斜め特徴のサイズは大であり、右下がり斜め特徴のサイズは小であり、さらに、それらの位置も大きくずれている、などという誤検出が起こりやすい。
【００５６】
すなわち、右空きＶ字特徴であれば、右上がり斜め特徴も、右下がり斜め特徴も、それぞれ右空きＶ字特徴の１構成要素であり、これらの大きさは略同じであり、且つこれらは近傍に存在しており、右空きＶ字特徴のサイズが大きければ、右上がり斜め特徴のサイズも、右下がり斜め特徴のサイズも、大きくなる。
【００５７】
したがって、２次特徴を検出するための基準モデルのサイズは、ステップＳ２０２で検出された1次特徴のサイズに合わせて適したものにする。
【００５８】
また、１次特徴である、右上がり斜め特徴や右下がり斜め特徴に関しても、常に同じフィルタサイズでの検出は困難である。
【００５９】
そこで、上記図５（Ａ）に示すように、対象画像における顔のサイズが小さい場合、１次特徴を小さいサイズのフィルタで検出し、同図５（Ｂ）に示すように、対象画像における顔のサイズが大きい場合、１次特徴を大きいサイズのフィルタで検出し、上述したように２次特徴である右空きＶ字特徴を検出するモデルのサイズをも、１次特徴を検出したフィルタのサイズに依存して変更する。
【００６０】
上述のように、本ステップＳ２０３では、１次特徴を検出したフィルタのサイズをパラメータとして、各２次特徴の検出のためのモデルを拡大或いは縮小して、各２次特徴を検出するための２次特徴の検出のためのモデルを設定する。
【００６１】
上記図５（Ｃ）は、顔サイズが小さい場合の右空きＶ字検出用のモデルを示し、同図（Ｄ）は、顔サイズが大きい場合の右空きＶ字検出用のモデルを示したものである。
これらのモデルは、上記図４（Ｂ）に示した右空きＶ次検出基準モデル４００を、それぞれ異なる倍率でサイズ変更したものである。
【００６２】
もちろん、１次特徴を検出するために複数のサイズのフィルタを用意し、該当するサイズに合わせて複数の処理チャネルを用意し、それぞれのサイズの２次特徴、さらに３次特徴、…を、それぞれの処理チャネルで検出する方法は有効である。
ただし、対象画像中の顔のサイズの変動が大きい場合、各顔サイズに合わせた処理チャネルを用意すると、処理チャネルの数が多くなる。すなわち、処理コスト量が多くなる。
【００６３】
そこで、本実施の形態では、２次特徴検出以降の特徴検出においては、検出モデルのサイズを、前段の階層の検出結果に応じて変更することで、上記の問題を解決している。
【００６４】
尚、上記図４（Ｂ）に示したような、右空きＶ字検出基準モデル４００、右上がり斜め領域４０３、及び右下がり斜め領域４０４は、予め検出すべき特徴に合わせて設定され、２次特徴基準モデル保持部１２２に保持されているものとする。
【００６５】
また、上記図３に示したような各特徴はそれぞれ、前ステップ処理で検出された特徴の組み合わせで検出が可能である。
例えば、２次特徴に関しては、左空きＶ字特徴は右下がり斜め特徴及び右上がり斜め特徴から検出可能であり、水平平行線特徴は横特徴から検出可能であり、垂直平行線特徴は縦特徴から検出可能である。また、３次特徴に関しては、眼特徴は右空きＶ字特徴、左空きＶ字特徴、水平平行線特徴、及び垂直平行線特徴から検出可能であり、口特徴は右空きＶ字特徴、左空きＶ字特徴、及び水平平行線特徴から検出可能であり、４次特徴に関しては、顔特徴は眼特徴と口特徴から検出可能である。
【００６６】
ステップＳ２０４：
２次特徴検出部１０２は、ステップＳ２０３で設定された２次特徴検出モデルを用いて、対象画像の２次特徴を検出する。
【００６７】
具体的には例えば、まず、２次特徴の検出は、２次特徴を構成する各１次特徴の値を用いて行うが、例えば、各１次特徴の値が、任意のしきい値以上であるか否かで判断する。
【００６８】
例えば、右空きＶ字検出モデルを用いて、所定の位置の２次特徴の右空きＶ字特徴を検出する場合で、右上がり斜め領域中に存在する各右上がり斜め特徴の値の最大値がしきい値より高く、且つ右下がり斜め領域中に存在する各右下がり斜め特徴の値の最大値がしきい値より高い場合、その位置に右空きＶ字特徴が存在するものとする。そして、その位置の値を、それら最大値の平均とする。逆に、各１次特徴の値がしきい値より低い場合、その位置には２次特徴が存在しないとして、その位置の値を“０”とする。
【００６９】
上述のようにして求めた２次特徴検出結果は、各２次特徴毎に、対象画像と同等の大きさの検出結果画像という形で出力される。
すなわち、上記図３（ｂ）に示すような２次特徴であれば、４種類の２次特徴検出結果の画像が得られることになる。これらの検出結果画像の各位置の値を参照することで、対象画像の該当する位置に各２次特徴が存在するか否かを判断できる。
【００７０】
ところで、本ステップＳ２０４の処理では、２次特徴検出モデルの各領域で１次特徴を検出するのではない、ということに注意する必要がある。
【００７１】
すなわち、例えば、２次特徴の１つである右空きＶ字特徴の検出では、右上がり斜め領域と右下がり斜め領域でそれぞれ、１次特徴である右上がり斜め特徴と右下がり斜め特徴を検出するのではない。これらの１次特徴の検出はステップＳ２０２で終了しており、したがって、本ステップＳ２０４では、これら領域に各１次特徴が存在するか否かを、しきい値を使用して判断している。そして、この結果、複数の１次特徴が、それぞれの領域に存在すると判断した場合に、その位置に２次特徴が存在するとする処理を実行する。このような特徴検出の処理方法は、次の３次特徴及び４次特徴に関しても同様である。
【００７２】
また、本ステップＳ２０４の処理では、次の３次特徴検出モデルを設定するために使用するパラメータを求める。
例えば、図６に示すように、右空きＶ字特徴の検出と同時に、右上がり斜め特徴の最大値を示した点と、右下がり斜め特徴の最大値を示した点との距離をパラメータとして求めておく。そして、このパラメータを、各２次特徴検出結果と共に出力する。
【００７３】
ステップＳ２０５：
３次特徴検出モデル設定部１１３は、３次特徴検出部１０３で３次特徴を検出する際に使用する、２次特徴それぞれの位置関係を示すモデルを、３次特徴基準モデル保持部１２３に保持された基準モデル、及び２次特徴検出部１０２からの２次特徴検出結果及びそのパラメータとを用いて設定する。
【００７４】
具体的には例えば、ここでは説明の簡単のため、上記図３（ｃ）に示すような眼特徴（３−１）を検出するための検出モデルの設定を考える。
図７は、眼を検出するための眼検出基準モデル７００の一例を示したものである。眼検出基準モデル７００では、２次特徴量である、右空きＶ字特徴（上記図３（ｂ）の（２−１）参照）の存在する右空きＶ字領域７０１が左側に、左空きＶ字特徴（上記図３（ｂ）の（２−２）参照）の存在する左空きＶ字領域７０２が右側に、そして水平平行線特徴（上記図３（ｂ）の（２−３）参照）の存在する水平平行線領域７０３及び垂直平行線特徴（上記図３（ｂ）の（２−４）参照）の存在する垂直平行線領域７０４が、これらＶ字特徴の中間に存在している。
【００７５】
本ステップＳ２０５においても、ステップＳ２０３と同様に、サイズ変動に対応するために、この基準モデルを拡大或いは縮小して３次特徴を検出するのに適した３次特徴検出モデルを設定する。当該基準モデルの拡大或いは縮小に使用するのが、ステップＳ２０４で求めたパラメータである。
【００７６】
例えば、右空きＶ字エッジを検出する際に求めた右上がり斜め特徴と右下がり斜め特徴の最大値を示す位置間の距離は、眼の大きさに依存する。そこで、この距離をパラメータとして、眼の基準モデルを基に眼特徴検出モデルを設定する。
【００７７】
上述のようにして、各３次特徴に対して、各基準モデルを基に、２次特徴のパラメータを用いて各位置に応じた検出モデルを設定する。
すなわち、例えば、図８（Ａ）に示すように、サイズが異なる（すなわち、眼のサイズが異なる）顔が対象画像中に存在する場合、上述したように２次特徴である右空きＶ字特徴の大きさをパラメータとして、同図（Ｂ）に示すように、各位置に適した眼特徴検出モデルを設定する。
【００７８】
上記図８（Ｂ）では、眼特徴検出モデル８０１は、その位置の２次特徴のパラメータ値から求めた大きさとなり、また、眼特徴検出モデル８０２の位置の２次特徴のパラメータ値から求めた大きさになることを概念的に示している。
【００７９】
ステップＳ２０６：
３次特徴検出部１０３は、ステップＳ２０５で設定された３次特徴検出モデルを用いて３次特徴を検出する。
ここでの各３次特徴の検出方法は、ステップＳ２０４と同様の方法であるため、その詳細な説明は省略する。また、パラメータに関しては、例えば、眼の検出でる場合、最大値を示した右空きＶ字特徴と左空きＶ字特徴間の距離（眼の横幅に対応した距離）を求め、これをパラメータとする。
【００８０】
ステップＳ２０７：
４次特徴検出モデル設定部１１４は、４次特徴検出部１０４で４次特徴を検出する際に使用する、３次特徴それぞれの位置関係を示すモデルを、４次特徴基準モデル保持部１２４に保持された基準モデル、及び３次特徴検出部１０３からの３次特徴検出結果及びそのパラメータとを用いて設定する。
【００８１】
具体的には例えば、顔特徴の検出の場合、顔のサイズと眼の横幅には一般的に関連があるため、上記図３（ｄ）に示すような顔特徴（４−１）の基準モデルに対して、ステップＳ２０６で得られた、眼の横幅を示すパラメータを用いて、当該顔の基準モデルを基に、顔特徴検出モデルを設定する。
【００８２】
ステップＳ２０８：
４次特徴検出部１０４は、ステップＳ２０７で設定された４次特徴検出モデルを用いて、４次特徴を検出する。
ここでの検出方法は、ステップＳ２０４及びＳ２０６と同様の方法であるため、その詳細な説明は省略する。また、パラメータに関しては、例えば、顔特徴の検出の場合、両眼と口の位置をパラメータとする。このパラメータは、次のステップＳ２０９で使用される。
【００８３】
ステップＳ２０９：
確認パターン設定部１１５は、基準確認パターン保持部１２５に保持された基準パターン、４次特徴検出部１０４からの４次特徴検出結果、及びそのパラメータを使用して、パターン確認部１０５で使用する確認パターンを設定する。
【００８４】
具体的には、まず、ステップＳ２０１〜ステップＳ２０８の処理で４次特徴検出を行うが、対象画像中の背景において、４次特徴を構成する複数の３次特徴に似た領域が存在し、かつそれらの位置関係も似ている場合、４次特徴検出で誤検出を行う可能性がある。
【００８５】
例えば、顔の検出の場合、対象画像中の背景において、それぞれ両眼及び口と似た領域が存在し、また、これらの位置関係も似ている場合、顔特徴の検出で誤検出をする可能性がある。
【００８６】
そこで、検出すべきパターンの一般的な基準パターンを用意し、このパターンの大きさや形状を、ステップＳ２０８で求めたパラメータを基に修正することで、確認パターンを求め、この確認パターンを用いて、最終的に検出すべきパターンが対象画像中に存在するか否かを判断する。
【００８７】
ここでは一例として、顔を検出パターンとしているため、顔の一般的な基準パターンを用意し、この基準パターンを修正することで、顔確認パターンを求め、この顔確認パターンを使用して、顔パターンが対象画像中に存在するかを判断する。
【００８８】
このため、本ステップＳ２０９では、先ず、基準パターンを基に、ステップＳ２０８で求めたパラメータを用いて、確認パターンを設定する。すなわち、顔パターンの設定においては、顔の基準パターンを基に、ステップＳ２０６で求めた両眼と口の位置を示すパラメータを用いて、顔確認パターンを設定する。
【００８９】
図９（Ａ）及び（Ｂ）は、確認パターンの一例を示したものである。
上記図９（Ａ）は、顔基準パターンを示したものであり、この顔基準パターンは、例えば、複数の顔を用意し、これらの大きさを正規化した後で輝度値の平均を取ったものである。
上記図９（Ａ）の顔基準パターンに対して、ステップＳ２０８で求められたパラメータ、すなわち両眼の位置及び口の位置を使用して、同図（Ｂ）に示すように、サイズや回転の変換を行なう。具体的には例えば、両眼間の距離や、両眼間の中点と口の距離を用いて、サイズの変換を行ない、また、両眼間の傾きを用いて、回転変換を行なうことで、顔確認パターンを設定する。
【００９０】
尚、確認パターンの設定方法としては、上述した方法に限られることはなく、例えば、サイズや回転量が異なった複数の基準パターンを用意しておき、これらの基準パターンの中から１つを、ステップＳ２０６のパラメータを用いて選択するようにしてもよい。或いは、パラメータを使用して、上記複数の基準パターンをモーフィングの技術等により合成して設定するようにしてもよい。
【００９１】
ステップＳ２１０：
パターン確認部１０５は、ステップＳ２０９で設定された確認パターンを用いて、対象画像から検出パターンを求める。
【００９２】
具体的には例えば、対象画像において、ステップＳ２０８で４次特徴が検出された位置で、ステップＳ２０９で求めた確認パターンと、対象画像中の該当する位置の部分領域との相関を求め、その値が任意のしきい値を越えた場合に、その位置に検出パターンが存在するものとする。
【００９３】
上記説明したように、本実施の形態では、各特徴を検出するための基準モデルを用意し、前段の特徴の検出結果から求めたパラメータを用いて、基準モデルを基に検出モデルを設定するように構成したので、各特徴の検出精度が向上し、最終的に検出するパターンの検出精度が向上する、という効果が得られる。また、最後の確認処理として、平均パターンとの相関を見る際に、それまでに求めた各特徴の位置に応じて、その平均パターンに対して、回転やサイズの変更等の変形を行なうことで、確認精度が向上する、という効果が得られる。
【００９４】
＜パターン認識装置１００の撮像装置１０００＞
ここでは、上記図１に示したパターン認識（検出）装置１００の機能を、例えば、図１０に示すような撮像装置１０００に搭載させることで、特定被写体へのフォーカシングや、特定被写体の色補正、或いは露出制御を行う場合について説明する。
【００９５】
まず、撮像装置１０００は、上記図１０に示すように、撮影レンズ及びズーム撮影用駆動制御機構を含む結像光学系１００２、ＣＣＤ（又はＣＭＯＳ）イメージセンサー１００３、撮像パラメータの計測部１００４、映像信号処理回路１００５、記憶部１００６、撮像動作の制御や撮像条件の制御等の制御用信号を発生する制御信号発生部１００７、ＥＶＦ等のファインダーを兼ねた表示ディスプレイ１００８、ストロボ発光部１００９、及び記録媒体１０１０等を含むと共に、更に、上記図１に示したパターン認識装置１００の機能を有する被写体検出（認識）部１０１１を含む構成としている。
【００９６】
上述のような撮像装置１０００では、特に、被写体検出（認識）部１０１１は、例えば、撮影して得られた映像中から、人物の顔画像を検出(存在位置、サイズの検出)する。
【００９７】
同制御信号発生部１００７は、被写体検出（認識）部１０１１での検出結果（人物の位置及びサイズ情報）を受け取ると、撮像パラメータ計測部１００４の出力に基づき、当該人物に対するピント制御、露出条件制御、及びホワイトバランス制御等を最適に行うための制御信号を発生する。
【００９８】
上述のように、上記図１のパターン検出(認識)装置１００の機能を、撮像装置１０００に用いることで、撮影して得られた映像中の人物検出と、これに基づく撮影の最適制御を行うことができる。
【００９９】
尚、上記図１０の撮影装置１０００では、上記図１のパターン検出装置１００の機能を被写体検出（認識）部１０１１として備える構成としたが、これに限られることはなく、例えば、パターン検出装置１００のアルゴリズムをプログラムとして撮影装置１０００に実装させ、この撮影装置１０００に搭載したＣＰＵ（不図示）で当該プログラムを実行するように構成してもよい。このような構成は、以下に説明する第２の実施の形態及び第３の実施の形態でも同様に実施可能である。
【０１００】
また、本実施の形態では、対象画像から検出すべきパターンの特徴を４階層に分けて、１次特徴〜４次特徴を順に検出し、最後に検出すべきパターンを確認するように構成したが、この４階層に限られることはなく、３階層や５階層等の任意の階層を適用可能である。これは、以下に説明する第２の実施の形態及び第３の実施の形態でも同様に実施可能である。
【０１０１】
また、本実施の形態では一例として、顔パターンを検出パターンとして、対象画像から顔領域を求めるものとしたが、本発明は、顔検出のみに限定されるわけではない。
例えば、図１１（Ａ）に示すような“２４”という数字列を対象画像中から検出することも可能である。
【０１０２】
上記の数字列検出の場合、上記図１１（Ｂ）に示すように、“２”は、横方向線分と右斜め下方向線分からなる２次特徴（上部特徴）と、縦方向線分と右斜め上方向線分からなる２次特徴（中間部特徴）と、右斜め上方向線分と横方向線分からなる２次特徴（下部特徴）とから構成され、さらに、これらの２次特徴は、上記図３（ａ）に示したような１次特徴から構成されている。
【０１０３】
したがって、先ず、対象画像から１次特徴を検出し、当該１次特徴の検出結果から２次特徴を検出し、そして、当該２次特徴検出結果を用いて、３次特徴としての“２”を検出する。これと同様に“４”に関しても、２次特徴検出結果から３次特徴として検出する。
次に、“２”と“４”の３次特徴検出結果から、４次特徴として“２４”を求める。
そして、３次特徴として検出した“２”と“４”の位置関係をパラメータとして、“２４”を示す数字列の基準パターンを基に、当該パラメータを用いて“２４”の確認パターンを設定し、最終的に“２４”を示す数字列を検出する。
【０１０４】
[第２の実施の形態]
本発明は、例えば、図１２に示すような情報処理装置１２００に適用される。
本実施の形態の情報処理装置１２００は、特に、上記図１に示したパターン認識装置１００の機能を有するものである。
【０１０５】
＜情報処理装置１２００の構成＞
情報処理装置１２００は、上記図１２に示すように、制御部１２７０、演算部１２１０、重み設定部１２２０、基準重み保持部１２３０、パラメータ検出部１２４０、入力信号メモリ１２５０、入力信号メモリ制御部１２５１、中間結果メモリ１２６０、及び中間結果メモリ制御部１２６１を含む構成としている。
【０１０６】
上述のような情報処理装置１２００において、まず、制御部１２７０は、情報処理装置１２００全体の動作制御を司る。
特に、制御部１２７０は、演算部１２１０、重み設定部１２２０、基準重み保持部１２３０、パラメータ検出部１２４０、入力信号メモリ制御部１２５１、及び中間結果メモリ制御部１２６１を制御することで、パターン認識動作を実施する。
【０１０７】
演算部１２１０は、入力信号メモリ１２５０又は中間結果メモリ１２６０からのデータと、重み設定部１２２０からの重みデータとを用いて、これらの積和演算及びロジスティック関数等による非線形演算を行ない、その結果を中間結果メモリ１２６０に保持する。
【０１０８】
重み設定部１２２０は、基準重み保持部１２３０からの基準重みデータを基に、パラメータ検出部１２４０からのパラメータを用いて、重みデータを設定し、その重みデータを演算部１２１０に供給する。
【０１０９】
基準重み保持部１２３０は、入力信号中の各特徴を検出するための基準となる基準重みデータを、各特徴それぞれに対して保持しており、その基準重みデータを重み設定部１２２０に供給する。
【０１１０】
パラメータ検出部１２４０は、重み設定部１２２０で重みデータを設定する際に使用するパラメータを、中間結果メモリ１２６０のデータを用いて検出し、当該パラメータを重み設定部１２２０に供給する。
【０１１１】
入力信号メモリ１２５０は、画像信号や音声信号等の処理対象となる入力信号を保持する。
入力信号メモリ制御部１２５１は、入力信号を入力信号メモリ１２５０に保持する際、また、入力信号メモリ１２５０に保持されている入力信号を演算部１２１０に供給する際に、入力信号メモリ１２５０を制御する。
【０１１２】
中間結果メモリ１２６０は、演算処理部１２１０で得られた演算結果を保持する。
中間結果メモリ制御部１２６１は、演算部１２１０からの演算結果を中間結果メモリ１２６０に保持する際、また、中間結果メモリに保持されている中間結果を演算部１２１０やパラメータ検出部１２４０に供給する際に、中間結果メモリ１２６０を制御する。
【０１１３】
＜情報処理装置１２００の動作＞
ここでは情報処理装置１２００の動作の一例として、並列階層処理により画像認識を行う神経回路網を形成した場合の動作について説明する。すなわち、第1の実施の形態と同様に、処理対象となる入力信号を画像信号とする。
【０１１４】
まず、図１３を参照して、神経回路網の処理内容を詳細に説明する。
神経回路網は、入力信号中の局所領域において、対象又は幾何学的特徴等の認識(検出)に関与する情報を階層的に扱うものであり、その基本構造は、所謂Convolutionalネットワーク構造(LeCun, Y. and Bengio, Y., 1995, "Convolutional Networks for Images Speech, and Time Series" in Handbook of Brain Theory and Neural Networks (M. Arbib, Ed.), MIT Press, pp.255-258)である。最終層（最上位層）からの出力は、認識結果としての認識された対象のカテゴリ、及びその入力データ上の位置情報である。
【０１１５】
上記図１３において、データ入力層１３０１は、ＣＭＯＳセンサ或いはＣＣＤ素子等の光電変換素子からの局所領域データを入力する層である。
【０１１６】
最初の特徴検出層１３０２(１,０)は、データ入力層１３０１から入力された画像パターンの局所的な低次の特徴（特定方向成分や特定空間周波数成分等の幾何学的特徴の他、色成分特徴等を含む特徴でもよい）を全画面の各位置を中心として、局所領域(或いは、全画面にわたる所定のサンプリング点の各点を中心とする局所領域)において、同一箇所で複数のスケールレベル又は解像度で複数の特徴カテゴリの数のみ検出する。
【０１１７】
特徴統合層１３０３(２,０)は、所定の受容野構造(以下、「受容野」とは、直前の層の出力素子との結合範囲を意味し、「受容野構造」とは、その結合荷重の分布を意味する)を有し、特徴検出層１３０２(１,０)からの同一受容野内にある複数のニューロン素子出力の統合（局所平均化や最大出力検出等によるサブサンプリング等の演算による統合）を行う。
【０１１８】
上記の統合処理は、特徴検出層１３０２（１，０）からの出力を空間的にぼかすことで、位置ずれや変形等を許容する役割を有する。また、特徴統合層内のニューロンの各受容野は、同一層内のニューロン間で共通の構造を有している。
【０１１９】
尚、一般的に特徴検出層内のニューロンの各受容野も同一層内のニューロン間で共通の構造を有しているが、その受容野構造をサイズに関して、前段のニューロンの出力結果（検出結果）に応じて変更するというのが、本実施の形態の主旨である。
【０１２０】
後続の層である各特徴検出層１３０２（（１,１）、(１,２)、…、(１,M)）及び各特徴統合層１３０３（(２,１)、(２,２)、…、(２,M)）は、上述した各層と同様に、前者（(１,１)、…）は、各特徴検出モジュールにおいて複数の異なる特徴の検出を行ない、後者（(２,１)、…）は、前段の特徴検出層からの複数特徴に関する検出結果の統合を行なう。
【０１２１】
但し、前者の特徴検出層は、同一チャネルに属する前段の特徴統合層の細胞素子出力を受けるように結合（配線）されている。特徴統合層で行う処理であるサブサンプリングは、同一特徴カテゴリの特徴検出細胞集団からの局所的な領域（当該特徴統合層ニューロンの局所受容野）からの出力についての平均化等を行なうものである。
【０１２２】
図１４は、情報処理装置１２００の動作の具体例として、第１の実施の形態と同様に、対象画像から顔パターンを認識する場合の動作を、フローチャートにより示したものである。
【０１２３】
ステップＳ１４０１：
入力信号メモリ制御部１２５１は、制御部１２７０により入力された信号（ここでは画像信号）を入力信号メモリ１２５０に入力する。
本ステップＳ１４０１が、上記図１３に示したデータ入力層１３０１に対応する。
【０１２４】
ステップＳ１４０２：
重み設定部１２２０は、例えば、基準重み保持部１２３０に保持されている、上記図３（ａ）に示したような１次特徴の検出重みデータ（各方向や各サイズのエッジ抽出を行なうための重みデータ）を演算部１２１０に対して設定する。
【０１２５】
尚、サイズや方向をパラメータとしては、１次特徴検出重みデータを重み設定部１２２０で生成するようにしてもよい。
また、次の２次特徴、３次特徴、及び４次特徴に関しても、例えば、第１の実施の形態で述べた特徴と同様のものを使用することが可能である。
【０１２６】
ステップＳ１４０３：
演算部１２１０は、１次特徴を検出する。
すなわち、本ステップＳ１４０３での１次特徴検出は、上記図１３に示した特徴検出層１３０２(１,０)の処理に対応し、演算部１２１０は、それぞれの特徴ｆの検出モジュール１３０４に相当する処理を実行する。
【０１２７】
具体的には、ステップＳ１４０２で設定された各１次特徴検出重みデータは、各特徴ｆを検出する受容野１３０５の構造に相当し、演算部１２１０は、入力画像メモリ１２５０から画像信号を取得し、当該画像信号の各位置の局所領域（受容野１３０５に対応する領域）と、各１次特徴検出重みデータとの積和演算を実行する。
【０１２８】
ここで、演算処理部１２１０で実行される特徴検出層ニューロンの入出力特性の一例を、下記の式（１）で示す。すなわち、第Ｌ段目の第ｋ番目の特徴を検出する細胞面の位置ｎにあるニューロンの出力ｕ_SL（ｎ，ｋ）は、
【０１２９】
【数１】

【０１３０】
なる式（１）で表される。
【０１３１】
上記式（１）において、ｕ_CL（ｎ，κ）は、第Ｌ段目の特徴統合層の第κ番目の細胞面の位置ｎにあるニューロンの出力を示す。Ｋ_CLは、第Ｌ段目の特徴統合層の種類の数を示す。ｗＬ（ｖ，κ，ｋ）は、第Ｌ段目の特徴検出細胞層の第ｋ番目の細胞面の位置ｎにあるニューロンの、第Ｌ−１段目の特徴統合層の第κ番目の細胞面の位置ｎ＋にあるニューロンからの入力結合である。また、Ｗ_Lは、検出細胞の受容野であり、その大きさは有限である。
【０１３２】
本ステップＳ１４０３の処理は、１次特徴検出であるため、Ｌは“１”であり、したがって、ｕ_CL-1は、データ入力層に相当するため、前段の特徴数は１種類となる。そして、検出する特徴が８種類であるため、８種類の結果が得られることになる。
【０１３３】
また、上記式（１）において、ｆ（）は、積和演算の結果に対しての非線形処理を示す。例えば、この非線形処理には、
【０１３４】
【数２】

【０１３５】
なる式（２）で表されるロジスティック関数を使用する。
【０１３６】
上記非線形処理された結果は、中間結果メモリ１２６０に保持される。ここでは、上述したように８種類の特徴を検出しているため、これら全ての特徴の検出結果が、中間結果メモリ１２６０に保持されることになる。
【０１３７】
ステップＳ１４０４：
重み設定部１２２０は、基準重み保持手段１２３０に保持されている１次特徴統合重みデータを演算部１２１０に対して設定する。
ここでの１次特徴統合重みデータは、ステップＳ１４０３で検出された１次特徴の局所的な平均化や最大値の検出等の処理を行なうための重みデータである。
【０１３８】
ステップＳ１４０５：
演算部１２１０は、中間結果メモリ１２６０に保持されている各１次特徴の検出結果と、ステップＳ１４０４で設定された各１次特徴統合重みデータとの積和演算を行なう処理（各１次特徴の検出結果の統合処理）を実行する。
【０１３９】
本ステップＳ１４０５における処理は、上記図１３に示した特徴統合層１３０３(２，０)の処理に対応し、各特徴ｆの統合モジュールに相当する処理である。具体的には、特徴検出層１３０２(１，０)からの同一受容野内に存在する複数のニューロン素子出力の統合（局所平均化、最大出力検出等によるサブサンプリングなどの演算）に相当する。
【０１４０】
すなわち、演算部１２１０は、各１次特徴の検出結果毎に、局所領域で平均化や最大値検出等の処理を実行する。例えば、演算部１２１０は、
【０１４１】
【数３】

【０１４２】
なる式（３）で示される、局所領域での平均化を実行する。
【０１４３】
上記式（３）において、ｄ_L（ｖ）は、第Ｌ段目の特徴検出層のニューロンから、第Ｌ段目の特徴統合細胞層の細胞面に存在するニューロンへの入力結合であり、｜ｖ｜に関して単純に減少する関数である。また、Ｄ_Lは、統合細胞の受容野を示し、その大きさは有限である。
【０１４４】
演算部１２１０は、上記式（３）による積和演算の結果を中間結果メモリ１２６０に保持する。
このとき、演算部１２１０は、上記積和演算の結果に対して、さらに非線形処理を施し、この結果を中間結果メモリ１２６０に保持するようにしてもよい。
【０１４５】
本ステップＳ１４０５までの処理で、中間結果メモリ１２６０は、１次特徴検出結果を各特徴毎に局所領域で統合した、各サイズ及び各方向の１次特徴の統合結果を保持していることになる。
【０１４６】
ステップＳ１４０６：
重み設定部１２２０は、２次特徴検出重みデータを設定する。
ここでの２次特徴検出重みデータは、上述したように、第１の実施の形態で用いた上記図３（ｂ）に示した各２次特徴を検出するための重みデータである。
【０１４７】
第１の実施の形態においても説明したように、２次特徴以降の各特徴の大きさはそれ以前に求めた特徴の大きさと相関がある。このため、重み設定部１２２０は、２次特徴以降の各特徴を検出する際に、前段の階層で検出された特徴の大きさに依存して、特徴検出重みデータを設定する。
【０１４８】
具体的には、先ず、重み設定部１２２０は、予め設定された、パラメータ検出部１２４０により各１次特徴を検出した１次特徴検出重みデータが示す受容野サイズを、パラメータとして設定する。
そして、重み設定部１２２０は、基準重み保持部１２３０に保持されている基準２次特徴検出重みデータを、上記受容野サイズに関して、先にパラメータ検出部１２４０により設定したパラメータを用いて修正し、この結果を２次特徴検出重みデータとする。
【０１４９】
すなわち、例えば、基準２次特徴検出重みデータが、上記図３（ａ）に示したような１次特徴のサイズが大きい方（受容野サイズが大きい方）に対して設定されているものとすると、重み設定部１２２０は、受容野サイズが小さい重み係数で検出した１次特徴検出結果に対して、２次特徴を検出する際に、例えば、図１５に示すように、２次特徴検出重みデータの受容野サイズを小さくする。
【０１５０】
ステップＳ１４０７：
演算部１２１０は、２次特徴の検出を行なう。これは、上記図１３に示した特徴検出層１３０２(１，１)の処理に対応する。
【０１５１】
本ステップＳ１４０７での処理自体は、ステップＳ１４０３における１次特徴検出処理と同様である。
例えば、演算部１２１０は、上記式（１）を用いた積和演算、及びその結果に対する非線形演算の処理を実行する。ただし、演算部１２１０は、ステップＳ１４０６で設定された２次特徴検出重みデータ、及び中間結果メモリ１２６０に保持されている１次特徴の統合結果を、積和演算に使用し、当該演算結果に対して非線形演算を行ない、当該演算結果（２次特徴検出結果）を中間結果メモリ１２６０に保持する。
【０１５２】
ステップＳ１４０８：
重み設定部１２２０は、基準重み保持部１２３０に保持されている２次特徴統合重みデータを演算部１２１０に対して設定する。
ここでの２次特徴統合重みデータは、ステップＳ１４０７で検出した２次特徴結果の局所的な平均化や最大値の検出等の処理を実行するための重みデータである。
【０１５３】
ステップＳ１４０９：
演算部１２１０は、各２次特徴の検出結果を統合する。これは、上記図１３に示した特徴統合層１３０３(２，１)の処理に対応する。
【０１５４】
具体的には、演算部１２１０は、中間結果メモリ１２６０に保持されている各２次特徴の検出結果と、ステップＳ１４０８で設定された各２次特徴統合重みデータとの積和演算を、例えば、上記式(３)に従って実行し、当該積和演算の結果を中間結果メモリ１２６０に保持する。このとき、演算部１２１０は、上記積和演算の結果に対して、さらに非線形処理を施し、当該処理結果を中間結果メモリ１２６０に保持するようにしてもよい。
【０１５５】
ステップＳ１４１０：
重み設定部１２２０は、３次特徴検出重みデータを演算部１２１０に対して設定する。
ここでの３次特徴検出重みデータは、上述したように、第１の実施の形態における上記図３（ｃ）で示した各３次特徴を検出するための重みデータである。
【０１５６】
具体的には、先ず、重み設定部１２２０は、パラメータ検出部１２４０で、中間結果メモリ１２６０に保持されている各１次特徴検出結果及び各２次特徴検出結果から、２次特徴の大きさに基づいた値をパラメータとして設定する。このパラメータとしては、例えば、第１の実施の形態で説明したように、右空きＶ字特徴の場合、右上がり斜め特徴と右下がり斜め特徴間の垂直距離を使用することができる。
そして、重み設定部１２２０は、基準重み保持部１２３０に保持されている基準３次特徴検出重みデータを、その受容野サイズに関して、パラメータ検出部１２４０で求めたパラメータを用いて修正し、この結果を３次特徴検出重みデータとする。
【０１５７】
ステップＳ１４１１：
演算部１２１０は、３次特徴検出を行なう。これは、上記図１３に示した特徴検出層１３０２(１，２)の処理に対応する。
【０１５８】
具体的には、演算部１２１０は、ステップＳ１４１０で設定された３次特徴検出重みデータと、中間結果メモリ１２６０に保持されている２次特徴の統合結果との積和演算、及びその結果に対する非線形演算を実行し、当該演算結果（３次特徴検出結果）を中間結果メモリ１２６０に保持する。
【０１５９】
ステップＳ１４１２：
重み設定部１２２０は、基準重み保持部１２３０に保持されている３次特徴統合重みデータを演算部１２１０に対して設定される。
ここでの３次特徴統合重みデータは、ステップＳ１４１１で検出した３次特徴結果の局所的な平均化や最大値検出等の処理を行なうための重みデータである。
【０１６０】
ステップＳ１４１３：
演算部１２１０は、各３次特徴の検出結果を統合する。これは、上記図１３に示した特徴統合層１３０３(２，２)の処理に対応する。
【０１６１】
具体的には、演算部１２１０は、中間結果メモリ１２６０に保持されている各３次特徴の検出結果と、ステップＳ１４１２で設定された各３次特徴統合重みデータとの積和演算を実行し、当該積和演算の結果を中間結果メモリ１２６０に保持する。このとき、演算部１２１０は、当該積和演算の結果に対して、さらに非線形処理を行い、当該処理結果を中間結果メモリ１２６０に保持するようにしてもよい。
【０１６２】
ステップＳ１４１４：
重み設定部１２２０は、４次特徴検出重みデータを演算部１２１０に対して設定する。
ここでの４次特徴検出重みデータは、上述したように、第１の実施の形態で使用した上記図３（ｄ）に示した各４次特徴を検出するための重みデータである。
【０１６３】
具体的には、先ず、重み設定部１２２０は、パラメータ検出部１２４０で、中間結果メモリ１２６０に保持されている各２次特徴検出結果及び各３次特徴検出結果から、３次特徴の大きさに基づいた値をパラメータとして設定する。このパラメータとしては、例えば、第１の実施の形態で説明したように、眼特徴の場合、右空きＶ字特徴と左空きＶ字特徴間の水平距離を使用することができる。
そして、重み設定部１２２０は、基準重み保持部１２３０に保持されている基準４次特徴検出重みデータを、その受容野サイズに関して、パラメータ検出部１２４０で求めたパラメータを用いて修正し、この結果を４次特徴検出重みデータとする。
【０１６４】
ステップＳ１４１５：
演算部１２１０は、４次特徴検出を行なう。これは、上記図１３に示した特徴検出層１３０２(１，３)の処理に対応する。
【０１６５】
具体的には、演算部１２１０は、ステップＳ１４１４で設定された４次特徴検出重みデータと中間結果メモリ１２６０に保持されている３次特徴の統合結果との積和演算、及びその結果に対する非線形演算を実行し、当該演算結果（４次特徴検出結果）を中間結果メモリ１２６０に保持する。
【０１６６】
ステップＳ１４１６：
重み設定部１２２０は、基準重み保持手段１２３０に保持されている４次特徴統合重みデータを演算部１２１０に対して設定する。
ここでの４次特徴統合重みデータは、ステップＳ１４１５で検出した４次特徴結果の局所的な平均化や最大値の検出等の処理を行なうための重みデータである。
【０１６７】
ステップＳ１４１７：
演算部１２１０は、４次特徴の検出結果を統合する。これは、上記図１３に示した特徴統合層１３０３(２，３)の処理に対応する。
【０１６８】
具体的には、演算部１２１０は、中間結果メモリ１２６０に保持されている４次特徴の検出結果と、ステップＳ１４１６で設定された４次特徴統合重みデータとの積和演算を実行し、当該積和演算の結果を中間結果メモリ１２６０に保持する。このとき、演算部１２１０は、当該積和演算の結果に対して、さらに非線形処理を行い、当該処理結果を中間結果メモリ１２６０に保持するようにしてもよい。
【０１６９】
ステップＳ１４１８：
演算部１２１０は、パターン確認重みデータを設定する。
【０１７０】
具体的には、まず、上述したステップＳ１４１７までの処理により、４次特徴が検出されるが、第１の実施の形態で説明したように、対象画像（入力画像）中の背景に４次特徴を構成する複数の３次特徴に似た領域があり、また、これらの位置関係をも似ている場合、４次特徴の検出で誤検出する可能性がある。すなわち、例えば、顔の検出の場合、入力画像中の背景に、それぞれ両眼及び口と似た領域が存在し、また、その位置関係をも似ている場合、顔特徴の検出で誤検出する可能性がある。
【０１７１】
このため、本実施の形態では、検出すべきパターンにおいて典型的なタイプ（サイズや向き等）を検出するための基準パターン確認重みデータを用意し、当該重みデータを修正し、当該修正後のパターン確認重みデータを設定し、当該設定パターン確認重みデータを用いて、最終的に検出すべきパターンが入力画像中に存在するか否かを判断する。
【０１７２】
ここで一例として、顔を検出パターンとしているので、典型的な顔を検出する基準顔パターン確認重みデータを用意し、これを修正し、当該修正後の顔パターン確認重みデータを設定し、当該設定顔パターン確認重みデータを使用して、顔パターンが入力画像中に存在するかを判断する。
【０１７３】
したがって、本ステップＳ１４１８では、先ず、演算部１２１０は、パラメータ検出部１２４０で、中間結果メモリ１２６０に保持されている各３次特徴検出結果及び４次特徴検出結果から、検出した４次特徴の各位置において、３次特徴検出結果に基づいた値をパラメータとして設定する。このパラメータとしては、例えば、第１の実施の形態で説明したように、顔特徴である場合、眼特徴と口特徴の位置を使用することができる。
そして、演算部１２１０は、基準重み保持部１２３０に保持されている基準パターン確認重みデータを、その受容野サイズ及び回転に関して、パラメータ検出部１２４０で求めたパラメータを用いて修正し、当該修正結果をパターン確認重みデータとする。
【０１７４】
ステップＳ１４１９：
演算部１２１０は、検出パターンの確認を行なう。
具体的には、演算部１２１０は、ステップＳ１４１８で設定された確認パターン重みデータと、入力信号メモリ１２５０に保持されている入力信号との積和演算、及びその結果に対する非線形演算を実行し、当該演算結果を中間結果メモリ１２６０に保持する。この中間結果メモリ１２６０に保持された結果が、検出すべきパターンの検出最終結果となる。
【０１７５】
上記説明したように、本実施の形態では、各特徴を検出するための基準重みデータを用意し、前段の検出結果から求めたパラ―メータを用いて、当該基準重みデータを基に、検出重みデータを設定するように構成したので、各特徴の検出精度が向上し、最終的に検出するパターンの検出精度が向上するという効果がある。
【０１７６】
また、演算部１２１０では、検出重みデータ又は統合重みデータと、中間結果メモリ１２６０又は入力信号メモリ１２５０からのデータとの積和演算及びその結果の非線形変換を行い、当該積和演算に使用する重みデータを、毎回設定するように構成したので、同じ演算部１２１０を繰り返し使用できるという効果がある。さらに、入力信号と中間結果の両方を保持する構成としているので、最後の確認処理をも容易に行えるという効果がある。
【０１７７】
尚、本実施の形態では、その一例として、統合処理に使用する統合重みデータに対して、検出結果に応じた設定を行なっていないが、例えば、検出重みデータ同様に、受容野サイズの設定を行なうことも可能である。また、上記図１４に示したステップＳ１４１６及びＳ１４１７の４次特徴に対する統合処理は、省略することも可能である。
【０１７８】
[第３の実施の形態]
本発明は、例えば、図１６に示すような情報処理装置１６００に適用される。本実施の形態の情報処理装置１６００は、特に、上記図１に示したパターン認識装置１００の機能を有するものである。
【０１７９】
具体的には、まず、情報処理装置１６００は、上記図１６に示すように、制御部１６７０、演算部１６１０、基準重み保持部１６３０、パラメータ検出部１６４０、入力信号メモリ１６５０、入力信号メモリ制御部１６５１、中間結果メモリ１６６０、及び中間結果メモリ制御部１６６１を含む構成としている。
【０１８０】
ここで、本実施の形態における情報処理装置１６００は、基本的には第２の実施の形態における情報処理装置１２００（上記図１２参照）と同様の機能を有するものであるが、これと異なる点は、重み設定部１２２０に相当する機能を持たず、パラメータ検出部１６４０で求めたパラメータを中間結果メモリ制御部１６６１及び演算部１６１０に供給するように構成したことにある。
【０１８１】
すなわち、第２の実施の形態では、前段の処理結果からパラメータを求め、そのパラメータから特徴を検出するための重みデータを設定するように構成したが、本実施の形態では、重みデータとして、基準重み保持手段１６３０に保持されている基準重みデータをそのまま使用し、替わりに受容野に相当する、中間結果メモリ１６６０に保持されている前段の検出結果を、補間等を用いてサイズ変更するように構成する。
【０１８２】
このため、例えば、３次特徴である眼特徴を検出する場合、情報処理装置１６００は、図１７に示すように、入力画像１７００に対する通常の受容野に対して、サイズ変更することで、サイズ変更後局所画像１７１０を生成し、この変更後局所画像１７１０と、基準重み保持手段１６３０に保持されている基準重みデータとの積和演算を実行する。
【０１８３】
尚、３次特徴を求める場合、中間結果メモリ１６６０に保持されている２次特徴検出結果を使用するが、上記図１７では、説明の簡単のため、入力画像１７００の局所画像のサイズ変更を示している。実際には、２次特徴検出結果画像の局所領域をサイズ変更して使用する。
【０１８４】
以上説明したように、本実施の形態では、前段の検出結果から求めたパラ―メータを用いて、特徴を検出する際に使用する前段の検出結果のサイズを変更して再設定するように構成したので、各特徴の検出精度が向上し、最終的に検出するパターンの検出精度が向上する、という効果を得られる。また、検出結果のサイズを変更は、メモリから読み出す領域の変更と補間処理で良いため、容易に実現できる、という効果をも得られる。
【０１８５】
尚、第２及び第３の実施の形態における情報処理装置１２００，１６００の機能を、例えば、第１の実施の形態と同様に、撮像装置に搭載させることも可能である。
【０１８６】
また、本発明の目的は、第１〜第３の実施の形態のホスト及び端末の機能を実現するソフトウェアのプログラムコードを記録した記録媒体を、システム或いは装置に供給し、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読みだして実行することによっても、達成されることは言うまでもない。
この場合、記録媒体から読み出されたプログラムコード自体が第１〜第３の実施の形態の機能を実現することとなり、そのプログラムコードを記録した記録媒体及び当該プログラムコードは本発明を構成することとなる。
プログラムコードを供給するための記録媒体としては、ＲＯＭ、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード等を用いることができる。
また、コンピュータが読みだしたプログラムコードを実行することにより、第１〜第３の実施の形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ等が実際の処理の一部又は全部を行い、その処理によって第１〜第３の実施の形態の機能が実現される場合も含まれることは言うまでもない。
さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された拡張機能ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって第１〜第３の実施の形態の機能が実現される場合も含まれることは言うまでもない。
【０１８７】
図１８は、上記コンピュータの機能１８００を示したものである。
コンピュータ機能１８００は、上記図１８に示すように、ＣＰＵ１８０１と、ＲＯＭ１８０２と、ＲＡＭ１８０３と、キーボード（ＫＢ）１８０９のキーボードコントローラ（ＫＢＣ）１８０５と、表示部としてのＣＲＴディスプレイ（ＣＲＴ）１８１０のＣＲＴコントローラ（ＣＲＴＣ）１８０６と、ハードディスク（ＨＤ）１８１１及びフレキシブルディスク（ＦＤ）１８１２のディスクコントローラ（ＤＫＣ）１８０７と、ネットワーク１８２０との接続のためのネットワークインターフェースコントローラ（ＮＩＣ）１８０８とが、システムバス１８０４を介して互いに通信可能に接続された構成としている。
【０１８８】
ＣＰＵ１８０１は、ＲＯＭ１８０２或いはＨＤ１８１１に記録されたソフトウェア、或いはＦＤ１８１２より供給されるソフトウェアを実行することで、システムバス１８０４に接続された各構成部を総括的に制御する。
すなわち、ＣＰＵ１８０１は、所定の処理シーケンスに従った処理プログラムを、ＲＯＭ１８０２、或いはＨＤ１８１１、或いはＦＤ１８１２から読み出して実行することで、第１〜第３の実施の形態での動作を実現するための制御を行う。
【０１８９】
ＲＡＭ１８０３は、ＣＰＵ１８０１の主メモリ或いはワークエリア等として機能する。
ＫＢＣ１８０５は、ＫＢ１８０９や図示していないポインティングデバイス等からの指示入力を制御する。
ＣＲＴＣ１８０６は、ＣＲＴ１８１０の表示を制御する。
ＤＫＣ１８０７は、ブートプログラム、種々のアプリケーション、編集ファイル、ユーザファイル、ネットワーク管理プログラム、及び本実施の形態における所定の処理プログラム等を記録するＨＤ１８１１及びＦＤ１８１２とのアクセスを制御する。
ＮＩＣ１８０８は、ネットワーク１８２０上の装置或いはシステムと双方向にデータをやりとりする。
【０１９０】
【発明の効果】
以上説明したように本発明では、入力信号（画像信号等）に含まれる所定パターン（顔パターン等）を構成する複数の特徴（眼や口等）を階層的に検出するにあたり、対象特徴の検出に使用するデータを、対象特徴に対応した基準データ（基準顔データ等）、及び対象特徴の前段の特徴の検出結果に基づいて設定するように構成した。
【０１９１】
これにより、例えば、同一階層の各特徴の検出処理を独立に行い、次の階層の特徴を検出する際に、その前段の階層の複数の特徴の検出結果から求めたパラメータを用いて、各特徴を検出する際に使用するモデル又は重み等のデータを、適応的に設定できるため、或いは、特徴検出の際に使用する前段の検出結果から求めたパラメータを用いて適応的に再設定できるため、各特徴の検出精度を向上させることができ、入力信号中に、サイズが異なる複数の認識対象が存在する場合であっても、全ての認識対象を、少ない処理コストで検出することができる。
【０１９２】
また、例えば、最後の確認処理として、確認パターンとの相関を求める際に、これまでに求めた各特徴の位置に応じて、当該確認パターンに対して回転やサイズ変更等の変形（変換）を行なうように構成した場合、確認精度を向上させることができる。
【０１９３】
また、上記の機能を、例えば、撮像装置に適用するように構成した場合、画像中の顔等の特定領域の色補正や、フォーカスの設定等を容易に行える。
【０１９４】
よって、本発明によれば、対象信号中に存在する任意の領域を特定の認識対象として検出するにあたり、認識対象が如何なるものであっても、少ない処理コストで効率的に検出できる。
特に、例えば、対象画像中にサイズの異なる複数の認識対象が存在する場合であっても、全ての認識対象を、少ない処理コストで抽出することができ、また、認識対象のパターンではないにも関わらず認識対象のパターンとして誤って検出してしまう誤検出を防ぐことができる。
【図面の簡単な説明】
【図１】第１の実施の形態において、本発明を適用したパターン認識（検出）装置の構成を示すブロック図である。
【図２】上記パターン認識（検出）装置の動作を説明するためのフローチャートである。
【図３】上記パターン認識（検出）装置において、顔領域を検出する際の特徴の一例を説明するための図である。
【図４】上記顔領域検出の際に用いる検出基準データの一例を説明するための図である。
【図５】上記顔領域検出の対象画像の一例を説明するための図である。
【図６】上記顔領域検出の際に用いるパラメータの一例を説明するための図である。
【図７】上記顔領域の眼領域を検出する際の特徴の検出基準モデルの一例を説明するための図である。
【図８】上記眼領域検出の対象画像において、位置による眼特徴の検出モデルの違いを説明するための図である。
【図９】上記顔領域検出の確認パターンの設定を説明するための図である。
【図１０】上記パターン認識（検出）装置の機能付き撮像装置の構成を示すブロック図である。
【図１１】上記パターン認識（検出）装置の機能による文字列の検出を説明するための図である。
【図１２】第２の実施の形態において、本発明を適用した情報処理装置の構成を示すブロック図である。
【図１３】上記情報処理装置において、Convolutionalニューラルネットワーク構造を説明するための図である。
【図１４】上記情報処理装置の動作を説明するためのフローチャートである。
【図１５】上記情報処理装置において、特徴検出重みデータを模式的に説明するための図である。
【図１６】第３の実施の形態において、本発明を適用した情報処理装置の構成を示すブロック図である。
【図１７】上記情報処理装置の機能を模式的に説明するための図である。
【図１８】第１〜第３の実施の形態における装置の機能をコンピュータに実現させるためのプログラムをコンピュータ読み取り可能な記録媒体から読み出して実行する当該コンピュータの構成を示すブロック図である。
【符号の説明】
１００パターン認識（検出）装置
１０１１次特徴検出部
１０２２次特徴検出部
１０３３次特徴検出部
１０４４次特徴検出部
１０５パターン確認部
１１１１次特徴検出フィルタ設定部
１１２２次特徴検出モデル設定部
１１３３次特徴検出モデル設定部
１１４４次特徴検出モデル設定部
１１５確認パターン設定部
１２２２次特徴基準モデル保持部
１２３３次特徴基準モデル保持部
１２４４次特徴基準モデル保持部
１２５基準確認パターン保持部
１３０信号入力部[0001]
BACKGROUND OF THE INVENTION
The present invention performs pattern recognition on a target image, detection of a specific subject, and the like by performing, for example, hierarchical calculation processing on the target image. Pa Turn recognition device, Pa Turn recognition method, And Program.
[0002]
[Prior art]
Conventionally, for example, in the field of image recognition and voice recognition, a recognition processing algorithm specialized for a specific recognition target is executed by hardware using a computer software or a dedicated parallel image processor, so that the recognition target and background A technique for detecting a recognition target from an image including the image is known.
[0003]
In particular, as a configuration for detecting a face area existing in a target image as a specific recognition target, for example, Japanese Patent No. 2776714, Japanese Patent Application Laid-Open No. 9-251534, Japanese Patent Application Laid-Open No. 9-44676, Japanese Patent No. 2973676, and Some are disclosed in Kaihei 11-283036.
[0004]
Specifically, first, the configuration described in JP-A-9-251534 uses a template called a standard face for an input image (target image), searches for a face area in the target image, Thereafter, a person in the target image is authenticated by using a partial template for feature point candidates such as eyes, nostrils, or mouth from the face area.
[0005]
The configuration described in Japanese Patent No. 2776714 and the like obtains an eye and mouth candidate group from a face image, and collates a face candidate group obtained by combining these with a face structure that has been recorded in advance. And discover the area corresponding to the mouth.
[0006]
The configuration described in Japanese Patent Application Laid-Open No. 9-44676 and the like obtains a plurality of eye, nose, and mouth candidates from the target image, and from the positional relationship between these candidates and feature points prepared in advance. The face area in the target image is detected.
[0007]
The configuration described in Japanese Patent No. 2973676 is designed to change the shape data when checking the degree of coincidence between the shape data of each part of the face and the input image, and the search area of each face part. Is determined based on the positional relationship of the parts obtained previously.
[0008]
In the configuration described in Japanese Patent Laid-Open No. 11-283036, a region model in which a plurality of determination element acquisition regions are set is moved in the input image (target image), and each point is determined in the determination element acquisition region. By determining the presence or absence of an element, the face area in the target image is recognized.
[0009]
[Problems to be solved by the invention]
However, the conventional configuration for image recognition (pattern recognition) as described above has the following problems.
[0010]
First, in the conventional configuration described in Japanese Patent Laid-Open No. 9-251534 and the like, a template called a standard face is first used for the target image, and the face area in the target image is searched by matching the entire face. Therefore, it is difficult to cope with various sizes of faces and changes in face orientation. In addition, in order to cope with this, it is necessary to prepare multiple standard faces corresponding to the face size and face orientation, and to detect them using each of them, which increases the size of the template for the entire face. As a result, the processing cost also increases.
[0011]
In a conventional configuration described in Japanese Patent No. 2776714, a face candidate group obtained by combining an eye obtained from a face image and a mouth candidate group and a face structure recorded in advance are collated, so that the eye in the face image is matched. The area corresponding to the mouth is found. For example, in the case of creating a caricature from the target image, the number of faces in the target image is usually one or a small number, and the size of the face is also large to some extent It is considered that most of the area in the target image is a face and the background is small. With such a target image, even if face candidates are created from all eye and mouth candidate groups, the number of face candidates is limited. However, if the target image is an image obtained by shooting with a general camera or video, for example, the size of the face in the target image may be reduced and the background may be increased accordingly. In such a case, a large number of eye candidates and mouth candidates are erroneously detected in the background. For this reason, when face candidates are created from all eye and mouth candidate groups, the number of face candidates is increased, and matching with the face structure is performed. Processing costs will increase.
[0012]
The conventional configuration described in Japanese Patent Laid-Open No. 9-44676 obtains a plurality of eye, nose, and mouth candidates from the target image, and the positional relationship between these candidates and feature points prepared in advance. From this, the face area in the target image is detected. Also in this configuration, as in the configuration described in Japanese Patent No. 2776714, when there are many candidates for eyes, nose, and mouth in the background, these positions are detected. The processing cost for collating the relationship becomes enormous.
[0013]
In the conventional configuration described in Japanese Patent No. 297676, shape data of each part of the face (shape data of iris, mouth, nose, etc.) is stored, first, two irises are obtained, and then the mouth and nose. For example, the search area for facial parts such as mouth and nose is limited based on the position of the iris. In other words, the algorithm with this configuration does not detect face parts that make up the face such as the iris (eye), mouth, and nose in parallel, but first detects the iris (eye) and uses this detection result. Then, facial parts such as mouth and nose are detected in order. Therefore, since the configuration assumes that there is only one face in the target image and the iris is accurately obtained, if the detected iris is a false detection, the mouth or nose The search area for other features such as cannot be set correctly.
[0014]
In the conventional configuration described in Japanese Patent Application Laid-Open No. 11-283036 or the like, an area model in which a plurality of determination element acquisition areas are set is moved in the input image (target image), and each point is within the determination element acquisition area. The face area in the target image is recognized by determining the presence / absence of the determination element, but in order to correspond to various face sizes, it is necessary to prepare area models of different sizes, which are actually applicable If there is no size face, many unnecessary processes are executed, which is inefficient.
[0015]
Therefore, the present invention was made to eliminate the above-described drawbacks, and in detecting any region existing in the target image as a specific recognition target, the recognition target is small. A pattern recognition device that can be detected efficiently at a processing cost, Pa Turn recognition method, And The purpose is to provide a program.
[0016]
Specifically, for example, even when there are a plurality of recognition targets having different sizes in the target image, it is possible to extract all the recognition targets at a low processing cost. Further, it is possible to prevent erroneous detection that is erroneously detected as a recognition target pattern although it is not a recognition target pattern.
[0017]
[Means for Solving the Problems]
Under such an object, a pattern recognition apparatus for detecting a predetermined pattern included in an input signal according to the present invention corresponds to feature detection means for hierarchically detecting a plurality of features of the predetermined pattern, and to the plurality of features. Shi The reference data for each stage is composed of a combination of areas corresponding to specific features detected in the previous stage. The reference data holding means for holding a plurality of reference data and the reference data holding means Each stage The reference data is converted in accordance with the positional relationship of a plurality of features preceding the target feature obtained by the feature detection means, and the converted reference data is set as data used for detection of the target feature by the feature detection means. Data setting means If the maximum value of the corresponding feature detected in the previous stage is higher than the threshold value in each area constituting the reference data of each stage, the feature detection means detects the position using the reference data at that position. Suppose there is a feature to be detected, and the average of the maximum values of each region is the value of the feature to be detected It is characterized by that.
[0020]
The pattern recognition method for detecting a predetermined pattern included in an input signal according to the present invention includes a feature detection step for hierarchically detecting a plurality of features constituting the predetermined pattern, and the feature detection step. To detect multiple features individually Each stage composed of a combination of areas corresponding to specific features detected in the previous stage. A reference data holding step for holding a plurality of reference data, and a data setting step for setting data used for feature detection in the feature detection step based on the reference data held in the reference data holding step, The data setting step is held by the reference data holding step when setting data for detecting the feature in the feature detecting step. Each stage The reference data is converted according to the positional relationship of the plurality of features in the previous stage of the detection target feature obtained in the feature detection step, and the converted reference data is used as data used for detection of the target feature in the feature detection step. Includes steps to set In the feature detection step, if the maximum value of the corresponding feature detected in the previous stage is higher than the threshold value in each area constituting the reference data of each stage, the reference data is detected at that position. It is assumed that there is a feature to be detected, and the average of the maximum values of each region is the value of the feature to be detected. It is characterized by that.
[0026]
Further, according to the present invention, there is provided a program for causing a computer to function as predetermined means. The predetermined means hierarchically detects a plurality of features of a predetermined pattern included in an input signal, and the plurality of the plurality of features. Corresponding to the features of The reference data for each stage is composed of a combination of areas corresponding to specific features detected in the previous stage. The reference data holding means for holding a plurality of reference data and the reference data holding means Each stage The reference data is converted in accordance with the positional relationship of a plurality of features preceding the target feature obtained by the feature detection means, and the converted reference data is set as data used for detection of the target feature by the feature detection means. Data setting means If the maximum value of the corresponding feature detected in the previous stage is higher than the threshold value in each area constituting the reference data of each stage, the feature detection means detects the position using the reference data at that position. Suppose there is a feature to be detected, and the average of the maximum values of each region is the value of the feature to be detected It is characterized by that.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0031]
[First embodiment]
The present invention is applied to, for example, a pattern recognition apparatus 100 as shown in FIG.
The pattern recognition apparatus 100 according to the present embodiment can be applied to an imaging apparatus or the like, and in detecting all recognition objects (patterns) existing in a target image, a plurality of features constituting the recognition object from the target image. By holding a plurality of reference data for hierarchical detection, and using the parameters obtained from the detection results of the previous stage feature based on the reference data, the configuration for setting the target feature detection data, Even when there are a plurality of recognition objects of different sizes in the target image, all the recognition objects are efficiently detected at a low processing cost.
Hereinafter, the configuration and operation of the pattern recognition apparatus 100 of the present embodiment will be described.
[0032]
<Configuration of Pattern Recognition Device 100>
As shown in FIG. 1, the pattern recognition apparatus 100 includes a signal input unit 130, a primary feature detection unit 101, a primary feature detection filter setting unit 111, a secondary feature detection unit 102, and a secondary feature detection model setting unit 112. Secondary feature reference model holding unit 122, tertiary feature detection unit 103, tertiary feature detection model setting unit 113, tertiary feature reference model holding unit 123, quaternary feature detection unit 104, quaternary feature detection model setting unit 114 A quaternary feature reference model holding unit 124, a pattern confirmation unit 105, a confirmation pattern setting unit 115, and a reference confirmation pattern holding unit 125 are provided.
[0033]
The signal input unit 130 inputs a signal to be processed (in this case, a signal of the target image) such as an image signal or an audio signal.
[0034]
The primary feature detection unit 101 performs a process for detecting a primary feature on the signal input from the signal input unit 130, and uses the processing result (primary feature detection result) as a secondary feature detection unit. And the primary feature detection result and its parameters are supplied to the secondary feature detection model setting unit 112.
[0035]
At this time, the primary feature detection filter setting unit 111 sets filter characteristics or parameters for the primary feature detection unit 101 to detect the primary feature.
[0036]
The secondary feature detection unit 102 detects the secondary feature using the detection model set by the secondary feature detection model setting unit 112 with respect to the primary feature detection result from the primary feature detection unit 101. This processing result (secondary feature detection result) is supplied to the tertiary feature detection unit 103, and the secondary feature detection result and its parameters are supplied to the tertiary feature detection model setting unit 113.
[0037]
At this time, the secondary feature detection model setting unit 112 uses the secondary feature reference model holding unit as a model indicating the positional relationship between the primary features used when the secondary feature detection unit 102 detects the secondary features. This is set using the reference model held in 122, the primary feature detection result from the primary feature detection unit 101, and its parameters.
The secondary feature reference model holding unit 122 holds the reference model of the detection model set by the secondary feature detection model setting unit 112.
[0038]
The tertiary feature detection unit 103 detects the tertiary feature using the detection model set by the tertiary feature detection model setting unit 113 with respect to the secondary feature detection result from the secondary feature detection unit 102. Processing is performed, and the processing result (third-order feature detection result) is supplied to the fourth-order feature detection unit 104, and the third-order feature detection result and its parameters are supplied to the fourth-order feature detection model setting unit 114.
[0039]
At this time, the tertiary feature detection model setting unit 113 uses a tertiary feature reference model holding unit to display a model indicating the positional relationship of each secondary feature used when the tertiary feature detection unit 103 detects the tertiary feature. This is set using the reference model held in 123, the secondary feature detection result from the secondary feature detection unit 102, and its parameters.
The tertiary feature reference model holding unit 123 holds a reference model of the detection model set by the tertiary feature detection model setting unit 113.
[0040]
The quaternary feature detection unit 104 detects the quaternary feature using the detection model set by the quaternary feature detection model setting unit 114 with respect to the tertiary feature detection result from the tertiary feature detection unit 103. Processing is performed, and the processing result (quaternary feature detection result) is supplied to the pattern confirmation unit 105, and the quaternary feature detection result and its parameters are supplied to the confirmation pattern setting unit 115.
[0041]
At this time, the quaternary feature detection model setting unit 114 uses a quaternary feature reference model holding unit that represents the positional relationship of each of the tertiary features used when the quaternary feature detection unit 104 detects the quaternary feature. This is set using the reference model held in 124, the tertiary feature detection result from the tertiary feature detection unit 103, and its parameters.
The quaternary feature reference model holding unit 124 holds a reference model of the detection model set by the quaternary feature detection model setting unit 114.
[0042]
The pattern confirmation unit 105 confirms whether the confirmation pattern set by the confirmation pattern setting unit 115 exists in the signal input by the signal input unit 130. The confirmation pattern setting unit 115 uses the reference pattern held in the reference confirmation pattern holding unit 125, the quaternary feature detection result from the quaternary feature detection unit 104, and its parameters, and the confirmation used by the pattern confirmation unit 105. Set the pattern.
The reference confirmation pattern holding unit 125 holds the reference pattern of the confirmation pattern set by the confirmation pattern setting unit 115.
[0043]
<Operation of Pattern Recognition Device 100>
FIG. 2 is a flowchart showing the operation of the pattern recognition apparatus 100.
[0044]
Here, as an example of pattern recognition processing, an image signal is input from the signal input 130 and a face area in the image is detected.
[0045]
Step 201:
The signal input unit 130 inputs an image signal as a processing target signal.
[0046]
Step S202:
The primary feature detection unit 101 uses, for example, a filter set by the primary feature detection filter setting unit 111 at each position of an image (target image) configured from an image signal input by the signal input unit 130. Detect primary features.
[0047]
Specifically, for example, as illustrated in FIG. 3A, the primary feature detection unit 101 includes, in the target image, a vertical feature size (1-1-1), a horizontal feature size (1-2-1), Right-up diagonal feature size (1-3-1), Right-down tilt feature size (1-4-1), Vertical feature size (1-1-2), Horizontal feature size (1-2-2), Right rise Features with different directions and different sizes, such as diagonal feature small (1-3-2) and right-down diagonal feature small (1-4-2), are detected, and the detection results (primary feature detection results) Each feature is output in the form of a detection result image having the same size as the target image.
As a result, eight types of primary feature detection result images are obtained here. Thereby, by referring to the value of each position of the detection result image of each feature, it can be determined whether or not each feature exists at the corresponding position of the target image.
[0048]
Note that a plurality of filters used in the primary feature detection unit 101 may be prepared from the beginning, or may be created by the primary feature detection filter setting unit 111 using the direction and size as parameters. Good.
Further, as shown in FIGS. 3B to 3D, secondary features detected in the processing described later are a right empty V-character feature (2-1), a left empty V-character feature (2-2), Horizontal parallel line features (2-3) and vertical parallel line features (2-4), tertiary features are eye features (3-1) and mouth features (3-2), and quaternary features are It is assumed that the facial feature (4-1).
[0049]
Step S203:
The secondary feature detection model setting unit 112 sets a model for the secondary feature detection unit 102 to detect secondary features.
[0050]
Specifically, for example, first, the setting of a detection model for detecting the right empty V-shaped feature (2-1) shown in FIG. 3B is considered as an example.
For example, as shown in FIG. 4A, the right empty V-shaped feature (2-1) has a right-up diagonal feature at the top and a right-down diagonal feature at the bottom. That is, in order to detect the right empty V-shaped feature, using the result of the primary feature detection obtained in step S202, a position where the upper right oblique feature exists and the lower right oblique feature exists at the lower portion. And the right empty V-characteristic (2-1) exists at that position.
Thus, a secondary feature can be detected by combining a plurality of types of primary features.
[0051]
However, the size of the face that exists in the target image is not a fixed size, and the size of the eyes and mouth varies depending on the individual, and the eyes and mouth open and close.
The size of the right empty V character also changes.
[0052]
Therefore, in the present embodiment, a right empty V-character detection reference model 400 as shown in FIG. 4B is used.
In the right empty V-character detection reference model 400, reference numeral 403 denotes a right-upward diagonal area, and 404 denotes a right-down diagonal area. Of the primary features obtained in step S <b> 202 for the right-up diagonal region 403, only the right-up diagonal feature size or the right-up diagonal feature size is present, and the right-down diagonal region 404 is down to the right. It is assumed that the right empty Vth-order feature (2-1) exists at the position where only the large diagonal feature or the small right-down diagonal feature exists. With such a configuration, it is possible to perform robust processing on a certain amount of change in size or shape for the right empty V-character.
[0053]
However, for example, as shown in FIGS. 5 (A) and 5 (B), in order to detect a right empty V-character feature having a considerably different size, detection is difficult even if the same V-character detection reference model 400 is used. It is.
[0054]
Of course, in order to detect a right empty V-shaped feature having a considerably different size as shown in FIGS. 5A and 5B using the same V-shaped reference model 400, for example, FIG. If the right empty V-shaped detection reference model 400 shown in FIG. 5 is set to be very large and, as a result, the right-upward oblique area 403 and the right-down oblique area 404 are made very wide, Can be detected.
[0055]
However, since the search range of each primary feature becomes large, for example, the size of the upward-sloping diagonal feature is large, the size of the downward-sloping diagonal feature is small, and their position is also greatly shifted. This is likely to occur.
[0056]
That is, in the case of a right empty V-shaped feature, a right-upward diagonal feature and a right-down diagonal feature are each a component of a right empty V-character feature, and their sizes are substantially the same, and these are nearby. If the size of the right empty V-shaped feature is large, the size of the diagonal feature that rises to the right and the size of the diagonal feature that falls to the right increase.
[0057]
Therefore, the size of the reference model for detecting the secondary feature is made suitable for the size of the primary feature detected in step S202.
[0058]
Also, detection with the same filter size is always difficult with respect to the first feature, that is, the right-upward diagonal feature and the right-down diagonal feature.
[0059]
Therefore, as shown in FIG. 5A, when the face size in the target image is small, the primary feature is detected by a small size filter, and as shown in FIG. 5B, the face in the target image is detected. If the size of the model is large, the primary feature is detected by a large size filter, and the size of the model that detects the right empty V-shaped feature that is the secondary feature as described above is also the size of the filter that has detected the primary feature. Change depending on.
[0060]
As described above, in this step S203, the model for detecting each secondary feature is enlarged or reduced using the size of the filter that has detected the primary feature as a parameter, and 2 for detecting each secondary feature. Set model for detection of next feature.
[0061]
FIG. 5C shows a model for detecting a right empty V-shaped when the face size is small, and FIG. 5D shows a model for detecting a right empty V-shaped when the face size is large. It is.
These models are obtained by resizing the right empty V-th order detection reference model 400 shown in FIG. 4B with different magnifications.
[0062]
Of course, a plurality of size filters are prepared for detecting the primary feature, a plurality of processing channels are prepared according to the corresponding size, a secondary feature of each size, a tertiary feature,. The method of detecting with the processing channel is effective.
However, when the variation in the size of the face in the target image is large, the number of processing channels increases if a processing channel is prepared for each face size. That is, the processing cost amount increases.
[0063]
Therefore, in the present embodiment, in the feature detection after the secondary feature detection, the above problem is solved by changing the size of the detection model in accordance with the detection result of the previous level.
[0064]
Note that the right empty V-shaped detection reference model 400, the upward-sloping diagonal region 403, and the downward-sloping diagonal region 404, as shown in FIG. It is assumed that it is held in the feature reference model holding unit 122.
[0065]
Each feature as shown in FIG. 3 can be detected by a combination of features detected in the previous step process.
For example, regarding the secondary feature, the left empty V-shaped feature can be detected from the right-down diagonal feature and the right-up diagonal feature, the horizontal parallel line feature can be detected from the horizontal feature, and the vertical parallel line feature can be detected from the vertical feature. It can be detected. As for the tertiary feature, the eye feature can be detected from the right empty V-character feature, the left empty V-character feature, the horizontal parallel line feature, and the vertical parallel line feature, and the mouth feature is the right empty V-character feature and the left empty feature. It can be detected from a V-shaped feature and a horizontal parallel line feature. As for a quaternary feature, a facial feature can be detected from an eye feature and a mouth feature.
[0066]
Step S204:
The secondary feature detection unit 102 detects the secondary feature of the target image using the secondary feature detection model set in step S203.
[0067]
Specifically, for example, the detection of the secondary feature is performed using the value of each primary feature constituting the secondary feature. For example, the value of each primary feature is equal to or greater than an arbitrary threshold value. Judgment is made by whether or not there is.
[0068]
For example, when a right empty V-shaped feature of a secondary feature at a predetermined position is detected using a right empty V-shaped detection model, the maximum value of each right-upward oblique feature existing in the right-up oblique region is If the maximum value of each right-sloping diagonal feature that is higher than the threshold and exists in the right-sloping diagonal region is higher than the threshold, it is assumed that a right empty V-shaped feature exists at that position. And the value of the position is taken as the average of those maximum values. On the other hand, if the value of each primary feature is lower than the threshold value, it is assumed that there is no secondary feature at that location, and the value at that location is set to “0”.
[0069]
The secondary feature detection result obtained as described above is output for each secondary feature in the form of a detection result image having the same size as the target image.
That is, if the secondary feature is as shown in FIG. 3B, four types of secondary feature detection result images are obtained. By referring to the value of each position of these detection result images, it can be determined whether or not each secondary feature exists at the corresponding position of the target image.
[0070]
By the way, it should be noted that the primary feature is not detected in each region of the secondary feature detection model in the process of step S204.
[0071]
That is, for example, in the detection of a right empty V-shaped feature that is one of the secondary features, a right-up diagonal feature and a right-down diagonal feature that are primary features are detected in a right-up diagonal region and a right-down diagonal region, respectively. Not. The detection of these primary features is completed in step S202, and therefore, in this step S204, it is determined whether or not each primary feature exists in these regions using a threshold value. As a result, when it is determined that a plurality of primary features are present in each region, a process is performed in which a secondary feature is present at that position. The feature detection processing method is the same for the following tertiary features and quaternary features.
[0072]
In the process of step S204, a parameter used for setting the next tertiary feature detection model is obtained.
For example, as shown in FIG. 6, simultaneously with the detection of the right empty V-shaped feature, the distance between the point showing the maximum value of the right-up diagonal feature and the point showing the maximum value of the right-down diagonal feature is obtained as a parameter. Keep it. Then, this parameter is output together with each secondary feature detection result.
[0073]
Step S205:
The tertiary feature detection model setting unit 113 holds, in the tertiary feature reference model holding unit 123, a model indicating the positional relationship of each secondary feature used when the tertiary feature detection unit 103 detects the tertiary feature. This is set using the reference model, the secondary feature detection result from the secondary feature detection unit 102, and its parameters.
[0074]
Specifically, for example, for the sake of simplicity of explanation, setting of a detection model for detecting the eye feature (3-1) as shown in FIG.
FIG. 7 shows an example of an eye detection reference model 700 for detecting eyes. In the eye detection reference model 700, a right empty V-shaped area 701 in which a right empty V-character feature (see (2-1) in FIG. 3B), which is a secondary feature quantity, is present on the left side, and a left empty V The left empty V-shaped region 702 where the character feature (see (2-2) in FIG. 3B) is present on the right side and the horizontal parallel line feature (see (2-3) in FIG. 3B) A horizontal parallel line region 703 in which a vertical parallel line is present and a vertical parallel line region 704 in which a vertical parallel line feature (see (2-4) in FIG. 3B) exists are present between these V-shaped features.
[0075]
In step S205, as in step S203, a tertiary feature detection model suitable for detecting tertiary features is set by enlarging or reducing the reference model in order to cope with the size variation. The parameters obtained in step S204 are used for enlarging or reducing the reference model.
[0076]
For example, the distance between the positions indicating the maximum values of the right-upward diagonal feature and the right-down diagonal feature obtained when detecting the right empty V-shaped edge depends on the size of the eye. Therefore, using this distance as a parameter, an eye feature detection model is set based on the eye reference model.
[0077]
As described above, a detection model corresponding to each position is set for each tertiary feature using the parameters of the secondary feature based on each reference model.
That is, for example, as shown in FIG. 8A, when a face having a different size (that is, a different eye size) exists in the target image, the right empty V-shaped feature that is the secondary feature as described above. As a parameter, an eye feature detection model suitable for each position is set as shown in FIG.
[0078]
In FIG. 8B, the eye feature detection model 801 has a size obtained from the parameter value of the secondary feature at that position, and is obtained from the parameter value of the secondary feature at the position of the eye feature detection model 802. It shows conceptually that it becomes big.
[0079]
Step S206:
The tertiary feature detection unit 103 detects tertiary features using the tertiary feature detection model set in step S205.
The method of detecting each tertiary feature here is the same as that in step S204, and thus detailed description thereof is omitted. As for the parameters, for example, in the case of eye detection, the distance between the right empty V-character feature and the left empty V-character feature showing the maximum value (distance corresponding to the lateral width of the eye) is obtained and used as the parameter. .
[0080]
Step S207:
The quaternary feature detection model setting unit 114 holds, in the quaternary feature reference model holding unit 124, a model indicating the positional relationship of each of the tertiary features used when the quaternary feature detecting unit 104 detects the quaternary feature. The reference model, the tertiary feature detection result from the tertiary feature detection unit 103, and its parameters are used for setting.
[0081]
Specifically, for example, in the case of face feature detection, since the face size and the lateral width of the eyes are generally related, the reference model of the face feature (4-1) as shown in FIG. On the other hand, using the parameter indicating the lateral width of the eye obtained in step S206, a face feature detection model is set based on the reference model of the face.
[0082]
Step S208:
The quaternary feature detection unit 104 detects a quaternary feature using the quaternary feature detection model set in step S207.
Since the detection method here is the same method as steps S204 and S206, detailed description thereof will be omitted. As for parameters, for example, in the case of detecting facial features, the positions of both eyes and the mouth are used as parameters. This parameter is used in the next step S209.
[0083]
Step S209:
The confirmation pattern setting unit 115 uses the reference pattern held in the reference confirmation pattern holding unit 125, the quaternary feature detection result from the quaternary feature detection unit 104, and its parameters, and the confirmation used by the pattern confirmation unit 105. Set the pattern.
[0084]
Specifically, first, quaternary feature detection is performed in the processing of step S201 to step S208. However, in the background in the target image, there are regions similar to a plurality of tertiary features constituting the quaternary feature, and When the positional relationship is similar, there is a possibility that false detection is performed by quaternary feature detection.
[0085]
For example, in the case of face detection, if there are areas similar to both eyes and mouth in the background in the target image, and the positional relationship between them is also similar, it is possible to make a false detection by detecting face features There is sex.
[0086]
Therefore, a general reference pattern of the pattern to be detected is prepared, and the size and shape of this pattern are corrected based on the parameters obtained in step S208 to obtain a confirmation pattern, and using this confirmation pattern, It is determined whether or not a pattern to be finally detected exists in the target image.
[0087]
Here, as an example, the face is used as a detection pattern. Therefore, a general reference pattern for a face is prepared, and a face confirmation pattern is obtained by correcting the reference pattern. Is present in the target image.
[0088]
For this reason, in this step S209, first, based on the reference pattern, a confirmation pattern is set using the parameters obtained in step S208. That is, in setting the face pattern, the face confirmation pattern is set using the parameters indicating the positions of both eyes and the mouth obtained in step S206 based on the reference pattern of the face.
[0089]
FIGS. 9A and 9B show an example of a confirmation pattern.
FIG. 9A shows a face reference pattern. For example, a plurality of faces are prepared, and after the sizes are normalized, the luminance values are averaged. Is.
For the face reference pattern in FIG. 9A, the parameters obtained in step S208, that is, the position of both eyes and the position of the mouth, are used, as shown in FIG. Perform conversion. Specifically, for example, size conversion is performed using the distance between both eyes or the distance between the middle point and the mouth between both eyes, and rotation conversion is performed using the inclination between both eyes. Set the face confirmation pattern.
[0090]
The confirmation pattern setting method is not limited to the above-described method. For example, a plurality of reference patterns having different sizes and rotation amounts are prepared, and one of these reference patterns is selected. You may make it select using the parameter of step S206. Alternatively, the plurality of reference patterns may be combined and set using a morphing technique or the like using parameters.
[0091]
Step S210:
The pattern confirmation unit 105 obtains a detection pattern from the target image using the confirmation pattern set in step S209.
[0092]
Specifically, for example, in the target image, the correlation between the confirmation pattern obtained in step S209 and the partial region at the corresponding position in the target image is obtained at the position where the quaternary feature is detected in step S208, and the value is obtained. When the value exceeds an arbitrary threshold value, it is assumed that a detection pattern exists at that position.
[0093]
As described above, in the present embodiment, a reference model for detecting each feature is prepared, and a detection model is set based on the reference model using parameters obtained from the detection result of the previous stage feature. Thus, the detection accuracy of each feature is improved, and the detection accuracy of the pattern to be finally detected is improved. Also, as the last confirmation process, when looking at the correlation with the average pattern, depending on the position of each feature obtained so far, the average pattern can be modified such as rotation or size change. The effect of improving the accuracy of confirmation can be obtained.
[0094]
<Imaging Device 1000 of Pattern Recognition Device 100>
Here, by mounting the function of the pattern recognition (detection) device 100 shown in FIG. 1 on, for example, an imaging device 1000 as shown in FIG. 10, focusing on a specific subject, color correction of a specific subject, Or the case where exposure control is performed is demonstrated.
[0095]
First, as shown in FIG. 10, the imaging apparatus 1000 includes an imaging optical system 1002 including a photographing lens and a zoom photographing drive control mechanism, a CCD (or CMOS) image sensor 1003, an imaging parameter measuring unit 1004, and a video signal. A processing circuit 1005, a storage unit 1006, a control signal generation unit 1007 that generates control signals for controlling imaging operations and imaging conditions, a display display 1008 that also serves as a viewfinder such as an EVF, a strobe light emitting unit 1009, and a recording medium 1010 and the like, and further includes a subject detection (recognition) unit 1011 having the function of the pattern recognition apparatus 100 shown in FIG.
[0096]
In the imaging apparatus 1000 as described above, in particular, the subject detection (recognition) unit 1011 detects a person's face image (detection of the presence position and size) from, for example, a video obtained by shooting.
[0097]
When the control signal generation unit 1007 receives the detection result (person position and size information) from the subject detection (recognition) unit 1011, based on the output of the imaging parameter measurement unit 1004, focus control and exposure condition control for the person are performed. And a control signal for optimally performing white balance control and the like.
[0098]
As described above, by using the function of the pattern detection (recognition) device 100 in FIG. 1 for the imaging device 1000, person detection in the video obtained by shooting and optimal control of shooting based on this are performed. be able to.
[0099]
Note that the imaging apparatus 1000 in FIG. 10 is configured to include the function of the pattern detection apparatus 100 in FIG. 1 as the subject detection (recognition) unit 1011. However, the present invention is not limited to this. For example, the pattern detection apparatus 100 The algorithm may be implemented in the photographing apparatus 1000 as a program, and the program may be executed by a CPU (not shown) installed in the photographing apparatus 1000. Such a configuration can be similarly implemented in the second and third embodiments described below.
[0100]
In this embodiment, the feature of the pattern to be detected from the target image is divided into four layers, the primary feature to the fourth feature are detected in order, and the pattern to be detected last is confirmed. However, the present invention is not limited to these four hierarchies, and arbitrary hierarchies such as three hierarchies and five hierarchies are applicable. This can be similarly implemented in the second embodiment and the third embodiment described below.
[0101]
In this embodiment, as an example, the face area is obtained from the target image using the face pattern as a detection pattern. However, the present invention is not limited to face detection.
For example, a numeric string “24” as shown in FIG. 11A can be detected from the target image.
[0102]
In the case of the above-described digit string detection, as shown in FIG. 11B, “2” is a secondary feature (upper feature) composed of a horizontal line segment and a diagonally lower right line segment, and a vertical line segment. It consists of secondary features (intermediate features) consisting of diagonally upper right line segments, and secondary features (lower features) consisting of diagonally upper right line segments and horizontal line segments, and these secondary features are: It is composed of primary features as shown in FIG.
[0103]
Therefore, first, the primary feature is detected from the target image, the secondary feature is detected from the detection result of the primary feature, and “2” as the tertiary feature is detected using the secondary feature detection result. To detect. Similarly, “4” is also detected as a tertiary feature from the secondary feature detection result.
Next, “24” is obtained as the quaternary feature from the tertiary feature detection results of “2” and “4”.
Then, using the positional relationship between “2” and “4” detected as the tertiary feature as a parameter, based on the reference pattern of the number string indicating “24”, a confirmation pattern of “24” is set using the parameter. Finally, a number string indicating “24” is detected.
[0104]
[Second Embodiment]
The present invention is applied to, for example, an information processing apparatus 1200 as shown in FIG.
In particular, the information processing apparatus 1200 according to the present embodiment has the function of the pattern recognition apparatus 100 shown in FIG.
[0105]
<Configuration of Information Processing Apparatus 1200>
As shown in FIG. 12, the information processing apparatus 1200 includes a control unit 1270, a calculation unit 1210, a weight setting unit 1220, a reference weight holding unit 1230, a parameter detection unit 1240, an input signal memory 1250, an input signal memory control unit 1251, An intermediate result memory 1260 and an intermediate result memory control unit 1261 are included.
[0106]
In the information processing apparatus 1200 as described above, first, the control unit 1270 controls operation of the entire information processing apparatus 1200.
In particular, the control unit 1270 controls the pattern recognition operation by controlling the calculation unit 1210, the weight setting unit 1220, the reference weight holding unit 1230, the parameter detection unit 1240, the input signal memory control unit 1251, and the intermediate result memory control unit 1261. To implement.
[0107]
The arithmetic unit 1210 performs non-linear operations such as a product-sum operation and a logistic function using the data from the input signal memory 1250 or the intermediate result memory 1260 and the weight data from the weight setting unit 1220, and outputs the result. The result is held in the intermediate result memory 1260.
[0108]
The weight setting unit 1220 sets weight data using parameters from the parameter detection unit 1240 based on the reference weight data from the reference weight holding unit 1230, and supplies the weight data to the calculation unit 1210.
[0109]
The reference weight holding unit 1230 holds reference weight data serving as a reference for detecting each feature in the input signal for each feature, and supplies the reference weight data to the weight setting unit 1220.
[0110]
The parameter detection unit 1240 detects parameters used when the weight setting unit 1220 sets weight data using the data in the intermediate result memory 1260, and supplies the parameters to the weight setting unit 1220.
[0111]
The input signal memory 1250 holds input signals to be processed such as image signals and audio signals.
The input signal memory control unit 1251 controls the input signal memory 1250 when holding the input signal in the input signal memory 1250 and when supplying the input signal held in the input signal memory 1250 to the arithmetic unit 1210. .
[0112]
The intermediate result memory 1260 holds the calculation result obtained by the calculation processing unit 1210.
When the intermediate result memory control unit 1261 holds the calculation result from the calculation unit 1210 in the intermediate result memory 1260, or when the intermediate result held in the intermediate result memory is supplied to the calculation unit 1210 or the parameter detection unit 1240 The intermediate result memory 1260 is controlled.
[0113]
<Operation of Information Processing Apparatus 1200>
Here, as an example of the operation of the information processing apparatus 1200, an operation when a neural network that performs image recognition by parallel hierarchical processing is formed will be described. That is, as in the first embodiment, an input signal to be processed is an image signal.
[0114]
First, the processing contents of the neural network will be described in detail with reference to FIG.
A neural network hierarchically handles information related to recognition (detection) of an object or a geometric feature in a local region in an input signal, and its basic structure is a so-called Convolutional network structure (LeCun, Y. and Bengio, Y., 1995, "Convolutional Networks for Images Speech, and Time Series" in Handbook of Brain Theory and Neural Networks (M. Arbib, Ed.), MIT Press, pp. 255-258). The output from the final layer (uppermost layer) is a recognized target category as a recognition result and position information on the input data.
[0115]
In FIG. 13, the data input layer 1301 is a layer for inputting local area data from a photoelectric conversion element such as a CMOS sensor or a CCD element.
[0116]
The first feature detection layer 1302 (1, 0) includes local low-order features of the image pattern input from the data input layer 1301 (in addition to geometric features such as a specific direction component and a specific spatial frequency component, Multiple scale levels at the same location in a local area (or a local area centered around each point of a predetermined sampling point across the entire screen) centered on each position on the entire screen. Alternatively, only the number of feature categories is detected by resolution.
[0117]
The feature integration layer 1303 (2, 0) has a predetermined receptive field structure (hereinafter, “receptive field” means a range of connection with the output element of the immediately preceding layer, and “receptive field structure” Integration of multiple neuron element outputs within the same receptive field from the feature detection layer 1302 (1, 0) (by sub-sampling by local averaging, maximum output detection, etc.) Integration).
[0118]
The integration process has a role of allowing positional deviation and deformation by spatially blurring the output from the feature detection layer 1302 (1, 0). Each receptive field of neurons in the feature integration layer has a common structure among neurons in the same layer.
[0119]
In general, each receptive field of a neuron in the feature detection layer also has a common structure among neurons in the same layer. However, the size of the receptive field structure is the output result of the preceding neuron (detection result). The main point of the present embodiment is to change it according to ().
[0120]
Each feature detection layer 1302 ((1, 1), (1, 2),..., (1, M)) and each feature integration layer 1303 ((2, 1), (2, 2), ..., (2, M)) is the same as the above-described layers, the former ((1,1), ...) detects a plurality of different features in each feature detection module, and the latter ((2,1) ,...) Integrates detection results relating to a plurality of features from the preceding feature detection layer.
[0121]
However, the former feature detection layer is coupled (wired) to receive the cell element output of the former feature integration layer belonging to the same channel. Subsampling, which is a process performed in the feature integration layer, is to average the output from a local region (local receptive field of the feature integration layer neuron) from the feature detection cell population of the same feature category. .
[0122]
As a specific example of the operation of the information processing apparatus 1200, FIG. 14 is a flowchart illustrating the operation when a face pattern is recognized from a target image, as in the first embodiment.
[0123]
Step S1401:
The input signal memory control unit 1251 inputs a signal (here, an image signal) input by the control unit 1270 to the input signal memory 1250.
This step S1401 corresponds to the data input layer 1301 shown in FIG.
[0124]
Step S1402:
The weight setting unit 1220 has, for example, primary feature detection weight data (for performing edge extraction of each direction and each size) held in the reference weight holding unit 1230 as shown in FIG. Weight data) is set for the arithmetic unit 1210.
[0125]
Note that the primary feature detection weight data may be generated by the weight setting unit 1220 using the size and direction as parameters.
As for the following secondary feature, tertiary feature, and quaternary feature, for example, the same features as those described in the first embodiment can be used.
[0126]
Step S1403:
The calculation unit 1210 detects a primary feature.
That is, the primary feature detection in step S1403 corresponds to the processing of the feature detection layer 1302 (1, 0) shown in FIG. 13, and the calculation unit 1210 corresponds to the detection module 1304 for each feature f. Execute the process.
[0127]
Specifically, each primary feature detection weight data set in step S1402 corresponds to the structure of the receptive field 1305 for detecting each feature f, and the calculation unit 1210 acquires an image signal from the input image memory 1250. Then, a product-sum operation is performed on the local region (region corresponding to the receptive field 1305) at each position of the image signal and each primary feature detection weight data.
[0128]
Here, an example of the input / output characteristics of the feature detection layer neuron executed by the arithmetic processing unit 1210 is represented by the following equation (1). That is, the output u of the neuron located at the position n of the cell plane for detecting the kth feature of the Lth stage _SL (N, k) is
[0129]
[Expression 1]

[0130]
It is represented by the following formula (1).
[0131]
In the above formula (1), u _CL (N, κ) indicates the output of a neuron located at position n on the κ-th cell plane of the L-th stage feature integration layer. K _CL Indicates the number of types of feature integration layers in the L-th stage. wL (v, κ, k) is the κ th cell of the (L-1) th stage feature integration layer of the neuron at the position n of the kth cell plane of the Lth stage feature detection cell layer. Input coupling from a neuron at position n + of the plane. W _L Is a receptive field for detection cells, and its size is finite.
[0132]
Since the process of step S1403 is primary feature detection, L is “1”, and therefore u _CL-1 Corresponds to the data input layer, so the number of features in the previous stage is one. Since there are eight types of features to be detected, eight types of results are obtained.
[0133]
Further, in the above equation (1), f () represents a non-linear process for the result of the product-sum operation. For example, this non-linear processing includes
[0134]
[Expression 2]

[0135]
A logistic function expressed by the following equation (2) is used.
[0136]
The non-linearly processed result is held in the intermediate result memory 1260. Here, since eight types of features are detected as described above, detection results of all these features are held in the intermediate result memory 1260.
[0137]
Step S1404:
The weight setting unit 1220 sets the primary feature integration weight data held in the reference weight holding unit 1230 for the calculation unit 1210.
The primary feature integration weight data here is weight data for performing processing such as local averaging of the primary features detected in step S1403 and detection of the maximum value.
[0138]
Step S1405:
The calculation unit 1210 performs a product-sum operation on each primary feature detection result held in the intermediate result memory 1260 and each primary feature integrated weight data set in step S1404 (for each primary feature). Execute detection result integration processing).
[0139]
The processing in step S1405 corresponds to the processing of the feature integration layer 1303 (2, 0) shown in FIG. 13 and corresponds to the integration module of each feature f. Specifically, this corresponds to the integration of a plurality of neuron element outputs existing in the same receptive field from the feature detection layer 1302 (1, 0) (calculation such as local averaging, sub-sampling by maximum output detection, etc.).
[0140]
That is, the arithmetic unit 1210 executes processing such as averaging and maximum value detection in the local region for each detection result of each primary feature. For example, the calculation unit 1210
[0141]
[Equation 3]

[0142]
The averaging in the local region shown by the following formula (3) is executed.
[0143]
In the above formula (3), d _L (V) is an input connection from a neuron of the L-th stage feature detection layer to a neuron existing on the cell surface of the L-th stage feature integrated cell layer, and is a function that simply decreases with respect to | v |. is there. D _L Indicates the receptive field of the integrated cell, and its size is finite.
[0144]
The operation unit 1210 holds the result of the product-sum operation according to the above equation (3) in the intermediate result memory 1260.
At this time, the arithmetic unit 1210 may further perform non-linear processing on the result of the product-sum operation and hold the result in the intermediate result memory 1260.
[0145]
In the processing up to step S1405, the intermediate result memory 1260 holds the primary feature integration result of each size and each direction, in which the primary feature detection results are integrated in the local region for each feature. .
[0146]
Step S1406:
The weight setting unit 1220 sets secondary feature detection weight data.
The secondary feature detection weight data here is weight data for detecting each secondary feature shown in FIG. 3B used in the first embodiment, as described above.
[0147]
As described in the first embodiment, the size of each feature after the secondary feature has a correlation with the size of the feature obtained before that. For this reason, the weight setting unit 1220 sets feature detection weight data depending on the size of the feature detected in the previous layer when detecting each feature after the secondary feature.
[0148]
Specifically, first, the weight setting unit 1220 sets, as a parameter, a receptive field size indicated by primary feature detection weight data, which is set in advance and detected by the parameter detection unit 1240.
Then, the weight setting unit 1220 corrects the reference secondary feature detection weight data held in the reference weight holding unit 1230 using the parameters set by the parameter detection unit 1240 with respect to the receptive field size. The result is set as secondary feature detection weight data.
[0149]
That is, for example, it is assumed that the reference secondary feature detection weight data is set for a larger primary feature size (a larger receptive field size) as shown in FIG. When the secondary feature is detected from the primary feature detection result detected by the weighting coefficient having a small receptive field size, for example, as shown in FIG. Reduce the size of the receptive field.
[0150]
Step S1407:
The arithmetic unit 1210 detects secondary features. This corresponds to the processing of the feature detection layer 1302 (1, 1) shown in FIG.
[0151]
The process itself in step S1407 is the same as the primary feature detection process in step S1403.
For example, the arithmetic unit 1210 executes a product-sum operation using the above equation (1) and a non-linear operation process for the result. However, the computation unit 1210 uses the secondary feature detection weight data set in step S1406 and the primary feature integration result stored in the intermediate result memory 1260 for the product-sum operation, The nonlinear calculation is performed, and the calculation result (secondary feature detection result) is held in the intermediate result memory 1260.
[0152]
Step S1408:
The weight setting unit 1220 sets the secondary feature integration weight data held in the reference weight holding unit 1230 for the calculation unit 1210.
The secondary feature integration weight data here is weight data for executing processing such as local averaging of the secondary feature result detected in step S1407 and detection of the maximum value.
[0153]
Step S1409:
The calculation unit 1210 integrates the detection results of the respective secondary features. This corresponds to the processing of the feature integration layer 1303 (2, 1) shown in FIG.
[0154]
Specifically, the computing unit 1210 performs a product-sum operation on each secondary feature detection result held in the intermediate result memory 1260 and each secondary feature integrated weight data set in step S1408, for example, This is executed according to the above equation (3), and the result of the product-sum operation is held in the intermediate result memory 1260. At this time, the calculation unit 1210 may further perform non-linear processing on the result of the product-sum operation and hold the processing result in the intermediate result memory 1260.
[0155]
Step S1410:
The weight setting unit 1220 sets tertiary feature detection weight data for the calculation unit 1210.
The tertiary feature detection weight data here is weight data for detecting each tertiary feature shown in FIG. 3C in the first embodiment as described above.
[0156]
Specifically, first, the weight setting unit 1220 uses the parameter detection unit 1240 to calculate the size of the secondary feature from each primary feature detection result and each secondary feature detection result held in the intermediate result memory 1260. Set the based value as a parameter. As this parameter, for example, as described in the first embodiment, in the case of a right empty V-shaped feature, the vertical distance between a right-upward diagonal feature and a right-down diagonal feature can be used.
Then, the weight setting unit 1220 modifies the reference tertiary feature detection weight data held in the reference weight holding unit 1230 with respect to the receptive field size using the parameter obtained by the parameter detection unit 1240, and the result is obtained. It is set as tertiary feature detection weight data.
[0157]
Step S1411:
The arithmetic unit 1210 performs tertiary feature detection. This corresponds to the processing of the feature detection layer 1302 (1, 2) shown in FIG.
[0158]
Specifically, the calculation unit 1210 calculates the product-sum of the tertiary feature detection weight data set in step S1410 and the integration result of the secondary features held in the intermediate result memory 1260, and performs nonlinear processing on the result. The calculation is executed, and the calculation result (tertiary feature detection result) is held in the intermediate result memory 1260.
[0159]
Step S1412:
The weight setting unit 1220 sets the tertiary feature integration weight data held in the reference weight holding unit 1230 for the calculation unit 1210.
The tertiary feature integration weight data here is weight data for performing processing such as local averaging and maximum value detection of the tertiary feature result detected in step S1411.
[0160]
Step S1413:
The calculation unit 1210 integrates the detection results of the respective tertiary features. This corresponds to the processing of the feature integration layer 1303 (2, 2) shown in FIG.
[0161]
Specifically, the calculation unit 1210 performs a product-sum operation on each tertiary feature detection result held in the intermediate result memory 1260 and each tertiary feature integration weight data set in step S1412. The result of the product-sum operation is held in the intermediate result memory 1260. At this time, the calculation unit 1210 may further perform non-linear processing on the result of the product-sum operation and hold the processing result in the intermediate result memory 1260.
[0162]
Step S1414:
The weight setting unit 1220 sets quaternary feature detection weight data for the calculation unit 1210.
The quaternary feature detection weight data here is weight data for detecting each quaternary feature shown in FIG. 3D used in the first embodiment, as described above.
[0163]
Specifically, first, the weight setting unit 1220 uses the parameter detection unit 1240 to calculate the size of the tertiary feature from each secondary feature detection result and each tertiary feature detection result held in the intermediate result memory 1260. Set the based value as a parameter. As this parameter, for example, as described in the first embodiment, in the case of an eye feature, the horizontal distance between the right empty V-character feature and the left empty V-character feature can be used.
Then, the weight setting unit 1220 corrects the reference quaternary feature detection weight data held in the reference weight holding unit 1230 with respect to the receptive field size using the parameter obtained by the parameter detection unit 1240, and the result is obtained. This is quaternary feature detection weight data.
[0164]
Step S1415:
The arithmetic unit 1210 performs quaternary feature detection. This corresponds to the processing of the feature detection layer 1302 (1, 3) shown in FIG.
[0165]
Specifically, the calculation unit 1210 calculates the sum of products of the quaternary feature detection weight data set in step S1414 and the integration result of the tertiary features held in the intermediate result memory 1260, and the non-linear operation on the result. And the calculation result (quaternary feature detection result) is held in the intermediate result memory 1260.
[0166]
Step S1416:
The weight setting unit 1220 sets the quaternary feature integration weight data held in the reference weight holding unit 1230 for the calculation unit 1210.
The quaternary feature integration weight data here is weight data for performing processing such as local averaging of the quaternary feature result detected in step S1415 and detection of the maximum value.
[0167]
Step S1417:
The calculation unit 1210 integrates the detection results of the quaternary features. This corresponds to the processing of the feature integration layer 1303 (2, 3) shown in FIG.
[0168]
Specifically, the operation unit 1210 performs a product-sum operation on the detection result of the quaternary feature held in the intermediate result memory 1260 and the quaternary feature integration weight data set in step S1416, and the product The result of the sum operation is held in the intermediate result memory 1260. At this time, the calculation unit 1210 may further perform non-linear processing on the result of the product-sum operation and hold the processing result in the intermediate result memory 1260.
[0169]
Step S1418:
The calculation unit 1210 sets pattern confirmation weight data.
[0170]
Specifically, first, the quaternary feature is detected by the processing up to step S1417 described above. As described in the first embodiment, the quaternary feature is added to the background in the target image (input image). If there is a region similar to a plurality of tertiary features constituting the same, and these positional relationships are also similar, there is a possibility of erroneous detection by detecting the quaternary features. That is, for example, in the case of detecting a face, if there are areas similar to both eyes and mouth in the background in the input image, and the positional relationship is also similar, a false detection is made by detecting the facial features. there is a possibility.
[0171]
For this reason, in this embodiment, reference pattern confirmation weight data for detecting a typical type (size, orientation, etc.) in the pattern to be detected is prepared, the weight data is corrected, and the corrected pattern Confirmation weight data is set, and using the set pattern confirmation weight data, it is determined whether or not a pattern to be finally detected exists in the input image.
[0172]
Here, as an example, since the face is used as a detection pattern, reference face pattern confirmation weight data for detecting a typical face is prepared, this is corrected, the corrected face pattern confirmation weight data is set, and the setting is performed. The face pattern confirmation weight data is used to determine whether a face pattern exists in the input image.
[0173]
Therefore, in this step S1418, first, the calculation unit 1210, in the parameter detection unit 1240, detects each of the detected quaternary features from the respective tertiary feature detection results and quaternary feature detection results held in the intermediate result memory 1260. At the position, a value based on the tertiary feature detection result is set as a parameter. As this parameter, for example, as described in the first embodiment, in the case of a facial feature, the position of an eye feature and a mouth feature can be used.
Then, the calculation unit 1210 corrects the reference pattern confirmation weight data held in the reference weight holding unit 1230 with respect to the receptive field size and rotation using the parameters obtained by the parameter detection unit 1240, and the correction result is obtained. The pattern confirmation weight data is used.
[0174]
Step S1419:
The arithmetic unit 1210 confirms the detection pattern.
Specifically, the operation unit 1210 performs a product-sum operation on the confirmation pattern weight data set in step S1418 and the input signal held in the input signal memory 1250, and a non-linear operation on the result, The operation result is held in the intermediate result memory 1260. The result held in the intermediate result memory 1260 is the final detection result of the pattern to be detected.
[0175]
As described above, in this embodiment, reference weight data for detecting each feature is prepared, and using the parameters obtained from the detection result in the previous stage, the detection weight is based on the reference weight data. Since the data is set, the detection accuracy of each feature is improved, and the detection accuracy of the pattern to be finally detected is improved.
[0176]
In addition, the calculation unit 1210 performs a product-sum operation on the detection weight data or the integrated weight data and the data from the intermediate result memory 1260 or the input signal memory 1250 and a non-linear conversion of the result, and a weight used for the product-sum operation. Since the data is configured to be set every time, the same arithmetic unit 1210 can be used repeatedly. Further, since both the input signal and the intermediate result are held, there is an effect that the final confirmation process can be easily performed.
[0177]
In this embodiment, as an example, the integrated weight data used for the integration process is not set according to the detection result. However, for example, as with the detection weight data, the receptive field size is set. It is also possible to do this. Further, the integration process for the quaternary features in steps S1416 and S1417 shown in FIG. 14 can be omitted.
[0178]
[Third embodiment]
The present invention is applied to, for example, an information processing apparatus 1600 as shown in FIG. The information processing apparatus 1600 according to the present embodiment particularly has the function of the pattern recognition apparatus 100 shown in FIG.
[0179]
Specifically, first, as shown in FIG. 16, the information processing apparatus 1600 includes a control unit 1670, a calculation unit 1610, a reference weight holding unit 1630, a parameter detection unit 1640, an input signal memory 1650, and an input signal memory control unit. 1651, an intermediate result memory 1660, and an intermediate result memory control unit 1661.
[0180]
Here, the information processing apparatus 1600 in the present embodiment basically has the same function as the information processing apparatus 1200 (see FIG. 12 above) in the second embodiment, but is different from this. Is that the function obtained by the parameter detection unit 1640 is not supplied to the weight setting unit 1220 and is supplied to the intermediate result memory control unit 1661 and the calculation unit 1610.
[0181]
That is, in the second embodiment, the configuration is such that the parameter is obtained from the processing result of the previous stage and the weight data for detecting the feature is set from the parameter, but in this embodiment, the reference data is used as the weight data. The reference weight data held in the weight holding means 1630 is used as it is, and instead, the detection result of the previous stage held in the intermediate result memory 1660 corresponding to the receptive field is resized using interpolation or the like. Constitute.
[0182]
Therefore, for example, when detecting an eye feature that is a tertiary feature, the information processing apparatus 1600 changes the size by changing the size of the normal receptive field for the input image 1700 as shown in FIG. A post-local image 1710 is generated, and a product-sum operation is performed on the post-change local image 1710 and the reference weight data held in the reference weight holding unit 1630.
[0183]
When obtaining the tertiary feature, the secondary feature detection result held in the intermediate result memory 1660 is used. In FIG. 17, the size change of the local image of the input image 1700 is shown for simplicity of explanation. ing. In practice, the local area of the secondary feature detection result image is resized and used.
[0184]
As described above, the present embodiment is configured to change and reset the size of the previous detection result used when detecting the feature using the parameter obtained from the previous detection result. As a result, the detection accuracy of each feature is improved, and the detection accuracy of the pattern to be finally detected is improved. In addition, since the size of the detection result can be changed by changing the area to be read from the memory and interpolation processing, an effect that it can be easily realized is obtained.
[0185]
Note that the functions of the

information processing apparatuses

1200 and 1600 in the second and third embodiments can be mounted on the imaging apparatus, for example, as in the first embodiment.
[0186]
Another object of the present invention is to supply a recording medium recording software program codes for realizing the functions of the host and terminal according to the first to third embodiments to a system or apparatus, and a computer of the system or apparatus. Needless to say, this can also be achieved by (or CPU or MPU) reading and executing the program code stored in the recording medium.
In this case, the program code itself read from the recording medium realizes the functions of the first to third embodiments, and the recording medium recording the program code and the program code constitute the present invention. It becomes.
As a recording medium for supplying the program code, ROM, flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, magnetic tape, nonvolatile memory card, and the like can be used.
Further, by executing the program code read by the computer, not only the functions of the first to third embodiments are realized, but also an OS running on the computer based on the instruction of the program code. Needless to say, the present invention includes a case where the functions of the first to third embodiments are realized by performing part or all of the actual processing.
Further, after the program code read from the recording medium is written to the memory provided in the extension function board inserted in the computer or the function extension unit connected to the computer, the function extension is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the board or function expansion unit performs part or all of the actual processing, and the functions of the first to third embodiments are realized by the processing.
[0187]
FIG. 18 shows the function 1800 of the computer.
As shown in FIG. 18, the computer function 1800 includes a CPU 1801, a ROM 1802, a RAM 1803, a keyboard controller (KBC) 1805 of a keyboard (KB) 1809, and a CRT controller (CRT) 1810 as a display unit (CRT controller (KB)). CRTC) 1806, a disk controller (DKC) 1807 of a hard disk (HD) 1811 and a flexible disk (FD) 1812, and a network interface controller (NIC) 1808 for connection to a network 1820 via a system bus 1804 It is the structure connected so that communication was possible mutually.
[0188]
The CPU 1801 comprehensively controls each component connected to the system bus 1804 by executing software recorded in the ROM 1802 or the HD 1811 or software supplied from the FD 1812.
That is, the CPU 1801 reads out a processing program according to a predetermined processing sequence from the ROM 1802, the HD 1811, or the FD 1812 and executes it, thereby performing control for realizing the operations in the first to third embodiments. Do.
[0189]
The RAM 1803 functions as a main memory or work area for the CPU 1801.
The KBC 1805 controls an instruction input from a KB 1809 or a pointing device (not shown).
A CRTC 1806 controls the display of the CRT 1810.
The DKC 1807 controls access to the HD 1811 and the FD 1812 that record a boot program, various applications, an edit file, a user file, a network management program, a predetermined processing program in the present embodiment, and the like.
The NIC 1808 exchanges data bidirectionally with devices or systems on the network 1820.
[0190]
【The invention's effect】
As described above, according to the present invention, target features are detected when hierarchically detecting a plurality of features (such as eyes and mouth) that constitute a predetermined pattern (such as a face pattern) included in an input signal (such as an image signal). Is set based on reference data (reference face data or the like) corresponding to the target feature and the detection result of the preceding feature of the target feature.
[0191]
Thus, for example, each feature of the same layer is detected independently, and when the feature of the next layer is detected, each feature is obtained using the parameters obtained from the detection results of the plurality of features of the previous layer. Since data such as a model or weight used when detecting the data can be set adaptively, or can be adaptively reset using parameters obtained from the detection result of the previous stage used when detecting features, The detection accuracy of each feature can be improved, and even if there are a plurality of recognition targets having different sizes in the input signal, all the recognition targets can be detected with a small processing cost.
[0192]
In addition, for example, as a final confirmation process, when obtaining a correlation with a confirmation pattern, depending on the position of each feature obtained so far, the confirmation pattern is subjected to deformation (transformation) such as rotation or size change. When it is configured to do so, the accuracy of confirmation can be improved.
[0193]
Further, when the above function is configured to be applied to an imaging device, for example, color correction of a specific area such as a face in an image, focus setting, and the like can be easily performed.
[0194]
Therefore, according to the present invention, in detecting an arbitrary region present in the target signal as a specific recognition target, any recognition target can be detected efficiently with a small processing cost.
In particular, for example, even when there are a plurality of recognition targets having different sizes in the target image, all the recognition targets can be extracted at a low processing cost, and the pattern is not a recognition target pattern. Regardless, erroneous detection that is erroneously detected as a pattern to be recognized can be prevented.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a pattern recognition (detection) apparatus to which the present invention is applied in a first embodiment.
FIG. 2 is a flowchart for explaining the operation of the pattern recognition (detection) apparatus.
FIG. 3 is a diagram for explaining an example of characteristics when a face area is detected in the pattern recognition (detection) apparatus.
FIG. 4 is a diagram for explaining an example of detection reference data used in the face area detection.
FIG. 5 is a diagram for explaining an example of the target image for the face area detection;
FIG. 6 is a diagram for explaining an example of parameters used when the face area is detected.
FIG. 7 is a diagram for explaining an example of a feature detection reference model when detecting an eye region of the face region;
FIG. 8 is a diagram for explaining a difference in an eye feature detection model depending on a position in the eye region detection target image;
FIG. 9 is a diagram for explaining setting of a confirmation pattern for the face area detection;
FIG. 10 is a block diagram illustrating a configuration of an imaging apparatus with a function of the pattern recognition (detection) apparatus.
FIG. 11 is a diagram for explaining detection of a character string by the function of the pattern recognition (detection) device.
FIG. 12 is a block diagram showing a configuration of an information processing apparatus to which the present invention is applied in the second embodiment.
FIG. 13 is a diagram for explaining a convolutional neural network structure in the information processing apparatus.
FIG. 14 is a flowchart for explaining the operation of the information processing apparatus;
FIG. 15 is a diagram for schematically describing feature detection weight data in the information processing apparatus.
FIG. 16 is a block diagram showing a configuration of an information processing apparatus to which the present invention is applied in the third embodiment.
FIG. 17 is a diagram for schematically explaining functions of the information processing apparatus;
FIG. 18 is a block diagram illustrating a configuration of a computer that reads and executes a program for causing the computer to realize the functions of the apparatuses in the first to third embodiments from a computer-readable recording medium.
[Explanation of symbols]
100 pattern recognition (detection) device
101 Primary feature detection unit
102 Secondary feature detection unit
103 tertiary feature detection unit
104 Fourth feature detection unit
105 Pattern confirmation part
111 Primary feature detection filter setting unit
112 Secondary feature detection model setting unit
113 tertiary feature detection model setting unit
114 Fourth feature detection model setting unit
115 Confirmation pattern setting section
122 Secondary feature reference model holding unit
123 Tertiary feature reference model holding unit
124 Fourth-order feature reference model holding unit
125 Reference confirmation pattern holder
130 Signal input section

Claims

A pattern recognition device for detecting a predetermined pattern included in an input signal,
Feature detecting means for hierarchically detecting a plurality of features of the predetermined pattern;
Reference data holding means for holding a plurality of reference data in which the reference data of each stage is configured by a combination of areas corresponding to the above-described plurality of features and corresponding to the specific feature detected in the previous stage ;
The reference data of each stage held in the reference data holding means is converted in accordance with the positional relationship of a plurality of features preceding the target feature obtained by the feature detection means, and the converted reference data is converted to the feature Data setting means for setting as data used for detection of the target feature by the detection means ,
If the maximum value of the corresponding feature detected in the preceding stage is higher than the threshold value in each area constituting the reference data of each stage, the feature detection means should detect the position using that reference data A pattern recognition apparatus characterized in that a feature exists, and an average of the maximum values of each region is set as a value of the feature to be detected .

2. The pattern recognition apparatus according to claim 1, wherein the data setting means performs data setting for each spatial position of the input signal.

The reference data holding means holds reference data for detecting a plurality of features constituting a typical pattern of the predetermined pattern,
2. The feature detection unit according to claim 1, wherein the feature detection unit confirms the presence or absence of a predetermined pattern included in the input signal based on a correlation between the reference data converted by the data setting unit and the input signal. Pattern recognition device.

The positional relationship of the plurality of features is a distance between the plurality of features,
The pattern recognition apparatus according to claim 1, wherein the data setting unit converts the size of the reference data.

The positional relationship of the plurality of features is a slope of a straight line connecting the plurality of features,
The pattern recognition apparatus according to claim 1, wherein the data setting means rotates the reference data.

A pattern recognition method for detecting a predetermined pattern included in an input signal,
A feature detection step for hierarchically detecting a plurality of features constituting the predetermined pattern;
In order to detect a plurality of features in the feature detection step, a reference data holding step for holding a plurality of reference data of each stage composed of a combination of areas corresponding to specific features detected in the previous stage ;
A data setting step for setting data used for feature detection in the feature detection step based on the reference data held in the reference data holding step;
The data setting step, when setting the data to detect features in the feature detection step, the reference data of each stage held by the reference data holding step, the detection object obtained by the feature detection step converted in accordance with the positional relationship between a plurality of features of the preceding features, the reference data after the conversion look including the step of setting the data to be used for the detection of target features in the feature detection step,
In the feature detection step, if the maximum value of the corresponding feature detected in the previous stage is higher than the threshold value in each area constituting the reference data of each stage, the reference data should be detected at that position. A pattern recognition method characterized in that a feature exists and an average of the maximum values of each region is set as the feature value to be detected.

A program for causing a computer to function as a predetermined means,
The predetermined means is:
Feature detection means for hierarchically detecting a plurality of features of a predetermined pattern included in the input signal;
Reference data holding means for holding a plurality of reference data in which the reference data of each stage is configured by a combination of areas corresponding to the above-described plurality of features and corresponding to the specific feature detected in the previous stage ;
The reference data of each stage held in the reference data holding means is converted in accordance with the positional relationship of a plurality of features preceding the target feature obtained by the feature detection means, and the converted reference data is converted to the feature Data setting means for setting as data used for detection of the target feature by the detection means ,
If the maximum value of the corresponding feature detected in the preceding stage is higher than the threshold value in each area constituting the reference data of each stage, the feature detection means should detect the position using that reference data A program characterized in that a feature exists, and an average of the maximum values of each region is set as the value of the feature to be detected .