JP4264599B2

JP4264599B2 - Image processing apparatus and image processing method

Info

Publication number: JP4264599B2
Application number: JP29032398A
Authority: JP
Inventors: 淳一石橋
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1998-10-13
Filing date: 1998-10-13
Publication date: 2009-05-20
Anticipated expiration: 2018-10-13
Also published as: JP2000123171A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置および画像処理方法に関し、例えば、画像に表示された所定の物体を検出することができるようにする画像処理装置および画像処理方法に関する。
【０００２】
【従来の技術】
画像に表示された物体の検出方法については、従来より種々のものが提案されているが、そのうちの１つに、例えば、「音声と画像のディジタル信号処理」、谷荻著、１９９６年、コロナ社の１２８乃至１３０ページなどに記載されているスリット法がある。
【０００３】
スリット法では、長方形状のスリットが、処理対象の画像の中の所定の位置に配置され、そのスリット内の画素値のヒストグラムの分布が検出される。そして、そのヒストグラム分布を、基準となるヒストグラム分布と比較することなどによって、処理対象の画像に表示された、例えば、人の顔の各部分（例えば、目など）などが検出される。
【０００４】
【発明が解決しようとする課題】
しかしながら、スリット法では、人を撮影するときの照明の位置や明るさが変化すると、得られるヒストグラムの形状も変化し、基準となるヒストグラムの形状からずれるため、そのような場合に対処するのが困難であった。
【０００５】
また、顔の形状、さらには、顔の中の目や、鼻、口などの顔の構成要素の位置、あるいは髪型などには、個人差がある。このため、人によっては、得られるヒストグラムの形状が、基準となるヒストグラムの形状と大きく異なるものとなり、従って、スリット法では、不特定多数の人間に対処することが困難であった。
【０００６】
さらに、スリット法では、空間的に２次元の情報である画像を、スリットを介して観察することで、いわば１次元の情報に変換して評価を行うため、処理対象の画像を回転させた場合や、画像に表示されている人の顔が傾いている場合には、得られるヒストグラムの形状が、基準となるヒストグラムの形状と大きく異なるものとなり、そのような場合に対処することが困難であった。
【０００７】
本発明は、このような状況に鑑みてなされたものであり、例えば、個人差や、画像の傾き、照明の変化などによる影響を軽減して、より精度良く、物体の検出等を行うことができるようにするものである。
【０００８】
【課題を解決するための手段】
本発明の第１の側面の画像処理装置は、画像の中の所定のパターンの位置を検出するための処理を行う画像処理装置であって、画素データを抽出するための複数の窓が一部に配置された複数のテンプレートを記憶するテンプレート記憶手段と、前記複数のテンプレートの中から選択されたテンプレートにしたがって、前記画像から、前記複数の窓に対応する複数の領域の画素データを抽出する抽出手段と、前記複数の領域それぞれの画素データを正規化し、その複数の領域それぞれについて、正規化データを出力する正規化手段と、前記複数の領域それぞれについての正規化データに基づいて、前記所定のパターンの位置を検出する検出手段とを備え、前記テンプレートの複数の窓は、そのテンプレートによって検出するパターンに応じた形状に配置されており、瞳の位置を検出する前記テンプレートでは、前記窓が、目の形状の瞳の位置と、前記目の形状の上側の輪郭上の複数の位置と、前記目の形状の下側に平行に並ぶ、上下方向に延びる複数の線分上の複数の位置とに配置されている画像処理装置である。
【０００９】
本発明の第１の側面の画像処理方法は、画像の中の所定のパターンの位置を検出するための処理を行う画像処理方法であって、画素データを抽出するための複数の窓が一部に配置された複数のテンプレートを記憶するテンプレート記憶手段に記憶された複数のテンプレートの中から選択されたテンプレートにしたがって、前記画像から、前記複数の窓に対応する複数の領域の画素データを抽出する抽出ステップと、前記複数の領域それぞれの画素データを正規化し、その複数の領域それぞれについて、正規化データを出力する正規化ステップと、前記複数の領域それぞれについての正規化データに基づいて、前記所定のパターンの位置を検出する検出ステップとを備え、前記テンプレートの複数の窓は、そのテンプレートによって検出するパターンに応じた形状に配置されており、瞳の位置を検出する前記テンプレートでは、前記窓が、目の形状の瞳の位置と、前記目の形状の上側の輪郭上の複数の位置と、前記目の形状の下側に平行に並ぶ、上下方向に延びる複数の線分上の複数の位置とに配置されている画像処理方法である。
【００１０】
本発明の第２の側面の画像処理装置は、画像の中の所定のパターンの位置を検出するための処理を行う画像処理装置であって、入力された光を光電変換し、前記画像を出力する撮像手段と、画素データを抽出するための複数の窓が一部に配置された複数のテンプレートを記憶するテンプレート記憶手段と、前記複数のテンプレートの中から選択されたテンプレートにしたがって、前記画像から、前記複数の窓に対応する複数の領域の画素データを抽出する抽出手段と、前記複数の領域それぞれの画素データを正規化し、その複数の領域それぞれについて、正規化データを出力する正規化手段と、前記複数の領域それぞれについての正規化データに基づいて、前記所定のパターンの位置を検出する検出手段とを備え、前記テンプレートの複数の窓は、そのテンプレートによって検出するパターンに応じた形状に配置されており、瞳の位置を検出する前記テンプレートでは、前記窓が、目の形状の瞳の位置と、前記目の形状の上側の輪郭上の複数の位置と、前記目の形状の下側に平行に並ぶ、上下方向に延びる複数の線分上の複数の位置とに配置されている画像処理装置である。
【００１１】
本発明の第１及び第２の側面においては、画素データを抽出するための複数の窓が一部に配置された複数のテンプレートの中から選択されたテンプレートにしたがって、前記画像から、前記複数の窓に対応する複数の領域の画素データが抽出され、前記複数の領域それぞれの画素データが正規化される。そして、その複数の領域それぞれについて、正規化データが出力され、前記複数の領域それぞれについての正規化データに基づいて、前記所定のパターンの位置が検出される。前記テンプレートの複数の窓は、そのテンプレートによって検出するパターンに応じた形状に配置されており、瞳の位置を検出する前記テンプレートでは、前記窓が、目の形状の瞳の位置と、前記目の形状の上側の輪郭上の複数の位置と、前記目の形状の下側に平行に並ぶ、上下方向に延びる複数の線分上の複数の位置とに配置されている。
【００１４】
【発明の実施の形態】
図１は、本発明を適用した位置検出装置の一実施の形態の構成例を示している。
【００１５】
この位置検出装置においては、例えば、画像に表示された物体の位置が検出されるようになされている。
【００１６】
即ち、フレームメモリ１には、処理対象のディジタル画像データが供給されるようになされており、フレームメモリ１は、その画像データを、例えば、１フレーム単位で記憶するようになされている。
【００１７】
領域抽出部２（抽出手段）は、テンプレート選択部６から供給されるテンプレートにしたがって、フレームメモリ１に記憶された画像から、複数の領域の画素データを抽出し、正規化部３および閾値算出部７に供給するようになされている。さらに、領域抽出部２は、画素データを抽出するときに、テンプレートを、処理対象の画像（フレームメモリ１に記憶された画像）の、どの位置に配置したかを表す位置情報を、位置検出部４に供給するようにもなされている。
【００１８】
正規化部３（正規化手段）は、領域抽出部２から供給される、処理対象の画像から抽出された複数の領域それぞれの画素データを正規化し、その複数の領域それぞれについて、正規化データを、位置検出部４に供給するようになされている。即ち、領域抽出部２からの複数の領域それぞれは、複数の画素データを有しており、正規化部３は、その複数の領域それぞれの複数の画素データから、複数の領域それぞれを代表する代表値を求める。さらに、正規化部３は、複数の領域それぞれを代表する代表値を、閾値計算部７から供給される所定の閾値と比較することで、例えば、１ビットの正規化データとし、位置検出部４に供給する。
【００１９】
位置検出部４（検出手段）は、正規化部３から供給される、複数の領域それぞれについての正規化データに基づいて、処理対象の画像の中の所定のパターンの位置を検出するようになされている。即ち、位置検出部４には、上述したように、正規化部３から、処理対象の画像の中の複数の領域それぞれについての正規化データが供給されるとともに、領域抽出部２から、位置情報が供給される他、テンプレート選択部６から、領域抽出部２で用いられたテンプレートの種類を表す選択情報が供給されるようにもなされている。そして、位置検出部４は、テンプレート選択部６からの選択情報に基づいて、検出用パターン記憶部８から、所定の検出用パターンを読み出し、その検出用パターンと、正規化部３からの正規化データのパターンとを比較する。さらに、位置検出部４は、その比較結果に基づいて、領域抽出部２からの位置情報に対応する、処理対象の画像の中の位置に、所定のパターンが存在するかどうかを判定し、その判定結果にしたがって、処理対象の画像に表示された物体の位置を検出するようになされている。
【００２０】
テンプレート記憶部５は、処理対象の画像から、種々のパターンに対応する領域の画素データをそれぞれ抽出するための複数のテンプレートを記憶している。テンプレート選択部６は、テンプレート記憶部５に記憶された複数のテンプレートの中から、あらかじめ設定された種類のパターンのテンプレートを選択し、領域抽出部２に供給するようになされている。また、テンプレート選択部６は、選択したテンプレートの種類を表す選択情報を、位置検出部４に供給するようにもなされている。閾値計算部７（閾値算出手段）は、領域抽出部２からの、処理対象の画像から抽出された複数の領域それぞれの画素データに基づいて、正規化部３において正規化データを求めるのに用いる所定の閾値を算出し、正規化部３に供給するようになされている。検出用パターン記憶部８は、位置検出部４において物体の位置を求めるのに用いる、正規化データの、いわば基準のパターンを表す検出用パターンを、テンプレート記憶部５に記憶されたテンプレートごとに記憶している。
【００２１】
次に、図２のフローチャートを参照して、その動作について説明する。
【００２２】
フレームメモリ１に、処理対象の画像データが供給されると、フレームメモリ１では、ステップＳ１において、その画像データが記憶される。ここで、例えば、いま、図３（Ａ）に示すような、大小２種類の円が表示された画像が、フレームメモリ１に記憶されたとし、この画像から、大きい円または小さい円のうちの、例えば、小さい円の位置を検出するものとする。なお、ここでは、フレームメモリ１に供給される画像が、グレースケールで表現されているものとし、円が表示されている部分は、他の部分と比較して輝度が低いものとする（このように仮定しても、処理の一般性は失われるものではない）。
【００２３】
フレームメモリ１に、例えば、１フレーム分の画像データが記憶されると、ステップＳ２に進み、テンプレート選択部６は、あらかじめ設定された種類のテンプレートを、テンプレート記憶部５に記憶されているテンプレートの中から選択し、領域抽出部２に供給するとともに、その選択したテンプレートの種類を表す選択情報を、位置検出部４に供給する。即ち、いまの場合、図３（Ａ）に示した大小の円のうちの小さい円の位置を検出するため、テンプレート選択部６では、例えば、図３（Ｂ）に示すような、その小さい円と同一の形状をした窓を中心に配置し、さらに、その中心に配置した窓の周りに、それと同一形状の８個の窓を、比較的密に配置して正方形（正四角形）状にしたテンプレートが選択され、領域抽出部２に供給されるとともに、そのテンプレートに対応する選択情報が、位置検出部４に供給される。
【００２４】
領域抽出部２は、テンプレート選択部６からテンプレートを受信すると、ステップＳ３において、そのテンプレートにしたがって、フレームメモリ１に記憶された画像から、複数の領域の画素データを抽出する。即ち、ステップＳ３では、領域抽出部２において、テンプレート選択部６からのテンプレートが、例えば、図３（Ｃ）に示すように、フレームメモリ１に記憶された画像の左上などの所定の位置に配置され、そのテンプレートのパターンに対応する複数の領域の画素データが抽出される。具体的には、例えば、図３（Ｃ）に示すように、フレームメモリ１に記憶された画像の左上に配置されたテンプレートの９個の窓（ここでは、上述したように、位置を検出する小さい円と同一形状の窓）それぞれから見える画像の領域を構成する画素データが抽出される。
【００２５】
さらに、ステップＳ３では、領域抽出部２において、テンプレートにしたがって抽出された複数の領域、即ち、ここでは、図３（Ｂ）に示したテンプレートが有する窓の個数に等しい９個の領域の画素データが、正規化部３および閾値計算部７に供給されるとともに、テンプレートが配置された、処理対象の画像上の位置を表す位置情報（例えば、図３（Ｂ）のテンプレートの中央の窓である円の中心の位置など）が、位置検出部４に供給され、ステップＳ４に進む。
【００２６】
ステップＳ４では、閾値計算部７において、所定の閾値が求められる。即ち、閾値計算部７は、まず、領域抽出部２からの複数の領域それぞれの画素データを代表する代表値を算出する。具体的には、閾値計算部７は、ある領域に注目した場合に、その領域の複数の画素データの、例えば、平均値を、その代表値として算出する（なお、領域が１の画素データしか有しない場合は、例えば、その１の画素データがそのまま代表値とされる）。さらに、閾値計算部７は、例えば、複数の領域それぞれの代表値の最大値または最小値を、それぞれＶ_maxまたはＶ_minとすると、（Ｖ_max−Ｖ_min）／２を、所定の閾値εとして算出する。この閾値εは、正規化部３に供給される。
【００２７】
正規化部３は、領域抽出部２から複数の領域の画素データを受信するとともに、閾値計算部７から閾値εを受信すると、ステップＳ５において、複数の領域の画素データを、１の１ビットの値に正規化する。即ち、正規化部３は、例えば、上述の閾値計算部７と同様にして、領域抽出部２からの複数の領域についての代表値を計算する。なお、正規化部３と閾値計算部７とでは、異なる手法で代表値を計算することが可能である。上述した手法以外の手法による代表値の計算については、後述する。
【００２８】
正規化部３は、複数の領域それぞれについての代表値を求めると、その代表値を、閾値計算部７から供給される所定の閾値と比較することで、１ビットの正規化データとする。即ち、正規化部３は、複数の領域それぞれの代表値と、閾値εとを比較し、代表値の方が閾値εより大きい場合（以上の場合）、その代表値を、０または１のうちの、例えば１とし、代表値が閾値ε以下である場合（未満である場合）、その代表値を０とする。そして、正規化部３は、複数の領域それぞれについての１ビットの正規化データを、位置検出部４に出力する。
【００２９】
位置検出部４では、ステップＳ６において、正規化部３から供給される、複数の領域それぞれについての正規化データに基づいて、処理対象の画像の中の所定のパターンの位置が検出される。即ち、位置検出部４は、テンプレート選択部６からの選択情報に基づいて、検出用パターン記憶部８から、所定の検出用パターンを読み出す。具体的には、位置検出部４は、例えば、図３（Ｂ）に示すようなテンプレートを表す選択情報を受信した場合、例えば、図３（Ｄ）に示すような、図３（Ｂ）のテンプレートと同一パターンの９個の窓を有し、中心の窓に割り当てられた値が０で、他の窓に割り当てられた値が１の検出用パターンを読み出す。
【００３０】
さらに、位置検出部４は、複数の領域それぞれについての正規化データと、検出用パターンの、対応する窓に割り当てられた値とを比較する。そして、位置検出部４は、それらがすべて一致している場合のみ、領域抽出部４からの位置情報を、位置の検出結果として出力し、そうでない場合は、領域抽出部４からの位置情報を破棄する。
【００３１】
ここで、図３（Ａ）の画像では、上述したように、円が表示されている部分は、他の部分と比較して輝度が低くなっている。従って、図３（Ｂ）に示したテンプレートの９個の窓のうちの中心の窓の位置が、図３（Ａ）の画像に表示された小さな円の位置に一致した場合のみ、図３（Ｂ）のテンプレートにしたがって抽出される９個の領域の正規化データは、一般に、図３（Ｄ）に示すように、テンプレートの中心の窓に対応する領域については１になり、その他の領域については０となるから、正規化データがそのようになるときのテンプレートの位置は、図３（Ａ）の画像に表示された小さな円の位置に一致することになる。
【００３２】
ステップＳ６の処理後は、ステップＳ７に進み、フレームメモリ１に記憶された画像のすべての領域について、ステップＳ３乃至Ｓ６の処理を行ったかどうかが判定される。ステップＳ７において、フレームメモリ１に記憶された画像のすべての領域について、まだ処理を行っていないと判定された場合、例えば、図３（Ｃ）に矢印で示すように、テンプレートの位置が、現在位置から、例えば、１画素分だけラインスキャン順に移動され、ステップＳ３に戻り、以下、同様の処理が繰り返される。
【００３３】
また、ステップＳ７において、フレームメモリ１に記憶された画像のすべての領域について処理を行ったと判定された場合、処理を終了する。
【００３４】
次に、図３においては、大小２種類の円が表示された画像から、小さい円の位置を検出するようにしたが、その他、例えば、異なる形状および大きさの図形が表示された画像から、所定の大きさの図形の位置を検出することも可能である。
【００３５】
即ち、例えば、図４（Ａ）に示すような大小の円や、正方形、八角形、ハート型が表示された画像から、小さな図形だけを検出する場合には（図４（Ａ）の画像も、図３（Ａ）における場合と同様にグレースケールで表現されているものとし、図形が表示されている部分は、他の部分と比較して輝度が低いものとする）、例えば、図４（Ｂ）に示すように、その小さな図形よりも幾分小さい円形状をした窓を中心に配置し、さらに、その中心に配置した窓の周りに、それと同一形状の窓を、疎らに８個配置して正方形（正四角形）状にしたテンプレートを用いる。さらに、この場合も、図４（Ｂ）のテンプレートを、図４（Ａ）の画像に対して、図４（Ｃ）に矢印で示すように、ラインスキャン順に移動しながら、そのテンプレートにしたがって、複数の領域それぞれの画素データを抽出し、正規化データを求める。そして、複数の領域それぞれの正規化データ、即ち、いまの場合、図４（Ｂ）のテンプレートにしたがって抽出された９個の領域の正規化データが、図４（Ｄ）に示すようなパターン（中心の窓に０、他の窓の１が割り当てられた検出用パターン）に一致すれば、そのときのテンプレートの位置が、図４（Ａ）に示した画像に表示された大小の図形のうちの小さな図形の位置に一致することになる。
【００３６】
次に、例えば、図５（Ａ）に示すような人の顔が表示された画像から、瞳や、鼻の穴、軽くあけた口などの、比較的、小さな円形状に近い、輝度の低い部分だけを検出する場合には（図５（Ａ）の画像も、図３（Ａ）における場合と同様にグレースケールで表現されているものとする）、例えば、図５（Ｂ）に示すように、その小さな円形状をした窓を中心に配置し、さらに、その中心に配置した窓の周りに、それと同一形状の窓を、疎らに８個配置してひし形状にしたテンプレートを用いる。さらに、この場合も、図５（Ｂ）のテンプレートを、図５（Ａ）の画像に対して、図５（Ｃ）に矢印で示すように、ラインスキャン順に移動しながら、そのテンプレートにしたがって、複数の領域それぞれの画素データを抽出し、正規化データを求める。そして、複数の領域それぞれの正規化データ、即ち、いまの場合、図５（Ｂ）のテンプレートにしたがって抽出された９個の領域の正規化データが、図５（Ｄ）に示すような検出用パターンに一致すれば、そのときのテンプレートの位置が、図５（Ａ）に示した画像に表示された瞳や、鼻の穴、軽くあけた口などの、比較的、小さな円形状に近い、輝度の低い部分の位置に一致することになる。
【００３７】
次に、例えば、図５（Ａ）と同様の図６（Ａ）に示すような人の顔が表示された画像から、瞳だけを検出する場合には（図６（Ａ）の画像も、図３（Ａ）における場合と同様にグレースケールで表現されているものとする）、瞳の大きさと同程度の大きさの円形状をした窓を、例えば、図６（Ｂ）に示すように配置したテンプレートを用いる。さらに、この場合も、図６（Ｂ）のテンプレートを、図６（Ａ）の画像に対して、図６（Ｃ）に矢印で示すように、ラインスキャン順に移動しながら、そのテンプレートにしたがって、複数の領域それぞれの画素データを抽出し、正規化データを求める。そして、複数の領域それぞれの正規化データ、即ち、いまの場合、図６（Ｂ）のテンプレートにしたがって抽出された正規化データが、図６（Ｄ）に示すような検出用パターンに一致すれば、そのときのテンプレートの位置が、図６（Ａ）に示した画像に表示された瞳の位置に一致することになる。即ち、人が表示された画像では、その人の目の下の部分は、比較的に、輝度が高く、安定（輝度変化が小さい）していることが多い。このため、図６（Ｂ）に示すようなテンプレートと、図６（Ｄ）に示すような検出用パターンとを用いることで、人が表示された画像から、比較的精度良く、瞳の位置を検出することができる。
【００３８】
次に、図７は、本発明を適用したカメラシステム（システムとは、複数の装置が論理的に集合したものをいい、各構成の装置が同一筐体中にあるか否かは問わない）の一実施の形態の構成例を示している。このカメラシステムにおいては、被写体である人の、例えば、瞳の位置が、常時、フレームの所定の位置に表示されるようになされている。
【００３９】
即ち、ビデオカメラ１１（撮像手段）では、被写体である人からの光が光電変換され、その人が表示された画像信号とされる。この画像信号は、Ａ／Ｄ（Analog/Digital）変換器１２に供給され、Ａ／Ｄ変換されることにより、ディジタル画像データとされ、位置検出装置１３に供給される。位置検出装置１３は、図１に示した位置検出装置と同様に構成されており、Ａ／Ｄ変換器１２からの画像データから、被写体である人の瞳の位置を検出し、その位置を表す位置情報を、制御部１４に供給する。制御部１４（制御手段）は、位置検出装置１３からの位置情報が、所定の位置を表すようになるように、パン／チルト機構１５を制御する。パン／チルト機構１５では、制御部１４の制御にしたがい、ビデオカメラ１１が出力する画像に表示された人の瞳が、所定の位置に一致するように、ビデオカメラ１１がパンニングまたはチルティングされる。
【００４０】
以上のように、テンプレートに形成された複数の窓にしたがって、画像から、複数の領域それぞれの画素データを抽出し、位置の検出に用いるようにしたので、テンプレートの窓どうしの間にある画素データは、位置の検出に関して、いわば不感帯となり（位置の検出に影響を及ぼさなくなり）、個人差や、姿勢の違い、照明の変化などによる悪影響を軽減した、そのような要因に対する耐性の高い位置検出が可能となる。さらに、不感帯の画素データは、演算に用いられないから、演算時間の短縮化を図ることができる。
【００４１】
また、複数の領域それぞれについて、連続した値をとる代表値を、閾値εによって、いわば量子化した正規化データに変換し、さらに、その正規化データのパターンに基づいて、位置の検出を行うようにしたので、やはり、個人差や、姿勢の違い、照明の変化などによる悪影響を軽減した、そのような要因に対する耐性の高い位置検出が可能となる。
【００４２】
なお、テンプレートのパターン（テンプレートに形成される窓のパターン）は、上述したものに限定されるものではない。ここで、個人差や姿勢の違いなどは、テンプレートに形成された窓どうしの間隔等を調整することによって吸収すること（個人差や姿勢の違いなどの影響を受けにくくすること）が可能である。
【００４３】
また、本実施の形態では、瞳などの物体の位置を検出するようにしたが、物体の位置を検出することができれば、例えば、膨張処理などを行うことで、その物体が表示された領域も抽出することができる。従って、本発明は、画像に表示された物体の位置だけでなく、領域を検出する場合にも適用可能である。さらに、本発明は、画像に表示された物体の位置の他、画像に表示された模様などの所定のパターンの位置を検出する場合にも適用可能である。
【００４４】
また、本実施の形態では、テンプレートにしたがって抽出されたある領域の複数の画素データの平均値を、その領域の代表値とするようにしたが、その他、例えば、その領域の複数の画素データの分布範囲の中央の値（複数の画素データを、例えば、昇順に並べた場合に、その中央に位置する値（中央に位置する値が、２つ存在する場合には、それらのうちのいずれか一方や平均値など））や、複数の画素データの最大値若しくは最小値などを、代表値として用いることが可能である。
【００４５】
さらに、本実施の形態では、複数の領域それぞれの代表値の最大値または最小値を、それぞれＶ_maxまたはＶ_minとするとき、（Ｖ_max−Ｖ_min）／２を、所定の閾値εとして用いるようにしたが、その他、例えば、（Ｖ_max−Ｖ_min）／２以外の、複数の領域それぞれの代表値の最大値Ｖ_maxと最小値Ｖ_minとの差に比例する値や、複数の領域それぞれの代表値の平均値Ｖ_aveに比例する値、または複数の領域それぞれの代表値の分布範囲の中央の値Ｖ_medに比例する値などを、閾値として用いることが可能である。具体的には、例えば、３×（Ｖ_max−Ｖ_min）／４や、Ｖ_ave／２，Ｖ_med／２などの、式Ｎ×（Ｖ_max−Ｖ_min）／２^Nや、Ｎ×Ｖ_ave／２^N，Ｎ×Ｖ^med／２^Nで表現可能な値などを、閾値として用いることが可能である（但し、Ｍ，Ｎは正の整数）。
【００４６】
また、本実施の形態では、処理の対象とする画像がグレースケールで表現されるものとしたことから、画素データとしては、輝度値を用いることとなるが、その他、画素データとしては、色差信号の色信号を用いることも可能である。
【００４７】
さらに、本実施の形態では、テンプレートを、ラインスキャン順に移動していくようにしたが、テンプレートの移動のさせ方は、ラインスキャン順に限定されるものではない。
【００４８】
また、本実施の形態では、１フレームの空間方向に分布する画素データを抽出するためのテンプレートを用いるようにしたが、テンプレートとしては、その他、例えば、時間方向に分布する画素データも抽出するようなものを用いることが可能である。即ち、例えば、図８に示すように、注目しているフレームを、第Ｎフレームとして、第Ｎフレームの他に、その前後の第Ｎ−１フレームおよび第Ｎ＋１フレームから、画素データを抽出するテンプレートを用いることが可能である。
【００４９】
【発明の効果】
本発明の第１及び第２の側面によれば、例えば、個人差や、画像の傾き、照明の変化等による影響を軽減して、より精度良く、物体の検出等を行うことが可能となる。特に、人が表示された画像から、比較的精度良く、瞳の位置を検出することができる。
【図面の簡単な説明】
【図１】本発明を適用した位置検出装置の一実施の形態の構成例を示すブロック図である。
【図２】図１の位置検出装置の処理を説明するためのフローチャートである。
【図３】図１の位置検出装置の処理を説明するための図である。
【図４】図１の位置検出装置の処理を説明するための図である。
【図５】図１の位置検出装置の処理を説明するための図である。
【図６】図１の位置検出装置の処理を説明するための図である。
【図７】本発明を適用したカメラシステムの一実施の形態の構成例を示すブロック図である。
【図８】空間および時間の両方向に分布するテンプレートを示す図である。
【符号の説明】
１フレームメモリ，２領域抽出部（抽出手段），３正規化部（正規化手段），４位置検出部（検出手段），５テンプレート記憶部，６テンプレート選択部，７閾値計算部（閾値算出手段），８検出用パターン記憶部，１１ビデオカメラ（撮像手段），１２Ａ／Ｄ変換器，１３位置検出装置，１４制御部（制御手段），１５パン／チルト機構[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing device and an image processing method, and for example, relates to an image processing device and an image processing method that can detect a predetermined object displayed in an image.
[0002]
[Prior art]
Various methods for detecting an object displayed in an image have been proposed, and one of them is, for example, “digital signal processing of sound and image”, Tanibuchi, 1996, Corona. There is a slit method described on pages 128 to 130 of the company.
[0003]
In the slit method, a rectangular slit is arranged at a predetermined position in an image to be processed, and a histogram distribution of pixel values in the slit is detected. Then, by comparing the histogram distribution with a reference histogram distribution, for example, each part (for example, eyes) of a human face displayed on the processing target image is detected.
[0004]
[Problems to be solved by the invention]
However, in the slit method, if the position or brightness of the illumination when a person is photographed changes, the shape of the obtained histogram also changes and deviates from the shape of the reference histogram. It was difficult.
[0005]
In addition, there are individual differences in the shape of the face, as well as the position of the constituent elements of the face such as the eyes in the face, the nose and the mouth, and the hairstyle. For this reason, depending on the person, the shape of the obtained histogram is greatly different from the shape of the reference histogram, and therefore it is difficult to deal with an unspecified number of people by the slit method.
[0006]
Furthermore, in the slit method, when an image that is spatially two-dimensional information is observed through the slit and converted into one-dimensional information for evaluation, the image to be processed is rotated. If the face of the person displayed in the image is tilted, the shape of the obtained histogram will be very different from the shape of the reference histogram, and it is difficult to deal with such a case. It was.
[0007]
The present invention has been made in view of such a situation. For example, it is possible to reduce the influence of individual differences, image inclination, illumination change, etc., and to detect an object with higher accuracy. It is something that can be done.
[0008]
[Means for Solving the Problems]
  An image processing apparatus according to a first aspect of the present invention is an image processing apparatus that performs processing for detecting a position of a predetermined pattern in an image, and includes a plurality of windows for extracting pixel data. Extracting pixel data of a plurality of regions corresponding to the plurality of windows from the image according to a template storage means for storing a plurality of templates arranged in the template and a template selected from the plurality of templates Normalization means for normalizing pixel data of each of the plurality of regions, outputting normalized data for each of the plurality of regions, and the predetermined data based on the normalized data for each of the plurality of regions Detecting means for detecting the position of the pattern, and the plurality of windows of the template are shaped according to the pattern detected by the template. It is arranged to beIn the template for detecting the position of the pupil, the window is parallel to the pupil position of the eye shape, a plurality of positions on the upper contour of the eye shape, and the lower side of the eye shape. Arranged at a plurality of positions on a plurality of line segments extending in the vertical direction.An image processing apparatus.
[0009]
  An image processing method according to a first aspect of the present invention is an image processing method for performing processing for detecting a position of a predetermined pattern in an image, and includes a plurality of windows for extracting pixel data. Extracting pixel data of a plurality of regions corresponding to the plurality of windows from the image according to a template selected from a plurality of templates stored in a template storage unit that stores a plurality of templates arranged in An extraction step; normalizing pixel data for each of the plurality of regions; outputting a normalization data for each of the plurality of regions; and the predetermined data based on the normalized data for each of the plurality of regions. A detecting step for detecting a position of the pattern, and the plurality of windows of the template are detected by the template. They are arranged into a shape corresponding to the over emissionsIn the template for detecting the position of the pupil, the window is parallel to the pupil position of the eye shape, a plurality of positions on the upper contour of the eye shape, and the lower side of the eye shape. Arranged at a plurality of positions on a plurality of line segments extending in the vertical direction.This is an image processing method.
[0010]
  An image processing apparatus according to a second aspect of the present invention is an image processing apparatus that performs processing for detecting the position of a predetermined pattern in an image, photoelectrically converts input light, and outputs the image. Imaging means, template storage means for storing a plurality of templates in which a plurality of windows for extracting pixel data are arranged in part, and a template selected from the plurality of templates according to the template. Extracting means for extracting pixel data of a plurality of areas corresponding to the plurality of windows; normalizing means for normalizing the pixel data of each of the plurality of areas and outputting normalized data for each of the plurality of areas; Detection means for detecting a position of the predetermined pattern based on normalized data for each of the plurality of regions, and a plurality of the templates It is disposed into a shape corresponding to a pattern detected by the templateIn the template for detecting the position of the pupil, the window is parallel to the pupil position of the eye shape, a plurality of positions on the upper contour of the eye shape, and the lower side of the eye shape. Arranged at a plurality of positions on a plurality of line segments extending in the vertical direction.An image processing apparatus.
[0011]
  In the first and second aspects of the present invention, according to a template selected from a plurality of templates in which a plurality of windows for extracting pixel data are arranged in part, the plurality of windows are extracted from the image. Pixel data of a plurality of areas corresponding to the window is extracted, and the pixel data of each of the plurality of areas is normalized. Then, normalized data is output for each of the plurality of regions, and the position of the predetermined pattern is detected based on the normalized data for each of the plurality of regions. The plurality of windows of the template are arranged in a shape corresponding to a pattern detected by the template.In the template for detecting the position of the pupil, the window is parallel to the pupil position of the eye shape, a plurality of positions on the upper contour of the eye shape, and the lower side of the eye shape. Arranged at a plurality of positions on a plurality of line segments extending in the vertical direction..
[0014]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a configuration example of an embodiment of a position detection device to which the present invention is applied.
[0015]
In this position detection device, for example, the position of an object displayed in an image is detected.
[0016]
That is, digital image data to be processed is supplied to the frame memory 1, and the frame memory 1 stores the image data, for example, in units of one frame.
[0017]
The region extraction unit 2 (extraction means) extracts pixel data of a plurality of regions from the image stored in the frame memory 1 according to the template supplied from the template selection unit 6, and normalizes the unit 3 and the threshold calculation unit. 7 is supplied. Furthermore, when extracting the pixel data, the region extracting unit 2 displays position information indicating where the template is arranged in the processing target image (the image stored in the frame memory 1). 4 is also provided.
[0018]
The normalization unit 3 (normalization means) normalizes the pixel data of each of the plurality of regions extracted from the processing target image supplied from the region extraction unit 2, and obtains normalized data for each of the plurality of regions. The position detection unit 4 is supplied. That is, each of the plurality of regions from the region extraction unit 2 has a plurality of pixel data, and the normalization unit 3 represents the representative of each of the plurality of regions from the plurality of pixel data of each of the plurality of regions. Find the value. Further, the normalization unit 3 compares the representative value representing each of the plurality of regions with a predetermined threshold supplied from the threshold calculation unit 7 to obtain, for example, 1-bit normalized data, and the position detection unit 4 To supply.
[0019]
The position detection unit 4 (detection means) detects the position of a predetermined pattern in the image to be processed based on the normalization data for each of the plurality of regions supplied from the normalization unit 3. ing. That is, as described above, the position detection unit 4 is supplied with normalization data for each of a plurality of regions in the image to be processed from the normalization unit 3, and also receives position information from the region extraction unit 2. In addition, the template selection unit 6 supplies selection information indicating the type of template used in the region extraction unit 2. Then, the position detection unit 4 reads out a predetermined detection pattern from the detection pattern storage unit 8 based on the selection information from the template selection unit 6, and normalizes the detection pattern and the normalization unit 3. Compare the data pattern. Further, based on the comparison result, the position detection unit 4 determines whether or not a predetermined pattern exists at a position in the image to be processed corresponding to the position information from the region extraction unit 2, and According to the determination result, the position of the object displayed in the processing target image is detected.
[0020]
The template storage unit 5 stores a plurality of templates for extracting pixel data of regions corresponding to various patterns from the processing target image. The template selection unit 6 selects a template of a preset type from among a plurality of templates stored in the template storage unit 5 and supplies the template to the region extraction unit 2. The template selection unit 6 is also configured to supply selection information representing the type of the selected template to the position detection unit 4. The threshold calculation unit 7 (threshold calculation means) is used for obtaining normalized data in the normalization unit 3 based on the pixel data of each of the plurality of regions extracted from the processing target image from the region extraction unit 2. A predetermined threshold value is calculated and supplied to the normalization unit 3. The detection pattern storage unit 8 stores, for each template stored in the template storage unit 5, a detection pattern representing a standard pattern of normalized data used for obtaining the position of the object in the position detection unit 4. is doing.
[0021]
Next, the operation will be described with reference to the flowchart of FIG.
[0022]
When image data to be processed is supplied to the frame memory 1, the frame memory 1 stores the image data in step S1. Here, for example, as shown in FIG. 3A, it is assumed that an image in which two types of large and small circles are displayed is stored in the frame memory 1, and from this image, a large circle or a small circle is displayed. For example, the position of a small circle is detected. Here, it is assumed that the image supplied to the frame memory 1 is expressed in gray scale, and a portion where a circle is displayed has a lower brightness than other portions (such as this The generality of processing is not lost.
[0023]
For example, when image data for one frame is stored in the frame memory 1, the process proceeds to step S <b> 2, and the template selection unit 6 selects a preset type of template stored in the template storage unit 5. While selecting from among them, it supplies to the area | region extraction part 2, and the selection information showing the kind of the selected template is supplied to the position detection part 4. FIG. That is, in this case, in order to detect the position of a small circle among the large and small circles shown in FIG. 3A, the template selection unit 6 uses, for example, the small circle as shown in FIG. A window having the same shape as the center is arranged in the center, and further, eight windows having the same shape are arranged relatively densely around the window arranged in the center to form a square (regular tetragon) shape. A template is selected and supplied to the region extraction unit 2, and selection information corresponding to the template is supplied to the position detection unit 4.
[0024]
  When receiving the template from the template selection unit 6, the region extraction unit 2 extracts pixel data of a plurality of regions from the image stored in the frame memory 1 in accordance with the template in step S3. That is, in step S3, in the region extraction unit 2, the template from the template selection unit 6 is arranged at a predetermined position such as the upper left of the image stored in the frame memory 1 as shown in FIG. Then, pixel data of a plurality of regions corresponding to the template pattern is extracted. Specifically, for example, as shown in FIG. 3C, nine windows of the template arranged in the upper left of the image stored in the frame memory 1 (here, as described above, the position is detected. Windows with the same shape as a small circle)RespectivelyPixel data constituting the region of the image that can be seen from is extracted.
[0025]
Further, in step S3, pixel data of a plurality of regions extracted according to the template in the region extraction unit 2, that is, here, nine regions equal to the number of windows included in the template shown in FIG. Is the positional information (for example, the central window of the template in FIG. 3B) that represents the position on the image to be processed where the template is placed, while being supplied to the normalization unit 3 and the threshold value calculation unit 7. The position of the center of the circle, etc.) is supplied to the position detector 4, and the process proceeds to step S4.
[0026]
In step S4, the threshold calculation unit 7 calculates a predetermined threshold. That is, the threshold value calculation unit 7 first calculates a representative value representing pixel data of each of the plurality of regions from the region extraction unit 2. Specifically, when focusing on a certain area, the threshold calculation unit 7 calculates, for example, an average value of a plurality of pieces of pixel data in that area as a representative value (note that only pixel data with one area is included). If not, for example, the one pixel data is used as a representative value as it is). Further, the threshold value calculation unit 7 calculates, for example, the maximum value or the minimum value of the representative values of each of the plurality of areas as V_maxOr V_minThen (V_max-V_min) / 2 is calculated as a predetermined threshold ε. This threshold value ε is supplied to the normalization unit 3.
[0027]
When the normalization unit 3 receives the pixel data of a plurality of regions from the region extraction unit 2 and also receives the threshold value ε from the threshold calculation unit 7, the normalization unit 3 converts the pixel data of the plurality of regions into 1 1-bit data in step S5. Normalize to value. That is, for example, the normalization unit 3 calculates representative values for a plurality of regions from the region extraction unit 2 in the same manner as the threshold calculation unit 7 described above. Note that the normalization unit 3 and the threshold value calculation unit 7 can calculate the representative value by different methods. The calculation of the representative value by a method other than the method described above will be described later.
[0028]
When the normalization unit 3 obtains a representative value for each of a plurality of regions, the normalization unit 3 compares the representative value with a predetermined threshold supplied from the threshold calculation unit 7 to obtain 1-bit normalized data. That is, the normalization unit 3 compares the representative value of each of the plurality of regions with the threshold value ε, and when the representative value is larger than the threshold value ε (in the above case), the representative value is set to 0 or 1 For example, if the representative value is 1 or less (less than) the threshold value ε, the representative value is 0. Then, the normalization unit 3 outputs 1-bit normalization data for each of the plurality of regions to the position detection unit 4.
[0029]
In step S <b> 6, the position detection unit 4 detects the position of a predetermined pattern in the processing target image based on the normalization data for each of the plurality of regions supplied from the normalization unit 3. That is, the position detection unit 4 reads a predetermined detection pattern from the detection pattern storage unit 8 based on the selection information from the template selection unit 6. Specifically, for example, when the position detection unit 4 receives selection information representing a template as shown in FIG. 3B, for example, as shown in FIG. A detection pattern having nine windows having the same pattern as the template, the value assigned to the central window being 0, and the value assigned to the other windows being 1 is read.
[0030]
Further, the position detection unit 4 compares the normalized data for each of the plurality of regions with the value assigned to the corresponding window of the detection pattern. Then, the position detection unit 4 outputs the position information from the region extraction unit 4 as a position detection result only when they all match. Otherwise, the position detection unit 4 outputs the position information from the region extraction unit 4. Discard.
[0031]
Here, in the image of FIG. 3A, as described above, the luminance of the portion where the circle is displayed is lower than that of the other portion. Therefore, only when the position of the central window among the nine windows of the template shown in FIG. 3B matches the position of the small circle displayed in the image of FIG. The normalized data of the nine regions extracted according to the template of B) is generally 1 for the region corresponding to the central window of the template, as shown in FIG. Since 0 becomes 0, the position of the template when the normalized data becomes such a position coincides with the position of a small circle displayed in the image of FIG.
[0032]
After the process of step S6, the process proceeds to step S7, and it is determined whether or not the processes of steps S3 to S6 have been performed for all the regions of the image stored in the frame memory 1. If it is determined in step S7 that all the regions of the image stored in the frame memory 1 have not been processed yet, for example, as indicated by an arrow in FIG. From the position, for example, one pixel is moved in the line scan order, and the process returns to step S3, and the same processing is repeated thereafter.
[0033]
If it is determined in step S7 that processing has been performed for all regions of the image stored in the frame memory 1, the processing is terminated.
[0034]
Next, in FIG. 3, the position of a small circle is detected from an image in which two kinds of large and small circles are displayed. In addition, for example, from an image in which figures of different shapes and sizes are displayed, It is also possible to detect the position of a figure of a predetermined size.
[0035]
That is, for example, when only a small figure is detected from an image in which large and small circles, squares, octagons, or hearts are displayed as shown in FIG. 4A (the image of FIG. 3 (A), it is assumed that it is expressed in gray scale, and the portion where the graphic is displayed is lower in luminance than other portions), for example, FIG. As shown in B), a circular window that is slightly smaller than the small figure is placed at the center, and 8 windows of the same shape are arranged sparsely around the window placed at the center. Then, a template made into a square (regular tetragon) shape is used. Furthermore, also in this case, the template of FIG. 4B is moved according to the template while moving in the line scan order with respect to the image of FIG. 4A as indicated by an arrow in FIG. Pixel data of each of a plurality of areas is extracted to obtain normalized data. Then, the normalized data of each of the plurality of areas, that is, in this case, the normalized data of nine areas extracted according to the template of FIG. If the pattern coincides with the detection pattern in which 0 is assigned to the central window and 1 is assigned to the other window, the position of the template at that time is selected from the large and small figures displayed in the image shown in FIG. Will match the position of the small figure.
[0036]
Next, for example, from an image on which a human face is displayed as shown in FIG. 5A, a relatively small circular shape such as a pupil, a nostril, a lightly opened mouth, etc. is relatively low in brightness. In the case of detecting only a portion (assuming that the image in FIG. 5A is also expressed in gray scale as in FIG. 3A), for example, as shown in FIG. In addition, a template is used in which a small circular window is arranged at the center, and furthermore, eight windows having the same shape are arranged sparsely around the window arranged at the center to form a diamond shape. Furthermore, also in this case, the template of FIG. 5B is moved according to the template while moving in the line scan order with respect to the image of FIG. 5A as indicated by an arrow in FIG. Pixel data of each of a plurality of areas is extracted to obtain normalized data. Then, the normalized data of each of the plurality of areas, that is, in this case, the normalized data of nine areas extracted according to the template of FIG. 5B is used for detection as shown in FIG. If it matches the pattern, the position of the template at that time is close to a relatively small circular shape such as the pupil, nostril, lightly opened mouth, etc. displayed in the image shown in FIG. This corresponds to the position of the low luminance part.
[0037]
Next, for example, in the case where only the pupil is detected from an image displaying a human face as shown in FIG. 6A, which is similar to FIG. 5A, the image of FIG. For example, as shown in FIG. 6B, a circular window having the same size as the size of the pupil is assumed. Use the placed template. Furthermore, also in this case, the template of FIG. 6B is moved in the line scan order with respect to the image of FIG. 6A as indicated by an arrow in FIG. Pixel data of each of a plurality of areas is extracted to obtain normalized data. If the normalized data of each of the plurality of regions, that is, in this case, the normalized data extracted according to the template of FIG. 6B matches the detection pattern as shown in FIG. The position of the template at that time coincides with the position of the pupil displayed in the image shown in FIG. That is, in an image on which a person is displayed, the portion under the person's eyes is relatively high in luminance and stable (the luminance change is small) in many cases. Therefore, by using a template as shown in FIG. 6 (B) and a detection pattern as shown in FIG. 6 (D), the position of the pupil can be relatively accurately determined from an image on which a person is displayed. Can be detected.
[0038]
Next, FIG. 7 shows a camera system to which the present invention is applied (a system refers to a logical collection of a plurality of devices, regardless of whether or not each component device is in the same casing). 2 shows a configuration example of an embodiment. In this camera system, the position of, for example, the pupil of the person who is the subject is always displayed at a predetermined position in the frame.
[0039]
That is, in the video camera 11 (imaging means), light from a person who is a subject is photoelectrically converted into an image signal on which the person is displayed. This image signal is supplied to an A / D (Analog / Digital) converter 12, converted into digital image data by A / D conversion, and supplied to a position detection device 13. The position detection device 13 is configured in the same manner as the position detection device shown in FIG. 1, detects the position of the pupil of the person who is the subject from the image data from the A / D converter 12, and represents the position. The position information is supplied to the control unit 14. The control unit 14 (control means) controls the pan / tilt mechanism 15 so that the position information from the position detection device 13 represents a predetermined position. In the pan / tilt mechanism 15, under the control of the control unit 14, the video camera 11 is panned or tilted so that the pupil of the person displayed in the image output from the video camera 11 matches a predetermined position. .
[0040]
As described above, pixel data for each of a plurality of regions is extracted from an image according to a plurality of windows formed in the template and used for position detection. Therefore, pixel data between the windows of the template is used. Is a dead zone in terms of position detection (no longer affecting position detection), which reduces the adverse effects of individual differences, posture differences, lighting changes, etc., and is highly resistant to such factors. It becomes possible. Further, since the dead zone pixel data is not used for the calculation, the calculation time can be shortened.
[0041]
In addition, for each of a plurality of regions, a representative value that takes a continuous value is converted into normalized data that is quantized by a threshold value ε, and further, a position is detected based on the pattern of the normalized data. As a result, it is possible to detect a position highly resistant to such a factor while reducing adverse effects caused by individual differences, posture differences, lighting changes, and the like.
[0042]
The template pattern (the window pattern formed in the template) is not limited to the above-described one. Here, individual differences and differences in posture can be absorbed by adjusting the interval between windows formed in the template (to make it less susceptible to individual differences and differences in posture). .
[0043]
In the present embodiment, the position of an object such as a pupil is detected. However, if the position of the object can be detected, for example, an area where the object is displayed can be displayed by performing an expansion process or the like. Can be extracted. Therefore, the present invention can be applied not only to the position of the object displayed in the image but also to the detection of a region. Furthermore, the present invention can be applied to the case of detecting the position of a predetermined pattern such as a pattern displayed on the image in addition to the position of the object displayed on the image.
[0044]
In this embodiment, the average value of a plurality of pixel data in a certain area extracted according to the template is set as the representative value of the area. The central value of the distribution range (when a plurality of pixel data are arranged in ascending order, for example, the value located in the center (if there are two central values, one of them) On the other hand, an average value or the like)) or a maximum value or a minimum value of a plurality of pixel data can be used as a representative value.
[0045]
Furthermore, in this embodiment, the maximum value or the minimum value of the representative values of each of the plurality of regions is set to V_maxOr V_minWhen (V_max-V_min) / 2 is used as the predetermined threshold ε, but for example, (V_max-V_min) / 2, the maximum representative value V of each of the plurality of regions_maxAnd the minimum value V_minA value proportional to the difference between the values and an average value V of representative values of each of a plurality of areas_aveOr the central value V of the distribution range of representative values of each of the plurality of regions_medA value proportional to can be used as the threshold value. Specifically, for example, 3 × (V_max-V_min) / 4 or V_ave/ 2, V_medN × (V_max-V_min) / 2^NN × V_ave/ 2^N, N × V^med/ 2^NCan be used as a threshold (where M and N are positive integers).
[0046]
In the present embodiment, since the image to be processed is expressed in grayscale, the luminance value is used as the pixel data. In addition, the color difference signal is used as the pixel data. It is also possible to use the color signal.
[0047]
Furthermore, in the present embodiment, the templates are moved in the line scan order. However, the way of moving the templates is not limited to the line scan order.
[0048]
In the present embodiment, a template for extracting pixel data distributed in the spatial direction of one frame is used. However, for example, pixel data distributed in the time direction is also extracted as the template. It is possible to use anything. That is, for example, as shown in FIG. 8, a template for extracting pixel data from the N−1th frame and the N + 1th frame before and after the Nth frame as a frame of interest as the Nth frame. Can be used.
[0049]
【The invention's effect】
  According to the first and second aspects of the present invention, for example, it is possible to reduce the influence of individual differences, image inclination, illumination change, etc., and to detect an object with higher accuracy. .In particular, the position of the pupil can be detected with relatively high accuracy from an image on which a person is displayed.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of an embodiment of a position detection device to which the present invention is applied.
FIG. 2 is a flowchart for explaining processing of the position detection device of FIG. 1;
FIG. 3 is a diagram for explaining processing of the position detection device of FIG. 1;
4 is a diagram for explaining processing of the position detection device in FIG. 1; FIG.
FIG. 5 is a diagram for explaining processing of the position detection device of FIG. 1;
6 is a diagram for explaining processing of the position detection device in FIG. 1; FIG.
FIG. 7 is a block diagram illustrating a configuration example of an embodiment of a camera system to which the present invention is applied.
FIG. 8 is a diagram showing templates distributed in both space and time directions;
[Explanation of symbols]
1 frame memory, 2 region extraction unit (extraction means), 3 normalization unit (normalization means), 4 position detection unit (detection means), 5 template storage unit, 6 template selection unit, 7 threshold calculation unit (threshold calculation unit) ), 8 pattern storage for detection, 11 video camera (imaging means), 12 A / D converter, 13 position detection device, 14 control section (control means), 15 pan / tilt mechanism

Claims

An image processing apparatus that performs processing for detecting a position of a predetermined pattern in an image,
Template storage means for storing a plurality of templates in which a plurality of windows for extracting pixel data are partially arranged;
Extracting means for extracting pixel data of a plurality of regions corresponding to the plurality of windows from the image according to a template selected from the plurality of templates;
Normalizing means for normalizing pixel data of each of the plurality of regions and outputting normalized data for each of the plurality of regions;
Detecting means for detecting the position of the predetermined pattern based on normalized data for each of the plurality of regions,
The plurality of windows of the template are arranged in a shape corresponding to a pattern detected by the template ,
In the template for detecting the position of the pupil, the window is
The position of the pupil in the shape of the eye,
A plurality of positions on the upper contour of the eye shape;
A plurality of positions on a plurality of line segments extending in the vertical direction, arranged in parallel to the lower side of the eye shape;
An image processing apparatus arranged in

The image processing apparatus according to claim 1, wherein the extraction unit extracts a luminance value of a pixel constituting each of the plurality of regions from the image.

The image processing apparatus according to claim 1, wherein each of the plurality of regions includes a plurality of pixel data.

The image processing apparatus according to claim 3 , wherein the normalizing unit obtains a representative value representing each of the plurality of regions from a plurality of pixel data of each of the plurality of regions.

The normalizing unit is configured to calculate an average value of a plurality of pixel data in each of the plurality of regions, a central value of a distribution range of the plurality of pixel data in each of the plurality of regions, or a plurality of pixel data in each of the plurality of regions. The image processing apparatus according to claim 4 , wherein a maximum value or a minimum value is obtained as a representative value for each of the plurality of regions.

5. The image processing according to claim 4 , wherein the normalization unit converts the representative value of each of the plurality of regions into a 1-bit value by comparing the representative value of each of the plurality of regions with a predetermined threshold value. apparatus.

The image processing apparatus according to claim 6 , further comprising a threshold value calculation unit that obtains the predetermined threshold value.

The threshold value calculation means is configured such that a difference between a maximum value and a minimum value of representative values of each of the plurality of regions, an average value of representative values of the plurality of regions, or a center of a distribution range of representative values of the plurality of regions. The image processing apparatus according to claim 7 , wherein a value proportional to the value of is obtained as the predetermined threshold value.

An image processing method for performing processing for detecting a position of a predetermined pattern in an image,
In accordance with a template selected from a plurality of templates stored in a template storage means for storing a plurality of templates in which a plurality of windows for extracting pixel data are arranged in part, from the image, the plurality of windows An extraction step of extracting pixel data of a plurality of regions corresponding to the window;
Normalizing the pixel data of each of the plurality of regions, and outputting normalized data for each of the plurality of regions;
Detecting a position of the predetermined pattern based on normalized data for each of the plurality of regions, and
The plurality of windows of the template are arranged in a shape corresponding to a pattern detected by the template ,
In the template for detecting the position of the pupil, the window is
The position of the pupil in the shape of the eye,
A plurality of positions on the upper contour of the eye shape;
A plurality of positions on a plurality of line segments extending in the vertical direction, arranged in parallel to the lower side of the eye shape;
Image processing method arranged in

An image processing apparatus that performs processing for detecting a position of a predetermined pattern in an image,
Imaging means for photoelectrically converting the input light and outputting the image;
Template storage means for storing a plurality of templates in which a plurality of windows for extracting pixel data are partially arranged;
Extracting means for extracting pixel data of a plurality of regions corresponding to the plurality of windows from the image according to a template selected from the plurality of templates;
Normalizing means for normalizing pixel data of each of the plurality of regions and outputting normalized data for each of the plurality of regions;
Detecting means for detecting the position of the predetermined pattern based on normalized data for each of the plurality of regions,
The plurality of windows of the template are arranged in a shape corresponding to a pattern detected by the template ,
In the template for detecting the position of the pupil, the window is
The position of the pupil in the shape of the eye,
A plurality of positions on the upper contour of the eye shape;
A plurality of positions on a plurality of line segments extending in the vertical direction, arranged in parallel to the lower side of the eye shape;
An image processing apparatus arranged in

The predetermined pattern on the basis of the detection result of the position of the image processing apparatus according to claim 1 0, further comprising a control means for controlling a predetermined device.