JP2004038531A

JP2004038531A - Method and device for detecting position of object

Info

Publication number: JP2004038531A
Application number: JP2002194288A
Authority: JP
Inventors: Kenji Kondo; 近藤　堅司; Takeo Azuma; 吾妻　健夫; Kenya Uomori; 魚森　謙也
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-07-03
Filing date: 2002-07-03
Publication date: 2004-02-05

Abstract

<P>PROBLEM TO BE SOLVED: To detect the position of an object more precisely than usual by using a luminance image in combination with a distance image. <P>SOLUTION: The distance image of a person's face is obtained by a range finder and so on (S231), and the position of a nose is found in the distance image (S232). The luminance image, each of whose coordinates is associated with the distance image, is obtained (S233), and then the positions of eyes are found in the luminance image (S234). Then, the positions of the eyes are finally determined by referring to relative positional relations between the eyes and the nose (S235). <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、画像から、人間の目のような特定の物体の位置を検出する技術に属する。
【０００２】
【従来の技術】
近年、様々な技術分野において、人間の目などの顔部品の位置を精度良く検出できることへのニーズが、高まっている。その代表的なものに、生体情報を用いた個人認証（バイオメトリクス個人認証）技術がある。
【０００３】
例えば、虹彩認証では、２カメラ構成の虹彩認識システムが提案されている（特開平１０−４０３８６号公報参照）。このシステムでは、第１のカメラで顔全体を撮影し、撮影された顔画像を解析して目の位置を検出する。次に、第２のカメラを機械的に制御して、検出された目の位置に向けて、虹彩の拡大画像を得る。このシステムでは、被認証者は単に装置の前に立つだけでよいので、認証動作を楽に行うことができる。ただし、目の位置を高精度に検出できることが必要条件となる。
【０００４】
また、顔認証では、通常、顔を含む画像中から目を含む顔の特徴点を検出し、その特徴点を基に顔位置と大きさを正規化した後、顔認証を行っている。このため、顔認証を精度良く行うためには、目を含む顔の特徴点の位置を高精度に検出することが必要条件となる。
【０００５】
個人認証以外の分野でも、例えば、視線検出や居眠り検知などの用途では、目位置を検出する必要がある。
【０００６】
ところで、目位置検出方式の多くは、局所的な目のテンプレートを用いてマッチングを行っているため、目に似た形状の眉や、眼鏡のフレーム、口等を誤って目と認識してしまう可能性が高いという問題がある。
【０００７】
この問題に対して、まず鼻の位置を検出し、鼻の位置から目の探索領域を決定し、その探索領域内から目の位置を検出するという方式が提案されている。すなわち、探索領域を限定することによって、目の誤検出を減らすというものである。例えば、特開２０００−１０５８２９号公報では、鼻画像のテンプレートを用いてマッチングを行い鼻の位置を検出し、鼻の位置よりも上を目の探索領域として設定している。また特開２０００−１９３４２０号公報では、顔の斜め方向から照明を照射して顔画像を撮影し、顔画像中からエッジ情報を利用して鼻筋を検出する。そして鼻筋の上方に、目の探索領域を設定している。
【０００８】
【発明が解決しようとする課題】
ところが、これら従来の方法では、輝度画像における鼻の画像は、照明の方向によって大きく変動するため、鼻の位置を正確に検出するためには、一定の照明環境を必要としていた。このため、精度の高い検出を実現するためには、システムの使用条件が大きく制限されてしまうことになり、あまり汎用的ではない。
【０００９】
前記の問題に鑑み、本発明は、画像から、物体の位置を、従来よりも精度良く、検出できるようにすることを課題とする。
【００１０】
【発明を解決するための手段】
前記の課題を解決するために、請求項１の発明が講じた解決手段は、物体の位置検出方法として、被写体について距離画像を得るステップと、前記被写体について、前記距離画像と各座標が対応付けられた輝度画像を得るステップと、前記距離画像において第１の物体の位置を探索するとともに、前記輝度画像において第２の物体の位置を探索するステップとを備え、前記探索ステップにおいて、前記第１および第２の物体の相対的な位置関係の情報を参照して、前記第１および第２の物体の少なくともいずれか一方の２次元位置を決定するものである。
【００１１】
請求項１の発明によると、距離画像において第１の物体が探索されるとともに、輝度画像において第２の物体が探索される。そして、その両者の相対的な位置関係の情報を参照して、第１および第２の物体の少なくともいずれか一方の位置が最終的に決定される。すなわち、立体形状に特徴があり距離画像で探索が容易な第１の物体と、輝度に特徴があり輝度画像で探索が容易な第２の物体とについて、相対的な位置関係が既知であるとき、高精度に、その位置を検出することができる。
【００１２】
請求項２の発明では、前記請求項１における探索ステップにおいて、前記第１の物体の探索の結果得られた位置候補に応じて前記輝度画像における探索領域を絞り込み、この絞り込まれた探索領域において前記第２の物体の位置を探索するものとする。
【００１３】
請求項２の発明によると、第１の物体の位置候補に応じて、輝度画像における第２の物体の探索領域が絞り込まれるので、輝度画像の探索における処理量を削減することができるとともに、誤検出を減らすことができる。
【００１４】
請求項３の発明では、前記請求項１において、２次元位置が決定された前記第１および第２の物体の少なくともいずれか一方について、前記距離画像と決定された２次元位置とを用いて、３次元位置を決定するステップを備えたものとする。
【００１５】
請求項４の発明では、前記請求項１において、前記被写体は人物であり、前記第１の物体は鼻であり、前記第２の物体は目であるものとする。
【００１６】
請求項５の発明では、前記請求項４において、目の距離値を、距離画像における頬周辺の距離値を用いて決定するステップを備えたものとする。
【００１７】
請求項６の発明では、前記請求項１の探索ステップにおいて、前記第１の物体の位置の探索を、前記距離画像と前記第１の物体の距離値から構成された距離テンプレートとのマッチングによって行うものとする。
【００１８】
請求項７の発明では、前記請求項６において、前記距離画像から前記被写体の概略距離を算出し、この概略距離と、前記距離画像を撮影したカメラの画角と、前記第１の物体の推定サイズとを基にして、前記距離テンプレートのサイズを決定するステップを備えたものとする。
【００１９】
また、請求項８の発明が講じた解決手段は、物体の位置検出装置として、被写体について距離画像を取得する距離画像取得部と、前記被写体について前記距離画像と各座標が対応付けられた輝度画像を取得する輝度画像取得部と、前記距離画像において第１の物体の位置を探索するとともに、前記輝度画像において第２の物体の位置を探索する探索部とを備え、前記探索部は、前記第１および第２の物体の相対的な位置関係の情報を参照して、前記第１および第２の物体の少なくともいずれか一方の２次元位置を決定するものである。
【００２０】
請求項８の発明によると、距離画像において第１の物体が探索されるとともに、輝度画像において第２の物体が探索される。そして、その両者の相対的な位置関係の情報を参照して、第１および第２の物体の少なくともいずれか一方の位置が最終的に決定される。すなわち、立体形状に特徴があり距離画像で探索が容易な第１の物体と、輝度に特徴があり輝度画像で探索が容易な第２の物体とについて、相対的な位置関係が既知であるとき、高精度に、その位置を検出することができる。
【００２１】
請求項９の発明では、前記請求項８における探索部は、前記第１の物体の探索の結果得られた位置候補に応じて前記輝度画像における探索領域を絞り込み、この絞り込まれた探索領域において前記第２の物体の位置を探索するものとする。
【００２２】
請求項１０の発明では、前記請求項８において、前記距離画像取得部は、レンジファインダを備えており、前記レンジファインダは、複数の光源が配列された光源アレイ部と、前記光源アレイ部の各光源の発光様態を制御することによって、前記光源アレイ部から少なくとも２種類の光パタンを投射させる光源制御部と、前記光源アレイ部から投射された各光パタンに対する前記被写体からの反射光をそれぞれ原輝度画像として撮影するカメラと、前記カメラによって撮像された原輝度画像から前記距離画像を得る画像処理部とを備えたものとし、前記輝度画像取得部は、前記レンジファインダの前記カメラによって撮像された原輝度画像の少なくとも１つから前記輝度画像を得るものとする。
【００２３】
また、請求項１１の発明が講じた解決手段は、物体の位置検出方法として、被写体について距離画像を得るステップと、前記距離画像において物体の位置を探索するステップとを備え、前記探索ステップにおいて、前記物体の位置の探索を、前記距離画像と前記物体の距離値から構成された距離テンプレートとのマッチングによって行うものである。
【００２４】
請求項１１の発明によると、距離画像における物体の位置検出を、距離テンプレートを用いたマッチングによって行うので、従来よりも検出精度を向上させることができる。
【００２５】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照して説明する。
【００２６】
図１は本発明の一実施形態に係る虹彩認証装置の正面図、図２は図１の虹彩認証装置１０によって、その前面に位置する被写体としての人物ＰＰの虹彩認証を行う様子を示す図である。本実施形態では、虹彩認証装置１０内のレンジファインダ装置によって得た人物ＰＰの顔の距離画像から、第１の物体としての鼻ＮＳの位置を探索し、この鼻ＮＳの位置を参照して第２の物体としての目ＥＹの３次元位置を求めて、虹彩認証を行うものとする。
【００２７】
図１および図２に示すように、虹彩認証装置１０は、広角カメラ用照明１０１、広角カメラ１０２および虹彩撮影ユニット１０５を備えている。虹彩撮影ユニット１０５は可動であり、内部に、望遠カメラ用照明１０３および望遠カメラ１０４が固定されている。虹彩撮影ユニット１０５は、望遠カメラ用照明１０３と望遠カメラ１０４の相対位置関係を変えることなく、照明（撮影）する方向を変化させることができる。
【００２８】
本実施形態では、広角カメラ用照明１０１と広角カメラ１０２、およびこれらの制御に係る部分が、被写体の３次元位置を測定するレンジファインダ１００として機能する。図３はこのレンジファインダ１００を含む、本実施形態に係る物体位置検出装置の構成を機能的に示すブロック図である。本実施形態に係る距離画像取得部は、照明１０１，カメラ１０２，光源制御部１０６および画像処理部１０７を有するレンジファインダ１００を備えている。また、画像処理部１０７は、カメラ１０２によって撮像された原輝度画像から、距離画像および輝度画像を得るものであり、距離画像取得部を構成するとともに、輝度画像取得部を構成する。探索部１０８は、距離画像から鼻ＮＳの位置を探索するとともに、輝度画像から目ＥＹの位置を探索し、鼻ＮＳと目ＥＹの相対的な位置関係の情報を参照して、例えば、鼻ＮＳと目ＥＹの相対的な位置関係の事前知識を利用して、目ＥＹの位置を最終的に決定する。
【００２９】
広角カメラ用照明１０１は、光源としてのＬＥＤ１１が複数個配列された光源アレイ部によって構成されている。図１において、各ＬＥＤ１１は○印で表されており、広角カメラ用照明１０１は１１×１１個のＬＥＤ１１によって構成されている。また、広角カメラ１０２はＶＧＡ画素数（横４８０画素×縦６４０画素）を有しており、縦長の視野が得られるように設置される。この広角カメラ１０２によって、図４に示すような人物ＰＰの顔を含む上半身画像が撮影される。
【００３０】
光源制御部１０６は、広角カメラ用照明１０１に対して、ＬＥＤ毎に（例えばアレイの列または行毎に）、ＰＷＭ（Ｐｕｌｓｅ　Ｗｉｄｔｈ　Ｍｏｄｕｌａｔｉｏｎ）によって発光時間を制御することができる。広角カメラ１０２の露出時間内において、発光時間が長いほどのべ光量が大きくなるので、発光時間の制御によって、任意の光パタンの生成が可能となる。
【００３１】
ここでは、行（水平方向）毎に発光時間を制御するものとする。この場合には、光パタンにおいて、図２における垂直方向に変調がかかることになる。また光源制御部１０６は、カメラの露出タイミング（露出時間）に合わせて、２種類の光パタンを切り替える。図５は発光時間の制御によって生成した２種類の光パタンを示す図である。同図中、（ａ）はＬＥＤの発光時間を行番号に従って単調に増加させる光パタンＡであり、（ｂ）はＬＥＤの発光時間を行番号に従って単調に減少させる光パタンＢである。光源制御部１０６は照明１０１から図５のような２種類のパタンを照射させる。
【００３２】
図１および図２に示す虹彩認証装置１０の動作について、図６〜図８のフローチャートに従って、説明する。
【００３３】
まずステップＳ１０において、広角カメラ用照明１０１は図５に示すような２種類の光パタンを照射し、広角カメラ１０２はこの２種類の光パタンに対する反射光画像を原輝度画像としてそれぞれ撮影する。図９は広角カメラ１０２の露出タイミングと広角カメラ用照明１０１から投射される光パタンとの関係を示す図である。画像処理部１０７は、照明１０１から光パタンＡ，Ｂがそれぞれ投射されたときの２枚の原輝度画像から、距離画像を生成する。光源アレイを用いたレンジファインダの距離計測の原理については、特願２００１−２８６６４６に示されており、ここでは説明を省略する。
【００３４】
次にステップＳ２０において、目の３次元位置を計算する。図７はステップＳ２０の詳細を示す。まずステップＳ２１において、人物の奥行き値を推定する。具体的には、画像処理部１０７は、横４８０画素×縦６４０画素の距離画像について、全画素の距離値のヒストグラムを求め、このヒストグラムから、人物ＰＰをカメラ１０２の光軸と垂直な平面と見做した場合の、カメラ１０２からの距離を推定する。
【００３５】
図１０は人物ＰＰがカメラ１０２から約６０ｃｍの距離にいる場合の距離画像から得た距離値ヒストグラムの一例である。図１０において、縦軸は頻度（画素数）、横軸は距離値であり、５ｃｍ毎に階級化している。図１０のデータでは、６０〜６５ｃｍの階級において頻度が最大になっている。
【００３６】
次に、距離画像について、距離値ヒストグラムを参照して二値化を行う。ここで、頻度が最大の階級の下限値と上限値の平均値を、代表値ｄｍとする。図１０の例では、ｄｍ＝６２．５（ｃｍ）となる。そして、距離画像の各座標（距離値ｄ）について、
ｄｍ−ａ＜ｄ＜ｄｍ＋ａ　のとき　“１”
それ以外（計測不能点含む）は、　“０”
のように二値化を行う。ａは所定の値であり、ここではａ＝１０ｃｍとする。
【００３７】
図１１は二値化された距離画像の例である。図１１では、二値化の結果‘１’の領域を白、‘０’の領域を黒で表現している。なお、頭髪、黒色の眼鏡フレーム、目、眉など反射率の小さい領域や、首、衣服のしわ部など影が生じる領域では、距離がうまく測定できないため、二値化の際に背景領域すなわち‘０’に決定されている。
【００３８】
本実施形態では、奥行き値を用いて人物領域の切り出しを行うため、カメラの視野内に複数の人物が映っていても、このような簡易な方法で、距離値を基に人物を抽出することができる。もちろん、他の方法、例えば輝度画像を用いて人物領域の切り出しを行うような方法を採用してもよい。
【００３９】
次にステップＳ２２において、人物の顔領域を推定する。人物の顔領域の推定には、図１１に示したような奥行き二値化画像を用いる。ここで、顔領域は、少なくとも両目と鼻を含む領域と定義する。
【００４０】
まず図１２のように、二値距離データを水平方向に射影し、ヒストグラムＨ１（ｙ）を作成する。そして、ヒストグラムＨ１（ｙ）を用いて、顔領域の上下範囲を決定する。
【００４１】
ここで、図１０のヒストグラムからカメラと人物との間の距離は６２．５（ｃｍ）と推定された。そして、顔の大きさは人物によらず一定であると仮定し、カメラの画角を既知とすれば、画像内における顔の大きさは推定可能である。推定した顔幅をｗ１とすると、顔の上側の境界ｙｔは、ヒストグラムＨ１（ｙ）＞ｗ１×ｂ（例えばｂ＝０．１）を満たす最小のｙと決定することができる。そして、顔の上側の境界ｙｔから、少なくとも鼻全体までが範囲に入るように、顔幅ｗ１の定数倍の距離だけ下方に移動した位置を、顔の下側の境界ｙｂとして決定する。すなわち、
ｙｂ＝ｙｔ＋ｗ１×ｃ　（ｃ＝１．２）
本実施形態では、照明の反射光を用いて距離を測定しているため、髪の毛等の反射率の小さい領域は計測できない。よって、顔の上側の境界ｙｔは、額が広いか狭いか、または、髪の毛が額にかかっているか否かによって、大きく異なる。そこでここでは、定数ｃは頭頂が境界ｙｔと決定された場合でも、顔領域に鼻が含まれるような値に設定した。決定された顔の上下位置を図１３に示す。
【００４２】
なお、目の位置を検出して虹彩認証を行うためには、顔領域は、少なくとも目の位置を含むように設定されればよい。これに対して、本実施形態において、顔領域を、目だけでなく鼻も含まれるように決定したのは、距離画像を用いて鼻位置を検出すれば、眼鼻の位置関係を用いて眼位置の検出精度を向上させることができるからである。また、鼻位置を利用して眼位置の探索領域を絞り込むことができるからである。
【００４３】
次に、顔の左右の位置を決定する。図１４のように、上下位置が決定された顔領域の二値画像を、垂直方向に射影してヒストグラムＨ２（ｘ）を作成する。そして、
Ｈ２（ｘ）＞（ｙｂ−ｙｔ＋１）／３
となる最小のｘを顔の左位置ｘｌとし、
Ｈ２（ｘ）＞（ｙｂ−ｙｔ＋１）／３
を満たす最大のｘを顔の右位置ｘｒとする。このようにして決定された顔領域を図１５に示す。
【００４４】
なお、ここでは、距離画像のみを利用して簡易に顔領域を検出する方法を説明したが、もちろん他の方法でもよい。例えば、後述するような方法によって輝度画像を作成し、輝度画像における顔の平均画像を顔テンプレートとして、輝度画像中から顔領域を探索してもよい。この場合は、顔のテンプレートサイズは、人物までの奥行き値から決定できる。また、輝度画像において、図１１のような距離画像から算出した人物領域のみを探索すればよいため、探索に要する処理量を低減することが可能である。
【００４５】
次にステップＳ２３において、ステップＳ２２で推定した顔領域内から目の３次元位置を推定する。図８はこのステップＳ２３の詳細を示す。
【００４６】
まずステップＳ２３１において、画像処理部１０７は、顔領域の距離画像を取得する。ここでは、図１５に示す距離画像から、顔領域（幅ｆｗ（ｘｌ〜ｘｒ），高さｆｈ（ｙｔ〜ｙｂ））を切り出すものとする。
【００４７】
次にステップＳ２３２において、探索部１０８は、距離画像から鼻位置を探索する。
【００４８】
ここでは、鼻位置の探索のために、距離テンプレートとしての鼻形状のテンプレートを利用する。図１６は鼻形状のテンプレートの一例である。図１６の鼻テンプレートは、正面から測定したＮ（Ｎ＝２０）人の顔の３次元データから鼻領域を手動で切り出し、縦横サイズの正規化および位置合わせを行った後に平均化したものである。図１６では距離値がグレーの濃淡によって表現されており、グレーが濃く輝度が低いほど、カメラに近いことを、グレーがうすく輝度が高いほど、カメラから遠いことを表している。図１６の鼻テンプレートは底辺Ｔｘ，高さＴｙの台形の形状であり、その外接矩形における左上および右上部分の領域は、距離画像が眼鏡等の影響を受けるおそれがあるので、テンプレートとしては採用していない。なお、テンプレートを作成する際の３次元データは、レンジファインダによって得る必要は必ずしもなく、別の手法によって得てもかまわない。
【００４９】
そして、顔領域内で鼻探索領域を決定する。図１７のように、顔領域の水平方向中央に帯状の領域を設定する。ここで、ｒ１の値は実験的に求められ、ここでは０．５とする。本実施形態では、被認証者はカメラのほぼ正面に立ち、カメラに正対することを想定している。この場合は、鼻は必ず顔領域の中心線付近に位置するため、図１７のように帯状の鼻探索領域を設定することによって、探索時間を縮小することができる。もちろん、例えば斜め方向から顔を撮影するような用途では、顔領域の全体から鼻を探索すればよい。
【００５０】
次に顔領域の距離画像から、顔の奥行き値を推定し、推定した奥行き値から鼻テンプレートのサイズＴｘ，Ｔｙを変更する。
【００５１】
まず、ステップＳ２１と同様に距離値のヒストグラムを算出し、最も頻度の高い階級の下限値と上限値との平均値を、顔の奥行き値の推定値とする。ステップＳ２１では、顔、首、体など距離値が異なる領域全体を利用して距離値を推定していたが、ここでは、顔領域の奥行き値を正確に推定するために、顔領域の距離値のみを用い、ヒストグラムの階級の刻み幅もより小さい値にして、再度、距離値の推定を行う。第１の物体の推定サイズとしての鼻の大きさを、人物によらず一定であると仮定して、実世界での大きさに関する事前知識から推定し、距離画像を撮影したカメラの画角を既知とすれば、被写体の概略距離としての顔の奥行き値から、画像内における鼻の大きさが推定可能であり、これにより鼻テンプレートのサイズを決定することができる。推定した顔幅に合わせて、鼻テンプレートのサイズＴｘ，Ｔｙを変更する。そして、距離値を補間することによって、新たな鼻テンプレートを作成する。補間の方法は、線形補間、バイキュービック法、ニアレストネイバー法等を用いることができる。
【００５２】
そして図１７の鼻探索領域内で、サイズを変更した後のテンプレートをマッチングさせ、マッチングスコアが上位の位置を鼻位置の候補として、マッチングスコアとともに保存する。ここでは、マッチングスコアが所定の閾値以上のものを全て候補として残し、その総数をＰ個とする。ここで、鼻位置とは、テンプレートがマッチした領域における所定の座標、例えば、左上の点、中心点、鼻の最も凸である位置などを用いることができる。また、マッチングスコアの指標としては正規化相関を用いたが、例えばユークリッド距離等、他の距離尺度でもかまわない。
【００５３】
ここで、通常のテンプレートマッチングでは、画像中に含まれた探索対象の大きさが分からないので、サイズが異なる複数のテンプレートを用いてマッチングを行い、その中でマッチングスコアが最良のものを検出結果とすることが多い。しかしながら、複数のテンプレートを用いる場合は処理時間が増加してしまう。本実施形態では、距離画像から対象物までの概略距離が得られるため、画像内の対象物の大きさを推定可能である。よって、単一のテンプレートで高速にマッチングを行うことができる。
【００５４】
もちろん、距離が分からない場合でも、図１５のようにニ値化距離画像から検出した顔領域の幅から、テンプレートのサイズを決定することも可能である。ところが、本実施形態のように反射光の強度を用いて距離を測定する場合には、髪の毛の影響を受けて、正確な顔幅を必ずしも検出することができない。すなわち、黒い髪の毛のように反射率が小さい領域の距離は測定できないため、女性に多くあるように、頬に髪の毛がかかっている人物の場合は、検出される顔幅は実際よりも小さくなってしまう。一方、金髪や茶髪のように、反射率が比較的高い髪の場合は、髪の毛の領域も距離が測定できるため、検出される顔幅が実際よりも大きくなってしまう。これに対して、距離値からテンプレートのサイズを決定することによって、髪の毛に左右されないで、安定してテンプレートのサイズを決定することができる。
【００５５】
なお、ここでは、被認証者はカメラのほぼ正面に立ち、カメラに正対することを想定しているので、単一形状の鼻テンプレートを用いた。これに対して、斜め方向から顔を撮影したり、顔が上下に傾いているような場合には、鼻テンプレートを回転させて様々な姿勢に対応したものを予め作成しておき、マッチングに用いればよい。または、顔の距離画像および輝度画像から顔の３次元姿勢を推定し、距離データを正面から見た状態に回転補正し、補正後の距離画像を用いて、鼻テンプレートのマッチングを行えばよい。
【００５６】
また、鼻位置の検索は、鼻テンプレートマッチング以外の方法によって、行ってもよい。例えば、顔領域中で最もカメラに距離が近い領域を鼻の最凸部とみなす、等である。しかしながら、本実施形態では、反射光の強度を基にして距離を計測しているため、例えば、皮脂による顔の“てかり”や目の角膜反射などにより、距離画像において測距誤差が大きい領域が部分的に存在する。このような誤差を含む距離画像では、１点の距離値のみを参照して単純に鼻位置を決定することは、精度的に問題がある。このような場合には、距離テンプレートを用いたマッチングが有効であるといえる。
【００５７】
次にステップＳ２３３において、画像処理部１０７は、顔領域の輝度画像を取得する。すでに、ステップＳ１０において、距離画像を算出するために２種類の光パタンに対応した２つの原輝度画像が撮影されている。そこで、ここでは、図１５に示す顔領域について、２つの原輝度画像の各画素値を平均化したものを、輝度画像（幅ｆｗ（ｘｌ〜ｘｒ）、高さｆｈ（ｙｔ〜ｙｂ））として得る。
【００５８】
図５に示すように、照射される２種類の光パタンは、一方は行番号が増加するにつれて暗から明となり、もう一方は明から暗となる。したがって、２つの原輝度画像を平均化することによって、一様な照明を照射した場合に相当する輝度画像が得られる。
【００５９】
なお、２つの原輝度画像の画素値の平均値ではなく、画素値が大きい方の値を採用して輝度画像を合成してもよい。この場合は、全体的に輝度が高く、コントラストが明瞭な画像が得られるため、以降の目位置検出を行いやすいという利点がある。
【００６０】
なお、２つの原輝度画像のいずれか一方を輝度画像としてもよい。本実施形態のように、上方から照明を行い、図４のようにカメラ視野の上方に対象（人間の顔）が含まれることが分かっている場合は、上方から明→暗となる照明に対応した原輝度画像を単独で輝度画像として用いても、以降の目位置検出を、精度低下を招くことなく行うことができる。
【００６１】
次にステップＳ２３４において、探索部１０８は、輝度画像から目位置を探索する。まず、図１８に示すように、顔領域において、人物の右目（向かって左目）を探索するための目探索領域１と、人物の左目（向かって右目）を探索するための目探索領域２とを帯状に設定する。定数ｒ２，ｒ３は実験的に定められ、ここでは、ｒ２＝０．３５，ｒ３＝０．１とする。各目探索領域内から、目の検出を行う。本実施形態では、被認証者はカメラのほぼ正面に立ち、カメラに正対することを想定している。この場合は、目のｙ座標は必ず図１８の目探索領域１，２内にそれぞれ位置するので、探索時間を縮小することができる。もちろん、斜め方向から顔を撮影するような用途では、顔領域の全体から目を探索すればよい。
【００６２】
輝度画像からの目位置の探索は、任意の方法でよく、本実施形態では、特開２００２−５６３９４号公報に記載された方法を用いる。すなわち、図１９に示すような目テンプレートを予め作成しておく。図１９の各矢印は目の輝度エッジ上の点における２次元の輝度勾配ベクトルを表す。そして、輝度画像から、水平方向および垂直方向に差分をとった輝度勾配画像を作成し、この輝度勾配画像と図１９の目テンプレートとの相関値を計算し、相関値が高いものを目位置として探索する。本実施形態では、両目位置の組み合わせにおいて、相関値が最高のものを目位置として決定するのではなく、両目の相関値が所定の閾値以上のものを、全て両目位置の候補として、相関値とともに残す。その総数をＱ個とする。
【００６３】
ここで、目位置とは、目テンプレートがマッチした領域における所定の座標、例えば、左上の点、中心点、テンプレートにおける虹彩領域の中心点などを用ればよい。また、ステップＳ２３２と同様に、顔の奥行き値から目テンプレートのサイズを決定することによって、単一のテンプレートによって、精度のよい探索が可能になる。
【００６４】
次にステップＳ２３５において、探索部１０８は、３次元の目位置を決定する。ここまでの処理によって、ステップＳ２３２ではＰ個の鼻位置候補が得られており、ステップＳ２３４ではＱ個の目位置候補が得られている。ここでは、目と鼻の相対的な位置関係の情報を参照して、Ｐ×Ｑ個の鼻位置候補と目位置候補との組み合わせの中から、相対的な位置関係からみて適当でないものを除外し、残った組み合わせの中で、マッチングスコアを合成した結果が最高のものを、最終的な目および鼻の２次元位置として決定する。ここでは、鼻のマッチングスコアと目のマッチングスコアの平均値を、合成スコアとする。なお、合成スコアは、重み付きの線形和でもよく、その場合の重みは、鼻のスコアの信頼性と目のスコアの信頼性を考慮して決定すればよい。
【００６５】
目と鼻の相対的な位置関係のチェックは、例えば次のような方法によって行う。すなわち図２０に示すように、右目位置候補と左目位置候補とを結ぶ線分に対して垂直二等分線を引き、この垂直二等分線上で下方にＤ１×ｒ４だけ離れた点Ａを中心として、幅Ｄ１×ｒ５，高さＤ１×ｒ６の矩形を鼻存在領域として設定する。そして、鼻位置候補がこの鼻存在領域に含まれない場合は、目と鼻の相対的な位置関係からみて、この位置候補の組み合わせは適当ではないと判定する。ここで、Ｄ１は両目の位置候補間の距離であり、ｒ４，ｒ５，ｒ６は多数の人物について実測して定めた値とする。
【００６６】
または図２１に示すように、鼻位置候補から上方にＤ２×ｒ４だけ移動した点Ｂから、左右にＤ２／２だけ移動した点Ｂ１，Ｂ２を中心として、幅Ｄ２×ｒ７，高さＤ２×ｒ８の矩形を片目存在領域として設定する。そして、片目の位置候補がこの片目存在領域に含まれない場合は、目と鼻の相対的な位置関係からみて、この位置候補の組み合わせは適当ではないと判定する。ここで、Ｄ２とは顔の奥行き値から推定した顔幅からさらに推定した両目間の距離であり、ｒ４，ｒ７，ｒ８は多数の人物について実測して定めた値とする。
【００６７】
なお、ステップＳ２３２において算出したＰ個の鼻位置候補に対して、輝度画像において、図２０のような片目存在領域を目探索領域として設定し、その領域内のみから目位置を探索してもよい。この場合、最終的な結果は上述の例と等しくなる。Ｐの個数によっては、図１８のような帯状領域よりも探索面積が小さくなるので、処理速度の向上が期待できる。
【００６８】
逆に、輝度画像から先に目位置を探索して位置候補を得て、その後、図１９のような鼻存在領域を鼻探索領域として設定し、距離画像のその領域内のみから鼻位置を探索することも可能である。しかしながら、距離画像において鼻に類似した形状を持つ箇所よりも、輝度画像において目に類似した輝度を持つ箇所の方が多いため、先に距離画像から鼻を探索した方が、効率的である。
【００６９】
また、鼻位置と両目間の理想的な位置関係を定義し、その位置関係から外れるほど０に近い値になり、その位置関係に近づくほど１に近い値になるような評価関数を定義し、鼻のマッチングスコアと目のマッチングスコアの合成スコアにこの評価関数の値を乗じたものを統合スコアとし、統合スコアが最高のものを、最終的な目および鼻の２次元位置として決定してもよい。
【００７０】
このように、距離画像から鼻の位置を探索し、輝度画像から目の位置を探索し、その両者の相対的な位置関係を利用して最終的に目の位置を決定することによって、輝度画像において目に類似した眉や眼鏡のフレーム、口等を誤って検出してしまうというような検出ミスを大幅に低減することができる。
【００７１】
そしてステップＳ２３６において、目の奥行き値を推定する。すなわち、ステップＳ２３５で決定された目の２次元位置から、すでに取得されている距離画像を用いて、目の３次元位置を求める。
【００７２】
図２２はＣ−ＸＹＺのカメラ座標系である。図２２において、Ｚ軸はカメラの光軸であり、Ｚ軸上のレンズ中心Ｃから焦点距離ｆだけ離れた位置に、画像中心ｃを原点とする仮想画像平面があると考える。画像平面上の点ｍ（ｘｍ，ｙｍ）に対応する計測点Ｍの奥行き値ＺＭは、距離画像における（ｘｍ，ｙｍ）に対応する画素値として算出することができる。よって、ＸＭ＝ＺＭ・ｘｍ／ｆ、ＹＭ＝ＺＭ・ｙｍ／ｆとして、カメラ座標における３次元位置が算出できる。輝度画像および距離画像の座標系から図２１の仮想画像平面上の座標系に変換するためには、予めカメラの内部パラメータ（ｆを含む）を求めておけばよい。また、レンズの周辺では歪みがあるため、歪み補正を行っておく。
【００７３】
このように、輝度画像および距離画像を併用することによって、目および鼻の３次元位置を容易に決定できる。
【００７４】
本実施形態で用いるＬＥＤアレイレンジファインダは、本願発明者らが実際にシステム開発した結果、測距誤差として相対誤差１％（測定距離５００ｍｍの場合でも５ｍｍの誤差）を達成している。被写体の凹凸がなだらかな場合、周辺の距離を平滑化することによって、測定した距離の信頼性はさらに増す。すなわち、２次元画像中の特定画素の距離を測定したい場合、超音波センサなどのスポットセンサによって測定する方法もあるが、本実施形態のように、レンジファインダによって取得した距離画像を利用することによって、その特定画素の周辺の距離値も有効に用いることができるので、安定した距離計測が可能になる。
【００７５】
次にステップＳ３０において、望遠カメラを目の３次元位置に向けて、虹彩画像を撮影する。そして、ステップＳ４０において、撮影された目の画像から虹彩認証を行う。虹彩認証は、例えば特表平８−５０４９７９号公報記載の方法によって行えばよい。
【００７６】
以上のように本実施形態によると、距離画像において、立体形状に特徴がある鼻の位置が探索されるとともに、輝度画像において、輝度に特徴がある目の位置が探索される。そして、目と鼻の相対的な位置関係の情報を参照して、目の位置が最終的に決定される。すなわち、従来よりも高精度に、目の位置を検出することができる。
【００７７】
なお、ステップＳ２３６において、目の距離値を、距離画像における頬周辺部分の距離値を用いて決定するようにしてもよい。すでに述べたように、本実施形態のように光強度比を基にして距離を測定する場合には、反射率の低い領域や、拡散反射でなく鏡面反射を起こす領域では、距離の測定誤差が大きくなる。そして目付近の奥行き値は、次の点で誤差が大きくなると考えられる。
・虹彩が濃い茶褐色の場合は、反射率が低い
・角膜の表面で鏡面反射が起きる
・被写体の人物が眼鏡をかけている場合、濃色のフレームは反射率が低い。また、レンズや金属のフレームによる鏡面反射が目付近に位置する場合がある。
【００７８】
このため、目付近の奥行き値の代わりに、頬周辺の奥行き値を算出して、目の奥行き値を間接的に推定する。図２３はある人物を真横から見た図であるが、目の奥行き値は鼻のほぼ真横あたりの頬とほぼ同じである。本願発明者が複数人の横顔を同じように観察したところ、目の奥行き値は、頬とほぼ同じであるか、少し奥側（大きくても１ｃｍ程度）にあるという知見を得た。そこでここでは、目の奥行き値＝頬の奥行き値＋０．５ｃｍとする。
【００７９】
頬の奥行き値は次のようにして求める。図２４に示すように、両目位置を結ぶ線分（長さＤ）を距離Ｄ×ｒ９だけ下方に平行移動し、その両端Ｃ１，Ｃ２を中心とする半径Ｄ×ｒ１０の円を頬領域として設定する。そして、この頬領域内の距離画像の画素値の平均値を頬の奥行き値とする。ここで、ｒ９，ｒ１０は実験的に求めるものとし、ここではｒ９＝０．６５、ｒ１０＝０．１とする。
【００８０】
ただし、この方法では、人物の頭部が前後左右に傾いていないことが必要条件となる。ここでは、個人認証という用途から、被認証者が撮影に協力してくれるということを前提として、この方法を採用した。これに対して、被写体となる人物に、もう少し自由な姿勢を許容する用途の場合には、距離画像および輝度画像から顔の姿勢推定を行い、顔の傾きを考慮して、頬の奥行き値を眼の奥行き値に変換する必要がある。
【００８１】
なお、本実施形態では、図１および図２に示すように、レンジファインダの光源１０１とカメラ１０２を上下に配置したが、このような配置に特に限定されるものではなく、カメラが上にあり、光源が下にあってもかまわない。もちろん、カメラと光源を左右に配置してもよい。また、距離の測定精度を向上させるために、複数個の光源を設けてもかまわない。
【００８２】
また、本実施形態では、予め登録された複数人のデータと比較して認証を行い、どの人物かを判定する１：Ｎ認証を前提として、説明を行った。この場合、認証装置では、顔幅や、鼻と目との相対的な位置関係、目テンプレートや鼻テンプレートの形状などは、複数の人物について平均的なものが利用される。これに対して、被認証者が自分のＩＤを認証装置に申告し、申告されたＩＤについて予め登録された個人と被認証者とが一致するか否かを判定する１：１認証の場合には、顔幅や、鼻と目との相対的な位置関係、目テンプレートや鼻テンプレートの形状などは、個人ごとに準備される。このため、より精度の高い検出が可能になる。ＩＤの申告には、キー入力、ＩＤカード（非接触ＩＤカードを含む）、音声入力などが利用可能である。
【００８３】
また、本発明において、本実施形態で示したような方式のレンジファインダは必須ではなく、距離画像が得られるものであれば、他の方式を採用してもかまわない。例えば、光切断法、モアレ法、ステレオ視など他の距離測定方法と組み合わせて利用することも可能である。また、本実施形態では、輝度画像は、レンジファインダから得た原輝度画像から生成したが、これ以外の生成方法でもよく、例えば輝度画像取得のための専用の撮影系を設けてもよい。ただし、輝度画像は、距離画像との間で各座標が対応付けられている必要がある。
【００８４】
なお、本実施形態では、虹彩認証を行うために目の３次元位置を検出するものとしたが、本発明に係る物体位置検出の適用範囲はこれに限られるものではない。例えば、顔認証のために、目、鼻、口などの顔部品の２次元位置を検出する場合に、本発明を用いてもよい。すなわち、距離画像から鼻位置を探索し、輝度画像から目および口を探索し、各顔部品の候補の相対的な位置関係から、最終的な顔部品の位置を決定してもよい。
【００８５】
もちろん、鼻以外の、形状（距離）に特徴のある物体と、眼以外の、濃度パタン（輝度）に特徴のある物体とを検出するために用いることも可能である。例えば、ロボット等の人工物において、形状に特徴のある部位と、濃度パタンに特徴のある部位が設計されている場合には、これらの部位の検出を行う際に、本発明を用いることができる。
【００８６】
本発明のように、形状に特徴のある部位と、濃度パタンに特徴のある部位とを分離して位置検出を行うことによって、次のような利点がある。濃度パタンに特徴のある部位が特徴的な形状をしていた場合、視点によって濃度パタンの見え方が変化してしまい、輝度画像からの部位検出が困難になる可能性がある。また、形状に特徴のある部位が特徴的な濃度パタンを有していた場合、本実施形態のようにレンジファインダによって形状計測を行うとすると、反射光の強度が被写体の濃度パタンによって左右されてしまい、反射率が低い個所が生じた場合は、形状計測の誤差が大きくなってしまう可能性がある。よって、形状に特徴のある部位と、濃度パタンに特徴のある部位とを分けて、それぞれを検出し、両者の相対的な位置関係を基に、最終的な位置検出を行うことによって、検出精度が格段に向上する。
【００８７】
【発明の効果】
以上のように本発明によると、鼻と目のように、一方は、立体形状に特徴があり距離画像で探索が容易であり、他方は、輝度に特徴があり輝度画像で探索が容易であって、かつ、相対的な位置関係が既知の物体について、高精度に、その位置を検出することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る虹彩認証装置を正面から見た図である。
【図２】図１の虹彩認証装置を側方から見た図である。
【図３】本発明の一実施形態に係る物体位置検出装置の構成を機能的に示すブロック図である。
【図４】図１の虹彩認証装置によって撮影された人物画像の模式図である。
【図５】レンジファインダが投射する光パタンの例である。
【図６】図１の虹彩認証装置の動作を示すフローチャートである。
【図７】図６におけるステップＳ２０の詳細を示すフローチャートである。
【図８】図７におけるステップＳ２３の詳細を示すフローチャートであり、本発明の一実施形態に係る物体位置検出方法を示す図である。
【図９】２種類の光パタンの切り替えタイミングを示す図である。
【図１０】距離画像から求めた距離値ヒストグラムの一例である。
【図１１】ニ値化された距離画像の例である。
【図１２】図１１の二値化された距離画像を水平方向に射影して作成したヒストグラムの例である。
【図１３】図１１の二値化距離画像において、決定された顔領域の上下範囲を示す図である。
【図１４】図１３の上下範囲で切り取った二値化距離画像を垂直方向に射影して得たヒストグラムの例である。
【図１５】図１１の二値化距離画像において、決定された顔領域を示す図である。
【図１６】距離値から構成された鼻テンプレートの一例である。
【図１７】距離画像の顔領域において設定した鼻探索領域を示す図である。
【図１８】輝度画像の顔領域において設定した目探索領域を示す図である。
【図１９】輝度勾配方向ベクトルから構成された目テンプレートの例である。
【図２０】両目の位置と鼻存在領域との位置関係を示す図である。
【図２１】鼻の位置と両目の存在領域との位置関係を示す図である。
【図２２】画像座標系とカメラ座標系との関係を示す図である。
【図２３】人物の横顔を示す図である。
【図２４】目の奥行き値を推定するために用いる頬領域を示す図である。
【符号の説明】
１００　レンジファインダ
１０１　広角カメラ用照明（光源アレイ部）
１０２　カメラ
１０６　光源制御部
１０７　画像処理部
１０８　探索部
ＰＰ　人物（被写体）
ＮＳ　鼻（第１の物体）
ＥＹ　目（第２の物体）[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technique for detecting a position of a specific object such as a human eye from an image.
[0002]
[Prior art]
In recent years, in various technical fields, the need for accurately detecting the position of a face part such as human eyes has been increasing. A typical example is a personal authentication (biometrics personal authentication) technology using biometric information.
[0003]
For example, in iris authentication, an iris recognition system having a two-camera configuration has been proposed (see Japanese Patent Laid-Open No. 10-40386). In this system, an entire face is photographed by a first camera, and the photographed face image is analyzed to detect an eye position. Next, the second camera is mechanically controlled to obtain an enlarged image of the iris toward the detected eye position. In this system, the user only needs to stand in front of the device, so that the authentication operation can be performed easily. However, a necessary condition is that the position of the eyes can be detected with high accuracy.
[0004]
In the face authentication, a feature point of a face including eyes is usually detected from an image including a face, and the face position and the size are normalized based on the feature points, and then the face authentication is performed. Therefore, in order to perform face authentication with high accuracy, it is necessary to detect the positions of feature points of the face including the eyes with high accuracy.
[0005]
In fields other than personal authentication, for example, in applications such as gaze detection and dozing detection, it is necessary to detect the eye position.
[0006]
By the way, since most eye position detection methods perform matching using a local eye template, eyebrows, eyeglass frames, mouths, and the like having shapes similar to eyes are erroneously recognized as eyes. There is a problem that the possibility is high.
[0007]
To solve this problem, a method has been proposed in which a nose position is first detected, an eye search area is determined from the nose position, and an eye position is detected from within the search area. That is, erroneous detection of eyes is reduced by limiting the search area. For example, in Japanese Patent Application Laid-Open No. 2000-105829, matching is performed using a template of a nose image to detect the position of the nose, and an area above the nose is set as an eye search area. In Japanese Patent Application Laid-Open No. 2000-193420, a face image is photographed by irradiating illumination from a diagonal direction of the face, and a nose muscle is detected from the face image using edge information. An eye search area is set above the nose muscle.
[0008]
[Problems to be solved by the invention]
However, in these conventional methods, since the nose image in the luminance image greatly varies depending on the direction of illumination, a certain illumination environment is required to accurately detect the position of the nose. For this reason, in order to realize highly accurate detection, the use conditions of the system are greatly restricted, and it is not very versatile.
[0009]
In view of the above-described problem, an object of the present invention is to make it possible to detect the position of an object from an image with higher accuracy than before.
[0010]
[Means for Solving the Invention]
In order to solve the above-mentioned problem, a solution taken by the invention according to claim 1 includes, as a position detection method of an object, a step of obtaining a distance image of an object, and associating the distance image with each coordinate of the object. Obtaining the obtained luminance image, and searching for the position of a first object in the distance image, and searching for the position of a second object in the luminance image. And determining a two-dimensional position of at least one of the first and second objects with reference to information on a relative positional relationship between the first and second objects.
[0011]
According to the first aspect of the present invention, the first object is searched for in the distance image, and the second object is searched for in the luminance image. The position of at least one of the first and second objects is finally determined with reference to the information on the relative positional relationship between the two. That is, when a relative positional relationship is known between a first object having a characteristic in a three-dimensional shape and easily searchable in a distance image and a second object having a characteristic in luminance and easily searchable in a luminance image. The position can be detected with high accuracy.
[0012]
In the invention according to claim 2, in the search step in claim 1, a search area in the luminance image is narrowed according to a position candidate obtained as a result of the search for the first object. It is assumed that the position of the second object is searched.
[0013]
According to the second aspect of the present invention, the search area for the second object in the luminance image is narrowed according to the position candidate of the first object, so that the processing amount in the search for the luminance image can be reduced, and Detection can be reduced.
[0014]
In the invention of claim 3, in claim 1, for at least one of the first and second objects whose two-dimensional position has been determined, using the distance image and the determined two-dimensional position, It is provided with a step of determining a three-dimensional position.
[0015]
According to a fourth aspect of the present invention, in the first aspect, the subject is a person, the first object is a nose, and the second object is an eye.
[0016]
According to a fifth aspect of the present invention, in the fourth aspect, a step of determining a distance value of the eye using a distance value around the cheek in the distance image is provided.
[0017]
According to a sixth aspect of the present invention, in the search step of the first aspect, the search for the position of the first object is performed by matching the distance image with a distance template configured from the distance values of the first object. Shall be.
[0018]
In the invention of claim 7, in claim 6, the approximate distance of the subject is calculated from the distance image, and the approximate distance, the angle of view of the camera that captured the distance image, and the estimation of the first object are calculated. Determining a size of the distance template based on the size.
[0019]
According to another aspect of the present invention, there is provided a distance image acquisition unit that acquires a distance image of a subject as a position detection device for an object, and a luminance image in which the distance image and each coordinate are associated with the subject. And a search unit that searches for the position of a first object in the distance image and searches for the position of a second object in the brightness image. The search unit includes: The two-dimensional position of at least one of the first and second objects is determined with reference to information on the relative positional relationship between the first and second objects.
[0020]
According to the eighth aspect of the invention, the first object is searched for in the distance image, and the second object is searched for in the luminance image. The position of at least one of the first and second objects is finally determined with reference to the information on the relative positional relationship between the two. That is, when a relative positional relationship is known between a first object having a characteristic in a three-dimensional shape and easily searchable in a distance image and a second object having a characteristic in luminance and easily searchable in a luminance image. The position can be detected with high accuracy.
[0021]
According to the ninth aspect of the present invention, the search unit according to the eighth aspect narrows down a search area in the luminance image according to a position candidate obtained as a result of the search for the first object. It is assumed that the position of the second object is searched.
[0022]
In a tenth aspect of the present invention, in the eighth aspect, the distance image acquisition unit includes a range finder, and the range finder includes a light source array unit in which a plurality of light sources are arranged, and a light source array unit. A light source control unit for projecting at least two types of light patterns from the light source array unit by controlling a light emitting mode of the light source, and a light reflected from the subject with respect to each light pattern projected from the light source array unit. A camera that captures a luminance image, and an image processing unit that obtains the distance image from an original luminance image captured by the camera, wherein the luminance image acquisition unit is captured by the camera of the range finder. The luminance image is obtained from at least one of the original luminance images.
[0023]
The solution taken by the invention of claim 11 includes, as an object position detection method, a step of obtaining a distance image of a subject, and a step of searching for a position of the object in the distance image. The search for the position of the object is performed by matching the distance image with a distance template composed of distance values of the object.
[0024]
According to the eleventh aspect, since the position detection of the object in the distance image is performed by matching using the distance template, the detection accuracy can be improved as compared with the related art.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0026]
FIG. 1 is a front view of an iris authentication device according to an embodiment of the present invention, and FIG. 2 is a diagram showing how the iris authentication device 10 of FIG. 1 performs iris authentication of a person PP as a subject located in front thereof. is there. In the present embodiment, the position of the nose NS as the first object is searched from the distance image of the face of the person PP obtained by the range finder device in the iris authentication device 10, and the position of the nose NS is searched for with reference to the position of the nose NS. It is assumed that the three-dimensional position of the eye EY as the second object is obtained and iris authentication is performed.
[0027]
As shown in FIGS. 1 and 2, the iris authentication device 10 includes a wide-angle camera illumination 101, a wide-angle camera 102, and an iris imaging unit 105. The iris imaging unit 105 is movable, and the telephoto camera illumination 103 and the telephoto camera 104 are fixed inside. The iris imaging unit 105 can change the direction of illumination (photographing) without changing the relative positional relationship between the telephoto camera illumination 103 and the telephoto camera 104.
[0028]
In the present embodiment, the illumination 101 for the wide-angle camera, the wide-angle camera 102, and a portion related to these controls function as a range finder 100 that measures the three-dimensional position of the subject. FIG. 3 is a block diagram functionally showing the configuration of the object position detection device according to the present embodiment including the range finder 100. The range image acquisition unit according to the present embodiment includes a range finder 100 having an illumination 101, a camera 102, a light source control unit 106, and an image processing unit 107. The image processing unit 107 obtains a distance image and a luminance image from the original luminance image captured by the camera 102, and configures a distance image acquisition unit and a luminance image acquisition unit. The search unit 108 searches the position of the nose NS from the distance image, searches the position of the eye EY from the luminance image, and refers to the information on the relative positional relationship between the nose NS and the eye EY, for example, the nose NS. The position of the eye EY is finally determined using the prior knowledge of the relative positional relationship between the eye EY and the eye EY.
[0029]
The illumination 101 for a wide-angle camera is configured by a light source array unit in which a plurality of LEDs 11 as light sources are arranged. In FIG. 1, each LED 11 is represented by a circle, and the wide-angle camera illumination 101 is configured by 11 × 11 LEDs 11. The wide-angle camera 102 has VGA pixels (480 horizontal pixels × 640 vertical pixels), and is installed so as to obtain a vertically long field of view. The wide-angle camera 102 captures an upper body image including the face of the person PP as shown in FIG.
[0030]
The light source control unit 106 can control the light emission time of the wide-angle camera illumination 101 by PWM (Pulse Width Modulation) for each LED (for example, for each column or row of the array). Within the exposure time of the wide-angle camera 102, the longer the light emission time is, the larger the total light amount becomes. Therefore, by controlling the light emission time, an arbitrary light pattern can be generated.
[0031]
Here, the light emission time is controlled for each row (horizontal direction). In this case, the light pattern is modulated in the vertical direction in FIG. Further, the light source control unit 106 switches between two types of light patterns in accordance with the exposure timing (exposure time) of the camera. FIG. 5 is a diagram showing two types of light patterns generated by controlling the light emission time. In the figure, (a) is an optical pattern A for monotonically increasing the LED emission time according to the row number, and (b) is an optical pattern B for monotonically decreasing the LED emission time according to the row number. The light source control unit 106 causes the illumination 101 to emit two types of patterns as shown in FIG.
[0032]
The operation of the iris authentication device 10 shown in FIGS. 1 and 2 will be described with reference to the flowcharts of FIGS.
[0033]
First, in step S10, the wide-angle camera illumination 101 irradiates two types of light patterns as shown in FIG. 5, and the wide-angle camera 102 captures reflected light images corresponding to these two types of light patterns as original luminance images. FIG. 9 is a diagram illustrating the relationship between the exposure timing of the wide-angle camera 102 and the light pattern projected from the wide-angle camera illumination 101. The image processing unit 107 generates a distance image from two original luminance images when the light patterns A and B are projected from the illumination 101, respectively. The principle of distance measurement of a range finder using a light source array is disclosed in Japanese Patent Application No. 2001-286646, and description thereof is omitted here.
[0034]
Next, in step S20, the three-dimensional position of the eye is calculated. FIG. 7 shows details of step S20. First, in step S21, a depth value of a person is estimated. Specifically, the image processing unit 107 obtains a histogram of distance values of all pixels for a distance image of 480 pixels in width × 640 pixels in height, and, based on the histogram, associates the person PP with a plane perpendicular to the optical axis of the camera 102. If so, the distance from the camera 102 is estimated.
[0035]
FIG. 10 is an example of a distance value histogram obtained from a distance image when the person PP is at a distance of about 60 cm from the camera 102. In FIG. 10, the vertical axis represents the frequency (the number of pixels), the horizontal axis represents the distance value, and is classified every 5 cm. In the data of FIG. 10, the frequency is highest in the class of 60 to 65 cm.
[0036]
Next, binarization is performed on the distance image with reference to the distance value histogram. Here, the average value of the lower limit value and the upper limit value of the class with the highest frequency is defined as a representative value dm. In the example of FIG. 10, dm = 62.5 (cm). Then, for each coordinate (distance value d) of the distance image,
“1” when dm−a <d <dm + a
Otherwise (including unmeasurable points), "0"
The binarization is performed as follows. a is a predetermined value, and here, it is assumed that a = 10 cm.
[0037]
FIG. 11 is an example of a binarized distance image. In FIG. 11, the region of “1” as a result of the binarization is represented by white, and the region of “0” is represented by black. It should be noted that in areas where the reflectance is low, such as head hair, black eyeglass frames, eyes, eyebrows, or in areas where shadows occur, such as necks and wrinkles of clothes, the distance cannot be measured properly. It has been determined to be 0 '.
[0038]
In the present embodiment, since a person region is cut out using the depth value, even if a plurality of people are shown in the field of view of the camera, the person is extracted based on the distance value by such a simple method. Can be. Of course, another method, for example, a method of cutting out a person region using a luminance image may be adopted.
[0039]
Next, in step S22, the face area of the person is estimated. For the estimation of the face area of a person, a depth binarized image as shown in FIG. 11 is used. Here, the face area is defined as an area including at least both eyes and a nose.
[0040]
First, as shown in FIG. 12, the binary distance data is projected in the horizontal direction to create a histogram H1 (y). Then, the upper and lower ranges of the face area are determined using the histogram H1 (y).
[0041]
Here, the distance between the camera and the person was estimated to be 62.5 (cm) from the histogram of FIG. Then, assuming that the size of the face is constant irrespective of the person and the angle of view of the camera is known, the size of the face in the image can be estimated. Assuming that the estimated face width is w1, the upper boundary yt of the face can be determined to be the minimum y satisfying the histogram H1 (y)> w1 × b (for example, b = 0.1). Then, a position moved downward by a distance that is a constant multiple of the face width w1 such that at least the entire nose is within the range from the upper boundary yt of the face is determined as the lower boundary yb of the face. That is,
yb = yt + w1 × c (c = 1.2)
In the present embodiment, since the distance is measured by using the reflected light of the illumination, it is impossible to measure an area having a small reflectance such as a hair. Therefore, the upper boundary yt of the face greatly differs depending on whether the forehead is wide or narrow, or whether the hair is on the forehead. Therefore, here, the constant c is set to a value such that the nose is included in the face area even when the crown is determined to be the boundary yt. FIG. 13 shows the determined vertical position of the face.
[0042]
In order to perform iris authentication by detecting the position of the eyes, the face region may be set to include at least the positions of the eyes. On the other hand, in the present embodiment, the face area is determined so as to include not only the eyes but also the nose. If the nose position is detected using the distance image, the eye area is determined using the eye-nose positional relationship. This is because the position detection accuracy can be improved. Further, it is also possible to narrow down the eye position search area using the nose position.
[0043]
Next, the left and right positions of the face are determined. As shown in FIG. 14, a binary image of the face area whose vertical position has been determined is projected in the vertical direction to create a histogram H2 (x). And
H2 (x)> (yb-yt + 1) / 3
Is the minimum position x1 of the face,
H2 (x)> (yb-yt + 1) / 3
The maximum x that satisfies is set as the right position xr of the face. FIG. 15 shows the face area determined in this way.
[0044]
Here, the method of simply detecting the face area using only the distance image has been described, but other methods may of course be used. For example, a luminance image may be created by a method described later, and a face area may be searched from the luminance image using an average image of the face in the luminance image as a face template. In this case, the face template size can be determined from the depth value up to the person. Further, since it is sufficient to search only the person area calculated from the distance image as shown in FIG. 11 in the luminance image, it is possible to reduce the processing amount required for the search.
[0045]
Next, in step S23, the three-dimensional position of the eye is estimated from within the face area estimated in step S22. FIG. 8 shows the details of step S23.
[0046]
First, in step S231, the image processing unit 107 acquires a distance image of the face area. Here, it is assumed that a face area (width fw (xl to xr), height fh (yt to yb)) is cut out from the distance image shown in FIG.
[0047]
Next, in step S232, the search unit 108 searches for a nose position from the distance image.
[0048]
Here, a nose-shaped template is used as a distance template for searching for a nose position. FIG. 16 is an example of a nose-shaped template. The nose template of FIG. 16 is obtained by manually cutting out the nose region from the three-dimensional data of N (N = 20) faces measured from the front, performing normalization of the vertical and horizontal sizes and positioning, and then averaging. . In FIG. 16, the distance value is represented by shading of gray, and the darker and lower the brightness, the closer to the camera, and the lighter the gray, the higher the brightness, the farther from the camera. The nose template in FIG. 16 has a trapezoidal shape having a base Tx and a height Ty, and the upper left and upper right portions of the circumscribed rectangle are employed as templates because the distance image may be affected by glasses or the like. Not. Note that the three-dimensional data for creating a template does not necessarily need to be obtained by a range finder, and may be obtained by another method.
[0049]
Then, a nose search area is determined within the face area. As shown in FIG. 17, a band-shaped area is set at the center in the horizontal direction of the face area. Here, the value of r1 is experimentally obtained, and is set to 0.5 here. In the present embodiment, it is assumed that the person to be authenticated stands almost in front of the camera and faces the camera. In this case, since the nose is always located near the center line of the face area, the search time can be shortened by setting a band-shaped nose search area as shown in FIG. Needless to say, for example, in an application in which a face is photographed from an oblique direction, the nose may be searched from the entire face area.
[0050]
Next, the depth value of the face is estimated from the distance image of the face area, and the sizes Tx and Ty of the nose template are changed from the estimated depth value.
[0051]
First, a histogram of distance values is calculated in the same manner as in step S21, and the average value of the lower limit value and the upper limit value of the most frequent class is used as the estimated value of the face depth value. In step S21, the distance value is estimated using the entire area having different distance values such as the face, neck, and body. However, in order to accurately estimate the depth value of the face area, the distance value of the face area is estimated. Using only the above, the step size of the class of the histogram is set to a smaller value, and the distance value is estimated again. Assuming that the size of the nose as the estimated size of the first object is constant irrespective of the person, it is estimated from prior knowledge about the size in the real world, and the angle of view of the camera that captured the range image is calculated. If it is known, the size of the nose in the image can be estimated from the depth value of the face as the approximate distance of the subject, whereby the size of the nose template can be determined. The size of the nose template Tx, Ty is changed according to the estimated face width. Then, a new nose template is created by interpolating the distance values. As an interpolation method, a linear interpolation, a bicubic method, a nearest neighbor method, or the like can be used.
[0052]
Then, the template whose size has been changed is matched in the nose search area of FIG. 17, and a position having a higher matching score is stored as a nose position candidate together with the matching score. In this case, all the matching scores that are equal to or greater than a predetermined threshold are left as candidates, and the total number is P. Here, the nose position may be a predetermined coordinate in an area where the template matches, for example, a point at the upper left, a center point, a position where the nose is most convex, or the like. Although the normalized correlation is used as the index of the matching score, another distance scale such as the Euclidean distance may be used.
[0053]
Here, in normal template matching, since the size of the search target included in the image is not known, matching is performed using a plurality of templates having different sizes, and the matching result having the best matching score is detected. Often. However, when a plurality of templates are used, the processing time increases. In the present embodiment, since the approximate distance from the distance image to the target is obtained, the size of the target in the image can be estimated. Therefore, high-speed matching can be performed with a single template.
[0054]
Of course, even when the distance is not known, the size of the template can be determined from the width of the face area detected from the binarized distance image as shown in FIG. However, when the distance is measured using the intensity of the reflected light as in the present embodiment, an accurate face width cannot always be detected due to the influence of the hair. That is, since the distance of a region having a small reflectance such as black hair cannot be measured, the face width to be detected is smaller than the actual face width in the case of a person having hair on the cheek, as in many women. I will. On the other hand, in the case of hair having a relatively high reflectance, such as blond hair or brown hair, the distance of the hair area can be measured, so that the detected face width becomes larger than it actually is. On the other hand, by determining the template size from the distance value, the template size can be determined stably without being affected by the hair.
[0055]
In this case, since the subject is assumed to stand almost in front of the camera and face the camera, a single-shaped nose template is used. On the other hand, when the face is photographed from an oblique direction or when the face is tilted up and down, the nose template is rotated to prepare in advance various types of postures and used for matching. Just fine. Alternatively, the three-dimensional posture of the face may be estimated from the distance image and the luminance image of the face, the rotation of the distance data may be corrected to be viewed from the front, and the nose template may be matched using the corrected distance image.
[0056]
Further, the nose position search may be performed by a method other than the nose template matching. For example, an area closest to the camera in the face area is regarded as the most convex part of the nose. However, in the present embodiment, since the distance is measured based on the intensity of the reflected light, for example, a region in which a distance measurement error is large in the distance image due to, for example, the “light” of the face due to sebum or the corneal reflection of the eye. Partially exist. In a distance image including such an error, simply determining the nose position with reference to only one distance value has a problem in accuracy. In such a case, it can be said that matching using the distance template is effective.
[0057]
Next, in step S233, the image processing unit 107 acquires a luminance image of the face area. In step S10, two original luminance images corresponding to two types of light patterns have already been captured in order to calculate a distance image. Therefore, here, for the face area shown in FIG. 15, an average of the pixel values of the two original luminance images is referred to as a luminance image (width fw (xl to xr) and height fh (yt to yb)). obtain.
[0058]
As shown in FIG. 5, one of the two types of light patterns to be irradiated changes from dark to bright as the row number increases, and the other changes from light to dark. Therefore, by averaging the two original luminance images, a luminance image equivalent to a case where uniform illumination is applied is obtained.
[0059]
It should be noted that the luminance image may be synthesized by using the larger value of the pixel values instead of the average value of the pixel values of the two original luminance images. In this case, an image having a high brightness and a clear contrast is obtained as a whole, and there is an advantage that subsequent eye position detection is easily performed.
[0060]
Note that one of the two original luminance images may be a luminance image. As in the present embodiment, illumination is performed from above, and when it is known that an object (human face) is included in the upper part of the camera field of view as shown in FIG. Even if the original luminance image thus obtained is used alone as a luminance image, subsequent eye position detection can be performed without lowering the accuracy.
[0061]
Next, in step S234, the search unit 108 searches for an eye position from the luminance image. First, as shown in FIG. 18, in the face area, an eye search area 1 for searching for the right eye (left eye) of a person, and an eye search area 2 for searching for the left eye (right eye) of a person. Is set in a belt shape. The constants r2 and r3 are determined experimentally, and here, r2 = 0.35 and r3 = 0.1. Eye detection is performed from within each eye search area. In the present embodiment, it is assumed that the person to be authenticated stands almost in front of the camera and faces the camera. In this case, since the y coordinate of the eye is always located in each of the eye search areas 1 and 2 in FIG. 18, the search time can be reduced. Of course, in an application in which a face is photographed from an oblique direction, eyes may be searched from the entire face area.
[0062]
The search for the eye position from the luminance image may be performed by any method. In the present embodiment, a method described in JP-A-2002-56394 is used. That is, an eye template as shown in FIG. 19 is created in advance. Each arrow in FIG. 19 represents a two-dimensional luminance gradient vector at a point on the luminance edge of the eye. Then, a luminance gradient image having a difference in the horizontal direction and the vertical direction is created from the luminance image, and a correlation value between the luminance gradient image and the eye template of FIG. 19 is calculated. Explore. In the present embodiment, in the combination of the binocular positions, instead of determining the highest correlation value as the eye position, all the binocular correlation values that are equal to or greater than a predetermined threshold are all candidates for the binocular position, along with the correlation value. leave. Let the total number be Q.
[0063]
Here, the eye position may be a predetermined coordinate in an area where the eye template matches, for example, an upper left point, a center point, a center point of an iris area in the template, and the like. Also, as in step S232, by determining the size of the eye template from the depth value of the face, accurate search can be performed using a single template.
[0064]
Next, in step S235, the search unit 108 determines a three-dimensional eye position. By the processing up to this point, P nose position candidates have been obtained in step S232, and Q eye position candidates have been obtained in step S234. Here, by referring to the information on the relative positional relationship between the eyes and the nose, from the combinations of P × Q nose position candidates and the eye position candidates, those that are not appropriate in terms of the relative positional relationship are excluded. Then, among the remaining combinations, the one with the highest result of combining the matching scores is determined as the final two-dimensional eye and nose position. Here, the average value of the matching score of the nose and the matching score of the eyes is used as the composite score. The composite score may be a weighted linear sum, and the weight in that case may be determined in consideration of the reliability of the nose score and the reliability of the eye score.
[0065]
The relative positional relationship between the eyes and the nose is checked by, for example, the following method. That is, as shown in FIG. 20, a vertical bisector is drawn for a line segment connecting the right eye position candidate and the left eye position candidate, and a point A which is separated downward by D1 × r4 on the vertical bisector is centered. , A rectangle having a width D1 × r5 and a height D1 × r6 is set as a nose existence region. Then, when the nose position candidate is not included in the nose existing area, it is determined that the combination of the position candidates is not appropriate in view of the relative positional relationship between the eyes and the nose. Here, D1 is the distance between the position candidates of both eyes, and r4, r5, and r6 are values determined by actually measuring a large number of persons.
[0066]
Alternatively, as shown in FIG. 21, from point B, which has moved upward by D2 × r4 from the nose position candidate, width D2 × r7 and height D2 × r8 centering on points B1 and B2 which have moved left and right by D2 / 2. Is set as the one-eye existence area. If the position candidate for one eye is not included in the one eye existence region, it is determined that the combination of the position candidates is not appropriate in view of the relative positional relationship between the eye and the nose. Here, D2 is the distance between the eyes further estimated from the face width estimated from the depth value of the face, and r4, r7, and r8 are values determined by actually measuring many persons.
[0067]
Note that, for the P nose position candidates calculated in step S232, the one-eye existence region as shown in FIG. 20 may be set as the eye search region in the luminance image, and the eye position may be searched only from within that region. . In this case, the end result is equal to the above example. Depending on the number of Ps, the search area becomes smaller than that of the belt-shaped region as shown in FIG. 18, so that an improvement in processing speed can be expected.
[0068]
Conversely, the eye position is first searched from the luminance image to obtain a position candidate, and then the nose existence area as shown in FIG. 19 is set as the nose search area, and the nose position is searched only from within the area in the distance image. It is also possible. However, since there are more places in the luminance image similar to the eyes in the luminance image than in the distance image having a shape similar to the nose, it is more efficient to search the nose from the distance image first.
[0069]
In addition, an ideal positional relationship between the nose position and both eyes is defined, and an evaluation function is defined such that the value becomes closer to 0 as the position deviates from the positional relationship and becomes closer to 1 as the position approaches. Even if the composite score of the nose matching score and the eye matching score multiplied by the value of this evaluation function is defined as an integrated score, and the highest integrated score is determined as the final two-dimensional eye and nose position, Good.
[0070]
As described above, the position of the nose is searched from the distance image, the position of the eye is searched from the luminance image, and the position of the eye is finally determined using the relative positional relationship between the two. Thus, detection errors such as erroneous detection of eyebrows, eyeglass frames, mouth, etc. similar to eyes can be greatly reduced.
[0071]
Then, in step S236, the depth value of the eye is estimated. That is, from the two-dimensional position of the eye determined in step S235, the three-dimensional position of the eye is obtained using the already acquired distance image.
[0072]
FIG. 22 shows a C-XYZ camera coordinate system. In FIG. 22, the Z axis is the optical axis of the camera, and it is assumed that a virtual image plane having the origin at the image center c is located at a position apart from the lens center C on the Z axis by the focal length f. The depth value ZM of the measurement point M corresponding to the point m (xm, ym) on the image plane can be calculated as a pixel value corresponding to (xm, ym) in the distance image. Therefore, a three-dimensional position in camera coordinates can be calculated as XM = ZM · xm / f and YM = ZM · ym / f. In order to convert from the coordinate system of the luminance image and the distance image to the coordinate system on the virtual image plane in FIG. 21, the internal parameters (including f) of the camera may be obtained in advance. In addition, since there is distortion around the lens, distortion correction is performed in advance.
[0073]
As described above, by using the luminance image and the distance image together, the three-dimensional positions of the eyes and the nose can be easily determined.
[0074]
The LED array range finder used in the present embodiment has achieved a relative measurement error of 1% (5 mm error even when the measurement distance is 500 mm) as a result of the actual development of the system by the present inventors. In the case where the unevenness of the subject is gentle, smoothing the peripheral distance further increases the reliability of the measured distance. That is, when it is desired to measure the distance of a specific pixel in a two-dimensional image, there is a method of measuring with a spot sensor such as an ultrasonic sensor. However, as in the present embodiment, by using a distance image acquired by a range finder, Since the distance value around the specific pixel can be effectively used, stable distance measurement can be performed.
[0075]
Next, in step S30, the iris image is photographed by pointing the telephoto camera at the three-dimensional position of the eye. Then, in step S40, iris authentication is performed from the captured eye image. The iris authentication may be performed, for example, by the method described in Japanese Patent Publication No. Hei 8-504979.
[0076]
As described above, according to the present embodiment, the position of the nose having the characteristic of the three-dimensional shape is searched for in the distance image, and the position of the eye having the characteristic of luminance is searched for in the luminance image. Then, the position of the eyes is finally determined with reference to the information on the relative positional relationship between the eyes and the nose. That is, the position of the eyes can be detected with higher accuracy than before.
[0077]
In step S236, the eye distance value may be determined using the distance value of the cheek periphery in the distance image. As described above, when the distance is measured based on the light intensity ratio as in the present embodiment, the distance measurement error is reduced in an area where the reflectance is low or an area where specular reflection occurs instead of diffuse reflection. growing. It is considered that the depth value near the eye has a large error at the following points.
・ If the iris is dark brown, the reflectance is low
・ Specular reflection occurs on the surface of the cornea
When the subject person is wearing glasses, dark frames have low reflectance. Also, specular reflection by a lens or a metal frame may be located near the eyes.
[0078]
Therefore, instead of the depth value near the eyes, the depth value around the cheek is calculated, and the depth value of the eyes is indirectly estimated. FIG. 23 is a view of a certain person from the side, and the depth value of the eyes is almost the same as the cheek near the side of the nose. When the inventor of the present application observed the profile of a plurality of persons in the same manner, it was found that the depth value of the eyes was almost the same as that of the cheeks, or was slightly behind (about 1 cm at most). Therefore, here, the depth value of the eyes = the depth value of the cheek + 0.5 cm.
[0079]
The cheek depth value is obtained as follows. As shown in FIG. 24, a line segment (length D) connecting both eye positions is translated downward by a distance D × r9, and a circle having a radius D × r10 centered on both ends C1 and C2 is set as a cheek region. I do. Then, the average value of the pixel values of the distance image in the cheek region is set as the cheek depth value. Here, r9 and r10 are obtained experimentally, and here, r9 = 0.65 and r10 = 0.1.
[0080]
However, in this method, a necessary condition is that the head of the person is not inclined forward, backward, left and right. Here, this method was adopted on the assumption that the person to be authenticated would cooperate with the photographing for the purpose of personal authentication. On the other hand, in the case of an application that allows a slightly more free posture to the person as the subject, the posture of the face is estimated from the distance image and the luminance image, and the depth value of the cheek is calculated in consideration of the inclination of the face. It needs to be converted to an eye depth value.
[0081]
In the present embodiment, as shown in FIGS. 1 and 2, the light source 101 and the camera 102 of the range finder are arranged vertically, but the arrangement is not particularly limited to this, and the camera is located above. The light source may be below. Of course, the camera and the light source may be arranged on the left and right. A plurality of light sources may be provided in order to improve the distance measurement accuracy.
[0082]
Further, in the present embodiment, the explanation has been made on the assumption that the authentication is performed by comparing the data with a plurality of persons registered in advance to determine which person is 1: N authentication. In this case, in the authentication device, the average of the face width, the relative positional relationship between the nose and the eyes, the shape of the eye template and the shape of the nose template are used for a plurality of persons. On the other hand, in the case of 1: 1 authentication, the authenticated person declares his / her own ID to the authentication device, and determines whether the individual registered in advance with the declared ID matches the authenticated person. Is prepared for each individual such as the face width, the relative positional relationship between the nose and the eyes, and the shapes of the eye template and the nose template. For this reason, more accurate detection becomes possible. For inputting an ID, a key input, an ID card (including a non-contact ID card), a voice input, and the like can be used.
[0083]
Further, in the present invention, the range finder of the method as shown in the present embodiment is not essential, and another method may be adopted as long as a range image can be obtained. For example, it can be used in combination with other distance measuring methods such as a light section method, a moiré method, and stereo vision. In the present embodiment, the luminance image is generated from the original luminance image obtained from the range finder. However, another generation method may be used. For example, a dedicated photographing system for obtaining a luminance image may be provided. However, it is necessary that each coordinate of the luminance image is associated with the distance image.
[0084]
In the present embodiment, the three-dimensional position of the eye is detected to perform iris authentication. However, the applicable range of the object position detection according to the present invention is not limited to this. For example, the present invention may be used when detecting a two-dimensional position of a face part such as an eye, a nose, and a mouth for face authentication. That is, the nose position may be searched from the distance image, the eyes and the mouth may be searched from the luminance image, and the final position of the face part may be determined from the relative positional relationship between the candidates for the face part.
[0085]
Of course, it can also be used to detect objects other than the nose, which have a characteristic shape (distance), and objects other than the eyes, which have a characteristic density pattern (luminance). For example, in the case of an artificial object such as a robot, if a part having a characteristic in shape and a part having a characteristic in a concentration pattern are designed, the present invention can be used in detecting these parts. .
[0086]
As described in the present invention, the following advantages are obtained by performing position detection by separating a portion having a characteristic in a shape and a portion having a characteristic in a density pattern. If a part having a characteristic in the density pattern has a characteristic shape, the appearance of the density pattern changes depending on the viewpoint, and it may be difficult to detect the part from the luminance image. In addition, when a part having a characteristic shape has a characteristic density pattern, if the shape is measured by a range finder as in the present embodiment, the intensity of the reflected light is affected by the density pattern of the subject. In other words, when a portion having a low reflectance occurs, an error in shape measurement may increase. Therefore, by separately detecting a part having a characteristic in shape and a part having a characteristic in a density pattern, detecting each of them, and performing a final position detection based on a relative positional relationship between the two, thereby achieving detection accuracy. Is significantly improved.
[0087]
【The invention's effect】
As described above, according to the present invention, like the nose and the eyes, one has a feature in a three-dimensional shape and is easily searched for in a distance image, and the other has a feature in brightness and is easy to search in a brightness image. In addition, the position of an object whose relative positional relationship is known can be detected with high accuracy.
[Brief description of the drawings]
FIG. 1 is a front view of an iris authentication device according to an embodiment of the present invention.
FIG. 2 is a side view of the iris authentication device of FIG. 1;
FIG. 3 is a block diagram functionally showing a configuration of an object position detection device according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of a person image captured by the iris authentication device of FIG. 1;
FIG. 5 is an example of an optical pattern projected by a range finder.
FIG. 6 is a flowchart illustrating an operation of the iris authentication device of FIG. 1;
FIG. 7 is a flowchart showing details of step S20 in FIG. 6;
8 is a flowchart showing details of step S23 in FIG. 7, and is a diagram showing an object position detection method according to an embodiment of the present invention.
FIG. 9 is a diagram showing switching timing of two types of optical patterns.
FIG. 10 is an example of a distance value histogram obtained from a distance image.
FIG. 11 is an example of a binarized distance image.
FIG. 12 is an example of a histogram created by projecting the binarized distance image of FIG. 11 in the horizontal direction.
13 is a diagram showing the upper and lower ranges of a determined face area in the binarized distance image of FIG. 11;
FIG. 14 is an example of a histogram obtained by vertically projecting a binarized distance image cut in the upper and lower ranges in FIG. 13;
FIG. 15 is a diagram showing a determined face area in the binarized distance image of FIG. 11;
FIG. 16 is an example of a nose template composed of distance values.
FIG. 17 is a diagram showing a nose search area set in a face area of a distance image.
FIG. 18 is a diagram illustrating an eye search area set in a face area of a luminance image.
FIG. 19 is an example of an eye template composed of luminance gradient direction vectors.
FIG. 20 is a diagram showing a positional relationship between the positions of both eyes and a nose existence region.
FIG. 21 is a diagram showing a positional relationship between a nose position and regions where both eyes are present.
FIG. 22 is a diagram illustrating a relationship between an image coordinate system and a camera coordinate system.
FIG. 23 is a diagram showing a profile of a person.
FIG. 24 is a diagram illustrating a cheek region used for estimating an eye depth value.
[Explanation of symbols]
100 range finder
101 Lighting for wide-angle camera (light source array)
102 camera
106 light source control unit
107 Image processing unit
108 Search Unit
PP person (subject)
NS nose (first object)
EY eyes (second object)

Claims

Obtaining a distance image of the subject;
Obtaining a luminance image in which the distance image and each coordinate are associated with the subject;
Searching for the position of a first object in the distance image, and searching for the position of a second object in the luminance image,
In the searching step, a two-dimensional position of at least one of the first and second objects is determined with reference to information on a relative positional relationship between the first and second objects. Object position detection method.

In claim 1,
In the searching step,
A search area in the luminance image is narrowed down according to a position candidate obtained as a result of the search for the first object,
An object position detection method, wherein the position of the second object is searched in the narrowed search area.

In claim 1,
Determining, for at least one of the first and second objects whose two-dimensional position has been determined, a three-dimensional position using the distance image and the determined two-dimensional position. An object position detection method that is a feature.

In claim 1,
The subject is a person,
The object position detection method according to claim 1, wherein the first object is a nose, and the second object is an eye.

In claim 4,
An object position detection method, comprising: determining an eye distance value using a distance value around a cheek in a distance image.

In claim 1,
In the searching step,
An object position detection method, wherein the search for the position of the first object is performed by matching the distance image with a distance template composed of distance values of the first object.

In claim 6,
From the distance image, the approximate distance of the subject is calculated, and based on the approximate distance, the angle of view of the camera that captured the distance image, and the estimated size of the first object, the size of the distance template is calculated. Determining an object position.

For a subject, a distance image acquisition unit that acquires a distance image,
For the subject, a luminance image acquisition unit that acquires a luminance image in which the distance image and each coordinate are associated,
A search unit that searches for the position of a first object in the distance image and searches for the position of a second object in the luminance image;
The search unit determines a two-dimensional position of at least one of the first and second objects with reference to information on a relative positional relationship between the first and second objects. An object position detecting device, characterized in that:

In claim 8,
The search unit,
A search area in the luminance image is narrowed down according to a position candidate obtained as a result of the search for the first object,
An object position detecting device for searching the position of the second object in the narrowed search area.

In claim 8,
The range image acquisition unit includes a range finder,
The range finder is
A light source array unit in which a plurality of light sources are arranged,
A light source control unit that projects at least two types of light patterns from the light source array unit by controlling a light emission mode of each light source of the light source array unit;
A camera that captures reflected light from the subject for each light pattern projected from the light source array unit as an original luminance image,
An image processing unit that obtains the distance image from an original luminance image captured by the camera,
The object position detection device according to claim 1, wherein the luminance image acquisition unit is configured to obtain the luminance image from at least one of original luminance images captured by the camera of the range finder.

Obtaining a distance image of the subject;
Searching the position of the object in the distance image,
In the searching step,
A method for detecting the position of an object, wherein the search for the position of the object is performed by matching the distance image with a distance template composed of distance values of the object.