JP4898026B2

JP4898026B2 - Face / Gaze Recognition Device Using Stereo Camera

Info

Publication number: JP4898026B2
Application number: JP2001197915A
Authority: JP
Inventors: 吉央松本; 亮治五十嵐; 司小笠原
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2001-06-29
Filing date: 2001-06-29
Publication date: 2012-03-14
Anticipated expiration: 2021-06-29
Also published as: JP2003015816A

Description

【０００１】
【発明の属する技術分野】
本発明は、ヒューマン・インターフェース一般に関連し、より具体的には、画像認識を利用して人間の顔向きと視線方向を認識する技術に関する。
【０００２】
【従来の技術】
人間の視線の動きは、人間の意図するものや注意するものに深い関係があり、この視線の動きをキーボードやマウスなどの入力デバイスの代わりに利用する研究が進められている。この様な次世代ヒューマン・インターフェースは、カメラによって人間の行動を撮影し、人間の意図や注意を認識する高度なインターフェースとして構築される。
【０００３】
視線認識によるインターフェースでは、視線の動きが顔の動きに追従して動く場合が多いので、視線の動きを検出するのと同時に顔の向きを検出することが好ましい。この様な顔向きと視線方向を同時に検出する顔・視線認識装置は、松本ほかによる「顔・視線計測システムの開発と動作認識への応用」（第５回ロボティクス・シンポジア、２０００／３／２６、２７）の論文に示されている。
【０００４】
【発明が解決しようとする課題】
松本ほかにより提案された顔・視線認識装置では、ステレオカメラにより撮影された画像フレームから人間の顔の向きを３次元的に検出し、その後で顔の向きに基づいて視線方向を検出している。視線方向を検出した後では、新たに撮影された画像フレームを使用して、同様の顔向き検出と視線方向検出が繰り返される。この顔向きと視線方向の検出は、ビデオカメラによる画像フレームの撮影速度に応じた速度で繰り返されて、顔向きと視線方向のリアルタイムでの追従が可能とされる。
【０００５】
この様なリアルタイムでの顔向き・視線方向の追従が高速な画像処理を必要とするので、無駄な演算時間が発生した場合、顔・視線認識装置は顔向き・視線方向をリアルタイムで追従することができなくなる。したがって、顔・視線認識装置では、誤認識を減少させ、精度を向上させることが望ましい。
【０００６】
本発明は、リアルタイムでの顔向き・視線方向の追従において、高速な画像処理を実現するために、無駄な演算時間を発生させるエラーを可能な限り低減することを目的とする。
【０００７】
【課題を解決するための手段】
上記課題を解決するために、本発明の顔・視線認識装置は、ユーザの顔を撮影する複数のカメラと、前記カメラの画像出力から顔の向きを検出する検出手段と、前記カメラの画像出力に撮影されている目周辺の画像領域から前記ユーザの目が開いているかどうかを検出する手段と、前記ユーザの目が開いていることに応答して、前記カメラの画像出力から前記ユーザの視線方向を検出する手段と、を備えるよう構成される。
【０００８】
この発明によると、ユーザの視線方向を検出する前にユーザの目の開閉を検出するので、視線方向の検出におけるエラーを回避することができる。
【０００９】
この発明の１つの形態によれば、前記目が開いているかどうかを検出する手段は、前記目周辺の画像領域に含まれる水平方向エッジを検出し、該画像領域に含まれている水平方向エッジの割合に応じて、目が開いているかどうかを検出するよう構成される。
【００１０】
この形態によると、目が開いている場合には目周辺の画像領域に縦や斜めのエッジが多く含まれているが、目が閉じている場合には水平方向のエッジが比較的多く含まれているので、水平方向のエッジを検出し、その割合を調べることにより、目が開いているかどうかを検出することができる。
【００１１】
この発明の１つの形態によれば、前記顔向き検出手段は、特徴的な顔の部分に相当する１つまたは複数の特徴点のそれぞれについて用意された複数のテンプレートから、顔向きに応じて各特徴点のために１つのテンプレートを選択する手段と、前記選択されたテンプレートをそれぞれ使用して、前記画像出力から前記特徴点に対応する１つまたは複数の画像領域を抽出する手段と、を備え、前記抽出された１つまたは複数の画像領域に基づいて、前記ユーザの顔向きを検出するよう構成される。
【００１２】
この形態によると、各特徴点のために複数のテンプレートの中から顔向きに応じて最適なテンプレートを選択し、その選択されたテンプレートを使用してテンプレート・マッチングを実行するので、テンプレート・マッチングでのエラーを低減することができる。
【００１３】
この発明の１つの形態によれば、前記テンプレートの選択手段は、前回の画像出力から検出された顔向きに基づいて、今回の画像出力のために前記複数のテンプレートから１つのテンプレートを選択するよう構成される。
【００１４】
この形態によると、前回の画像出力と今回の画像出力が連続的な画像フレームであり、前回の画像における顔向きと今回の画像における顔向きとが比較的相関が高いので、今回の画像における顔向きに比較的近い顔向きに対応したテンプレートを複数のテンプレートから選択することができる。
【００１５】
この発明の１つの形態によれば、前記顔・視線認識装置において、前記画像領域抽出手段は、前回の画像出力から検出された顔向きに基づいて今回の画像出力に撮影されていない特徴点を判断し、撮影されていない特徴点に対応する画像領域の抽出を処理しないよう構成される。
【００１６】
この発明の１つの形態によれば、今回の画像出力に撮影されていない特徴点に関するテンプレート・マッチングを回避することができるので、顔・視線認識装置のエラーを回避することができる。
【００１７】
この発明の１つの形態によれば、前記顔・視線認識装置の前記カメラの画像出力は、近赤外画像であり、前記視線方向の検出手段は、目周辺の画像の明暗から瞳孔の位置を検出し、検出された瞳孔の中心位置と眼球の中心位置から視線方向を検出するよう構成される。
【００１８】
この形態によると、近赤外画像に撮影されている瞳孔が虹彩との反射率の違いにより比較的暗く撮影されるので、目周辺の画像から最も暗い部分を検出することにより瞳孔が撮影されている領域を検出することができる。
【００１９】
【発明の実施の形態】
次に本発明の実施例を図面を参照して説明する。図１は、顔・視線認識装置におけるハードウェア構成の１つの実施形態を示す。この実施形態では、顔・視線認識装置はコンピュータで構成されるが、必ずしもこの様なハードウェア構成に限定されない。
【００２０】
図１の実施形態の顔・視線認識装置は、２個のビデオカメラ（右カメラ１１および左カメラ１３）、２個のカメラ・コントロール・ユニット（１５、１７）、画像処理ボード１９、ＩＲ投光機１４、パーソナル・コンピュータ２１、モニタ・ディスプレイ２３、キーボード２５、およびマウス２７を含む。
【００２１】
２個のビデオカメラは、撮影対象の人の前方左右に設置され、撮影対象の顔をステレオ視して撮影する。各ビデオカメラ（１１、１３）は、カメラ・コントロール・ユニット（１５、１７）を介してそれぞれ制御される。各カメラ・コントロール・ユニットは、外部同期信号線を介して相互接続されており、この同期信号によって左右のビデオカメラの同期がとられ、左右の位置で同じ時刻に撮影された２つの画像フレームが得られる。顔・視線認識装置は、左右の位置で同じ時刻に撮影された２つの画像フレームを入力画像として使用し、ステレオ法を用いて３次元的な物体認識を処理することができる。
【００２２】
赤外線投光機１４は、近赤外光を顔に照射するよう被写体の前面に設置され、車内の照明変動による画像の劣化を低減する。このため、ビデオカメラは、近赤外透過フィルタ２９などにより近赤外光以外の波長を遮断された状態で被写体を撮影する。
【００２３】
近赤外光を照明として使用する第１の理由は、照明変動に対する画像のロバスト性を向上させることにある。一般に撮影対象の周囲の明るさは、屋内外、または日中や夜間などの環境変化によって大きく変動する。また、強い可視光が１方向から顔にあたる場合には、顔面上に陰影のグラデーションが発生する。この様な照明の変動や陰影のグラデーションは、画像認識の精度を著しく悪化させる。
【００２４】
この実施例では、正面から赤外線投光機１４により近赤外光を照射して画像を撮影することによって、周囲からの可視光による顔面上の陰影のグラデーションを低減する。この様な近赤外画像は、可視光を使用して得られる画像と比較して照明変化による影響を受けにくく、画像認識の精度を向上させることができる。
【００２５】
近赤外光を使用する第２の理由は、目の瞳孔を明瞭に抽出することが可能な点にある。瞳の位置が視線方向を検出するために使用されるので、瞳を明瞭に撮影することは重要である。
【００２６】
画像処理ボード４７は、ビデオカメラで撮影された画像を様々に処理する。例えば、各ビデオカメラで撮影された画像がＮＴＳＣ方式のビデオ信号として送られてくる場合、画像処理ボード４７は、それらの画像を適当なフォーマットのクラスタ画像に変換し、内部のバッファメモリに記憶する。さらに、画像処理ボード４７は、画像処理アルゴリズムを実行するハードウェア回路を備えており、画像処理を高速に実行することができる。例えば、ハードウェア回路による画像処理アルゴリズムには、斜方投影機構、ハフ変換、２値画像マッチングフィルタ、アフィン変換（画像の回転、拡大、縮小）などの処理が含まれる。
【００２７】
画像処理ボード１９は、任意のインターフェース（例えばＰＣＩバス、シリアルバス、ＩＥＥＥ１３９４など）を介してパーソナル・コンピュータ２１に接続され、パーソナル・コンピュータ２１上のプログラムに応じて制御される。パーソナル・コンピュータ２１は、モニタ・ディスプレイ２３、キーボード２５、マウス２７などのユーザ・インターフェースなどを備え、「Ｌｉｎｕｘ」として知られるＯＳを使用して動作する。
【００２８】
図２は、図１に示すハードウェアによって実施される顔・視線認識装置の機能ブロック図を示す。図２の参照番号３１は撮影対象となる顔を示している。画像入力部３３は、図１で示す左右のビデオカメラ（１１、１３）、カメラ・コントロール・ユニット（１５、１７）、画像処理ボード１９を総合的に示している。この画像入力部３３は、撮影対象の顔３１を連続的にステレオ撮影し、それらの画像をクラスタ化して図２の画像処理部３５に提供する。
【００２９】
画像処理部３５は、顔探索部３７、顔トラッキング部３９、まばたき検出部４１、視線検出部４３を含み、提供された画像に撮影されている顔から顔向きと視線方向をリアルタイムで検出する。
【００３０】
顔探索部３７は、画像全体から顔が撮影されている領域を探索し、顔トラッキングの最初の初期化とエラー回復のために使用される。顔トラッキング部３９は、顔の特徴点を抽出し、撮影されている顔の向きをリアルタイムで検出する。まばたき検出部４１は、目周辺の画像を解析し、目が閉じているかどうかを判断する。視線検出部４３は、瞳孔を検出し、瞳孔の位置と眼球の位置から視線方向をリアルタイムで検出する。
【００３１】
図３は、画像処理部３５の全体的なフローチャートを示す。顔探索部３７、顔トラッキング部３９、まばたき検出部４１、視線検出部４３は、それぞれ関連して動作し、連続的に撮影される左右の入力画像から顔向きと視線方向をリアルタイムで検出することができる。
【００３２】
図３のフローチャートでは、顔探索部３７の処理がステップ１０１から１０３で示され、顔トラッキング部３９の処理がステップ１０５から１１３で示され、まばたき検出部４１の処理がステップ１１５から１１７で示され、視線検出部４３の処理がステップ１１９から１２３で示される。以下では、このフローチャートを参照して画像処理部３５の各機能ブロックの処理を説明する。
【００３３】
顔探索部３７
図３を参照して顔探索部３７の処理を説明する。顔探索部３７は、入力された画像から人間の顔が撮影されている画像領域をおおまかに探索する。ここでの処理は、顔トラッキング部３９のための前処理ともいえる。顔探索部３７が、顔トラッキング部３９の処理の前に、入力画像から顔が撮影されている領域をおおまかに探索することにより、顔トラッキング部３９は、入力画像中の顔の詳細な解析を高速に実行することができる。
【００３４】
最初に、ステップ１０１で画像入力部３３から左右のビデオカメラの画像が入力され、入力画像全体から人間の顔が撮影されている領域がおおまかに探索される。これは、予め記憶された探索用テンプレート５１を使用して２次元テンプレート・マッチングで実行される。
【００３５】
図４は、探索用テンプレート５１の例を示す。探索用テンプレート５９に使用される画像は、正面を向いた人間の顔を部分的に切り取った画像であり、この画像には目、鼻、口などの人間の顔の特徴的な領域が１つのテンプレートに含まれている。この探索用テンプレート５１は、テンプレート・マッチングでの処理速度を高めるために、予め低解像度化されており、さらに照明変動の影響を低減するために微分画像にされている。このテンプレートは、複数のサンプルから作成されて予め記憶されている。
【００３６】
ステップ１０１での探索は、２次元的なテンプレート・マッチングであるので、右ビデオカメラ１１の画像かまたは左ビデオカメラ１３の画像のどちらかが使用される。以下では、右ビデオカメラ１１の画像を使用したテンプレート・マッチングを例として述べる。
【００３７】
右ビデオカメラ１１の画像を使用したテンプレート・マッチングの場合、右ビデオカメラ１１の画像から探索用テンプレート５１に対応する画像領域が探索され抽出される。次に、ステップ１０３において、マッチした右画像内の画像領域をテンプレートにして、同様のテンプレート・マッチングが左画像に対して実行され、そのステレオ・マッチングの結果から顔全体の３次元位置がおおまかに求められる。この様にして得られた画像情報は、顔トラッキング部３９における各特徴点の探索範囲を設定するために使用される。
【００３８】
顔トラッキング部３９
顔トラッキング部３９は、前もって得られた画像情報に基づいて顔の特徴点を入力画像から抽出し、それらの特徴点から顔の３次元位置と顔の向きを求める。以下では、顔トラッキング部３９が入力画像から特徴点を抽出する方法に関して説明する。
【００３９】
顔トラッキング部３９は、テンプレート・マッチングにより入力画像から顔の特徴点を探索する。この探索に使用されるテンプレートは、データベース４７に予め記憶されている３次元顔特徴点モデル６９の画像を使用する。図５は、３次元顔特徴点モデル６９の例を示す。
【００４０】
本実施例における３次元顔特徴点モデル６９は、正面を向いた人間の顔の特徴的な部分を画像から局所的に切り取った部分的画像（５３〜６７）から生成される。例えば、これらの部分的画像は、図５に示すように、左の眉頭５３、右の眉頭５５、左の目尻５７、左の目頭５９、右の目尻６１、右の目頭６３、口の左端６５、口の右端６７などのように予め用意された顔画像から局所的に切り取られて生成される。これらの部分的画像のそれぞれは、その画像内で撮影されている対象物（この例では、左右の眉頭、左右の目尻と目頭、および口の両端）の３次元位置を表す３次元座標に関連付けられ、データベース４７に記憶されている。本明細書では、これらの３次元座標を有した顔特徴領域の部分的画像を顔特徴点と呼び、これらの複数の顔特徴点から生成される顔モデルを３次元顔特徴点モデル６９と呼ぶ。３次元顔特徴点モデル６９は、複数のサンプルから生成されデータベース４７に記憶されている。
【００４１】
顔トラッキング部３９は、３次元顔特徴点モデル６９の各部分的画像をテンプレートにしてそれぞれ対応する特徴点を入力画像から抽出する。このテンプレート・マッチングは、右ビデオカメラの画像と左ビデオカメラの画像のどちらを使用しても構わないが、この実施例では、右ビデオカメラの画像を使用している。このテンプレート・マッチングの結果得られる画像は、撮影された顔の左右の眉頭、左右の目頭と目尻、口の両端の計８個の画像である。
【００４２】
図３のフローチャートを参照してこの抽出処理を説明すると、最初に、ステップ１０５で各特徴点の探索範囲が設定される。この探索範囲の設定は、前もって得られた画像情報に基づいて行われる。例えば、ステップ１０３の後にステップ１０５が処理される場合、入力画像における顔全体の領域が既に分かっているので（ステップ１０１で検出されているので）、入力画像において各特徴点が存在している領域もおおまかに分かる。ステップ１１７またはステップ１２３の後にステップ１０５が処理される場合には、前回のループで検出された各特徴点（前回の入力画像における各特徴点）の情報から、今回の入力画像において各特徴点が存在している領域がおおまかに予測できる。したがって、各特徴点が存在する可能性が高い画像領域だけを各特徴点の探索範囲として設定することができ、この各特徴点の探索範囲の設定により、テンプレート・マッチングを高速に処理することが可能になる。
【００４３】
ステップ１０７で、各特徴点の探索範囲に基づいて３次元顔特徴点モデル６９に対応する画像領域が右ビデオカメラの画像から探索される。これは、３次元顔特徴点モデル６９の各特徴点の画像をテンプレートとし、右ビデオカメラ１１の画像に対してテンプレート・マッチングを行うことにより実行される。
【００４４】
ステップ１０９では、ステップ１０７の探索から得られた各特徴点の画像をテンプレートにして左ビデオカメラ１３の画像に対してステレオ・マッチングが実行される。これにより、３次元顔特徴点モデル６９の各特徴点に対応する入力画像の各特徴点の３次元座標が求められる。このステレオ・マッチングの結果、顔の左右の眉頭、左右の目尻と目頭、口の両端の３次元座標（観測点）がそれぞれ得られる。
【００４５】
ステップ１１１で、３次元顔特徴点モデル６９を使用して３次元モデル・フィッティングが実行され、顔の向きが検出される。以下ではこの３次元モデル・フィッティングを説明する。
【００４６】
先に述べたように、３次元顔特徴点モデル６９は、正面を向いた顔の特徴点から生成されている。それに対して入力画像で撮影されている顔は、必ずしも正面を向いているとは限らない。入力画像に撮影されている顔が正面を向いていない場合、ステップ１１１で得られた入力画像の各特徴点の３次元座標（観測点）は、３次元顔特徴点モデル６７の各特徴点の３次元座標から任意の角度と変位だけずれを有している。したがって、正面を向いた３次元顔特徴点モデル６７を任意に回転、変位させたときに、入力画像の各特徴点に一致する角度と変位が入力画像中の顔の向きと位置に相当する。
【００４７】
３次元顔特徴点モデル６７を任意に回転、変位させて、入力画像の各特徴点にフィッティングさせた場合、フィッティング誤差Ｅは、下記の式で表される。
【００４８】
【数１】

【００４９】
ここで、Ｎが特徴点の数であり、ｘ_ｉがモデル内の各特徴点の３次元座標であり、ｙ_ｉが入力画像からの各特徴点の３次元座標を表す。ω_ｉは、各特徴点に関する重み付け係数であり、入力画像から特徴点の３次元位置を求めたときのステレオ・マッチングにおける相関値を利用する。この相関値を利用することによって、それぞれの特徴点の信頼度を考慮することができる。回転行列は、Ｒ（φ，θ，ψ）であり、並進ベクトルは、ｔ（ｘ，ｙ，ｚ）で表され、これらが、この式における変数となる。
【００５０】
したがって、上記の式におけるフィッティング誤差Ｅを最小にする回転行列Ｒと並進ベクトルｔを求めれば、入力画像の顔向きと顔位置が求められる。この演算は、最小二乗法または仮想バネモデルを使用したフィッティング手法などを利用することによって実行される。
【００５１】
ステップ１１３では、ステップ１１１で顔の向きが正しく検出されたかどうかが判定される。もし顔の向きが正しく検出されなかったと判定された場合、ステップ１０１に戻り、新しい入力画像を使用して一連の処理が繰り返される。
【００５２】
図６は、より詳細な顔トラッキング部３９のフローチャートを示している。このフローチャートは、基本的には図３に示す顔トラッキング部３９の処理と同一であるが、各特徴点のテンプレート・マッチング（ステップ１０７）をより詳細に示している。
【００５３】
図６のフローチャートでは、１つの特徴点に対して複数のテンプレートを使用するよう示されている。１つの特徴点に対する複数のテンプレートは、テンプレート・マッチングにおけるエラーを低減させ、顔向き検出の精度を向上させるために使用される。さらに、このフローチャートでは、カメラに撮影されていない特徴点を予測し、撮影されていない特徴点のテンプレート・マッチングを行わないよう処理している。
【００５４】
最初に、１つの特徴点に対して複数のテンプレートを使用する顔トラッキング部３９の処理を説明する。
【００５５】
３次元顔特徴点モデル６９における各特徴点の画像に撮影されている対象物（左右の眉頭、左右の目頭と目尻、口の両端など）は、平面ではなく立体である。したがって、その見え方（すなわち撮影されている対象物の状態）は、顔向きや傾きに応じて変化する。このため、単一のテンプレートだけでテンプレート・マッチングを行う場合、入力画像がそのテンプレートとは異なる見え方をしているときにテンプレート・マッチングでエラーを生じる。
【００５６】
例えば正面の顔の画像から作成された特徴点のテンプレートだけをテンプレート・マッチングに使用する場合、入力画像の顔が斜めを向いているときにエラーが生じることがある。この様な各特徴点の見え方の違いで生じるエラーを回避するために、各顔向きの画像から作成された各特徴点のテンプレートが使用される。
【００５７】
１つの顔特徴点に対して複数のテンプレートを使用する場合、前回の入力画像における顔の情報（顔の向き）に基づいて、今回のテンプレート・マッチングで使用するテンプレートが選択される。すなわち、前回の入力画像における顔の情報に基づいて、予め用意された複数のテンプレートから最適なテンプレートが選択され、選択されたテンプレートが今回のフレームにおける特徴点のテンプレート・マッチングに使用される。
【００５８】
図７は、１つの特徴点に対する複数のテンプレートを示す図であり、具体的には、右目尻の特徴点に対する複数のテンプレートの例を示す図である。図７のａは、頭部がカメラに対して左右方向を向いたときの状態を示し、図７のｂは、頭部がカメラに対して正面を向いているときの状態を示し、図７のｃは、頭部がカメラの光軸に対して回転したときの状態を示す。図７のａに対応する右目尻のテンプレートが参照番号７１で示され、図７のｂに対応する右目尻のテンプレートが参照番号６１で示され、図７のｃに対応する右目尻のテンプレートが参照番号７３で示されている。
【００５９】
図７を参照して分かるように、同じ右目尻の画像であっても、画像における見え方は、頭部の姿勢に応じて変化する。このため、顔トラッキング部３９は、頭部の姿勢に応じた複数のテンプレートを予めデータベース４７に記憶しておき、その複数のテンプレートから１つのテンプレートを選択して使用する。例えば、図７の例では、頭部が左右方向に回転した状態に対して３種類のテンプレートを用意し、頭部が光軸に対して回転した状態に対しても３種類のテンプレートを用意している。したがって、右目尻の特徴点のために、計９（３×３）個の右目尻のテンプレートの集合が使用される。
【００６０】
これらのテンプレートの集合のうち、テンプレート・マッチングで実際に使用されるテンプレートは１つだけである。このテンプレートの選択は、前回の入力画像の顔の情報に基づいて決められる。前回の入力画像と今回の入力画像が連続した画像フレームであるので、前回の頭部の姿勢と今回の頭部の姿勢は、比較的相関が高いはずである。したがって、図６のステップ２０１で、前回の入力画像に撮影されていた頭部の姿勢が取得され、ステップ２０３で、その頭部の姿勢に対応するテンプレートが選択される。対応するテンプレートが選択された後で、そのテンプレートを使用して今回の入力画像に対してテンプレート・マッチングを行うので、エラーを低減することができる。
【００６１】
上記では、計９個の右目尻のテンプレートの集合を例として述べた。しかしながら、他の特徴点に関しても複数のテンプレートが用意され、その中から前回の画像における顔の情報に応じて１つのテンプレートがそれぞれ使用される。各特徴点のためのテンプレートの数は、必要に応じていくつ用意してもよい。
【００６２】
次に、カメラに撮影されていない特徴点を予測する処理に関して詳細に説明する。これらの処理は、図６のフローチャートのステップ２０５から２０９で処理される。
【００６３】
図８および図９を参照して、カメラに撮影されていない特徴点の予測の概要を説明する。図９は、カメラに対して正面を向いた顔３１の正面図と上面図を示している。この図では便宜的に１個のカメラしか示されていないが、実際には、ステレオカメラを構成する２個のカメラが存在し、この処理は２個のカメラそれぞれに関して実行される。
【００６４】
図８では、カメラの設置位置に基づいて基準位置が定められる。この基準位置と各特徴点の３次元座標とを結ぶベクトルを各特徴点の「位置ベクトル」と呼ぶことにする。さらに、各特徴点の座標を結んで得られる曲面に対する各特徴点の法線方向のベクトルを各特徴点の「法線ベクトル」と呼ぶ。各特徴点がカメラに撮影されるかどうかは、各特徴点について「位置ベクトル」と「法線ベクトル」とがなす角度θによって判断することができる。
【００６５】
例えば、図８の上面図におけるｘ−ｚ平面に関して考察する。この場合、各特徴点について位置ベクトルと法線ベクトルとがなす角度θは、９０°より十分小さい。したがって、各特徴点全てがカメラで撮影することが可能である。しかしながら、顔が横向きである図９の上面図の場合、特徴１について位置ベクトルと法線ベクトルとがなす角度θ_１が、ほぼ９０°になる。この場合、カメラは、特徴１を撮影することが出来なくなり、この特徴１に関するテンプレート・マッチングがエラーを生じる可能性が高くなる。
【００６６】
したがって、顔トラッキング部３９は、各特徴点ごとに位置ベクトルと法線ベクトルを求め、それらのベクトルがなす角度θを求める。この各特徴点のθが予め定められたしきい値より大きい場合、それに対応する特徴点は、カメラによって撮影されていないと判断される。結果として、顔トラッキング部３９は、その特徴点に関するテンプレート・マッチングを行わない。
【００６７】
上記の例では、ｘ−ｚ平面について説明したが、同様の処理は、ｘ−ｙ平面についても処理される。さらに、ステレオカメラを構成する２個のカメラのそれぞれについて、この処理が実行される。これにより、頭部の姿勢によって撮影されていない特徴点によって生じるエラーを回避することができる。
【００６８】
図６のフローチャートを参照して説明すると、ステップ２０５で前回の入力画像の頭部位置情報から各特徴点について位置ベクトルと法線ベクトルが求められる。ステップ２０７で各特徴点の位置ベクトルと法線ベクトルとがなす角度θが求められる。次に、ステップ２０９で、求められた角度θが予め定められたしきい値と比較され、テンプレート・マッチングを実行する特徴点が選択される。ステップ２１１で、今回の画像フレームに対して、選択されたテンプレートを使用してテンプレート・マッチングが実行され、その結果に基づいて、ステップ２１３で各特徴点のステレオ・マッチングが実行される。最終的にステップ２１５で３次元顔特徴点モデル６３に対して、３次元観測値とのフィッティングが行われることによって、入力画像に撮影されている顔向きが検出される。
【００６９】
まばたき検出部４１
まばたき検出部４１は、視線検出部４３のために、入力画像から目周辺の画像を抽出して目が閉じているかどうかを判断する。もし目が閉じられている場合、視線方向を検出する意味がないので顔トラッキング部３９に戻るよう処理される。
【００７０】
図１０は、まばたき検出部４１の処理（ステップ１１５）を詳細に示すフローチャートである。最初にステップ３０１で、入力画像において目が存在している領域が左右それぞれについて求められる。これは、顔トラッキング部３９で得られた左右の目尻と目頭の特徴点の情報に基づいて行われる。例えば、左右それぞれの目について、目全体を含む画像領域が求められる。次に、ステップ３０３で、その求められた領域から画像が抽出される。
【００７１】
図１１は、入力画像から抽出された右目の領域７５を示す。顔トラッキング部３９で検出された右目頭の特徴点は参照番号６３で示されており、右目尻の特徴点は参照番号６１で示されている。この例では、入力画像から抽出される目領域の範囲は、幅方向が目頭の特徴点から目尻の特徴点までであり、高さ方向が特徴点の高さの倍の長さである。
【００７２】
ステップ３０５で、目が開いているかどうかを判断するために目領域の画像から水平な直線が検出される。すなわち、目が開いている状態で撮影された入力画像の場合、抽出された目領域の画像には、虹彩や目の輪郭により生じる縦や斜めのエッジが多く含まれている。それに対して目が閉じている状態で撮影された入力画像の場合、閉じたまぶたによって生じる水平なエッジが比較的多く含まれている。したがって、目領域の画像からエッジ検出を行い、その目領域に含まれるエッジの種類（縦、斜め、水平など）の割合から目が閉じているかどうかを判断することができる。エッジの種類は、例えばハフ変換などの線分当てはめを行い、画像中に存在する直線群を検出することによって求められる。画像中に存在する直線群において水平とみなせる直線の割合が予め定めたしきい値より多く存在している場合、まばたき検出部４１は、ステップ３０７で目を閉じていると判断する。
【００７３】
この実施例では、左右両方の目についてそれぞれ目が開いているかどうかが検出される。左右どちらかの目が閉じられていると判断された場合、視線方向の検出には進まず、新たな画像フレームを使用して顔向き検出を処理する。
【００７４】
視線検出部４３
図１２は、視線検出部４３の詳細なフローチャートを示す。視線検出部４３は、顔トラッキング部３９で得られた顔の位置と向きに基づき、入力画像から視線方向を検出する。
【００７５】
視線検出部４３では、人の眼球は、眼球の中心が回転中心と一致する３次元的な球でモデル化される。視線方向は、顔トラッキング部３９で検出された頭部の位置および姿勢、並びに瞳孔の中心位置の関係で求められる。すなわち、視線検出部４３で検出される視線方向は、眼球の中心位置と瞳孔の中心位置とを結ぶベクトルとして求められる。
【００７６】
図１２のステップ４０１で、視線検出部４３は、顔トラッキング部３９で検出された顔の位置と向きから眼球の中心位置を求める。図１３を参照してこれを詳細に説明する。
【００７７】
図１３の参照番号７７は、顔トラッキング部３９で検出された目尻の特徴点の３次元座標を示しており、参照番号７７は、顔トラッキング部３９で検出された目頭の特徴点の３次元座標を示している。最初に、この２つの座標を結ぶ直線を得て、その直線の中点から眼球の中心方向に向かう直線が求められる。この明細書では、その中点から眼球の中心方向に向かうベクトルを「オフセット・ベクトル」と呼び、顔トラッキング部３９で得られた顔の向きに基づいて定める。眼球の中心位置８１は、その中点からオフセット・ベクトルに沿って引かれた直線上に存在し、中点から眼球の半径に相当する距離上に存在する。眼球の半径は、標準的な眼球の大きさに基づいて予め定められる。
【００７８】
次に、ステップ４０３で、画像から瞳孔の中心位置８３が検出される。先に述べたように、この実施例では、近赤外光を使用して撮影された近赤外画像が使用されている。この様な近赤外画像に撮影されている瞳孔は、虹彩との反射率の違いにより比較的暗く撮影される。したがって、目周辺の画像から最も暗い部分を検出することにより瞳孔が撮影されている領域を検出することができる。
【００７９】
ステップ４０５で、眼球の中心位置８１と瞳孔の中心位置８３とを結ぶベクトルから視線方向が求められる。図１４は、この様にして求められた視線方向を水平面および垂直面に対する角度として示している。図１４では、眼球の中心位置が参照番号８１で示され、眼球表面上に存在する瞳孔の中心が参照番号８２に示されている。図１４のａは、画像上平面をｘｙ座標とした場合に眼球の中心位置８１と瞳孔の中心位置８３とを結ぶベクトル（視線ベクトル）を示している。カメラの光軸方向をｚ軸とすると、図１４のａに対応する側面図は図１４のｂで示される。垂直面（この場合ｙｚ平面）に対する視線方向は、視線ベクトルがｘｚ平面に対してなす角度８５で表される。図１４のｃは、図１４のａの上面図を示している。この場合、水平面（ｘｚ平面）に対する視線方向は、視線ベクトルがｙｚ平面に対してなす角度８７で表される。
【００８０】
この実施例では、入力画像として左右の画像を使用しているので、右画像、左画像それぞれに対して視線ベクトルを求めることができる。さらに、１つの画像につき左右両方の視線ベクトルを求めることができるので、合計４つの視線ベクトルを求めることができる。本実施例では、この４つの視線ベクトルを平均したベクトルを入力画像の視線方向として使用する。
【００８１】
図３のステップ１２３で、顔の視線方向が検出された後で、ステップ１０５に戻り、新たな入力画像を使用して一連の処理が繰り返される。この繰り返しの結果、ドライバーの顔向き、顔位置、視線方向の連続的な追従をリアルタイムで実行することが可能になる。
【００８２】
他の実施形態
上記の顔・視線認識装置の実施形態では、コンピュータで構成されたハードウェア構成が説明されたが、本発明はこの様な実施形態に限定されない。図１５は、本発明による顔・視線認識装置を備えた自動車の１つの実施形態を示す。図１５の自動車は、画像入力部３３、サイドミラー９１、ルームミラー９３、制御装置９５、赤外線投光機１４を備える。
【００８３】
図１６は、図１５に示す顔・視線認識装置を備えた自動車の機能ブロック図を示す。この機能ブロック図には、画像入力部３３、赤外線投光機１４、画像解析部３５、個人識別部９６、環境設定部９７、サイドミラー・アクチュエータ９８、ルームミラー・アクチュエータ９９、シート・アクチュエータ１００が含まれる。
【００８４】
この実施形態における自動車は、図１６に示す各機能ブロックを使用して２種類の動作モードを処理する。第１の動作モードは、画像解析部３５が実行する顔向き・視線検出モードであり、このモードは、ドライバーの存在を検出して顔向きと視線方向の状態を連続的に検出する。第２の動作モードは、個人認証モードであり、このモードは、運転席に座っているドライバーを特定して、そのドライバーに合わせてミラーやシートなどの環境設定を実行する。
【００８５】
図１５に示す自動車は、通常、顔向き・視線検出モードで動作しており、ドライバーが運転席にいるかどうかを監視している。ドライバーが運転席にいる場合、ドライバーの顔向きと視線方向が常に画像解析部３５により検出される。自動車は、監視された顔向きと視線方向に基づいてドライバーの状態を判断し、それに応じた様々な処理を実行することができる。
【００８６】
個々のドライバーの情報は、データベース９２に予め登録されている。登録されているドライバー情報は、個々のドライバーの顔のデータ、個々のドライバーに対応する環境設定情報などである。顔のデータは、画像入力部３３で撮影された入力画像との照合のために個人識別部９６によって使用される。この実施形態では、個人識別部９６による個人認証の前に、画像解析部３５がドライバーの顔の位置や向き、視線方向などの情報を取得しているので、それらの顔の情報に応じた個人認証を実行することができる。
【００８７】
例えば、通常、ドライバーが斜め方向を向いている場合、個人認証の精度が低下する。しかしながら、この実施形態では、画像解析部３５がドライバーの顔の位置や向き、視線方向を前もって検出しているので、その様な顔の情報に応じた個人認証を実行することができる。
【００８８】
個人識別部９６によって運転席にいるドライバーが特定された場合、環境設定部９７は、登録された設定値を参照して、個々のドライバーのためにサイドミラー・アクチュエータ９８、バックミラー・アクチュエータ９９、シート・アクチュエータ１００を制御する。
【００８９】
以上この発明を特定の実施例について説明したが、この発明はこのような実施例に限定されるものではなく、当業者が容易に行うことができる種々の変形もこの発明の範囲に含まれる。
【００９０】
【発明の効果】
本発明は、リアルタイムでの顔向き・視線方向の追従において、誤認識が減少し、高精度な画像処理を実現させることができる。
【図面の簡単な説明】
【図１】コンピュータで構成される顔・視線認識装置の実施例。
【図２】顔・視線認識装置の機能ブロック図の実施例。
【図３】画像処理部の全体的なフローチャート。
【図４】探索用テンプレートの例。
【図５】３次元顔特徴点モデルの例。
【図６】顔トラッキング部のフローチャート。
【図７】複数のテンプレートを示す図。
【図８】カメラに撮影されていない特徴点の予測方法を示す図。
【図９】カメラに撮影されていない特徴点の予測方法を示す図。
【図１０】まばたき検出部のフローチャート。
【図１１】入力画像から抽出された右目の領域を示す図。
【図１２】視線検出部のフローチャート。
【図１３】視線検出部の処理を模式的に示す図。
【図１４】視線検出部により検出された視線方向を水平面および垂直面に対する角度として示した図。
【図１５】顔・視線認識装置を備えた自動車の実施例。
【図１６】図１３に示す顔・視線認識装置を備えた自動車の機能ブロック図。
【符号の説明】
１４赤外線投光機
３３画像入力部
３５画像解析部
３７顔探索部
３９顔トラッキング部
４１まばたき検出部
４３視線検出部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a human interface in general, and more specifically to a technique for recognizing a human face direction and a line-of-sight direction using image recognition.
[0002]
[Prior art]
Human eye movements are closely related to human intentions and cautions, and research is underway to use these eye movements instead of input devices such as keyboards and mice. Such a next-generation human interface is constructed as an advanced interface that captures human actions with a camera and recognizes human intentions and attention.
[0003]
In an interface based on gaze recognition, since the movement of the gaze often follows the movement of the face, it is preferable to detect the orientation of the face at the same time as detecting the movement of the gaze. Such a face / gaze recognition device that simultaneously detects the face direction and the gaze direction is described in “Development of Face / Gaze Measurement System and Application to Motion Recognition” by Matsumoto et al. (5th Robotics Symposia, 2000/3/26 27).
[0004]
[Problems to be solved by the invention]
In the face / line-of-sight recognition device proposed by Matsumoto et al., The direction of a human face is detected three-dimensionally from an image frame taken by a stereo camera, and then the line-of-sight direction is detected based on the direction of the face. . After detecting the gaze direction, the same face orientation detection and gaze direction detection are repeated using the newly captured image frame. The detection of the face direction and the line-of-sight direction is repeated at a speed corresponding to the image frame shooting speed by the video camera, and the face direction and the line-of-sight direction can be tracked in real time.
[0005]
Since tracking of the face direction / gaze direction in this way requires high-speed image processing, the face / gaze recognition device should follow the face direction / gaze direction in real time if unnecessary computation time occurs. Can not be. Therefore, in the face / gaze recognition device, it is desirable to reduce misrecognition and improve accuracy.
[0006]
An object of the present invention is to reduce as much as possible errors that generate unnecessary computation time in order to realize high-speed image processing in real-time tracking of the face direction and the line-of-sight direction.
[0007]
[Means for Solving the Problems]
In order to solve the above-described problems, a face / gaze recognition apparatus according to the present invention includes a plurality of cameras that capture a user's face, a detection unit that detects a face orientation from an image output of the camera, and an image output of the camera. Means for detecting whether or not the user's eyes are open from an image area around the eye being photographed, and in response to the user's eyes being opened, from the image output of the camera, Means for detecting a direction.
[0008]
According to this invention, since the opening / closing of the user's eyes is detected before the user's line-of-sight direction is detected, errors in the detection of the line-of-sight direction can be avoided.
[0009]
According to one aspect of the present invention, the means for detecting whether or not the eye is open detects a horizontal edge included in an image area around the eye, and the horizontal edge included in the image area Configured to detect whether the eyes are open according to the percentage of
[0010]
According to this form, when the eyes are open, the image area around the eyes contains many vertical and diagonal edges, but when the eyes are closed, there are relatively many horizontal edges. Therefore, it is possible to detect whether or not the eyes are open by detecting the edge in the horizontal direction and examining the ratio.
[0011]
According to one aspect of the present invention, the face direction detecting means can select each of one or more feature points corresponding to a characteristic face portion from a plurality of templates prepared according to the face direction. Means for selecting one template for feature points, and means for extracting one or more image regions corresponding to the feature points from the image output, each using the selected template. , Configured to detect the face orientation of the user based on the extracted one or more image regions.
[0012]
According to this mode, since an optimal template is selected from a plurality of templates according to the face orientation for each feature point, and template matching is executed using the selected template, template matching is performed. Errors can be reduced.
[0013]
According to one aspect of the present invention, the template selecting means selects one template from the plurality of templates for the current image output based on the face orientation detected from the previous image output. Composed.
[0014]
According to this form, the previous image output and the current image output are continuous image frames, and the face orientation in the previous image and the face orientation in the current image have a relatively high correlation. A template corresponding to the face orientation relatively close to the orientation can be selected from a plurality of templates.
[0015]
According to one aspect of the present invention, in the face / line-of-sight recognition device, the image area extraction unit includes feature points that have not been captured in the current image output based on the face orientation detected from the previous image output. Judgment is made so that extraction of an image region corresponding to a feature point that has not been photographed is not processed.
[0016]
According to one aspect of the present invention, it is possible to avoid template matching related to feature points that are not photographed in the current image output, and thus it is possible to avoid errors in the face / gaze recognition device.
[0017]
According to one aspect of the present invention, the image output of the camera of the face / line-of-sight recognition device is a near-infrared image, and the line-of-sight direction detection means determines the position of the pupil from the brightness of the image around the eyes. It detects, and it is comprised so that a gaze direction may be detected from the center position of the detected pupil, and the center position of the eyeball.
[0018]
According to this form, the pupil imaged in the near-infrared image is imaged relatively dark due to the difference in reflectance from the iris, so the pupil is imaged by detecting the darkest part from the image around the eyes. Can be detected.
[0019]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described with reference to the drawings. FIG. 1 shows an embodiment of a hardware configuration in a face / line-of-sight recognition apparatus. In this embodiment, the face / line-of-sight recognition device is configured by a computer, but is not necessarily limited to such a hardware configuration.
[0020]
1 includes two video cameras (right camera 11 and left camera 13), two camera control units (15, 17), an image processing board 19, and IR light projection. Machine 14, personal computer 21, monitor display 23, keyboard 25, and mouse 27.
[0021]
The two video cameras are installed on the front left and right of the person to be photographed, and photograph the face to be photographed in stereo. Each video camera (11, 13) is controlled via a camera control unit (15, 17). The camera control units are interconnected via an external sync signal line, and the left and right video cameras are synchronized by this sync signal, and two image frames taken at the same time at the left and right positions are taken. can get. The face / line-of-sight recognition apparatus can process three-dimensional object recognition using a stereo method using two image frames taken at the same time at left and right positions as input images.
[0022]
The infrared projector 14 is installed in front of the subject so as to irradiate the face with near-infrared light, and reduces image degradation due to illumination fluctuations in the vehicle. For this reason, the video camera captures a subject in a state where wavelengths other than near-infrared light are blocked by the near-infrared transmission filter 29 or the like.
[0023]
The first reason for using near-infrared light as illumination is to improve the robustness of the image against illumination variations. In general, the brightness around a subject to be photographed varies greatly depending on environmental changes such as indoors or outdoors or during the day or at night. Further, when strong visible light strikes the face from one direction, a shaded gradation is generated on the face. Such illumination fluctuations and shade gradations significantly deteriorate the accuracy of image recognition.
[0024]
In this embodiment, the infrared projector 14 irradiates near-infrared light from the front to capture an image, thereby reducing the gradation of shading on the face due to visible light from the surroundings. Such a near-infrared image is less susceptible to illumination changes than an image obtained using visible light, and the accuracy of image recognition can be improved.
[0025]
The second reason for using near-infrared light is that the pupil of the eye can be extracted clearly. Since the position of the pupil is used to detect the gaze direction, it is important to photograph the pupil clearly.
[0026]
The image processing board 47 processes various images taken by the video camera. For example, when images captured by each video camera are sent as NTSC video signals, the image processing board 47 converts the images into cluster images of an appropriate format and stores them in an internal buffer memory. . Furthermore, the image processing board 47 includes a hardware circuit that executes an image processing algorithm, and can execute image processing at high speed. For example, an image processing algorithm by a hardware circuit includes processes such as an oblique projection mechanism, a Hough transform, a binary image matching filter, and an affine transform (image rotation, enlargement, reduction).
[0027]
The image processing board 19 is connected to the personal computer 21 via an arbitrary interface (for example, PCI bus, serial bus, IEEE1394, etc.), and is controlled according to a program on the personal computer 21. The personal computer 21 includes a user interface such as a monitor display 23, a keyboard 25, and a mouse 27, and operates using an OS known as “Linux”.
[0028]
FIG. 2 shows a functional block diagram of the face / gaze recognition apparatus implemented by the hardware shown in FIG. Reference numeral 31 in FIG. 2 indicates a face to be photographed. The image input unit 33 comprehensively shows the left and right video cameras (11, 13), the camera control unit (15, 17), and the image processing board 19 shown in FIG. The image input unit 33 continuously shoots stereo images of the face 31 to be imaged, clusters the images, and provides the images to the image processing unit 35 in FIG.
[0029]
The image processing unit 35 includes a face search unit 37, a face tracking unit 39, a blink detection unit 41, and a line-of-sight detection unit 43, and detects the face direction and the line-of-sight direction in real time from the face photographed in the provided image.
[0030]
The face search unit 37 searches a region where a face is photographed from the entire image, and is used for initial initialization and error recovery of face tracking. The face tracking unit 39 extracts facial feature points and detects the orientation of the face being photographed in real time. The blink detection unit 41 analyzes an image around the eyes and determines whether the eyes are closed. The line-of-sight detection unit 43 detects the pupil and detects the line-of-sight direction in real time from the position of the pupil and the position of the eyeball.
[0031]
FIG. 3 shows an overall flowchart of the image processing unit 35. The face search unit 37, the face tracking unit 39, the blink detection unit 41, and the line-of-sight detection unit 43 operate in association with each other and detect the face direction and the line-of-sight direction in real time from the left and right input images that are continuously captured. Can do.
[0032]
In the flowchart of FIG. 3, the process of the face search unit 37 is indicated by steps 101 to 103, the process of the face tracking unit 39 is indicated by steps 105 to 113, and the process of the blink detection unit 41 is indicated by steps 115 to 117. The processing of the line-of-sight detection unit 43 is shown in steps 119 to 123. Hereinafter, processing of each functional block of the image processing unit 35 will be described with reference to this flowchart.
[0033]
Face search unit 37
Processing of the face search unit 37 will be described with reference to FIG. The face search unit 37 roughly searches an image area where a human face is photographed from the input image. The processing here can be said to be preprocessing for the face tracking unit 39. The face search unit 37 roughly searches a region where the face is photographed from the input image before the processing of the face tracking unit 39, so that the face tracking unit 39 performs a detailed analysis of the face in the input image. It can be executed at high speed.
[0034]
First, in step 101, images of the left and right video cameras are input from the image input unit 33, and a region where a human face is captured is roughly searched from the entire input image. This is performed by two-dimensional template matching using a search template 51 stored in advance.
[0035]
FIG. 4 shows an example of the search template 51. The image used for the search template 59 is an image obtained by partially cutting a human face facing front, and this image has one characteristic area of the human face such as eyes, nose, mouth, and the like. Included in the template. The search template 51 is previously reduced in resolution in order to increase the processing speed in template matching, and is further made into a differential image in order to reduce the influence of illumination fluctuations. This template is created from a plurality of samples and stored in advance.
[0036]
Since the search in step 101 is a two-dimensional template matching, either the image of the right video camera 11 or the image of the left video camera 13 is used. Hereinafter, template matching using the image of the right video camera 11 will be described as an example.
[0037]
In the case of template matching using an image of the right video camera 11, an image region corresponding to the search template 51 is searched from the image of the right video camera 11 and extracted. Next, in step 103, the same template matching is performed on the left image using the image region in the matched right image as a template, and the three-dimensional position of the entire face is roughly determined from the result of the stereo matching. Desired. The image information obtained in this way is used to set the search range of each feature point in the face tracking unit 39.
[0038]
Face tracking unit 39
The face tracking unit 39 extracts facial feature points from the input image based on the image information obtained in advance, and obtains the three-dimensional position of the face and the orientation of the face from these feature points. Hereinafter, a method in which the face tracking unit 39 extracts feature points from the input image will be described.
[0039]
The face tracking unit 39 searches for facial feature points from the input image by template matching. The template used for this search uses an image of the three-dimensional face feature point model 69 stored in advance in the database 47. FIG. 5 shows an example of the three-dimensional face feature point model 69.
[0040]
The three-dimensional face feature point model 69 in the present embodiment is generated from partial images (53 to 67) obtained by locally cutting out characteristic portions of a human face facing forward. For example, as shown in FIG. 5, these partial images include a left eyebrow head 53, a right eyebrow head 55, a left eye corner 57, a left eye head 59, a right eye corner 61, a right eye head 63, and a left edge 65 of the mouth. The image is generated by locally cutting a face image prepared in advance such as the right end 67 of the mouth. Each of these partial images is associated with a three-dimensional coordinate representing the three-dimensional position of the object being photographed in the image (in this example, the left and right eyebrows, the left and right eye corners and eyes, and both ends of the mouth). And stored in the database 47. In this specification, a partial image of the face feature region having these three-dimensional coordinates is referred to as a face feature point, and a face model generated from the plurality of face feature points is referred to as a three-dimensional face feature point model 69. . The three-dimensional face feature point model 69 is generated from a plurality of samples and stored in the database 47.
[0041]
The face tracking unit 39 extracts each corresponding feature point from the input image using each partial image of the three-dimensional face feature point model 69 as a template. The template matching may use either the right video camera image or the left video camera image, but in this embodiment, the right video camera image is used. The images obtained as a result of this template matching are a total of eight images of the left and right eyebrows, the left and right eyes and corners, and both ends of the mouth.
[0042]
This extraction process will be described with reference to the flowchart of FIG. 3. First, in step 105, a search range for each feature point is set. The search range is set based on image information obtained in advance. For example, when step 105 is processed after step 103, the region of the entire face in the input image is already known (since it has been detected in step 101), the region where each feature point exists in the input image I also understand roughly. When step 105 is processed after step 117 or step 123, each feature point in the current input image is determined from information on each feature point (each feature point in the previous input image) detected in the previous loop. The existing area can be roughly predicted. Therefore, it is possible to set only an image area where each feature point is likely to exist as a search range for each feature point, and template matching can be processed at high speed by setting the search range for each feature point. It becomes possible.
[0043]
In step 107, an image region corresponding to the three-dimensional face feature point model 69 is searched from the image of the right video camera based on the search range of each feature point. This is executed by performing template matching on the image of the right video camera 11 using the image of each feature point of the three-dimensional face feature point model 69 as a template.
[0044]
In step 109, stereo matching is executed on the image of the left video camera 13 using the image of each feature point obtained from the search in step 107 as a template. As a result, the three-dimensional coordinates of each feature point of the input image corresponding to each feature point of the three-dimensional face feature point model 69 are obtained. As a result of this stereo matching, three-dimensional coordinates (observation points) of the left and right eyebrows, left and right eye corners and eyes, and both ends of the mouth are obtained.
[0045]
In step 111, a 3D model fitting is performed using the 3D face feature point model 69 to detect the face orientation. Hereinafter, this three-dimensional model fitting will be described.
[0046]
As described above, the three-dimensional face feature point model 69 is generated from the feature points of the face facing the front. On the other hand, the face photographed in the input image does not necessarily face the front. When the face photographed in the input image is not facing the front, the three-dimensional coordinates (observation points) of the feature points of the input image obtained in step 111 are the respective feature points of the three-dimensional face feature point model 67. There is a deviation from the three-dimensional coordinates by an arbitrary angle and displacement. Therefore, when the front-facing three-dimensional face feature point model 67 is arbitrarily rotated and displaced, the angle and displacement corresponding to each feature point of the input image correspond to the face direction and position in the input image.
[0047]
When the three-dimensional face feature point model 67 is arbitrarily rotated and displaced to fit each feature point of the input image, the fitting error E is expressed by the following equation.
[0048]
[Expression 1]

[0049]
Where N is the number of feature points and x_iAre the three-dimensional coordinates of each feature point in the model, y_iRepresents the three-dimensional coordinates of each feature point from the input image. ω_iIs a weighting coefficient for each feature point, and uses a correlation value in stereo matching when the three-dimensional position of the feature point is obtained from the input image. By using this correlation value, the reliability of each feature point can be considered. The rotation matrix is R (φ, θ, ψ), the translation vector is represented by t (x, y, z), and these are variables in this equation.
[0050]
Therefore, if the rotation matrix R and the translation vector t that minimize the fitting error E in the above equation are obtained, the face orientation and face position of the input image can be obtained. This calculation is performed by using a least square method or a fitting method using a virtual spring model.
[0051]
In step 113, it is determined whether or not the face orientation is correctly detected in step 111. If it is determined that the face orientation has not been correctly detected, the process returns to step 101, and a series of processing is repeated using the new input image.
[0052]
FIG. 6 shows a more detailed flowchart of the face tracking unit 39. This flowchart is basically the same as the processing of the face tracking unit 39 shown in FIG. 3, but shows template matching (step 107) of each feature point in more detail.
[0053]
In the flowchart of FIG. 6, it is shown that a plurality of templates are used for one feature point. Multiple templates for one feature point are used to reduce errors in template matching and improve the accuracy of face orientation detection. Further, in this flowchart, feature points that are not photographed by the camera are predicted, and processing is performed so as not to perform template matching of feature points that are not photographed.
[0054]
First, processing of the face tracking unit 39 that uses a plurality of templates for one feature point will be described.
[0055]
The objects (such as the left and right eyebrows, the left and right eyes and corners of the eyes, and both ends of the mouth) captured in the image of each feature point in the three-dimensional face feature point model 69 are not a plane but a solid. Therefore, the appearance (that is, the state of the object being photographed) changes according to the face orientation and inclination. For this reason, when performing template matching using only a single template, an error occurs in template matching when the input image looks different from the template.
[0056]
For example, when only the feature point template created from the front face image is used for template matching, an error may occur when the face of the input image is directed diagonally. In order to avoid such an error caused by the difference in the appearance of each feature point, a template for each feature point created from each face-oriented image is used.
[0057]
When a plurality of templates are used for one face feature point, a template to be used for this template matching is selected based on face information (face orientation) in the previous input image. That is, based on face information in the previous input image, an optimum template is selected from a plurality of templates prepared in advance, and the selected template is used for template matching of feature points in the current frame.
[0058]
FIG. 7 is a diagram showing a plurality of templates for one feature point, and specifically shows an example of a plurality of templates for the feature point of the right eye corner. FIG. 7a shows a state when the head is facing left and right with respect to the camera, and FIG. 7b shows a state when the head is facing front with respect to the camera. C indicates a state when the head is rotated with respect to the optical axis of the camera. The right eye corner template corresponding to FIG. 7a is indicated by reference numeral 71, the right eye corner template corresponding to FIG. 7b is indicated by reference numeral 61, and the right eye corner template corresponding to FIG. This is indicated by reference numeral 73.
[0059]
As can be seen with reference to FIG. 7, even in the same right eye corner image, the appearance of the image changes according to the posture of the head. For this reason, the face tracking unit 39 stores a plurality of templates corresponding to the posture of the head in the database 47 in advance, and selects and uses one template from the plurality of templates. For example, in the example of FIG. 7, three types of templates are prepared for a state in which the head is rotated in the horizontal direction, and three types of templates are prepared for a state in which the head is rotated with respect to the optical axis. ing. Therefore, a total of 9 (3 × 3) right eye corner templates are used for the right eye corner feature points.
[0060]
Of these sets of templates, only one template is actually used in template matching. The selection of this template is determined based on the face information of the previous input image. Since the previous input image and the current input image are continuous image frames, the previous head posture and the current head posture should have a relatively high correlation. Therefore, in step 201 in FIG. 6, the posture of the head captured in the previous input image is acquired, and in step 203, a template corresponding to the posture of the head is selected. After a corresponding template is selected, template matching is performed on the current input image using the template, so that errors can be reduced.
[0061]
In the above description, a set of a total of nine right eye corner templates has been described as an example. However, a plurality of templates are prepared for other feature points, and one template is used for each of them according to face information in the previous image. Any number of templates for each feature point may be prepared as necessary.
[0062]
Next, a process for predicting feature points not photographed by the camera will be described in detail. These processes are performed in steps 205 to 209 in the flowchart of FIG.
[0063]
With reference to FIG. 8 and FIG. 9, an outline of prediction of feature points not photographed by the camera will be described. FIG. 9 shows a front view and a top view of the face 31 facing the camera. In this figure, only one camera is shown for convenience, but actually there are two cameras constituting a stereo camera, and this process is executed for each of the two cameras.
[0064]
In FIG. 8, the reference position is determined based on the installation position of the camera. A vector connecting the reference position and the three-dimensional coordinates of each feature point is referred to as a “position vector” of each feature point. Furthermore, a vector in the normal direction of each feature point with respect to the curved surface obtained by connecting the coordinates of each feature point is referred to as a “normal vector” of each feature point. Whether or not each feature point is photographed by the camera can be determined by an angle θ formed by the “position vector” and the “normal vector” for each feature point.
[0065]
For example, consider the xz plane in the top view of FIG. In this case, the angle θ formed by the position vector and the normal vector for each feature point is sufficiently smaller than 90 °. Therefore, all the feature points can be taken with the camera. However, in the case of the top view of FIG. 9 with the face facing sideways, the angle θ between the position vector and the normal vector for feature 1₁Is approximately 90 °. In this case, the camera cannot capture the feature 1 and the template matching related to the feature 1 is likely to cause an error.
[0066]
Therefore, the face tracking unit 39 obtains a position vector and a normal vector for each feature point, and obtains an angle θ formed by these vectors. When θ of each feature point is larger than a predetermined threshold value, it is determined that the corresponding feature point is not photographed by the camera. As a result, the face tracking unit 39 does not perform template matching regarding the feature point.
[0067]
In the above example, the xz plane has been described, but the same processing is also performed for the xy plane. Further, this process is executed for each of the two cameras constituting the stereo camera. Thereby, it is possible to avoid an error caused by a feature point that is not photographed depending on the posture of the head.
[0068]
Referring to the flowchart of FIG. 6, in step 205, a position vector and a normal vector are obtained for each feature point from the head position information of the previous input image. In step 207, the angle θ formed by the position vector of each feature point and the normal vector is obtained. Next, in step 209, the determined angle θ is compared with a predetermined threshold value, and a feature point for performing template matching is selected. In step 211, template matching is performed on the current image frame using the selected template. Based on the result, stereo matching of each feature point is performed in step 213. Finally, in step 215, the three-dimensional face feature point model 63 is fitted with a three-dimensional observation value, thereby detecting the face orientation captured in the input image.
[0069]
Blink detector 41
The blink detection unit 41 extracts an image around the eyes from the input image for the line-of-sight detection unit 43 and determines whether the eyes are closed. If the eyes are closed, there is no point in detecting the line-of-sight direction, and the process returns to the face tracking unit 39.
[0070]
FIG. 10 is a flowchart showing in detail the process (step 115) of the blink detection unit 41. First, in step 301, areas where eyes exist in the input image are obtained for each of the left and right sides. This is performed based on the information about the feature points of the left and right eye corners and the eyes obtained by the face tracking unit 39. For example, for each of the left and right eyes, an image area including the entire eye is obtained. Next, in step 303, an image is extracted from the obtained area.
[0071]
FIG. 11 shows a region 75 for the right eye extracted from the input image. The feature point of the right eye detected by the face tracking unit 39 is indicated by reference numeral 63, and the feature point of the right eye corner is indicated by reference numeral 61. In this example, the range of the eye region extracted from the input image is from the eye feature point to the eye feature point in the width direction, and the height direction is twice the height of the feature point.
[0072]
In step 305, a horizontal straight line is detected from the image of the eye area to determine whether the eye is open. That is, in the case of an input image shot with eyes open, the extracted image of the eye region includes many vertical and oblique edges caused by iris and eye contours. On the other hand, in the case of an input image taken with the eyes closed, a relatively large number of horizontal edges caused by the closed eyelids are included. Therefore, it is possible to detect the edge from the image of the eye region and determine whether the eye is closed from the ratio of the types of edges (vertical, diagonal, horizontal, etc.) included in the eye region. The type of edge is obtained by performing line segment fitting such as Hough transform and detecting a group of straight lines existing in the image. If the proportion of straight lines that can be regarded as horizontal in the straight line group existing in the image is greater than a predetermined threshold value, the blink detection unit 41 determines in step 307 that the eyes are closed.
[0073]
In this embodiment, it is detected whether the eyes are open for both the left and right eyes. When it is determined that either the left or right eye is closed, the detection of the face direction is processed using a new image frame without proceeding to the detection of the gaze direction.
[0074]
Line-of-sight detection unit 43
FIG. 12 shows a detailed flowchart of the line-of-sight detection unit 43. The line-of-sight detection unit 43 detects the line-of-sight direction from the input image based on the position and orientation of the face obtained by the face tracking unit 39.
[0075]
In the line-of-sight detection unit 43, the human eyeball is modeled as a three-dimensional sphere in which the center of the eyeball coincides with the center of rotation. The line-of-sight direction is obtained from the relationship between the position and posture of the head detected by the face tracking unit 39 and the center position of the pupil. That is, the line-of-sight direction detected by the line-of-sight detection unit 43 is obtained as a vector connecting the center position of the eyeball and the center position of the pupil.
[0076]
In step 401 of FIG. 12, the line-of-sight detection unit 43 obtains the center position of the eyeball from the face position and orientation detected by the face tracking unit 39. This will be described in detail with reference to FIG.
[0077]
Reference numeral 77 in FIG. 13 indicates the three-dimensional coordinates of the feature points of the corners of the eyes detected by the face tracking unit 39, and reference numeral 77 indicates the three-dimensional coordinates of the feature points of the eyes that are detected by the face tracking unit 39. Is shown. First, a straight line connecting the two coordinates is obtained, and a straight line from the midpoint of the straight line toward the center of the eyeball is obtained. In this specification, a vector from the midpoint toward the center of the eyeball is referred to as an “offset vector” and is determined based on the face orientation obtained by the face tracking unit 39. The center position 81 of the eyeball exists on a straight line drawn from the midpoint along the offset vector, and exists on a distance corresponding to the radius of the eyeball from the midpoint. The radius of the eyeball is predetermined based on the standard eyeball size.
[0078]
Next, in step 403, the center position 83 of the pupil is detected from the image. As described above, in this embodiment, a near-infrared image photographed using near-infrared light is used. The pupil imaged in such a near-infrared image is imaged relatively dark due to the difference in reflectance from the iris. Therefore, the region where the pupil is photographed can be detected by detecting the darkest part from the image around the eyes.
[0079]
In step 405, the line-of-sight direction is obtained from a vector connecting the center position 81 of the eyeball and the center position 83 of the pupil. FIG. 14 shows the line-of-sight direction obtained in this way as an angle with respect to the horizontal plane and the vertical plane. In FIG. 14, the center position of the eyeball is indicated by reference numeral 81, and the center of the pupil existing on the eyeball surface is indicated by reference numeral 82. FIG. 14A shows a vector (line-of-sight vector) connecting the center position 81 of the eyeball and the center position 83 of the pupil when the plane on the image is the xy coordinate. Assuming that the optical axis direction of the camera is the z-axis, a side view corresponding to a in FIG. 14 is shown by b in FIG. The line-of-sight direction with respect to the vertical plane (in this case, the yz plane) is represented by an angle 85 formed by the line-of-sight vector with respect to the xz plane. FIG. 14c shows a top view of FIG. 14a. In this case, the line-of-sight direction with respect to the horizontal plane (xz plane) is represented by an angle 87 formed by the line-of-sight vector with respect to the yz plane.
[0080]
In this embodiment, since the left and right images are used as the input image, the line-of-sight vector can be obtained for each of the right image and the left image. Furthermore, since both the left and right line-of-sight vectors can be obtained for one image, a total of four line-of-sight vectors can be obtained. In this embodiment, a vector obtained by averaging these four line-of-sight vectors is used as the line-of-sight direction of the input image.
[0081]
After the face line-of-sight direction is detected in step 123 of FIG. 3, the process returns to step 105, and a series of processing is repeated using a new input image. As a result of this repetition, it becomes possible to execute continuous tracking of the driver's face direction, face position, and line-of-sight direction in real time.
[0082]
Other embodiments
In the embodiment of the face / line-of-sight recognition apparatus described above, a hardware configuration configured by a computer has been described, but the present invention is not limited to such an embodiment. FIG. 15 shows one embodiment of an automobile provided with a face / line-of-sight recognition device according to the present invention. 15 includes an image input unit 33, a side mirror 91, a room mirror 93, a control device 95, and an infrared projector 14.
[0083]
FIG. 16 is a functional block diagram of an automobile provided with the face / gaze recognition device shown in FIG. This functional block diagram includes an image input unit 33, an infrared projector 14, an image analysis unit 35, a personal identification unit 96, an environment setting unit 97, a side mirror / actuator 98, a room mirror / actuator 99, and a sheet / actuator 100. included.
[0084]
The automobile in this embodiment processes two types of operation modes using each functional block shown in FIG. The first operation mode is a face direction / line-of-sight detection mode executed by the image analysis unit 35. This mode detects the presence of a driver and continuously detects the state of the face direction and the line-of-sight direction. The second operation mode is a personal authentication mode. In this mode, a driver sitting in the driver's seat is specified, and environment settings such as a mirror and a seat are executed in accordance with the driver.
[0085]
The vehicle shown in FIG. 15 normally operates in the face orientation / gaze detection mode, and monitors whether the driver is in the driver's seat. When the driver is in the driver's seat, the image analysis unit 35 always detects the driver's face direction and line-of-sight direction. The automobile can determine the state of the driver based on the monitored face direction and line-of-sight direction, and can execute various processes according to the judgment.
[0086]
Information on individual drivers is registered in the database 92 in advance. The registered driver information includes face data of each driver, environment setting information corresponding to each driver, and the like. The face data is used by the personal identification unit 96 for collation with the input image taken by the image input unit 33. In this embodiment, the image analysis unit 35 acquires information such as the face position and orientation of the driver and the line-of-sight direction before personal authentication by the personal identification unit 96. Authentication can be performed.
[0087]
For example, usually, when the driver is facing obliquely, the accuracy of personal authentication is lowered. However, in this embodiment, since the image analysis unit 35 detects the position and orientation of the driver's face and the line-of-sight direction in advance, personal authentication according to such face information can be executed.
[0088]
When the driver in the driver's seat is specified by the personal identification unit 96, the environment setting unit 97 refers to the registered setting value, and for each driver, the side mirror actuator 98, the rearview mirror actuator 99, The sheet actuator 100 is controlled.
[0089]
Although the present invention has been described above with reference to specific embodiments, the present invention is not limited to such embodiments, and various modifications that can be easily made by those skilled in the art are also included in the scope of the present invention.
[0090]
【The invention's effect】
According to the present invention, in the real-time tracking of the face direction and the line-of-sight direction, erroneous recognition is reduced, and high-accuracy image processing can be realized.
[Brief description of the drawings]
FIG. 1 shows an embodiment of a face / line-of-sight recognition apparatus constituted by a computer.
FIG. 2 is an example of a functional block diagram of the face / gaze recognition apparatus.
FIG. 3 is an overall flowchart of an image processing unit.
FIG. 4 shows an example of a search template.
FIG. 5 shows an example of a three-dimensional face feature point model.
FIG. 6 is a flowchart of a face tracking unit.
FIG. 7 is a diagram showing a plurality of templates.
FIG. 8 is a diagram illustrating a method for predicting feature points that are not captured by a camera.
FIG. 9 is a diagram illustrating a method for predicting feature points that are not photographed by a camera.
FIG. 10 is a flowchart of a blink detection unit.
FIG. 11 is a diagram showing a region of the right eye extracted from an input image.
FIG. 12 is a flowchart of a gaze detection unit.
FIG. 13 is a diagram schematically illustrating processing of a line-of-sight detection unit.
FIG. 14 is a diagram illustrating a line-of-sight direction detected by a line-of-sight detection unit as an angle with respect to a horizontal plane and a vertical plane.
FIG. 15 shows an example of an automobile equipped with a face / line-of-sight recognition device.
16 is a functional block diagram of an automobile provided with the face / gaze recognition device shown in FIG.
[Explanation of symbols]
14 Infrared projector
33 Image input section
35 Image Analysis Department
37 Face Search Unit
39 Face Tracking Unit
41 Blink detector
43 Gaze detection unit

Claims

A plurality of cameras that capture the face of the user, face orientation detection means for detecting the orientation of the face from the image output of the camera, and the user's eyes from an image area around the eyes that are captured in the image output of the camera Eye opening / closing detection means for detecting whether the user's eyes are open, and gaze direction detection means for detecting the user's gaze direction from the image output of the camera in response to the user's eyes being opened,
The eye opening / closing detection means detects a horizontal edge included in the image area around the eye, detects whether the eye is open according to a ratio of the horizontal edge included in the image area , The image output of the camera is a near-infrared image, and the line-of-sight direction detecting means detects the position of the pupil from the brightness of the image area around the eye, and the line-of-sight is detected from the detected center position of the pupil and the center position of the eyeball. Face / line-of-sight recognition device that detects direction .

A plurality of cameras that capture the face of the user, face orientation detection means for detecting the orientation of the face from the image output of the camera, and the user's eyes from an image area around the eyes that are captured in the image output of the camera Eye opening / closing detection means for detecting whether the user's eyes are open, and gaze direction detection means for detecting the user's gaze direction from the image output of the camera in response to the user's eyes being opened,
The face orientation detection means selects one template for each feature point according to the face orientation from a plurality of templates prepared for each of one or more feature points corresponding to a characteristic face portion. A template selection means for selecting, and an image area extraction means for extracting one or a plurality of image areas corresponding to the feature points from the image output using the selected templates, respectively, Detecting the user's face orientation based on one or more image regions;
The face selection / gaze recognition device, wherein the template selection means selects one template from the plurality of templates for the current image output based on the face orientation detected from the previous image output.

The image area extracting means determines a feature point that is not captured in the current image output based on a face orientation detected from the previous image output, and extracts an image area corresponding to the feature point that is not captured. The face / line-of-sight recognition device according to claim 2 which is not processed.