JP3451832B2

JP3451832B2 - Face image feature recognition device

Info

Publication number: JP3451832B2
Application number: JP10190496A
Authority: JP
Inventors: 雅之金田; 勉那須
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 1996-04-01
Filing date: 1996-04-01
Publication date: 2003-09-29
Anticipated expiration: 2016-04-01
Also published as: JPH09270010A

Description

【発明の詳細な説明】【０００１】【発明の属する技術分野】本発明は、顔の画像データを
演算処理して顔の特徴量を形成する顔画像の特徴量認識
装置に関する。【０００２】【従来の技術】顔の画像データを演算処理して顔の特徴
量を抽出することにより、人物を特定したり、視線の方
向を検知したり、居眠り／覚睡の区別を判定したりする
種々の装置が開発されている。画像データは、画素ごと
の明るさを多数の階調段階（濃度値、輝度）で表現した
膨大なデータであるから、このような装置では、画像デ
ータを早期に絞り込んでデータ数を削減し、演算数を節
約して処理速度を確保している。例えば、特開平７−１
８１０１２号公報に示される居眠り警報装置では、顔の
画像データから顔の特徴量の１つである眼の開度指標を
求めているが、その処理過程では、画像データの濃度値
を白黒で二値化する、画面の不必要な部分に黒いマスク
領域を設定して黒部分の個数を減じる、個々の島状の黒
部分をそれぞれひとかたまりに扱ってラベリングする等
によりデータ数を削減している。【０００３】【発明が解決しようとする課題】特開平７−１８１０１
２号公報に示される処理では、眼の位置検出を行うため
に以下の３つのステップを踏む必要があるため、画像処
理や演算にかなりの時間を要する。（１）濃度値の二値化やノイズ除去を主とした画像デー
タの前処理（２）眼の横方向の位置を特定するための顔幅検出（ラ
ベリング、粒子除去、白画素の連続性の判定等）（３）眼の縦方向の位置検出（各ラベルの重心座標と面
積値とフィレ径の判定等）つまり、濃度値が二値化されているとは言え、個々の島
状の黒部分ごとに数１００〜数万の画素を含んでいるた
め、これらの処理の１つ１つの演算にかなりの時間がか
かる。従って、処理の周期を短縮して判定の追従性や信
頼性を高めることが困難であり、演算装置の小型化や他
の演算処理の割り込み処理も困難であった。【０００４】また、顔面上の黒部分を個々に分離した状
態でラベリングを行うため、眼に該当する黒部分に他の
黒部分が連続した状態、例えば、長髪が眼尻にかかった
場合には、白黒に二値化された画像上で眼に該当する黒
部分が他の黒部分と一体化して眼に該当する黒部分を特
定できなくなる。【０００５】本発明は、演算されるデータ量が削減され
て処理速度が高まるとともに、長髪が眼尻にかかった場
合でも眼に対応する特徴量を正確に特定できる顔画像の
特徴量認識装置を提供することを目的としている。【０００６】【課題を解決するための手段】本発明は、顔の画像デー
タを処理して顔の特徴量を抽出する顔画像の特徴量認識
装置であって、顔の縦方向の画素列に沿って画素の濃度
を検出し、前記画素列における濃度の局所的な高まりご
とに１個づつの画素を定めて抽出点とし、隣接する画素
列の画素列方向に近接した抽出点を連結して顔の横方向
に伸びる曲線群を抽出するものである。【０００７】すなわち、顔を撮像して画面上の画素ごと
に濃度を数値化する画像入力手段と、前記画素ごとに数
値化された濃度値を記憶する記憶手段とを有し、顔の縦
方向（Ｙ方向）の画素列に沿って前記濃度値を前記記憶
手段から取り出し、前記縦方向（Ｙ方向）の画素列上の
前記濃度値の片方向のピークを代表する画素を特定して
抽出点とする抽出手段と、隣接する縦方向（Ｙ方向）画
素列の抽出点を比較して、画素列方向（Ｘ方向）に近接
する抽出点を１つにグループ化する形成手段とを備え
る。【０００８】抽出手段は、１つの画素を含んで連続した
複数の画素の濃度値を平均化して１つの画素の濃度値と
し相加平均値を求め、平均化された相加平均濃度値の前
記縦方向（Ｙ方向）の画素列に沿った位置についての微
分値が一方向に反転する画素を抽出点とするルーチンを
形成するとともに、該ルーチンを繰り返し、暗さのピー
クに対応した抽出点のＹ座標値のみを記憶する。【０００９】また、形成手段は、縦方向（Ｙ方向）の画
素列毎に求めた抽出点の縦方向の差が所定範囲以下の抽
出点を横方向（Ｘ方向）に連結してグループ化してグル
ープ番号を付与し、横方向端の画素列から順にグループ
番号が付与された画素が存在するか否かをチェックし、
存在する場合はグループ番号とその縦横方向座標と端部
から数えて何度目の出現であるかを記憶する。【００１０】そして、形成手段はさらに、隣り合う画素
列同士で共通のグループが存在するかをチェックし、存
在している場合は連結データとして連結された抽出点の
個数とＹ座標値を連結データとしてメモリに記憶し、該
連結データの連結された抽出点の個数が所定値を越える
グループだけを選択し、縦座標の平均値とグループの左
右の両端の画素列の中央の横座標値から候補点の位置を
設定するものである。【００１１】【作用】本発明の顔画像の特徴量認識装置では、撮像し
た顔の画素ごとの濃度値が記憶手段に蓄積され、抽出手
段は、記憶手段から必要な画素列を呼び出して抽出点を
求める。抽出点は、眼、眉、口、ほくろ等、濃度値の局
所的な高まり（反転画像では局所的なへこみ）に対して
１個づつ特定される。そして、抽出点を横方向に連結す
るグループ化によって、ほくろ（点）や鼻すじ（縦線）
のような顔の横方向に連続しない抽出点が振り落され
る。【００１２】より詳しくは、濃度値の局所的な高まりの
中心点が抽出点として選択される。平均化によって画素
列に沿った濃度値の小さな変化が平滑されて濃度値の大
きな変化が残る。濃度値の大きな変化を微分演算してそ
のピークが特定される。微分値が一方向（正から負また
は負から正）に反転する画素（位置）が暗さのピークを
表しており、それぞれ眼、眉、口等のほぼ中心位置に対
応する。微分値が逆方向に反転する画素は、眼、眉、口
等の暗い部分の中間に位置する明るさのピークに対応し
ており、抽出点とされることなく捨てられる。【００１３】形成手段では、隣り合う画素列同士で共通
のグループが存在する場合は連結データとして連結し、
連結された抽出点の個数が所定値を越えるグループだけ
を選択するから、鼻の穴等に対応する連続する長さが短
いグループが振り落されて、眼、眉、口のように顔の横
方向に長く連続するグループだけが残る。【００１４】そして、グループを形成する各抽出点群に
ついて、縦座標の平均値と、連結開始の画素列と連結終
了の画素列の中央に対応する横座標値とが求められ、抽
出点のグループがそれぞれ画面上の１個の候補点で代表
される。画面上の候補点の相対位置関係から、それぞれ
のグループが眼、眉、口のいずれに対応するかを判別で
きる。【００１５】こうして、本発明の顔画像の特徴量認識装
置では、顔を横方向に横断する複数の曲線で顔画像の特
徴量が表現される。個々の曲線は、画像の明るい部分を
横方向に横断する暗い部分を代表しており、眼、眉、口
等にそれぞれ該当する。鼻の影、額の髪の毛、頬の線
等、顔画像を縦方向に横断する暗い部分は、画素列上に
局所的なピークが形成されにくく、仮にピークが形成さ
れて抽出点となっても横方向に連結される他の抽出点が
存在しないから振り落とされる。顔の両側の線や頬にか
かる頭髪も、同様に、抽出点とはなりにくく、横方向に
連結される抽出点が存在しないから、振り落とされる。【００１６】【発明の実施の形態】図１〜図１３を参照して、運転席
制御に適用した実施例を説明する。本実施例では、運転
者の顔の画像から眼、眉、口等の複数の特徴量を抽出
し、これらの中から片方の眼の特徴量を特定する。図１
に示すように、自動車のインストパネルには、ＣＣＤ画
像素子を用いたカメラ１１が設置されており、運転中の
運転者の顔が一定方向、一定角度で連続的に撮像され
る。カメラ１１で撮像された画像のうち、０．１秒ごと
の１フレームの画像が、ＡＤ変換器１２に送出される。【００１７】ＡＤ変換器１２は、画素ごとのアナログ電
圧を２５６段階のデジタル値に変換して画像メモリ１３
に蓄積する。画像メモリ１３は、図６に示されるよう
に、縦４８０画素、横５１２画素の個々の画素に８ビッ
トづつの濃度値を記憶保持する。画像データ演算回路１
４は、画像メモリ１３に蓄積された１フレームの画像を
用いて顔の特徴量を抽出する。顔の特徴量認識回路１５
は、抽出された複数の顔の特徴量から片方の眼の特徴量
を特定する。ミラー、シート制御回路１６は、特定され
た片方の目の特徴量から目の高さを識別して、ミラーの
角度とシートの高さを調整する。【００１８】実施例における画像処理は、図２のフロー
チャートに従って実行される。ステップ１１０では、カ
メラ１１によって運転者の顔を含む車室内の画像が撮影
され、ＡＤ変換器１２によって画素の濃度がデジタル値
に変換される。ステップ１２０では、画像メモリ１３に
１フレームの画像に対応する画像データが蓄積される。
ステップ１３０では、画像メモリ１３に蓄積された膨大
な画像データの中から顔の縦方向の画素列が図７に示す
ように選択される。画面の中央部分に等間隔で数１０
列、Ｘ方向の１０画素ごとに１列の画素列が設定され
る。１列の画素列はＹ方向に４８０個の画素を有し、画
素列ごとに全画素の濃度値を画像データ演算回路１４に
送出可能である。【００１９】ステップ１４０では、それぞれの画素列の
濃度値を演算処理して抽出点となる画素（Ｙ座標置）を
特定する。画素列上の濃度値の大きな変化を求め、その
局所的なピークを抽出点とする。処理手順を図３に示
す。【００２０】ステップ１４１では、図７の左側から順番
に、画素列ごとの全画素の濃度値を呼び出す。ステップ
１４２では、画素列上の濃度値を相加平均して、画素列
上の濃度値の変化曲線から高い周波数成分を除去した相
加平均濃度値を求める。例えば、図７に示す画素列Ｘａ
からは図８の（ａ）、画素列Ｘｂからは図９の（ａ）に
示すような相加平均濃度値が抽出される。ステップ１４
３では、相加平均濃度値を微分する。図８の（ａ）に示
す相加平均濃度値では（ｂ）に示す微分値となり、図９
の（ａ）に示す相加平均濃度値では（ｂ）に示す微分値
となる。【００２１】ステップ１４４では、微分値から、相加平
均濃度値の暗さのピークに対応するＹ座標値を求める。
微分値が負から正に反転するＹ座標値は、暗さのピーク
に対応して、眼、眉、口等をそれぞれ代表する１個づつ
のＹ座標値となる。微分値が正から負に反転するＹ座標
値は、眼、眉等の中間位置に対応する明るさのピーク位
置であるから実施例では使用しない。図８の（ａ）に示
す相加平均濃度値では（ｂ）に示す微分値となり、微分
値が負から正に反転するＹ座標値ｐ１、ｐ２、ｐ３、ｐ
４、ｐ５が抽出される。これらの中で手前の微分値のピ
ークがしきい値（破線）以下となるＹ座標値ｐ１、ｐ
２、ｐ５が抽出点とされる。手前の微分値のピークを判
別することで、顔面のしわや凹凸が抽出点となることを
回避している。図９の（ａ）に示す相加平均濃度値で
は、（ｂ）に示す微分値が負から正に反転するＹ座標値
ｐ１、ｐ２、ｐ３が抽出されるが、いずれも手前の微分
値のピークがしきい値（破線）以下とならず、抽出点が
形成されない。【００２２】ステップ１４５では、最後の画素列、図７
の右端の画素列であるか否かを判定する。最後の画素列
であれば、全部の画素列で抽出点を特定済みであるか
ら、ステップ１５０へ進む。最後の画素列でなければス
テップ１４１へ進み、次の画素列の濃度値データによる
抽出点の特定を開始させる。このようにして、図１０に
示すように、画素列ごとに０〜４個程度の抽出点Ａ１、
Ａ２、Ａ３、Ａ４が特定される。画素列Ｘｃでは、抽出
点Ａ１、Ａ２、画素列Ｘｄでは抽出点Ａ１、Ａ２、Ａ
３、Ａ４が特定される。【００２３】ステップ１５０では、画素列ごとに求めた
抽出点をＸ方向に連結してグループ化する。処理手順を
図４に示す。ステップ１５１では、画素列の抽出点デー
タを順番に呼び出す。ステップ１５２、１５３では、隣
接する画素列の抽出点についてＹ座標値が比較される。
隣接する画素列の抽出点のＹ座標値の差が１０画素以下
であれば、ステップ１５４へ進んで連結データが形成さ
れる。差が所定範囲を越える（グループ化できない）と
判断された場合はステップ１５５へ進んで連結データは
形成されない。連結データは、発生したグループごとに
順番に付与されるグループ番号、グループの左端の抽出
点の属する画素列の番号（Ｘ座標値に対応）で分類し
て、連結された抽出点の個数（画素列数）とＹ座標値を
メモリに記憶し、画素列を処理するごとに記憶内容を書
き替えて形成される。ステップ１５５では、最後の画素
列であるか否かを判定する。最後の画素列であれば、全
部の画素列でグループ化を判定済みであるから、ステッ
プ１６０へ進む。最後の画素列でなければ、ステップ１
５１へ進み、次の画素列の抽出点によるグループ化の判
定を開始させる。【００２４】ステップ１５０における処理を図１１を用
いて簡単に説明する。ここでは、画素列を１１本として
説明する。図１１に示すように画素列１〜１１でそれぞ
れ特定された抽出点は、抽出点ごとにグループ番号Ｇ、
連結個数Ｎ、Ｙ座標値を付与されて連結データを形成す
る。図１１に示される３４個の抽出点は、１０個のグル
ープにまとめられている。【００２５】ステップ１６０では、図１に示す画像デー
タ演算回路１４が、図５に示す手順に従って不必要な連
結データを除去し、最終的に必要な連結データについて
だけ候補点データを形成する。ステップ１６１では、メ
モリから連結データをグループ番号の順に１個づつ呼び
出す。ステップ１６２、１６３では、連結データの連結
データ数を評価する。連結データ数が５個以上であれ
ば、次のステップ１６４へ進んで候補点データを形成す
る。連結データ数が５個未満であれば（鼻の穴等）、ス
テップ１６５へ進んで候補点データを形成しない。例え
ば、図１０に示す抽出点群からは、図１２に示すよう
に、両方の眼および眉、鼻の影、口にそれぞれ相当する
６個の抽出点のグループＧ１〜Ｇ６が残される。ここで
は、残された６個のグループについて左から右、上から
下方向に順番に番号を付け直している。【００２６】候補点データは、残された抽出点のグルー
プをそれぞれ画面上の１個づつの候補点で代表させる１
組の座標値、すなわち、そのグループに属する抽出点の
Ｙ座標値の平均値（高さ）と、グループの左右の両端の
画素列の中央のＸ座標値で構成される。例えば、図１２
に示す抽出点のグループＧ１〜Ｇ６は、図１３に示す画
面上の６個の候補点Ｋ１〜Ｋ６で代表される。ステップ
１６５では、最後の連結データであるか否かを判定す
る。最後の連結データであれば、全部の連結データを評
価済みであるから、ステップ１７０へ進む。最後の連結
データでなければステップ１６１へ進み、次の連結デー
タの評価を開始させる。【００２７】ステップ１７０では、図１に示す顔の特徴
量認識回路１５が、候補点データを評価して複数の候補
点の中から片方の眼の候補点を特定する。例えば、図１
３に示す６個の候補点Ｋ１〜Ｋ６の場合、Ｙ座標値の大
きい（低い）候補点Ｋ５、Ｋ６を中心にして中央の検出
ゾーンＺＣを形成する。検出ゾーンＺＣの外側には、Ｙ
座標値の小さい（高い）候補点Ｋ１、Ｋ３と候補点Ｋ
２、Ｋ４をそれぞれ中心にして左右の検出ゾーンＺＬ、
ＺＲを設定する。そして、左側の検出ゾーンＺＬに含ま
れる２つの候補点Ｋ１、Ｋ３と中央の検出ゾーンＺＣに
含まれる候補点Ｋ６の高さ差、候補点Ｋ１、Ｋ３の位置
関係に基づいて片方の眼に対応する候補点Ｋ３が特定さ
れる。ステップ１８０では、図１に示すミラー、シート
制御回路１６が、特定された片方の眼の特徴量、すなわ
ち運転者の眼の高さに適合させてミラーの角度やシート
高さを調整する。【００２８】実施例の画像処理によれば、画素列ごとに
画素の濃度値を呼び出して、画面の左側から右側へ１方
向に同じ演算処理を繰り返し、また、画素列ごとの限ら
れた大きさのデータしか扱わないから、特開平７−１８
１０１２号公報に示される処理に比較して処理プログラ
ムが単純になる。そして、画素列が増えても同じ処理の
繰り返し回数が増すだけで、処理内容自体は変化しな
い。また、演算された抽出点から連結グループを形成し
て、連結グループの数を絞り込み、候補点データという
小さなデータを形成して眼の特徴量を特定する過程は、
さらに小さなデータと単純な演算／比較で足りる。ま
た、濃度値の二値化やノイズ除去を主とした画像データ
の前処理、眼の横方向の位置を特定するための顔幅検
出、眼の縦方向の位置検出のいずれの処理も行う必要が
無い。従って、画像処理に必要な演算数やメモリ容量が
少なくて済み、演算素子の性能が多少低くても足り、装
置を小型軽量に構成したり、他の用途の演算装置で割り
込み処理させて画像処理することも可能となる。【００２９】また、顔の縦方向に画素列を選択して、暗
さの局所的なピークとなるＹ座標値を抽出点とするか
ら、生え際、顔の両端、顔に縦方向にかかる髪の毛等は
抽出点となる可能性が低い。仮に抽出点とされた場合で
も、水平方向に連結される抽出点が存在して連結される
個数が５以上という条件に満たない抽出点は自動的に排
除されるから、生え際、顔の両端、皺や凹凸、直射日光
による影等によって不必要な候補点が形成されて眼の特
定を誤る心配が無い。従って、眼に髪の毛がかかってい
る場合でも、少なくとも髪の毛にかかるまでの眼の線の
連結データや候補点データが問題無く形成され、髪の毛
がかかっていない場合と同様の手順で眼の開度指標を演
算できる。【００３０】なお、図２のフローチャート中、ステップ
１１０が発明の画像入力手段、ステップ１２０が発明の
記憶手段、ステップ１３０、１４０が発明の抽出手段、
ステップ１５０、１６０が発明の形成手段にそれぞれ対
応する。また、実施例では、眼の検出高さに基づいてシ
ート高さ等を調整する用途を説明したが、本発明は、人
物を特定したり、視線の方向を検知したり、特開平７−
１８１０１２号公報に示されるように居眠り／覚睡の区
別を判定する用途にも応用可能である。また、実施例で
は、眼、眉等が高い濃度値（暗い階調段階）となるよう
な画像で処理を行ったが、階調を反転した画像（眼、眉
等が低い濃度値となる）を用いてもよい。【００３１】また、実施例では、ＡＤ変換器１２以降の
処理をすべてデジタル演算で行うこととしたが、画素列
に沿った濃度値の相加平均を求めるまでの処理をアナロ
グ信号回路で行うこととしてもよい。例えば、撮像管を
用いたカメラで縦方向に走査線を設定し、顔を縦方向に
横切る走査線に沿った輝度信号をローパスフィルターに
通じて高周波成分を除去した後、１０本の走査線ごとに
１本の走査線分の輝度信号を抜き出してサンプリングに
よりＡＤ変換する。また、実施例では図７に示すように
Ｘ方向の１０画素ごとに１本の画素列を選択したが、さ
らに細かい刻みで多数の画素列を選択してもよい。画素
列が増えても同じ処理の繰り返し回数が増すだけで、基
本的な処理内容は変化しない。画素列を増すことで、特
徴量の曲線の湾曲状態をより精密に写し取れ、より小さ
な連結許容幅を設定して抽出点の連結間違いを回避でき
る。図１３に示すように検出ゾーンＺＬが設定された
後、検出ゾーンＺＬのみで高密度に画素列を設定して、
左側の眼の曲線を構成する抽出点の数を増し、その弓型
形状をさらに詳しく検出することとしてもよい。【００３２】なお、図１４に示すように運転者が眼鏡を
かけている場合、図１３に示される検出ゾーンＺＬに
は、図１５に示すように、眼と眉に対応する抽出点のグ
ループＧ３、Ｇ５に加えて、眼鏡の上下の縁に対応する
抽出点のグループＧ１、Ｇ７が残ってしまう。図１５で
は、残された９個のグループについて左から右、上から
下方向に順番に番号を付け直しており、眼、眉と符号の
関係が図１２とは一致しない。しかし、このような場合
でも、本実施例では、上述したように、検出ゾーンＺＬ
における候補点の上下関係に加えて、検出ゾーンＺＬの
複数の候補点と検出ゾーンＺＣの口の候補点の高さ差も
評価しているため、眼に対応するグループＧ５を誤り無
く特定できる。検出ゾーンＺＬに４個の候補点が形成さ
れる場合、下から２番目の候補点が眼と判定されて眼鏡
の縁に対応する候補点データが振り落とされ、運転者が
眼鏡をかけていない場合と同様な手順で眼の特徴量が特
定される。【００３３】【発明の効果】本発明によれば、例えば、眼にかかる画
素列の数と同じ個数の座標値で眼の弓形形状を特定でき
るから、眼の画像に含まれる画素全部を扱う場合に比較
してデータ数が大幅に削減される。眼の画像に含まれる
画素全部を扱って重心を求める処理に比較して、複数の
曲線から眼の曲線を特定する処理は簡単で済む。従っ
て、演算速度が低く、メモリの割り当ても小さい小型の
演算装置でも顔の特徴量を求める画像処理を高速に実行
できる。【００３４】そして、顔の横方向に現れる濃度値のピー
クや、顔の横方向に連結されない濃度値のピークが排除
されるから、顔を縦方向に横切る影や目尻にかかる髪の
毛によって、眼等の必要な特徴量の特定が妨げられな
い。【００３５】また、縦方向画素列においては連続した複
数の画素の濃度値を平均化して１つの濃度値とするか
ら、隣接する画素列における間違った抽出点をグループ
化する心配が無い。また、平均化によって形成された濃
度値のピークを抽出点とするから、眼や眉の形に正確に
沿った特徴量の曲線を形成でき、眼の開度指標等を求め
る際の誤差が抑制される。【００３６】グループの選択においては連結数の少ない
抽出点のグループが振り落とされるから、必要最小限に
絞り込んだ抽出点とグループの中で、眼や口の特徴量を
特定すればよく、間違った特徴量を選択しないで済む。【００３７】そして、絞り込まれたグループをそれぞれ
１個の候補点で代表させるから、特徴量の曲線をそのま
ま扱う場合に比較して相対位置の判定を効率的に実行で
きる。Description: BACKGROUND OF THE INVENTION [0001] 1. Field of the Invention [0002] The present invention relates to a face image feature quantity recognition apparatus for forming a face feature quantity by performing arithmetic processing on face image data. 2. Description of the Related Art Face image data is subjected to arithmetic processing to extract facial features, thereby identifying a person, detecting the direction of a line of sight, and determining whether to fall asleep or asleep. Various devices have been developed. Since image data is a huge amount of data expressing the brightness of each pixel in a number of gradation levels (density values, luminances), such a device narrows down the image data early to reduce the number of data, The processing speed is secured by saving the number of operations. For example, Japanese Patent Application Laid-Open No. 7-1
In the dozing alarm device disclosed in Japanese Patent Application Laid-Open No. 81012, an eye opening index, which is one of the facial features, is obtained from the face image data. The number of data is reduced by converting to a value, setting a black mask area in an unnecessary portion of the screen to reduce the number of black portions, and treating each island-shaped black portion as a lump and performing labeling. [0003] Japanese Patent Application Laid-Open No. 7-18101
In the processing disclosed in Japanese Patent Laid-Open Publication No. 2 (1993) -1995, the following three steps need to be performed in order to detect the position of the eye, so that considerable time is required for image processing and calculation. (1) Pre-processing of image data mainly on binarization of density values and noise removal (2) Face width detection (labeling, particle removal, continuity of white pixels) for specifying the horizontal position of eyes (Determination etc.) (3) Vertical position detection of eyes (judgment of barycentric coordinates, area value and fillet diameter of each label etc.) In other words, although density values are binarized, individual island-shaped black Since each part includes several hundreds to several tens of thousands of pixels, each of these processes takes a considerable amount of time to calculate. Therefore, it is difficult to shorten the processing cycle to increase the follow-up performance and reliability of the determination, and it is also difficult to reduce the size of the arithmetic unit and interrupt processing of other arithmetic processing. In addition, since labeling is performed in a state where the black portions on the face are individually separated, when a black portion corresponding to the eye is connected to another black portion, for example, when long hair is applied to the outer corner of the eye, On the black and white binarized image, the black portion corresponding to the eye is integrated with other black portions, and the black portion corresponding to the eye cannot be specified. The present invention provides an apparatus for recognizing a feature quantity of a face image, which can reduce the amount of data to be calculated, increase the processing speed, and can accurately specify the feature quantity corresponding to the eyes even when the long hair touches the outside of the eye. It is intended to provide. [0006] Means for Solving the Problems The present invention provides a feature value recognition device of the face image to extract a feature amount of the face by processing image data of a face, in the longitudinal direction of the pixel rows of the face A pixel density is detected along the pixel row, one pixel is determined for each local increase in density in the pixel row, and an extraction point is formed. The extraction points adjacent to each other in the pixel row direction of the adjacent pixel row are connected. A group of curves extending in the horizontal direction of the face is extracted. Namely, have a storage means for storing an image input means for digitizing the concentrations for each pixel on the screen by picking up the face, the digitized density value for each said pixel, the vertical direction of the face The density value is fetched from the storage means along a pixel row in the (Y direction), and a pixel representing a one-way peak of the density value on the pixel row in the vertical direction (Y direction) is specified and extracted. provided with extraction means for, by comparing the extracted points in the vertical direction (Y-direction) pixel column adjacent, and formation means for grouping into a single extraction point adjacent to the pixel column direction (X direction)
You . [0008] extraction means obtains the arithmetic mean value as the density value of one pixel by averaging the density values of a plurality of pixels continuous comprise one pixel, prior to averaged arithmetic mean density value
The routine fine <br/> content value for the position along the pixel columns of Kitate direction (Y direction) and extraction point pixels reversed in one direction
And repeat the routine to create a dark peak.
Only the Y coordinate value of the extraction point corresponding to the mark is stored. [0009] The forming means is provided with an image in a vertical direction (Y direction).
Extraction in which the vertical difference between the extraction points obtained for each
Connect the output points in the horizontal direction (X direction) to group
Group numbers in order from the pixel row at the end in the horizontal direction.
Check if there is a numbered pixel,
If present, the group number, its vertical and horizontal coordinates, and the end
It remembers how many times it is from the beginning. [0010] Further, the forming means may further comprise :
Check if a common group exists between columns and
If it exists, the extraction point
The number and the Y coordinate value are stored in the memory as linked data,
The number of linked extraction points in the linked data exceeds a specified value
Select only the group, the average value of the ordinate and the left of the group
The position of the candidate point is determined from the abscissa value at the center of the pixel row at the right
To set . In the facial image feature quantity recognition apparatus of the present invention , an image
The density value of each pixel of the face is stored in the storage
The stage retrieves the required pixel array from the storage means and selects the extraction point.
Ask. The extraction point is the density value station such as eye, eyebrow, mouth, mole, etc.
For localized heightening (local dent in reverse image)
It is specified one by one. Then, connect the extraction points in the horizontal direction.
Moles (dots) and nose streaks (vertical lines)
Extraction points that are not continuous in the horizontal direction, such as
You. More specifically, the local increase of the density value
The center point is selected as the extraction point. Pixel by averaging
Small changes in density values along the column are smoothed to
Changes remain. Differential calculation of large changes in density value
Are identified. If the derivative is in one direction (positive to negative or
Pixels (positions) that reverse from negative to positive)
In the center of the eyes, eyebrows, mouth, etc.
Respond. The pixels whose differential values are reversed in the opposite direction are the eyes, eyebrows, and mouth.
Corresponding to the brightness peak located in the middle of the dark part
And is discarded without being considered an extraction point. [0013] In the forming means, adjacent pixel columns are common to each other.
If the group of exists, concatenate as consolidated data,
Only the groups where the number of connected extraction points exceeds the specified value
, The continuous length corresponding to the nostrils etc. is short.
Group is shaken down and beside the face like eyes, eyebrows and mouth
Only groups that are long and continuous in the direction remain. Then, for each extracted point group forming the group, the average value of the ordinate and the abscissa value corresponding to the center of the pixel row at the start of connection and the pixel row at the end of the connection are obtained. Are represented by one candidate point on the screen. From the relative positional relationship of the candidate points on the screen, it can be determined whether each group corresponds to the eye, eyebrows, or mouth. Thus, the facial image feature quantity recognition apparatus of the present invention is provided.
Of the face image by multiple curves that cross the face in the horizontal direction.
The charge is expressed. Each curve represents a bright part of the image
Represents the dark part that traverses in the horizontal direction, such as eyes, eyebrows, mouth
And so on. Nose shadows, forehead hair, cheek lines
Dark areas that traverse the face image in the vertical direction, such as
Local peaks are hardly formed, and if peaks are formed
Other extraction points connected in the horizontal direction
It is shaken off because it does not exist. On the lines and cheeks on both sides of the face
Similarly, it is difficult for the hair to become an extraction point,
Since there is no connected extraction point, it is shaken off. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Referring to FIGS. 1 to 13, an embodiment applied to driver's seat control will be described. In the present embodiment, a plurality of features such as eyes, eyebrows, and mouth are extracted from the image of the driver's face, and the feature of one eye is specified from these features. FIG.
As shown in FIG. 1, a camera 11 using a CCD image element is installed on an instrument panel of an automobile, and a driver's face during driving is continuously imaged in a certain direction and at a certain angle. An image of one frame every 0.1 second among the images captured by the camera 11 is sent to the AD converter 12. The AD converter 12 converts an analog voltage for each pixel into a digital value of 256 levels and converts the analog voltage into an image memory 13.
To accumulate. As shown in FIG. 6, the image memory 13 stores and holds a density value of 8 bits for each of 480 pixels vertically and 512 pixels horizontally. Image data operation circuit 1
Reference numeral 4 extracts the feature amount of the face using the image of one frame stored in the image memory 13. Facial feature recognition circuit 15
Specifies the feature value of one eye from the extracted feature values of the plurality of faces. The mirror / sheet control circuit 16 identifies the eye height from the specified characteristic amount of one eye, and adjusts the angle of the mirror and the height of the sheet. The image processing in the embodiment is executed according to the flowchart of FIG. In step 110, an image of the vehicle interior including the driver's face is captured by the camera 11, and the AD converter 12 converts the pixel density into a digital value. In step 120, image data corresponding to one frame image is stored in the image memory 13.
In step 130, a vertical pixel row of the face is selected from the huge amount of image data stored in the image memory 13 as shown in FIG. Several tens at equal intervals in the center of the screen
One pixel row is set for every 10 pixels in the row and X direction. One pixel row has 480 pixels in the Y direction, and the density values of all pixels can be sent to the image data calculation circuit 14 for each pixel row. In step 140, the density value of each pixel row is arithmetically processed to specify a pixel (Y coordinate location) to be an extraction point. A large change in the density value on the pixel row is obtained, and a local peak thereof is set as an extraction point. FIG. 3 shows the processing procedure. In step 141, the density values of all the pixels in each pixel column are called in order from the left side of FIG. In step 142, the arithmetic values of the density values on the pixel row are arithmetically averaged, and an arithmetic average density value obtained by removing high frequency components from a density value change curve on the pixel row is obtained. For example, the pixel column Xa shown in FIG.
8A, and an arithmetic average density value as shown in FIG. 9A is extracted from the pixel row Xb. Step 14
In 3, the arithmetic mean density value is differentiated. In the arithmetic mean density value shown in FIG. 8A, the differential value shown in FIG.
In the arithmetic mean density value shown in (a), the differential value shown in (b) is obtained. In step 144, the Y coordinate value corresponding to the dark peak of the arithmetic average density value is obtained from the differential value.
The Y coordinate value at which the differential value is inverted from negative to positive becomes one Y coordinate value representing each of the eyes, eyebrows, mouth, etc., corresponding to the peak of darkness. The Y coordinate value at which the differential value is inverted from positive to negative is a peak position of brightness corresponding to an intermediate position such as an eye or an eyebrow, and is not used in the embodiment. In the arithmetic mean density value shown in FIG. 8A, the differential value shown in FIG. 8B is obtained, and the Y coordinate values p1, p2, p3, and p at which the differential value is inverted from negative to positive.
4. p5 is extracted. Among these, the Y coordinate values p1 and p1 in which the peak of the preceding differential value is equal to or less than the threshold value (broken line)
2, p5 is the extraction point. By determining the peak of the differential value at the near side, wrinkles and irregularities on the face do not become extraction points. In the arithmetic mean density value shown in FIG. 9A, Y coordinate values p1, p2, and p3 in which the differential value shown in FIG. 9B is inverted from negative to positive are extracted, but all are extracted from the preceding differential value. The peak does not fall below the threshold (broken line), and no extraction point is formed. In step 145, the last pixel row, FIG.
It is determined whether or not the pixel row is at the right end. If it is the last pixel row, the extraction point has been specified in all the pixel rows, so the process proceeds to step 150. If it is not the last pixel row, the process proceeds to step 141, and the extraction point is started to be specified by the density value data of the next pixel row. In this way, as shown in FIG. 10, about 0 to 4 extraction points A1,
A2, A3, and A4 are specified. In the pixel row Xc, the extraction points A1, A2, and in the pixel row Xd, the extraction points A1, A2, A
3. A4 is specified. In step 150, the extracted points obtained for each pixel column are connected in the X direction and grouped. FIG. 4 shows the processing procedure. In step 151, the extraction point data of the pixel row is called in order. In steps 152 and 153, the Y coordinate values are compared for the extraction points of the adjacent pixel rows.
If the difference between the Y coordinate values of the extraction points of the adjacent pixel columns is 10 pixels or less, the process proceeds to step 154, where connected data is formed. If it is determined that the difference exceeds the predetermined range (grouping cannot be performed), the process proceeds to step 155, and no linked data is formed. The linked data is classified by the group number sequentially assigned to each generated group and the number of the pixel column (corresponding to the X coordinate value) to which the extracted point at the left end of the group belongs, and the number of linked extracted points (pixels) (The number of columns) and the Y coordinate value are stored in a memory, and each time a pixel column is processed, the stored contents are rewritten to form. In step 155, it is determined whether the pixel row is the last pixel row. If it is the last pixel column, the grouping has been determined for all the pixel columns, and the process proceeds to step 160. If it is not the last pixel row, step 1
Proceeding to 51, determination of grouping based on the extraction point of the next pixel row is started. The processing in step 150 will be briefly described with reference to FIG. Here, the description is made on the assumption that the number of pixel columns is 11. As shown in FIG. 11, the extraction points specified in the pixel columns 1 to 11 are group numbers G,
The connection data is formed by adding the connection numbers N and Y coordinate values. The 34 extraction points shown in FIG. 11 are grouped into 10 groups. In step 160, the image data arithmetic circuit 14 shown in FIG. 1 removes unnecessary connection data according to the procedure shown in FIG. 5, and finally forms candidate point data only for the necessary connection data. In step 161, the connected data is called one by one from the memory in order of the group number. In steps 162 and 163, the number of linked data of the linked data is evaluated. If the number of connected data is 5 or more, the process proceeds to the next step 164 to form candidate point data. If the number of connected data is less than 5 (nostrils, etc.), the process proceeds to step 165, and no candidate point data is formed. For example, from the extraction point group shown in FIG. 10, as shown in FIG. 12, groups G1 to G6 of six extraction points corresponding to both eyes and eyebrows, a shadow of a nose, and a mouth are left. Here, the remaining six groups are renumbered in order from left to right and from top to bottom. The candidate point data represents the group of the remaining extraction points by one candidate point on the screen.
A set of coordinate values, that is, an average value (height) of Y coordinate values of the extraction points belonging to the group, and an X coordinate value at the center of the pixel rows at both left and right ends of the group. For example, FIG.
Are represented by six candidate points K1 to K6 on the screen shown in FIG. In step 165, it is determined whether or not the data is the last linked data. If it is the last linked data, the process proceeds to step 170 because all the linked data have been evaluated. If it is not the last linked data, the process proceeds to step 161 to start evaluation of the next linked data. In step 170, the facial feature quantity recognition circuit 15 shown in FIG. 1 evaluates the candidate point data and specifies one eye candidate point from a plurality of candidate points. For example, FIG.
In the case of six candidate points K1 to K6 shown in FIG. 3, a central detection zone ZC is formed around the candidate points K5 and K6 having large (low) Y coordinate values. Outside the detection zone ZC, Y
Candidate points K1, K3 with small (high) coordinate values and candidate point K
2, K4 as the center, left and right detection zones ZL,
Set ZR. Then, it corresponds to one eye based on the height difference between the two candidate points K1 and K3 included in the left detection zone ZL and the candidate point K6 included in the central detection zone ZC, and the positional relationship between the candidate points K1 and K3. Is specified. In step 180, the mirror and seat control circuit 16 shown in FIG. 1 adjusts the mirror angle and the seat height in conformity with the specified characteristic amount of one eye, that is, the height of the driver's eye. According to the image processing of the embodiment, the density value of the pixel is called for each pixel column, the same arithmetic processing is repeated in one direction from the left side to the right side of the screen, and the limited size of each pixel column is obtained. Japanese Patent Laid-Open No. 7-18 / 1994
The processing program becomes simpler than the processing disclosed in Japanese Patent No. 1012. Then, even if the number of pixel rows increases, the number of repetitions of the same processing only increases, and the processing itself does not change. In addition, the process of forming a connected group from the calculated extraction points, narrowing down the number of connected groups, forming small data called candidate point data, and identifying the feature amount of the eye,
Even smaller data and simple calculations / comparisons are sufficient. In addition, it is necessary to perform any of the pre-processing of image data mainly on binarization of density values and noise removal, face width detection for specifying a horizontal position of an eye, and vertical position detection of an eye. There is no. Therefore, the number of operations and the memory capacity required for image processing are small, and the performance of the processing element may be slightly lower. It is also possible to do. Further, since a pixel row is selected in the vertical direction of the face, and the Y coordinate value which is a local peak of darkness is used as an extraction point, the hairline, both ends of the face, the hair vertically applied to the face, etc. Is unlikely to be an extraction point. Even if it is set as an extraction point, extraction points that do not satisfy the condition that there are extraction points connected in the horizontal direction and the number of connected points is 5 or more are automatically excluded. Unwanted candidate points are not formed due to wrinkles, irregularities, shadows due to direct sunlight, and the like, and there is no fear of erroneous eye identification. Therefore, even when the eye is covered with hair, at least the connection data of the eye line and the candidate point data up to the point where the hair is covered are formed without any problem, and the eye opening index is calculated in the same procedure as when the hair is not covered. Can be calculated. In the flowchart of FIG. 2, step 110 is the image input means of the invention, step 120 is the storage means of the invention, steps 130 and 140 are the extraction means of the invention,
Steps 150 and 160 correspond to the forming means of the present invention, respectively. Further, in the embodiment, the use of adjusting the seat height or the like based on the detection height of the eyes has been described. However, the present invention is directed to specifying a person, detecting a direction of a line of sight,
As shown in JP-A-181012, the present invention can also be applied to a use for determining the distinction between dozing / sleep. In the embodiment, processing is performed on an image in which the eyes, eyebrows, and the like have a high density value (dark gradation level). However, an image in which the gradation is inverted (the eyes, eyebrows, and the like have low density values). May be used. In the embodiment, all the processes after the AD converter 12 are performed by digital operation. However, the processes up to obtaining the arithmetic mean of the density values along the pixel row are performed by the analog signal circuit. It may be. For example, after setting scanning lines in the vertical direction with a camera using an image pickup tube, passing a luminance signal along a scanning line that crosses the face in the vertical direction through a low-pass filter to remove high-frequency components, and then removing every 10 scanning lines Then, a luminance signal for one scanning line is extracted and A / D converted by sampling. Further, in the embodiment, one pixel row is selected for every 10 pixels in the X direction as shown in FIG. 7, but a plurality of pixel rows may be selected in finer increments. Even if the number of pixel rows increases, only the number of repetitions of the same processing increases, but the basic processing content does not change. By increasing the number of pixel rows, the curved state of the curve of the feature amount can be more accurately captured, and a smaller allowable connection width can be set to avoid incorrect connection of the extraction points. After the detection zone ZL is set as shown in FIG. 13, pixel rows are set with high density only in the detection zone ZL,
The number of extraction points forming the curve of the left eye may be increased to detect the bow shape in more detail. When the driver wears glasses as shown in FIG. 14, the detection zone ZL shown in FIG. 13 includes a group G3 of extraction points corresponding to eyes and eyebrows as shown in FIG. , G5, extraction point groups G1, G7 corresponding to the upper and lower edges of the glasses remain. In FIG. 15, the remaining nine groups are renumbered in order from left to right and from top to bottom, and the relationship between the eyes, eyebrows, and symbols does not match that in FIG. However, even in such a case, in the present embodiment, as described above, the detection zone ZL
Since the height difference between the plurality of candidate points in the detection zone ZL and the candidate point of the mouth in the detection zone ZC is evaluated in addition to the vertical relationship of the candidate points in the above, the group G5 corresponding to the eye can be specified without error. When four candidate points are formed in the detection zone ZL, the second candidate point from the bottom is determined as an eye, and candidate point data corresponding to the edge of the glasses is shaken off, and the driver does not wear glasses. The feature amount of the eye is specified in the same procedure as in the case. According to the present invention, for example, since the shape of the arch of the eye can be specified by the same number of coordinate values as the number of pixel rows related to the eye, all pixels included in the image of the eye are handled. The number of data is greatly reduced as compared with. Compared to the process of calculating the center of gravity by treating all the pixels included in the eye image, the process of specifying the eye curve from the plurality of curves is simpler. Therefore, even with a small arithmetic device having a low arithmetic speed and a small memory allocation, image processing for obtaining a face feature can be executed at high speed. [0034] Then, and the peak of the density values appearing in the lateral direction of the face, because the peak of the laterally linked not the density value of the face is eliminated by hair according to shadows and eye area across the face in the vertical direction, the eye etc. This does not hinder the specification of the required feature amount. In the vertical pixel row, continuous multiple
Whether to average the density values of several pixels into one density value
Therefore, there is no need to worry about grouping wrong extraction points in adjacent pixel columns. In addition, since the peak of the density value formed by the averaging is used as an extraction point, a curve of a feature amount that accurately follows the shape of an eye or an eyebrow can be formed, and an error in obtaining an eye opening index or the like is suppressed. Is done. In the selection of a group, a group of extraction points having a small number of connections is shaken off. Therefore, it is sufficient to specify the feature amount of the eyes and the mouth in the extraction points and the groups narrowed to the minimum necessary. It is not necessary to select a feature amount. [0037] Then, the group narrowed from be represented by one of the candidate points, respectively, can be efficiently perform the determination of the relative position as compared to when working curve of the feature as it is.

【図面の簡単な説明】【図１】実施例の警報装置の説明図である。【図２】警報装置の動作のフローチャートである。【図３】抽出点を求める処理のフローチャートである。【図４】連結データを形成する処理のフローチャートで
ある。【図５】候補点を選択する処理のフローチャートであ
る。【図６】顔画像の説明図である。【図７】画素列の設定の説明図である。【図８】画素列に沿った濃度値の説明図である。【図９】画素列に沿った濃度値の説明図である。【図１０】抽出点の分布の説明図である。【図１１】抽出点をグループ化する処理の説明図であ
る。【図１２】抽出点のグループの分布の説明図である。【図１３】候補点の分布の説明図である。【図１４】顔画像の説明図である。【図１５】抽出点のグループの分布の説明図である。【符号の説明】１１カメラ１２ＡＤ変換器１３画像メモリ１４画像データ演算回路１５顔の特徴量認識回路１６ミラー、シート制御回路BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an explanatory diagram of an alarm device according to an embodiment. FIG. 2 is a flowchart of an operation of the alarm device. FIG. 3 is a flowchart of a process for obtaining an extraction point. FIG. 4 is a flowchart of a process for forming linked data. FIG. 5 is a flowchart of a process of selecting a candidate point. FIG. 6 is an explanatory diagram of a face image. FIG. 7 is an explanatory diagram of setting of a pixel column. FIG. 8 is an explanatory diagram of density values along a pixel column. FIG. 9 is an explanatory diagram of density values along a pixel column. FIG. 10 is an explanatory diagram of a distribution of extraction points. FIG. 11 is an explanatory diagram of a process of grouping extraction points. FIG. 12 is an explanatory diagram of a distribution of a group of extraction points. FIG. 13 is an explanatory diagram of a distribution of candidate points. FIG. 14 is an explanatory diagram of a face image. FIG. 15 is an explanatory diagram of a distribution of extraction point groups. [Description of Signs] 11 Camera 12 A / D converter 13 Image memory 14 Image data calculation circuit 15 Face feature recognition circuit 16 Mirror and sheet control circuit

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−1863（ＪＰ，Ａ) 特開昭59−140589（ＪＰ，Ａ) 特開平７−311833（ＪＰ，Ａ) 山本新外２名，自動車運転環境におけるドライバのまばたき計測，電気学会論文誌Ｃ，1995年11月20日，Ｖｏｌ．115 −Ｃ，Ｎｏ．12，ｐｐ．1411−1416 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 7/00 - 7/60 G06T 1/00 H04N 7/18 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-4-1863 (JP, A) JP-A-59-140589 (JP, A) JP-A-7-311833 (JP, A) , Driver's blink measurement in a car driving environment, IEICE Transactions C, November 20, 1995, Vol. 115-C, no. 12, pp. 1411-1416 (58) Fields investigated (Int.Cl. ⁷ , DB name) G06T ^7/ 00-7/60 G06T 1/00 H04N 7/18

Claims

(57) [Claims] [Claim 1] An image of a face is taken and the density is determined for each pixel on the screen.
Image input means for digitizing, and storage means for storing a density value digitized for each pixel
In the facial image feature quantity recognition apparatus having the following expression, the density value is set in advance along the pixel row in the vertical direction (Y direction) of the face.
The pixel in the vertical direction (Y direction)
The pixel that represents the one-way peak of the density value on the column is identified.
By comparing the extraction means that are defined as extraction points with extraction points of adjacent vertical (Y-direction) pixel columns,
Extraction points close to the pixel column direction (X direction) are grouped into one.
Forming means for converting the image into a plurality of continuous images including one pixel.
Average the elementary density values to obtain the density value of one pixel.
The average value is calculated, and the averaged arithmetic average density value in the vertical direction is calculated.
The differential value at a position along the pixel row in the (Y direction) is one.
Create a routine that uses pixels that are reversed in the direction as extraction points
And repeat the routine to deal with dark peaks
Only the Y coordinate values of the extracted points are stored, and the forming means obtains the values for each pixel row in the vertical direction (Y direction).
Extraction points whose difference in the vertical direction of the extraction points is less than the specified range
(X direction) connected and grouped and numbered
Group numbers are assigned sequentially from the pixel row at the horizontal end.
Check if there is a pixel
Is the group number, its vertical and horizontal coordinates, and what
It remembers whether it was the first appearance, and shared between adjacent pixel rows.
Check if a group exists and if so
In the case, the number of connected extraction points and Y
The standard value is stored in a memory as linked data, and the linked data
Is a group where the number of connected extraction points exceeds
And select the average value of the ordinate and the left and right ends of the group.
An apparatus for recognizing a feature amount of a face image, wherein a position of a candidate point is set from an abscissa value at a center of a pixel row .