JP3577908B2

JP3577908B2 - Face image recognition system

Info

Publication number: JP3577908B2
Application number: JP23930797A
Authority: JP
Inventors: 直毅指田; 大器増本; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1997-09-04
Filing date: 1997-09-04
Publication date: 2004-10-20
Anticipated expiration: 2017-09-04
Also published as: JPH1185988A

Description

【０００１】
【発明の属する技術分野】
本発明は、人物の顔画像情報を変換した情報をデータベースに蓄積しておき、認識時には、入力された顔画像情報に所定の変換を施した後に、前記データベースに記憶された情報と照合することにより、顔画像を認識する顔画像認識システムに関する。
【０００２】
【従来の技術】
顔が表す情報をコンピュータで読み取ることができるようになれば、人間と機械との自然なインターフェースの実現に役立つとの期待から、近年、コンピュータで顔画像を認識しようとする試みが活発に行われている。また、顔画像に基づく個人識別についても、コンピュータでこれを代替、支援できるようになれば、犯罪捜査などのための顔画像データベース認識システムや、セキュリティのためのアクセスコントロールシステムなどへの応用が期待できる。このように、顔画像認識技術は、マンマシンインターフェースの高度化に際しての重要な課題の一つであり、今後ますますその重要性を増していくものと予想される。
【０００３】
コンピュータによる顔画像認識においては、顔の姿勢や、撮影位置の照明に代表される撮影条件などに伴い、見え方が変化する入力画像のパタンに対して、例えば、その人物の氏名の如く、識別結果の情報を安定に出力することが求められる。
【０００４】
このような問題に対して、画像中の顔が正面を向いているという前提のもとに、入力画像中から目や口などの部品（以下、「顔部品」という。）を正確に抽出し、これら顔部品の形状や配置における個人性をもとにして識別を行おうとする、顔部品ベースの手法がある。
【０００５】
しかし、この顔部品ベースの手法では、そもそも現実の環境で撮影された顔画像から、顔部品の線図的な形状やその位置などを正確に抽出すること自体大変困難な問題であり、さらに構造的に類似した個人の微妙な顔パタンの差異を線図的な形状だけで記述することにも限界がある。
【０００６】
そこで、最近になって注目を集めているアプローチとして、顔画像のパタンを各画素における濃淡値の二次元配列として表現し、そのマッチングによって識別を行おうとするパタン整合の手法が提案されている。この手法の代表的なものとしては、高次元の画像データを圧縮した固有空間と呼ばれる低次元空間上でパタン照合を行う固有空間法（例えば、ＨｉｒｏｓｈｉＭｕｒａｓｅａｎｄＳｈｒｅｅＫ．Ｎａｙｅｒ， ”ＶｉｓｕａｌＬｅａｒｎｉｇａｎｄＲｅｃｏｇｎｉｔｉｏｎｏｆ３−ＤＯｂｊｅｃｔｓｆｒｏｍＡｐｐｅａｒａｎｃｅ”，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，Ｖｏｌ．１４，Ｎｏ．１，ｐｐ．５−２４（１９９５）に開示されている）等がある。
【０００７】
この固有空間法に基づく顔画像認識システム（例えば、特開平５−２０４４２号公報に開示されている。）は、ほぼ一定の照明条件のもとで撮影され、かつ正確な位置合わせ、および顔サイズの正規化処理が既に施されている正面顔の識別に対して、有効であることが示されている。
【０００８】
【発明が解決しようとする課題】
しかしながら、上記従来の固有空間法においては、入力画像中の顔パタンの正確な位置決めと正規化処理を必要とするという点からも分かるように、画像中に含まれる顔パターンの画像内での位置ずれや、隠れ、背景ノイズなどに対して影響を受けやすいという問題点を有していた。
【０００９】
また、上記従来の固有空間法を用いた顔画像認識システムでは、正面を向いている顔画像であり、かつ顔パタンの正確な位置決めと正規化が行われているという制約条件の下では良好な結果を示すものの、現実的な環境で撮影した場合に起こることが予想される、顔の位置や姿勢（あるいは観測の視点）、照明条件、背景条件、時間経過などによる見え方の変化に対してロバストな顔画像認識を実現することに関しては、依然未解決の問題として残されている。
【００１０】
本発明は、認識しようとする画像に含まれる認識対象の位置が学習時と比べてずれている場合、認識対象の一部が隠れている場合、背景ノイズが存在する場合、複数の認識対象が存在する場合などにも対応できる顔画像識別システムを提供することを目的とする。
【００１１】
【課題を解決するための手段】
前記目的を達成するため、本発明の顔画像認識システムは、顔画像情報を取り込む顔画像情報入力手段と、前記顔画像情報入力手段が取り込んだ顔画像情報を変換する顔画像情報変換手段と、前記顔画像情報変換手段により変換された情報を記憶するモデル画像記憶手段と、前記顔画像情報変換手段により変換された情報と、前記モデル画像記憶手段に記憶されている情報とを照合する照合手段とを備える顔画像認識システムにおいて、前記顔画像情報変換手段は、さらに、ＨＳＩ変換を用いて前記顔画像情報から人間の顔の肌色に相当する領域を抽出する抽出手段を有し、前記抽出手段により抽出された領域から一致度マップを生成し、画像の濃淡値に基づくエッジ強度を求め、前記一致度マップと前記エッジ強度に基づいて、特徴的な小領域を選択し、前記選択された小領域のＲＧＢ値に対してＧｒａｙ−Ｓｃａｌｅ変換を施し、各画素のＧｒａｙ−Ｓｃａｌｅ値を要素とするベクトルを生成し、該ベクトルをベクトルの各要素の２乗和のルートで除算することによって明るさの正規化処理を施し、また、前記選択された各小領域に含まれる情報を固有空間に投影した点集合に変換し、前記照合手段は、前記モデル画像格納部に記憶されている情報の中から前記選択された小領域と最小距離を示す情報を探索し、前記選択された小領域と前記最小距離を示す情報との画像内相対位置を求め、前記モデル画像格納部に格納されている各情報に対応する投票空間に投票し、全投票空間上の得票数のピーク位置を検出することにより照合することを特徴とする。前記抽出手段を備えることにより、顔画像情報の処理においては、一般に取り扱うべき情報量が多く、処理に時間がかかる場合が多いが、処理量の軽減を図ることができる。
【００１２】
また、前記モデル画像記憶手段は、人物の顔を複数の視点から撮影した複数の顔画像情報について、それぞれ前記顔画像情報変換手段により変換した情報を記憶するのが好ましい。当該情報を記憶しておき、入力された認識対象の顔画像と照合することにより、認識対象の顔画像が正面から撮影されたものでない場合でも、顔画像の認識を行うことが可能となるからである。
【００１３】
また、前記モデル画像記憶手段は、複数の人物について、前記変換した情報を記憶することが好ましい。当該情報を記憶しておき、入力された認識対象の顔画像と照合することにより、認識対象である人物を特定することが可能となるからである。
【００１４】
また、前記顔画像情報入力手段として、ビデオカメラを用いることができる。この場合には、容易に認識対象である顔画像を撮影して、本発明の顔画像認識システムに取り込むことができる。
【００１５】
また、前記顔画像情報入力手段として、イメージスキャナを用いることもできる。この場合には、既に撮影されている顔画像の認識を行うことが可能となる。また、前記顔画像情報入力手段として、外部記憶装置を用いることもできる。近年、デジタルカメラのように、顔画像情報をデジタル情報として処理する装置も多く用いられるようになっており、前記顔画像情報入力手段として、外部記憶装置を用いることにより、デジタル化された情報についても、本発明の顔画像認識システムに取り込むことが可能となる。
【００１６】
前記顔画像情報変換手段はさらに、顔画像情報から人間の顔の部分に該当する情報を抽出する抽出手段を有し、前記顔画像情報入力手段がとり込んだ顔画像情報から特徴的な小領域を選択する際に、前記抽出手段により抽出された情報から前記小領域を選択するのが好ましい。顔画像情報の処理においては、一般に取り扱うべき情報量が多く、処理に時間がかかる場合が多いが、前記抽出手段を備えることにより、処理量の軽減を図ることができるからである。
【００１７】
また、前記顔画像情報変換手段は、前記特徴的な小領域を選択した後、当該選択された小領域の画素値を要素とする窓画像ベクトルに対して、明るさの正規化処理を施すことが好ましい。かかる処理を行うことにより、撮影時の照明の条件等による明るさの変化に対してもロバストな認識を行うことが可能となるからである。
【００１８】
以上のように、本発明の顔画像認識システムを用いることにより、認識しようとする画像に含まれる認識対象の位置が学習時と比べてずれている場合、認識対象の一部が隠れている場合、背景ノイズが存在する場合、複数の認識対象が存在する場合などにも対応できる顔画像識別システムを提供することができる。
【００１９】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照しながら説明する。
【００２０】
図１は、本発明の顔画像認識システムの全体構成を示すブロック図である。同図に示されるように、本発明の顔画像認識システムは、顔画像情報入力手段としての画像入力部１０１、顔画像情報変換手段、及び照合手段としての制御部１０２、モデル画像記憶手段としてのモデル画像格納部１０３、出力部１０４を備えている。
【００２１】
画像入力部１０１としては、例えばビデオカメラ、イメージスキャナ等が用いられ、画像入力部１０１を介して、学習画像及び認識対象画像となる顔画像情報が入力される。また、画像入力部１０１としては、補助記憶装置を接続して使用することも可能である。
【００２２】
制御部１０２は、入力処理部１０２１、認識処理部１０２２、表示処理部１０２３を備える。入力処理部１０２１は、画像入力部１０１からの入力を受け付ける。
【００２３】
認識処理部１０２２は、後述する学習フェーズでは、入力処理部１０２１が受け付けた画像情報から特徴データを抽出し、モデル画像情報として、モデル画像格納部１０３に格納する。また、認識処理部１０２２は、後述する認識フェーズでは、入力処理部１０２１が受け付けた認識対象の顔画像情報から特徴データを抽出し、学習フェーズでモデル画像格納部１０３に格納された情報と照合することにより顔画像の認識を行い、認識の結果を表示処理部１０２３に送る。特徴データの抽出処理については、後に詳細に説明する。
【００２４】
表示処理部１０２３は、認識処理部１０２２による認識の結果を出力部１０４に出力する。
【００２５】
モデル画像格納部１０３には、前記認識処理部１０２２により生成されたモデル画像情報が格納される。
【００２６】
出力部１０４には、顔画像の認識結果等が出力される。
【００２７】
ここで、本発明に係る顔画像認識システムで用いるＥｉｇｅｎ−Ｗｉｎｄｏｗ法の一般的な概念について説明する。Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法は、上記従来の固有空間法を改良したものであり、固有空間法が、認識対象が含まれる画像全体を一つのモデルとして扱うのに対して、画像中の認識対象を特徴的な小領域（以下、「窓画像」という。）に分割し、これらの集合で認識対象をモデル化する手法である。Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法を用いる三次元物体の認識では、認識対象を部分的な要素の集合として捉えることにより、固有空間法では実現できなかった、画像中に含まれる認識対象の位置ずれや隠れ、背景ノイズ、画像中に認識対象が複数存在する場合などに対応した認識が可能となる。
【００２８】
以下、Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法に基づく一般的な三次元物体の認識の処理過程について説明する。
【００２９】
Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法の処理過程は、入力画像からモデル画像情報を生成する学習フェーズ、学習フェーズで作成されたモデル画像情報と認識対象画像とを照合することにより、認識対象画像の認識を行う認識フェーズの２つのフェーズから構成される。
【００３０】
図２は、学習フェーズにおける処理過程の概略を示す図である。同図に示されるように、学習フェーズでは、まず、一つの認識対象を多視点から撮影した一連の学習画像（Ｍ枚）を取り込み、まず、エッジ強度に基づく窓選択を行う。
【００３１】
エッジ強度に基づく窓選択とは、画素の濃淡値を基にエッジ強度を計算し、ある閾値以上のエッジ強度を持つ画素を中心とするＮ画素×Ｎ画素の矩形小領域を窓画像として選択する処理である。
【００３２】
図３は、窓選択について説明するための図である。学習画像からＩ個の窓画像が選択されたとすると、以下の説明では、ｍ枚目の学習画像から選択されたｉ番目の窓画像をＺ_ｍ ^ｉと表すことにする。
【００３３】
エッジ強度に基づく窓選択処理が終了すると、ｍ枚目の学習画像に含まれるｉ番目の窓画像の濃淡値を要素とする列ベクトルｚ_ｍ ^ｉから構成される、下記（数１）に示される行列Ｚ_ｍに対して、下記（数２）に示す共分散行列Ｑを計算する。
【００３４】
【数１】

【００３５】
【数２】

【００３６】
ここで、ｃは選択された全ての窓画像の平均濃淡値ベクトルを表す。この行列Ｑの固有ベクトルｅ_ｉを対応する固有値λ_ｉの順にソートし、ある閾値以上の固有値を持つ固有ベクトルを基に、窓画像が持つ元々の次元Ｎ^２に対して十分小さな次元ｋで構成される固有空間への変換行列Ｒを構成する。変換行列Ｒは、具体的には下記の（数３）で表される。
【００３７】
【数３】

【００３８】
変換行列Ｒによって張られるｋ次元固有空間へ、学習窓画像Ｚ_ｍを投影し、対応する点群行列Ｇ_ｍを導く（これら一連の変換処理をＫａｒｈｕｎｅｎ−Ｌｏｅｖｅ展開、以下、略して「ＫＬ展開」という。）。点群行列Ｇ_ｍは、具体的には、下記の（数４）で表される。
【００３９】
【数４】

【００４０】
導かれた点群行列Ｇ_ｍ（以下、Ｇ_ｍを、「学習窓画像対応点」という。）を表す情報が、モデル画像格納部１０３に格納され、学習フェーズが終了する。図４は、ｋ＝３の場合の固有空間における学習窓画像対応点Ｇ_ｍの様子を表す図である。
【００４１】
次に、認識フェーズにおける処理について説明する。図５は、認識フェーズにおける処理内容の概要を示す図である。同図に示されるように、認識フェーズにおいては、認識対象となる画像の入力を受けて、入力された認識対象画像に対して、学習フェーズと同様に、エッジ強度に基づく窓選択およびＫＬ展開を行う。認識フェーズにおけるＫＬ展開では、選択された入力窓画像行列Ｚ_ｉｎに対応する固有空間上の点群行列Ｇ_ｉｎが求められる。Ｇ_ｉｎは、具体的には、下記の（数５）によって表される（以下、Ｇ_ｉｎを、「入力窓画像対応点」という）。これは、認識対象画像に関して上記（数４）と同様の内容を表すものである。
【００４２】
【数５】

【００４３】
上記の処理において得られた入力窓画像対応点Ｇ_ｉｎに含まれる各入力窓画像対応点ｇ_ｉｎ ^ｉについて、学習フェーズでモデル画像格納部１０３に格納されたＧ_ｍを表す情報との照合処理を行うことによって、識別対象画像が示す物体の特定、および当該物体の姿勢の特定が行われる。
【００４４】
以下、認識フェーズにおける照合処理の内容について説明する。照合処理は、最近傍点の探索と、投票とからなる。
【００４５】
最近傍点の探索処理においては、固有空間上のベクトル距離を評価基準として、各入力窓画像対応点ｇ_ｉｎ ^ｉに対して、対応点群行列Ｇ_ｍの中から最小距離を示す学習窓画像対応点ｇ_ｍ ^ｊを探索し、各入力窓画像ｚ_ｉｎ ^ｉに見えが最も類似する学習窓画像ｚ_ｍ ^ｊを求める。図６は、この際の固有空間の様子を具体的に示す図である。同図において、ｉ番目の入力窓画像対応点ｇ_ｉｎ ^ｉに対しては、学習窓画像対応点の一つｇ_ｍ ^ｊとが、最も類似することを示している。この場合には、学習窓画像対応点ｇ_ｍ ^ｊに対応する学習窓画像ｚ_ｍ ^ｊが、入力窓画像ｚ_ｉｎ ^ｉに見えが最も類似する学習窓画像として求められる。
【００４６】
投票処理とは、各入力窓画像ｚ_ｉｎ ^ｉに対して、上記で求められた最も類似する学習窓画像ｚ_ｍ ^ｊとの画像内相対位置Δｘ（ｚ_ｉｎ ^ｉ，ｚ_ｍ ^ｊ）を基に、ｍ枚目の学習画像に対応する投票空間に投票する処理である。上記画像内相対位置は、具体的には、下記の（数６）により求められる。
【００４７】
【数６】

【００４８】
上記（数６）において、ｐ［］は、全体画像中の窓画像位置を算出する演算式である。投票が終了すると、ｍ枚の各学習画像に対応するＭ個の投票空間上における得票数のピーク位置を検出することにより、入力画像中に含まれる対象に最も類似する学習画像を特定することができる。図７は、投票空間の様子を具体的に示す図である。ここで、得票数のピーク位置が複数検出された場合には、モデル画像に合致する認識対象が複数存在するものと認識することができる。
【００４９】
特定された学習画像に基づいて、入力画像中に含まれる認識対象の物体の種類、位置、姿勢等を推定することが可能となる。
【００５０】
次に、以上に説明したような概念に基づく処理により実現されるＥｉｇｅｎ−Ｗｉｎｄｏｗ法の手法を、顔画像認識システムに適用する方法について説明する。
【００５１】
本発明に係る顔画像認識システムでは、上記に説明したようなＥｉｇｅｎ−Ｗｉｎｄｏｗ法を実現するための種々の計算を、認識処理部１０２２が行うが、認識対象画像が人間の顔画像であることから、種々の改良を行っている。
【００５２】
まず、改良点の一つとして、学習フェーズに先立って初期設定処理を行うこととしている。初期設定処理とは、本実施の形態の顔画像認識システムに対して、顔画像の肌色を認識させることにより、より顔画像の認識精度を高めるとともに、認識処理の効率を向上させるべく行う処理である。
【００５３】
図８は、初期設定処理における認識処理部１０２２の処理内容を示すフローチャートである。同図に示されるように、認識処理部１０２２は、まず、入力部１０１を介してサンプル画像読み取り処理を行う（Ｓ８０１）。図９は、サンプル画像の一例を示す図である。サンプル画像とは、肌色を含む画像であり、図９に示されるように、画像の中で肌色の部分には、指定領域として、利用者によってあらかじめマークが付けられている。認識処理部１０２２は、当該マークが付けられている肌色に相当する画素に対して、ＨＳＩ変換を行い、ＨＳＩ値を求める（Ｓ８０２）。
【００５４】
ＨＳＩ変換とは、各画素にＲｅｄ、Ｇｒｅｅｎ、Ｂｌｕｅの値（通常、各２５６段階）が記述されたＲＧＢ画像を、Ｈｕｅ（色相）、Ｓａｔｕｒａｔｉｏｎ（彩度）、Ｉｎｔｅｎｓｉｔｙ（輝度）の三属性で表されたＨＳＩ画像に変換する処理である。上記ＨＳＩの三属性は、照明条件の変化などに起因する明るさの変化に対して変動を受け難いという特性を有するため、明るさの変化に対してロバストな顔画像の認識を行うことが可能となる。
【００５５】
さらに、認識処理部１０２２は、肌色領域の抽出を行う（Ｓ８０３）。
【００５６】
肌色領域の抽出とは、前述の如くＨＳＩ変換された肌色部分の画像の持つＨＳＩ値を基に、各画素をＨＳＩ空間上の点としてプロットし、これらの点が適切に含まれるように、以下の（数７）に示すような肌色領域のパラメータＨｍｉｎ、Ｈｍａｘ、Ｓｍｉｎ、Ｓｍａｘ、Ｉｍｉｎ、Ｉｍａｘを決定する処理である。
【００５７】
【数７】

【００５８】
図１０は、上記ＨＳＩ空間における肌色領域の抽出について、具体的に説明するための図である。同図に示される直方体の部分が、ＨＳＩ空間における肌色領域である。
【００５９】
以上のような初期設定処理を終了した本実施の形態の顔画像認識システムによる、学習フェーズと認識フェーズにおける認識処理部１０２２の処理内容について、以下に図面を参照しながら説明する。
【００６０】
図１１は、学習フェーズにおける認識処理部１０２２の処理内容を示すフローチャートである。同図に示される如く、本発明に係る顔画像認識システムでは、最初に説明したＥｉｇｅｎ−Ｗｉｎｄｏｗ法に基づくエッジ強度に基づく窓選択に先立って、肌色情報に基づく窓選択を行う（Ｓ１１０１）。認識対象が人間の顔画像であることに基づくＥｉｇｅｎ−Ｗｉｎｄｏｗ法の改良点の一つである。
【００６１】
肌色情報に基づく窓選択とは、初期設定処理において、本実施の形態の顔画像認識システムに認知された肌色情報に基づき、学習画像において肌色に相当する領域のみから窓選択を行う処理である。このような処理を行うことにより、エッジ強度に基づく窓選択等、その後の処理の効率を向上させることが可能となる。
【００６２】
図１２は、肌色情報に基づく窓選択の詳細な処理内容を示すフローチャートである。同図に示されるように、肌色情報に基づく窓選択においては、まず、各学習画像にＨＳＩ変換を施し、変換された各画素のＨＳＩ値を基に、人間の顔の肌色に相当する領域を抽出する（Ｓ１２０１）。
【００６３】
上述の如く、初期設定処理において肌色領域に該当するＨＳＩ値の範囲が抽出されているので、ここでは、当該肌色領域に該当するＨｕｅ値ｈを示す画素のみを選択することになる。
【００６４】
次に一致度マップの生成を行う（Ｓ１２０２）。一致度マップとは、ＨＳＩ空間上において、上記ステップＳ１２０１にて選択された画素のＨＳＩ値の位置が、前記初期設定処理にてマークされた肌色領域内における画素のＨＳＩ値を含む肌色領域における中心位置とどのくらい近いかを計算した一致度の値を画像上の画素位置に書き込んだものである。一致度の計算は、具体的には以下のようにして行う。即ち、例えば、ＨＳＩ空間上の肌色領域の直方体の中心に近い位置にプロットされた点に対しては、中心との位置が近いので、高い一致度を割り当て、直方体の側面に近い位置にプロットされた点に対しては、中心との位置が近くないので、低い一致度を与える。なお、肌色領域の外部に位置する点に対しては一致度０を割り当てる。図１３（ａ）は、一致度マップの内容を具体的に表した図である。
【００６５】
さらに、この一致度マップに対して、閾値処理を行い、あらかじめ設定された閾値よりも高い一致度を示す画素を窓画像候補点として選択する（Ｓ１２０３）。図１３（ｂ）に、選択された窓画像候補点の内容を具体的に示す。
【００６６】
上記、肌色情報に基づく窓選択を終了すると、図１１のフローチャートに戻って、エッジ強度に基づく窓選択を行う（Ｓ１１０２）。エッジ強度に基づく窓選択については、上記Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法の説明において既に説明しているので、ここでの詳細な説明は省略するが、本実施の形態の顔画像認識システムにおいては、上記肌色情報に基づく窓選択の処理にて選択された肌色を示す画素に対してのみ、画像の濃淡値を基にエッジ強度が計算され、窓選択が行われることになる。
【００６７】
次に、本実施の形態の窓画像認識システムでは、明るさの正規化処理を行う（Ｓ１１０３）。これは、顔画像の撮影時に起こりうる照明及び環境の変化等に起因する対象の見えの変化に対して、ある程度ロバストな認識を可能にするためにＥｉｇｅｎ−Ｗｉｎｄｏｗ法を改良した点の一つである。
【００６８】
図１４は、明るさの正規化処理の詳細な処理内容を示すフローチャートである。同図に示されるように、明るさの正規化処理においては、まず、窓画像として選択されたＲＧＢ値に対してＧｒａｙ−Ｓｃａｌｅ変換を施し、一つのＧｒａｙ−Ｓｃａｌｅ値（通常、２５６段階の濃淡値である。）で表現されたＧｒａｙ−Ｓｃａｌｅ窓画像に変換する（Ｓ１４０１）。
【００６９】
次に、この窓画像の各画素のＧｒａｙ−Ｓｃａｌｅ値ｚ１、ｚ２、・・・、ｚＮ（Ｎは窓画像の全画素数）を要素とする窓画像ベクトルＺを生成し（Ｓ１４０２）、このベクトルの各要素ｚ_ｉの２乗和のルート（以下、この値をベクトルの「ノルム」という。）を計算する（Ｓ１４０３）。以下、窓画像ベクトルＺのノルムの値を‖Ｚ‖で表す。
【００７０】
さらに、窓画像ベクトルＺをこのノルムの値‖Ｚ‖で除算することにより、明るさの正規化処理を行い、その結果得られた正規化窓画像ベクトルＺ´を、明るさの正規化処理の結果として出力する（Ｓ１４０４）。
【００７１】
出力された正規化窓画像ベクトルＺ´について、ＫＬ展開を行うことによって（Ｓ１１０５）、学習窓画像対応点の情報がモデル画像格納部１０３に格納されるが（Ｓ１１０６）、ＫＬ展開については、Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法に関する説明のところで詳述しているので、ここでの詳細な説明は省略する。
【００７２】
尚、ＫＬ展開を行うためには、準備された全ての学習画像を読み込んでおく必要があるので、ＫＬ展開を行う前に、学習画像をすべて読み込むようにしている（Ｓ１１０４）。
【００７３】
次に、準備された全ての学習画像をＫＬ展開し、その結果をモデル画像格納部１０３に格納すると、学習フェーズが終了する。
【００７４】
次に、モデル画像格納部１０３に格納された学習窓画像対応点の情報を用いて、認識対象画像を認識する認識フェーズにおける、本実施の形態の顔画像認識システムの処理内容について説明する。
【００７５】
図１５は、認識フェーズにおける認識処理部１０２２の処理内容を示すフローチャートである。同図に示される如く、認識フェーズでは、認識対象となる顔画像情報について、学習フェーズと同様に、肌色情報に基づく窓選択（Ｓ１５０１）、エッジ強度に基づく窓選択（Ｓ１５０２）、明るさの正規化（Ｓ１５０３）を行う。それぞれの処理の内容については、学習フェーズで説明した内容と同一であるので、ここでの詳しい説明は省略する。
【００７６】
認識フェーズでは、明るさの正規化を行った後、ＫＬ展開（Ｓ１５０４）、最近傍点の探索（Ｓ１５０５）、及び投票処理（Ｓ１５０６）を行う。
【００７７】
この部分の処理内容については、最初にＥｉｇｅｎ−Ｗｉｎｄｏｗ法について説明したところで詳述しているので、ここでの説明は省略する。
【００７８】
以上のような、認識フェーズの処理を行うことにより、入力された顔画像に対応する人物の名前、姿勢等を特定することができる。
【００７９】
【実施例】
図１６に、本発明の一実施例における顔画像認識システムの構成を示す。
【００８０】
同図に示されるように、本実施例においては、本発明の顔画像認識システムは、床に固定された椅子１６０１、椅子の角度を計測するためのゲージ１６０２、顔画像撮影用カメラ１６０３、および認識処理を行うコンピュータ１６０４などから構成されている。
【００８１】
学習した範囲内の任意の角度を向く椅子に腰掛けた人物の顔画像をカメラから取り込み、この入力された顔画像と、あらかじめ学習画像から作成したモデルとの照合を行うことにより、顔の識別と姿勢同定を行うことができる。
【００８２】
まず、学習フェーズでは、学習画像として、５人の人物を撮影用カメラから１２５０ｍｍ離れた椅子に座らせて、０度（右真横）から１８０度（左真横）まで３０度おきに撮影した計３５枚の画像を用いた。図１７は、本実施例において使用した学習画像を示す図である。
【００８３】
撮影時には特別な照明は使用せず、室内灯のみの環境下で行った。
【００８４】
取得した画像のサイズは１２０×１６０画素であり、そのＲＧＢ値を濃淡値画像に変換した後、エッジ強度に基づく窓画像（Ｎ＝１０）選択を行い、ｋ＝１０次元の固有空間上へ投影した。
【００８５】
この時の３５枚の学習画像に対する総窓画像数は３４８５個であった。
【００８６】
次に認識フェーズでは、以下に示す三つの場合のそれぞれに対して、５人の人物を０度から１８０度まで１５度おきに撮影し、これら計１９５枚の画像（学習画像と同サイズ）を評価用の入力画像（認識対象画像）として用いた。図１８に、本実施例の認識フェーズで使用した入力画像を示す。
【００８７】
第１の実施例として、カメラ設置位置を１２０ｍｍ移動し、相対的に画像中の顔位置を移動させた画像を入力した場合（６５枚）、第２の実施例として、被撮影者が口元を手で覆い隠した画像を入力した場合（６５枚）、及び第３の実施例として、学習画像と異なる複雑な背景を含む画像を入力した場合（６５枚）のそれぞれについての、実施の結果を図１９および図２０に示す。
【００８８】
図１９に示された数字は、顔の姿勢に関わらず、入力画像中に含まれる人物を正しく特定できた場合を正解とした個人識別率を表している。また、図２０に示された数字は、顔の姿及び入力画像中に含まれる人物を正しく特定できた場合を正解とした個人識別／姿勢特定率を表している。
【００８９】
尚、いずれの場合も、比較のため、固有空間法を用いた場合の個人識別率を下段に示している。
【００９０】
このように、本発明のに係る顔画像認識システムは、従来の固有空間法に基づく同様のシステムと比較して、対象の位置ずれ、隠れ、背景ノイズが存在する場合においても、識別率が向上することが確認された。
【００９１】
【発明の効果】
以上に説明したように、本発明に係る顔画像認識システムは、従来の固有空間法が有していた、位置ずれに弱い等の欠点を克服したＥｉｇｅｎ−Ｗｉｎｄｏｗ法に基づく顔画像認識手法を用いるため、認識対象画像が対象の位置ずれや隠れ、背景ノイズを含む場合、また複数の顔が画像内に存在する場合などにも対応することが可能な顔画像認識システムを実現することができるという効果を奏する。
【００９２】
また、ＨＳＩ変換した後のＨｕｅ値を用いて、顔に相当する肌色領域に対してのみから窓画像を選択するため、顔の部分以外の部分に位置する窓画像に対する処理を極力排除でき、結果的に、外乱を含む画像に対しても認識率を向上させるとともに、認識フェーズにおける処理速度を向上させることができるという効果がある。
【００９３】
さらに、各窓画像ごとに明るさの正規化処理を施すため、画像全体の明るさが均一に変動した場合だけでなく、影などにより局所的に明るさが変動する場合においても、安定な顔画像認識が可能になるという効果がある。
【図面の簡単な説明】
【図１】本発明の一実施の形態に係る顔画像認識システムの全体構成を示すブロック図である。
【図２】Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法の学習フェーズにおける処理過程の概略を示す図である。
【図３】窓選択について説明するための図である。
【図４】学習窓画像対応点の様子を表す図である。
【図５】Ｅｉｇｅｎ−Ｗｉｎｄｏｗ法の認識フェーズにおける処理過程の概略を示す図である。
【図６】最近傍点の探索処理における固有空間の様子を具体的に示す図である。
【図７】投票空間の様子を具体的に示す図である。
【図８】初期設定処理における認識処理部の処理内容を示すフローチャートである。
【図９】初期設定処理において用いられるサンプル画像の一例を示す図である。
【図１０】ＨＳＩ空間における肌色領域の抽出について、具体的に説明するための図である。
【図１１】学習フェーズにおける認識処理部の処理内容を示すフローチャートである。
【図１２】肌色情報に基づく窓選択の詳細な処理内容を示すフローチャートである。
【図１３】（ａ）一致度マップの内容を具体的に説明するための図である。
（ｂ）選択された窓画像候補点の内容を具体的に示す図である。
【図１４】明るさの正規化処理の詳細な処理内容を示すフローチャートである。
【図１５】認識フェーズにおける認識処理部の処理内容を示すフローチャートである。
【図１６】本発明の一実施例における顔画像認識システムの構成を示す図である。
【図１７】本実施例の学習フェーズにおいて使用した学習画像を示す図である。
【図１８】本実施例の認識フェーズにおいて使用した入力画像を示す図である。
【図１９】本実施例において、顔の姿勢に関わらず、入力画像中に含まれる人物を正しく特定できた場合を正解とした個人識別率を表す図である。
【図２０】本実施例において、顔の姿及び入力画像中に含まれる人物を正しく特定できた場合を正解とした個人識別／姿勢特定率を表す図である。
【符号の説明】
１０１入力部
１０２制御部
１０２１入力処理部
１０２２認識処理部
１０２３表示処理部
１０３モデル画像格納部
１０４出力部[0001]
TECHNICAL FIELD OF THE INVENTION
According to the present invention, information obtained by converting face image information of a person is stored in a database, and at the time of recognition, input face image information is subjected to a predetermined conversion, and then is compared with information stored in the database. And a face image recognition system for recognizing a face image.
[0002]
[Prior art]
In recent years, attempts to recognize facial images with computers have been vigorously made, since it would be useful if computers could read the information represented by faces, which would help realize a natural interface between humans and machines. ing. Also, if personal computers can replace and support personal identification based on facial images, applications to facial image database recognition systems for criminal investigations and access control systems for security are expected. it can. As described above, the face image recognition technology is one of the important issues in the advancement of the man-machine interface, and it is expected that the importance will be further increased in the future.
[0003]
In face image recognition by a computer, a pattern of an input image whose appearance changes according to a photographing condition represented by a posture of a face or illumination of a photographing position is identified by, for example, the name of the person. It is required to output the result information stably.
[0004]
To deal with such a problem, components such as eyes and mouth (hereinafter, referred to as “face components”) are accurately extracted from the input image on the assumption that the face in the image is facing forward. There is a face part-based method for performing identification based on individuality in the shape and arrangement of these face parts.
[0005]
However, in the face part-based method, it is very difficult to accurately extract the diagrammatic shape and the position of the face part from a face image photographed in a real environment in the first place. There is also a limit in describing subtle differences in facial patterns of individuals who are similar in shape only by a diagrammatic shape.
[0006]
Therefore, as an approach that has recently attracted attention, a pattern matching method has been proposed in which a pattern of a face image is expressed as a two-dimensional array of grayscale values at each pixel, and identification is performed by matching. A representative example of this method is an eigenspace method that performs pattern matching on a low-dimensional space called an eigenspace obtained by compressing high-dimensional image data (for example, Hiroshi Murase and Shree K. Nayer, “Visual Learning and Recognition”). of 3-D Objects from Appearance ", International Journal of Computer Vision, Vol. 14, No. 1, pp. 5-24 (1995)).
[0007]
A face image recognition system based on this eigenspace method (for example, disclosed in Japanese Patent Application Laid-Open No. H5-20442) is used for photographing under almost constant lighting conditions, accurate alignment, and face size. Is effective for the identification of a frontal face that has already been subjected to the normalization processing.
[0008]
[Problems to be solved by the invention]
However, in the above-mentioned conventional eigenspace method, as can be seen from the fact that accurate positioning and normalization processing of the face pattern in the input image is required, the position of the face pattern included in the image in the image is determined. It has a problem that it is easily affected by displacement, occlusion, background noise, and the like.
[0009]
Further, in the face image recognition system using the conventional eigenspace method, the face image is a face image facing the front, and a favorable condition is obtained under the constraint that accurate positioning and normalization of the face pattern are performed. Despite the results, the change in appearance due to the face position and orientation (or observation viewpoint), lighting conditions, background conditions, time lapse, etc. that is expected to occur when shooting in a realistic environment Achieving robust face image recognition remains an unsolved problem.
[0010]
In the present invention, when the position of the recognition target included in the image to be recognized is shifted from that at the time of learning, when part of the recognition target is hidden, when background noise exists, a plurality of recognition targets are It is an object of the present invention to provide a face image identification system that can cope with a case where the face image exists.
[0011]
[Means for Solving the Problems]
In order to achieve the object, a face image recognition system of the present invention includes a face image information input unit that captures face image information, a face image information conversion unit that converts the face image information captured by the face image information input unit, A model image storage unit for storing information converted by the face image information conversion unit; a matching unit for checking information converted by the face image information conversion unit with information stored in the model image storage unit In the face image recognition system comprising: the face image information conversion means,And extracting means for extracting a region corresponding to the skin color of a human face from the face image information using HSI conversion, generating a coincidence map from the region extracted by the extracting means, Is determined based on the coincidence map and the edge strength, a characteristic small region is selected, and a RGB value of the selected small region is subjected to a Gray-Scale conversion, and each pixel is subjected to a Gray-Scale conversion. A vector having a Gray-Scale value as an element is generated, and the vector is divided by the root of the sum of squares of each element of the vector to perform a brightness normalization process.Converts the information contained in each selected small area to a point set projected onto the eigenspaceThe matching unit searches information stored in the model image storage unit for information indicating the selected small area and the minimum distance, and searches the information indicating the selected small area and the minimum distance. And the relative position in the image is obtained, the voting is performed in the voting space corresponding to each information stored in the model image storage unit, and the matching is performed by detecting the peak position of the number of votes in the entire voting space.It is characterized by doing.By providing the extraction means, in the processing of face image information, the amount of information to be generally handled is large and the processing often takes a long time, but the processing amount can be reduced.
[0012]
Further, it is preferable that the model image storage unit stores information obtained by converting the plurality of face image information obtained by photographing a person's face from a plurality of viewpoints by the face image information conversion unit. By storing the information and collating it with the input face image of the recognition target, the face image can be recognized even if the recognition target face image is not taken from the front. It is.
[0013]
It is preferable that the model image storage unit stores the converted information for a plurality of persons. This is because it is possible to specify the person to be recognized by storing the information and collating it with the input facial image of the recognition target.
[0014]
Also, a video camera can be used as the face image information input means. In this case, the face image to be recognized can be easily photographed and taken into the face image recognition system of the present invention.
[0015]
Further, an image scanner can be used as the face image information input means. In this case, it is possible to recognize a face image that has already been captured. Further, an external storage device can be used as the face image information input means. In recent years, devices for processing face image information as digital information, such as digital cameras, have also been used in many cases. By using an external storage device as the face image information input means, digitalized information can be obtained. Can be taken into the face image recognition system of the present invention.
[0016]
The face image information converting means further includes an extracting means for extracting information corresponding to a human face portion from the face image information, and a characteristic small area from the face image information captured by the face image information input means. It is preferable to select the small area from the information extracted by the extracting means when selecting. In the processing of face image information, the amount of information to be handled is generally large, and the processing often takes a long time. However, the provision of the extraction means can reduce the processing amount.
[0017]
Further, the face image information converting means performs a brightness normalization process on a window image vector having a pixel value of the selected small region as an element after selecting the characteristic small region. Is preferred. By performing such processing, it is possible to perform robust recognition even for a change in brightness due to lighting conditions or the like at the time of shooting.
[0018]
As described above, by using the face image recognition system of the present invention, when the position of the recognition target included in the image to be recognized is shifted from that at the time of learning, when the part of the recognition target is hidden In addition, it is possible to provide a face image identification system that can cope with a case where background noise is present or a case where a plurality of recognition targets are present.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0020]
FIG. 1 is a block diagram showing the overall configuration of the face image recognition system of the present invention. As shown in the figure, the face image recognition system of the present invention includes an image input unit 101 as face image information input means, a face image information conversion means, a control unit 102 as collation means, and a model image storage means. A model image storage unit 103 and an output unit 104 are provided.
[0021]
As the image input unit 101, for example, a video camera, an image scanner, or the like is used, and through the image input unit 101, face image information to be a learning image and a recognition target image is input. In addition, as the image input unit 101, an auxiliary storage device can be connected and used.
[0022]
The control unit 102 includes an input processing unit 1021, a recognition processing unit 1022, and a display processing unit 1023. The input processing unit 1021 receives an input from the image input unit 101.
[0023]
In a learning phase described later, the recognition processing unit 1022 extracts feature data from the image information received by the input processing unit 1021, and stores the extracted feature data in the model image storage unit 103 as model image information. Also, in a recognition phase described later, the recognition processing unit 1022 extracts feature data from the face image information of the recognition target received by the input processing unit 1021, and compares the extracted feature data with information stored in the model image storage unit 103 in the learning phase. Thus, the face image is recognized, and the recognition result is sent to the display processing unit 1023. The feature data extraction process will be described later in detail.
[0024]
The display processing unit 1023 outputs the result of recognition by the recognition processing unit 1022 to the output unit 104.
[0025]
The model image storage unit 103 stores the model image information generated by the recognition processing unit 1022.
[0026]
The output unit 104 outputs a face image recognition result and the like.
[0027]
Here, a general concept of the Eigen-Window method used in the face image recognition system according to the present invention will be described. The Eigen-Window method is an improvement of the above-mentioned conventional eigenspace method. The eigenspace method treats the entire image including the recognition target as one model, whereas the eigenspace method distinguishes the recognition target in the image. This is a method of dividing into small regions (hereinafter, referred to as “window images”), and modeling a recognition target with these sets. In the three-dimensional object recognition using the Eigen-Window method, the recognition target is regarded as a set of partial elements, so that the position shift, occlusion, and background of the recognition target included in the image, which cannot be realized by the eigenspace method, are realized. Recognition corresponding to noise, a case where a plurality of recognition targets exist in an image, or the like can be performed.
[0028]
Hereinafter, a process of recognizing a general three-dimensional object based on the Eigen-Window method will be described.
[0029]
The processing steps of the Eigen-Window method include a learning phase for generating model image information from an input image, and a recognition phase for recognizing a recognition target image by comparing model image information created in the learning phase with a recognition target image. It consists of two phases.
[0030]
FIG. 2 is a diagram showing an outline of a process in the learning phase. As shown in the drawing, in the learning phase, first, a series of learning images (M images) obtained by photographing one recognition target from multiple viewpoints are taken, and first, window selection based on edge strength is performed.
[0031]
The window selection based on the edge strength means that the edge strength is calculated based on the gray value of the pixel, and a rectangular small area of N pixels × N pixels centering on a pixel having an edge strength not less than a certain threshold value is selected as the window image. Processing.
[0032]
FIG. 3 is a diagram for explaining window selection. Assuming that I window images are selected from the learning images, in the following description, the i-th window image selected from the m-th learning image is represented by Z_m ⁱWill be expressed as
[0033]
When the window selection process based on the edge strength is completed, the column vector z having the grayscale value of the i-th window image included in the m-th learning image as an element_m ⁱAnd a matrix Z shown in the following (Equation 1)_m Then, a covariance matrix Q shown in the following (Equation 2) is calculated.
[0034]
(Equation 1)

[0035]
(Equation 2)

[0036]
Here, c represents an average gray value vector of all the selected window images. The eigenvector e of this matrix Q_i The corresponding eigenvalue λ_i , And based on the eigenvectors having eigenvalues equal to or greater than a certain threshold, the original dimension N of the window image² , A transformation matrix R into an eigenspace composed of a sufficiently small dimension k is constructed. The transformation matrix R is specifically represented by the following (Equation 3).
[0037]
(Equation 3)

[0038]
To the k-dimensional eigenspace spanned by the transformation matrix R, the learning window image Z_m And the corresponding point cloud matrix G_m (A series of these conversion processes is referred to as Karhunen-Loeve expansion, and hereinafter, is abbreviated as “KL expansion”). Point group matrix G_m Is specifically represented by the following (Equation 4).
[0039]
(Equation 4)

[0040]
Derived point group matrix G_m (Hereinafter G_m Is referred to as a “learning window image corresponding point”. ) Is stored in the model image storage unit 103, and the learning phase ends. FIG. 4 shows a learning window image corresponding point G in the eigenspace when k = 3._m FIG.
[0041]
Next, processing in the recognition phase will be described. FIG. 5 is a diagram showing an outline of the processing contents in the recognition phase. As shown in the drawing, in the recognition phase, an image to be recognized is input, and the input recognition target image is subjected to window selection and KL expansion based on edge strength in the same manner as in the learning phase. Do. In the KL expansion in the recognition phase, the selected input window image matrix Z_inPoint matrix G on the eigenspace corresponding to_inIs required. G_inIs specifically represented by the following (Equation 5) (hereinafter, G_inAre referred to as “input window image corresponding points”). This represents the same content as the above (Equation 4) regarding the recognition target image.
[0042]
(Equation 5)

[0043]
Input window image corresponding point G obtained in the above processing_inG corresponding to each input window image included in_in ⁱ  For G stored in the model image storage unit 103 in the learning phase._m  By performing the matching process with the information representing the object, identification of the object indicated by the identification target image and identification of the posture of the object are performed.
[0044]
Hereinafter, the content of the matching process in the recognition phase will be described. The matching process includes searching for the nearest point and voting.
[0045]
In the search process of the nearest neighbor point, each input window image corresponding point g_in ⁱ  For the corresponding point cloud matrix G_m  Learning window image corresponding point g indicating the minimum distance from_m ^jAnd input window images z_in ⁱ  Learning window image z with the most similar appearance_m ^jAsk for. FIG. 6 is a diagram specifically showing a state of the eigenspace at this time. In the figure, the i-th input window image corresponding point g_in ⁱ  , One of the learning window image corresponding points g_m ^jIndicate that they are the most similar. In this case, the learning window image corresponding point g_m ^jLearning window image z corresponding to_m ^jIs the input window image z_in ⁱ  Is obtained as a learning window image having the most similar appearance.
[0046]
The voting process means that each input window image z_in ⁱ  To the most similar learning window image z determined above_m ^jRelative position Δx (z_in ⁱ  , Z_m ^j) Is a process of voting in the voting space corresponding to the m-th learning image. The relative position in the image is specifically determined by the following (Equation 6).
[0047]
(Equation 6)

[0048]
In the above (Equation 6), p [] is an arithmetic expression for calculating the position of the window image in the entire image. When the voting is completed, the peak position of the number of votes in the M voting spaces corresponding to the m learning images is detected to identify the learning image most similar to the target included in the input image. it can. FIG. 7 is a diagram specifically showing a state of the voting space. Here, when a plurality of peak positions of the number of votes are detected, it can be recognized that there are a plurality of recognition targets that match the model image.
[0049]
Based on the identified learning image, it is possible to estimate the type, position, posture, and the like of the recognition target object included in the input image.
[0050]
Next, a method of applying the Eigen-Window method realized by the processing based on the concept described above to a face image recognition system will be described.
[0051]
In the face image recognition system according to the present invention, the recognition processing unit 1022 performs various calculations for realizing the Eigen-Window method as described above. However, since the recognition target image is a human face image, Various improvements have been made.
[0052]
First, as one of the improvements, an initial setting process is performed prior to the learning phase. The initial setting process is a process performed by the face image recognition system according to the present embodiment to recognize the skin color of the face image, thereby increasing the recognition accuracy of the face image and improving the efficiency of the recognition process. is there.
[0053]
FIG. 8 is a flowchart illustrating the processing contents of the recognition processing unit 1022 in the initial setting processing. As shown in the drawing, the recognition processing unit 1022 first performs a sample image reading process via the input unit 101 (S801). FIG. 9 is a diagram illustrating an example of a sample image. The sample image is an image including a flesh color. As shown in FIG. 9, a flesh color portion in the image is marked in advance by the user as a designated area. The recognition processing unit 1022 performs the HSI conversion on the pixel corresponding to the skin color with the mark.Find the HSI value(S802).
[0054]
The HSI conversion means that an RGB image in which Red, Green, and Blue values (normally, 256 levels each) are described in each pixel is represented by three attributes of Hue (hue), Saturation (saturation), and Intensity (luminance). This is a process of converting the image into a converted HSI image. Since the above three attributes of the HSI have a characteristic that they are hardly affected by a change in brightness caused by a change in lighting conditions, it is possible to recognize a face image robust to a change in brightness. It becomes.
[0055]
Further, the recognition processing unit 1022 extracts a skin color area (S803).
[0056]
The extraction of the flesh-colored area means that each pixel is plotted as points in the HSI space based on the HSI value of the image of the flesh-colored part subjected to the HSI conversion as described above. This is a process of determining the parameters Hmin, Hmax, Smin, Smax, Imin, and Imax of the flesh color area as shown in (Expression 7).
[0057]
(Equation 7)

[0058]
FIG. 10 is a diagram for specifically explaining the extraction of the flesh color region in the HSI space. The rectangular parallelepiped portion shown in the figure is a flesh color area in the HSI space.
[0059]
The processing contents of the recognition processing unit 1022 in the learning phase and the recognition phase by the face image recognition system according to the present embodiment that has completed the above initial setting processing will be described below with reference to the drawings.
[0060]
FIG. 11 is a flowchart showing the processing contents of the recognition processing unit 1022 in the learning phase. As shown in the figure, in the face image recognition system according to the present invention, window selection based on skin color information is performed prior to window selection based on edge strength based on the Eigen-Window method described first (S1101). This is one of the improvements of the Eigen-Window method based on the fact that the recognition target is a human face image.
[0061]
The window selection based on the skin color information is a process of selecting a window from only the region corresponding to the skin color in the learning image based on the skin color information recognized by the face image recognition system of the present embodiment in the initial setting process. By performing such processing, it is possible to improve the efficiency of subsequent processing such as window selection based on edge strength.
[0062]
FIG. 12 is a flowchart showing detailed processing contents of window selection based on skin color information. As shown in the figure, in window selection based on skin color information, first, HSI conversion is performed on each learning image, and an area corresponding to the skin color of a human face is determined based on the HSI value of each converted pixel. It is extracted (S1201).
[0063]
As described above, since the range of the HSI value corresponding to the skin color region is extracted in the initial setting process, here, only the pixel having the Hue value h corresponding to the skin color region is selected.
[0064]
Next, a matching degree map is generated (S1202). What is a match map?In HSI space,HSI value of the pixel selected in step S1201Is, In the initial setting processmarkWithin the skin color regionCenter in skin color area including HSI value of pixelpositionHow closeCalculationdidThe value of the degree of coincidence is written at the pixel position on the image. The calculation of the degree of coincidence is specifically performed as follows. That is, for example,On HSI spaceFor points plotted near the center of the cuboid in the skin color area, Because it is close to the center,For points assigned high agreement and plotted close to the sides of the cuboidBecause the position with the center is not close,Give low match. Note that a degree of coincidence of 0 is assigned to a point located outside the skin color area. FIG. 13A is a diagram specifically illustrating the contents of the coincidence degree map.
[0065]
Further, threshold processing is performed on the coincidence map, and a pixel having a higher coincidence than a preset threshold is selected as a window image candidate point (S1203). FIG. 13B specifically shows the contents of the selected window image candidate point.
[0066]
When the window selection based on the skin color information is completed, the process returns to the flowchart of FIG. 11 and the window selection based on the edge strength is performed (S1102). Since the window selection based on the edge strength has already been described in the description of the Eigen-Window method, a detailed description thereof is omitted here. However, in the face image recognition system of the present embodiment, the skin color information The edge strength is calculated based on the grayscale value of the image only for the pixel indicating the skin color selected in the window selection processing based on, and the window is selected.
[0067]
Next, in the window image recognition system of the present embodiment, brightness normalization processing is performed (S1103). This is one of the points in which the Eigen-Window method has been improved in order to enable a somewhat robust recognition of a change in the appearance of an object due to a change in illumination and environment that can occur when capturing a face image. is there.
[0068]
FIG. 14 is a flowchart showing the details of the brightness normalization process. As shown in the figure, in the brightness normalization process, first, a Gray-Scale conversion is performed on the RGB values selected as the window image, and one Gray-Scale value (usually 256 shades of gray) is applied. Is converted into a Gray-Scale window image represented by the following formula (S1401).
[0069]
Next, a window image vector Z having elements of Gray-Scale values z1, z2,..., ZN (N is the total number of pixels of the window image) of each pixel of the window image is generated (S1402). Each element z_i (Hereinafter, this value is referred to as the “norm” of the vector) (S1403). Hereinafter, the norm value of the window image vector Z is represented by {Z}.
[0070]
Further, by dividing the window image vector Z by the norm value {Z}, brightness normalization processing is performed, and the resulting normalized window image vector Z ′ is used for brightness normalization processing. The result is output (S1404).
[0071]
By performing KL expansion on the output normalized window image vector Z ′ (S1105), information on learning window image corresponding points is stored in the model image storage unit 103 (S1106). Since the detailed description is given in the description of the -window method, the detailed description is omitted here.
[0072]
In order to perform KL expansion, it is necessary to read all prepared learning images, so that all learning images are read before performing KL expansion (S1104).
[0073]
Next, when all the prepared learning images are subjected to KL expansion and the results are stored in the model image storage unit 103, the learning phase ends.
[0074]
Next, the processing contents of the face image recognition system of the present embodiment in the recognition phase of recognizing the recognition target image using the information on the learning window image corresponding points stored in the model image storage unit 103 will be described.
[0075]
FIG. 15 is a flowchart illustrating the processing content of the recognition processing unit 1022 in the recognition phase. As shown in the drawing, in the recognition phase, window selection based on skin color information (S1501), window selection based on edge strength (S1502), and normalization of brightness are performed on face image information to be recognized, as in the learning phase. (S1503). The details of each process are the same as those described in the learning phase, and thus detailed description is omitted here.
[0076]
In the recognition phase, after normalizing the brightness, KL expansion (S1504), search for the nearest point (S1505), and voting processing (S1506) are performed.
[0077]
Since the processing content of this portion has been described in detail in the description of the Eigen-Window method first, the description thereof is omitted here.
[0078]
By performing the processing of the recognition phase as described above, the name, posture, and the like of the person corresponding to the input face image can be specified.
[0079]
【Example】
FIG. 16 shows a configuration of a face image recognition system according to an embodiment of the present invention.
[0080]
As shown in the figure, in the present embodiment, the face image recognition system of the present invention includes a chair 1601 fixed to the floor, a gauge 1602 for measuring the angle of the chair, a camera 1603 for capturing a face image, and It comprises a computer 1604 for performing recognition processing.
[0081]
The face image of a person sitting on a chair facing an arbitrary angle within the learned range is captured from the camera, and the input face image is compared with a model created in advance from the learning image to identify the face. Posture identification can be performed.
[0082]
First, in the learning phase, a total of 35 images were taken as learning images, in which five persons were seated in a chair 1250 mm away from the camera for shooting, and every 30 degrees from 0 (right next to right) to 180 (right next to left). Two images were used. FIG. 17 is a diagram illustrating a learning image used in the present embodiment.
[0083]
No special lighting was used at the time of shooting, and the shooting was performed in an environment with only room lights.
[0084]
The size of the acquired image is 120 × 160 pixels. After converting the RGB values into a gray-scale image, a window image (N = 10) is selected based on the edge intensity, and projected on a k = 10-dimensional eigenspace. did.
[0085]
The total number of window images for the 35 learning images at this time was 3485.
[0086]
Next, in the recognition phase, for each of the following three cases, five persons were photographed every 15 degrees from 0 degrees to 180 degrees, and a total of 195 images (the same size as the learning images) were taken. It was used as an input image for evaluation (recognition target image). FIG. 18 shows an input image used in the recognition phase of the present embodiment.
[0087]
As a first example, when the camera installation position is moved by 120 mm and an image in which the face position in the image is relatively moved is input (65 images), as a second example, the subject takes a The results of the implementation for the case where the image covered by the hand is input (65 images) and the case where the image including the complex background different from the learning image is input (65 images) as the third embodiment are shown below. FIG. 19 and FIG.
[0088]
The numbers shown in FIG. 19 represent the individual identification rates that are correct when the person included in the input image can be correctly identified regardless of the posture of the face. Further, the numbers shown in FIG. 20 represent the individual identification / posture specifying rate when the case where the figure of the face and the person included in the input image can be correctly specified is correct.
[0089]
In each case, the lower part shows the individual identification rate when the eigenspace method is used for comparison.
[0090]
As described above, the face image recognition system according to the present invention has an improved recognition rate even in the presence of a target position shift, occlusion, and background noise, as compared with a similar system based on the conventional eigenspace method. It was confirmed that.
[0091]
【The invention's effect】
As described above, the face image recognition system according to the present invention uses the face image recognition method based on the Eigen-Window method that overcomes the drawbacks of the conventional eigenspace method, such as weakness in displacement. Therefore, it is possible to realize a face image recognition system capable of coping with a case where the recognition target image includes a position shift or occlusion of the target, background noise, and a case where a plurality of faces exist in the image. It works.
[0092]
In addition, since the window image is selected only from the skin color region corresponding to the face using the Hue value after the HSI conversion, the processing on the window image located in a portion other than the face portion can be eliminated as much as possible. Thus, there is an effect that the recognition rate can be improved even for an image including a disturbance, and the processing speed in the recognition phase can be improved.
[0093]
Furthermore, since the brightness normalization process is performed for each window image, a stable face can be obtained not only when the brightness of the entire image fluctuates uniformly but also when the brightness fluctuates locally due to shadows. There is an effect that image recognition becomes possible.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a face image recognition system according to an embodiment of the present invention.
FIG. 2 is a diagram schematically illustrating a processing process in a learning phase of the Eigen-Window method.
FIG. 3 is a diagram for explaining window selection.
FIG. 4 is a diagram illustrating a state of a learning window image corresponding point.
FIG. 5 is a diagram schematically illustrating a processing step in a recognition phase of the Eigen-Window method.
FIG. 6 is a diagram specifically showing a state of an eigenspace in a search process of a nearest point.
FIG. 7 is a diagram specifically showing a state of a voting space.
FIG. 8 is a flowchart showing the processing contents of a recognition processing unit in the initial setting processing.
FIG. 9 is a diagram illustrating an example of a sample image used in an initial setting process.
FIG. 10 is a diagram for specifically describing extraction of a flesh color area in the HSI space.
FIG. 11 is a flowchart showing processing contents of a recognition processing unit in a learning phase.
FIG. 12 is a flowchart showing detailed processing contents of window selection based on skin color information.
FIG. 13A is a diagram for specifically explaining the content of a coincidence degree map.
(B) It is a figure which shows the content of the selected window image candidate point concretely.
FIG. 14 is a flowchart showing detailed processing contents of brightness normalization processing.
FIG. 15 is a flowchart illustrating processing performed by a recognition processing unit in a recognition phase.
FIG. 16 is a diagram showing a configuration of a face image recognition system according to one embodiment of the present invention.
FIG. 17 is a diagram showing a learning image used in the learning phase of the embodiment.
FIG. 18 is a diagram illustrating an input image used in the recognition phase of the present embodiment.
FIG. 19 is a diagram illustrating a personal identification rate in which a case where a person included in an input image can be correctly specified regardless of a posture of a face is correct in the embodiment.
FIG. 20 is a diagram illustrating a personal identification / posture specifying rate in which a case where a face shape and a person included in an input image are correctly specified in the present embodiment is a correct answer.
[Explanation of symbols]
101 Input unit
102 control unit
1021 Input processing unit
1022 Recognition processing unit
1023 Display processing unit
103 Model image storage
104 Output unit

Claims

Face image information input means for capturing face image information, face image information conversion means for converting face image information captured by the face image information input means, and model image for storing information converted by the face image information conversion means A face image recognition system comprising: a storage unit; and a matching unit that checks information converted by the face image information converting unit with information stored in the model image storing unit, and reads a sample image including a flesh color. The HSI value is obtained by performing HSI conversion on each pixel in the flesh color area where the user marks the flesh color portion in the sample image, and the three attributes of the HSI value, h (hue) and s (color) Each pixel is plotted based on its HSI value in an HSI space having axes of (degrees) and i (luminance), a range representing a flesh-colored area is determined, and a rectangular solid of the flesh-colored area is obtained. Has a setting means, the face image information converting means further subjected to HSI conversion to the face image information, based on the HSI values of each pixel converted in the range of the HSI values, human from the facial image information of having an extraction means for extracting a region corresponding to the skin color of the face, the in HSI space, the initial setting skin color region representing the range determined by means rectangular center position and area of the extracted by the extracting means Calculating the degree of coincidence indicating how close the position of the HSI value of each pixel is, writing the value of the degree of coincidence to a pixel position on the image to generate a degree of coincidence map, pixels exhibiting a high degree of matching than a predetermined threshold value in degrees map selected as a window image candidate points, obtains the edge strength based on the gray value of the image, based on the edge strength, a predetermined threshold value or more edges of the edge intensities Selects a plurality of small regions which is an image around the pixel having an intensity, subjecting the Gray-Scale conversion on RGB values of the small areas that have been selected, and the Gray-Scale values for each pixel element vector Is generated, and the vector is divided by the root of the sum of squares of each element of the vector to perform a brightness normalization process, and the information included in each of the selected small regions is projected to an eigenspace. Converted into a set of points, the matching unit searches the information stored in the model image storage unit for information indicating the selected small area and the minimum distance, and compares the selected small area with the minimum distance. The relative position in the image with the information indicating the distance is obtained, the votes are cast in the voting space corresponding to each information stored in the model image storage unit, and the peak position of the number of votes on the entire voting space is detected to illuminate. A facial image recognition system characterized by matching.

2. The face image recognition system according to claim 1, wherein the model image storage unit stores information obtained by converting the plurality of face image information obtained by photographing a person's face from a plurality of viewpoints by the face image information conversion unit.

3. The face image recognition system according to claim 2, wherein the model image storage unit stores the converted information for a plurality of persons.