JP4076777B2

JP4076777B2 - Face area extraction device

Info

Publication number: JP4076777B2
Application number: JP2002060043A
Authority: JP
Inventors: 昭二田中
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2002-03-06
Filing date: 2002-03-06
Publication date: 2008-04-16
Anticipated expiration: 2022-03-06
Also published as: JP2003256834A

Description

【０００１】
【発明の属する技術分野】
この発明は、入力された人物画像から、毛髪領域を含む顔領域を抽出し、更に顔の構成要素である両目、口、鼻位置を画像の座標値として求める装置に関するものである。
【０００２】
【従来の技術】
図１６は第１の従来例として、例えば、特開平８−１５３１９７に示された従来の画像処理装置がある。この装置によれば、入力画像データを色の属性である輝度データ、色相データ、彩度データに変換する座標変換器と、これらの変換した３属性データに基づく肌色画素のヒストグラムを生成するヒストグラム生成器と、抽出した肌色画素に基づいて顔領域を判定する顔領域判定器からなる。
【０００３】
次に動作について説明する。
まず、座標変換器において、入力された画像データから輝度データ、色相データ、彩度データの３属性データに変換する。そして、ヒストグラム生成器において変換した３属性データに基づき、画像の水平位置毎に垂直方向の肌色画素の累積ヒストグラムを生成する。最後に顔領域判定器において、肌色画素が密集している領域を顔領域として抽出する。
【０００４】
また、第２の従来例として、特開平９−１０１５７９に示された顔領域抽出方法および複写条件決定方法がある。この方法によれば、画像から肌色領域を抽出後、エッジ抽出結果を用いて肌色領域を再度分割することにより、肌色と類似した背景の前で撮影した画像に対して、背景と顔が同時に抽出されることを防いでいる。
【０００５】
【発明が解決しようとする課題】
上記した第１の従来例である画像処理装置では、肌色に類似した領域が顔の周辺に存在する場合、その領域が顔領域と統合されて抽出されるという課題がある。
一方、第２の従来例である顔領域抽出方法および複写条件決定方法では、肌色領域抽出後、エッジ抽出結果を用いて過少分割部分を補正しているが、一般にエッジ抽出によって得られた輪郭線は欠損を生じている場合が多く、顔領域が輪郭線で囲まれた領域とならず、結局背景領域を除去できないという課題がある。
また、この方法を眼鏡をかけた人物に適用すると、顔領域が眼鏡のフレームで分断されて領域を誤ってしまう。
【０００６】
この発明は上記のような課題を解決するためになされたもので、任意の背景の前で撮影された人物画像から頭部領域を正確に抽出し、目、鼻、口の位置を定める。
【０００７】
【課題を解決するための手段】
この発明に係る顔領域抽出及び顔構成要素位置判定装置は、入力された人物画像データを解析して顔領域を抽出し、目や鼻等の構成要素の位置を求める装置において、
人物画像データを各画素の色情報に基づいて複数の抽出単位領域に分割する候補領域特定部と、
人物画像データから顔領域を特定する顔領域特定部と、
人物画像データから毛髪部を特定する毛髪領域特定部とを備えて、得られた顔領域を記得られた抽出単位領域と特定した毛髪部で表現し、かつ顔の構成要素の位置を座標値として求めるようにした。
【０００８】
また更に、人物画像データをフィルタリングして、その後、標準色をもとに肌色領域候補を抽出し、この抽出した肌色領域候補中の最大領域を抽出する肌色領域抽出部を備えた。
【０００９】
また更に、候補領域特定部は、複数の領域特定手段の１つとして輪郭を抽出するエッジ抽出手段を用いた。
【００１０】
また更に、顔領域特定部は、肌色領域抽出部の出力を参照して顔領域を推定し、かつこの推定した顔領域中の画像データの欠落を膨張と収縮処理に基づいて補正して、顔領域を特定するようにした。
【００１１】
また更に、毛髪領域特定部は、毛髪色データベースを設け、顔領域特定部の出力を参照して毛髪領域探索範囲を設定し、この毛髪領域探索範囲から毛髪色データベースに基づいて毛髪部を特定するようにした。
【００１２】
また更に、顔領域特定部の出力に基づいて目、鼻、口位置の探索範囲を設定し、入力された人物画像データの輝度を正規化して上記目、鼻、口位置を定めるようにした。
【００１３】
【発明の実施の形態】
実施の形態１．
本発明の主旨は、毛髪部を含めた顔領域の正確な特定にあり、そのために複数の候補領域を組み合わせて確からしい領域を推定抽出する。即ち先ず抽出単位となる領域を細分化する候補領域特定部と、更に眼鏡や、顔の中にある髪の毛や装身具等の影響を除いて顔の最大輪郭を特定する顔領域特定部と、範囲を推定して毛髪部を特定する毛髪特定部を備えて、これらを総合して顔領域を確定する。以下、全体の構成、各部の構成と動作を順に説明する。
図１は、本実施の形態における特徴領域抽出頭部位置判定装置の構成を示す図である。図において、１は画像を入力するための画像入力部、２は入力された画像から頭部領域を抽出するための頭部領域抽出部、３は抽出された頭部領域から両目、口、鼻などの部位の位置を検出するための部位検出部である。
頭部領域抽出部２はさらに、画像から肌色領域を抽出するための肌色領域抽出部５、顔領域および毛髪領域の候補を特定するための候補領域特定部６、肌色領域抽出部５および候補領域特定部６の結果に基づき顔領域を特定するための顔領域特定部７、顔領域特定部７と候補領域特定部６の結果に基づき髪の毛領域を特定するための毛髪領域特定部８から構成される。
【００１４】
肌色領域抽出部５はさらに、画像をぼかすためのフィルタリング手段９、肌色画素を抽出するための肌色抽出手段１０、肌色抽出手段１０で抽出した肌色画素を連結した領域のうち最大の領域を抽出するための最大領域抽出手段１１から構成される。
本実施の形態において第１の重要構成要素である、候補領域特定部６はさらに、画像のエッジを抽出するためのエッジ抽出手段１２、画像を領域分割するための領域分割手段１３、領域分割手段１３の結果をエッジ抽出手段１２の結果を用いて補正するための領域分割補正手段１４から構成される。
同じく第２の重要要素である、顔領域特定部７はさらに、肌色領域抽出部５の結果と候補領域特定部６の結果から顔領域を特定するための顔領域判定手段１５、顔領域判定手段１５の結果得られた顔領域に発生している穴や亀裂などの欠落を補正するための顔領域補正手段１６から構成される。
【００１５】
毛髪領域特定部８はさらに、顔領域特定部７の結果に基づいて毛髪領域が含まれる範囲を特定するための毛髪候補領域選定手段１７、髪毛色を格納する毛髪色データベース１９、候補領域特定部６で得られた領域のうち、毛髪候補領域選定手段１７で設定した範囲に含まれる領域の中で、毛髪色データベース１９に格納されている毛髪色に類似した領域を毛髪領域として判定するための毛髪領域判定手段１８から構成される。
【００１６】
部位検出部３はさらに、頭部領域抽出部２の結果から両目、口、鼻領域の探索範囲を設定するための部位領域マスク設定手段２０、部位領域マスク設定手段２０で設定した各探索範囲の中から部位領域を特定するための部位領域特定手段２１から構成される。
【００１７】
次に本実施の形態における特徴領域抽出頭部位置判定合成装置の動作をその全体動作フローチャートである図２を用いて説明する。
まず、画像入力手段１において画像を入力する（ステップＳ１−１）。次に、頭部領域抽出部２において入力画像から人物の頭部領域を抽出する（ステップＳ１−２）。そして、部位検出部３において、頭部領域抽出部２で抽出した頭部領域から両目、口、鼻の位置を検出する（ステップＳ１−３）。画像出力を編集する場合は、図示していない画像合成手段において抽出した頭部領域を、任意の背景画像と、背景画像に付随した合成パラメータを用いて合成する（ステップＳ１−４）。
【００１８】
次に頭部領域抽出部２の動作をその動作フローチャートである図３を用いて説明する。
頭部領域抽出部２は、画像入力部１で入力した画像からまず肌色領域抽出部５において肌色領域を抽出する（ステップＳ２−１）。また、候補領域特定部６において画像を領域分割する（ステップＳ２−２）。次に、顔領域特定部７において、候補領域特定部６で得られた領域分割結果から、肌色領域抽出部５で抽出した肌色領域と重なる領域を顔領域として抽出し、内部に発生している穴や亀裂等の欠落を補正する（ステップＳ２−３）。最後に、毛髪領域特定部８において、顔領域特定部７で抽出した顔領域から、髪の毛領域が含まれる範囲を設定し、その範囲に含まれる領域に対して毛髪色ＤＢに格納された毛髪色と類似している領域を毛髪領域として特定する（ステップＳ２−４）。
【００１９】
更に、個別に各要素の動作を説明する。まず肌色領域抽出部５の動作をフローチャート図４および説明用の図５を用いて詳細に説明する。図５は、フィルタリング手段９でぼかし処理を行ったときの処理前および処理後の画像例である。
肌色領域抽出部５は、フィルタリング手段９において画像入力部１で入力された画像をぼかす処理を行う（ステップＳ３−１）。これは、例えば眼鏡をかけた人物から後述の肌色画素を抽出して顔領域を特定する際、眼鏡で顔領域が分断される場合がある。顔領域を抽出する際には、顔領域が１領域として抽出されることが望ましい。もし、顔領域が分断された場合、どの領域までが顔領域かを判定することが困難となる問題がある。よって、肌色画素を抽出する前にぼかし処理を施すことにより、眼鏡などの画素を肌色と同化させることにより、顔領域が分断されず、一領域として抽出される効果がある。
【００２０】
ぼかし処理には、例えば「画像ハンドブック」（高木幹雄，下田陽久監修、東京大学出版会）（文献１）に記載のメディアンフィルタを用いてもよい。メディアンフィルタのフィルタリングサイズを画像の大きさに合わせて調整すると、例えば入力画像が図５（ａ）の形状２６であった場合、その結果は図５（ｂ）の形状２７のようになり、眼鏡のエッジ部分がかすれるようになる。よって、後の肌色抽出処理を行った場合、顔領域が眼鏡等によって分断されること無く１領域として抽出される。
フィルタリング手段９で入力画像のぼかし処理を行った後、肌色抽出手段１０において肌色画素を抽出する（ステップＳ３−２）。
【００２１】
肌色画素の抽出は、例えば“ＰｉｃＴｏＳｅｅｋ：ＣｏｍｂｉｎｉｎｇＣｏｌｏｒａｎｄＳｈａｐｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅｓｆｏｒｉｍａｇｅＲｅｔｒｉｅｖａｌ，”ＴｈｅｏＧｅｒｅｒｓａｎｄＡｒｎｏｌｄＷ．Ｍ．Ｓｍｅｕｌｄｅｒｓ，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｏｎＩｍａｇｅＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．９，Ｎｏ．１，（文献２）に記載の肌色モデルを用いて行う。つまり、入力画像の画素値（Ｒ，Ｇ，Ｂ）を肌色抽出のための別の色空間に写像し、肌色が属する範囲に入る画素を抽出する。
【００２２】
この方法は具体的には、色空間として、Ｒ、Ｇ，Ｂを画素の色データとしたとき、まず次式により色を正規化する。
【００２３】
【数１】

【００２４】
上記式で正規化した色をさらに次式で変換する。
Ｃ１＝ｃ２／ｃ１（式４）
Ｃ２＝ｃ３／ｃ２（式５）
肌色領域抽出手段８では、（式４）および（式５）でＲＧＢ空間からＣ１−Ｃ２空間に写像した色が、次式で定義した肌色範囲の範疇に入っているか否かを判断して、入力画像から肌色領域を抽出する。
ｔｈ＜Ｃ１＜ｔｈ２（式６）
ｔｈ３＜Ｃ２＜ｔｈ４（式７）
最後に、最大領域抽出手段１１において、肌色抽出手段１０で抽出した肌色画素を４連結あるいは８連結で連結し領域ごとに分類し、分類した肌色領域の中で最大のものを顔領域として抽出する（ステップＳ３−４）。
【００２５】
次に、第１の重要構成要素である候補領域特定部６の動作をフローチャート図６および説明図の図７を用いて詳細に説明する。図７は、領域分割補正手段１４において、領域分割手段１３の処理結果をエッジ抽出手段１２の結果を用いて補正した結果を示した図である。
まず、領域分割手段１３において入力画像を領域分割する（ステップＳ４−１）。領域分割の手法には様々なものがあるが、例えば、“ＣｏｌｏｒＱｕａｎｔｉｚａｔｉｏｎｂｙＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇａｎｄＰｒｉｎｃｉｐａｌＡｎａｌｙｓｉｓ，”ＸｉａｏｌｉｎＷｕ，ＴｒａｎｓａｃｔｉｏｎｏｎＧｒａｐｈｉｃｓ，Ｖｏｌ．１１，Ｎｏ．４，１９９２．（文献３）に記載の色による領域分割を用いる。この手法は、画像の画素値（ＲＧＢ値）を主成分分析し、第一主成分に直行する平面で色を指定された色数で分類するものである。この手法では、原画像の画素値と分類後の各クラスターの代表色（平均色）との色差が大きくならないようにＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇを用いて最適解を求めている。
領域分割手段１３では、以上でクラスタリングした各画素を、同じクラスタＩＤを持つものどおしで４連結あるいは８連結で連結した領域を生成し、生成した領域にユニークなＩＤを振りなおすことで画像を領域分割する（ステップＳ４−１）。
なお、領域分割の手法は、別の手法を用いてもよく、いずれにせよ、ひとまず第１の細分化領域を得る。
【００２６】
次に、エッジ抽出手段１２において、画像からエッジを抽出する（ステップＳ４−２）。エッジ抽出の手法は、例えば先に引用した（文献１）に記載のＣａｎｎｙエッジ抽出法を用いる。
ここで、領域分割手段１３の処理結果が図７の分割２８でエッジ抽出手段１２の結果が図７の分割２９となったと仮定する。
この例では、本来の領域数は４である。しかしながら領域分割結果では領域数は３となっている。このように、領域分割手法による領域分割結果は、画像を大まかに分割するようにパラメータを設定すると過少分割（本来別領域の領域を１領域として分割する）を起こすことがある。そこで、領域分割補正手段１４において、領域分割手段１３で得られた領域分割結果２８に、エッジ抽出手段１２で得られたエッジ２９に沿って例えばどの領域にも属さないという意味のＩＤ＝０を設定する。そして過少分割を起こした領域を分断し、領域ＩＤを再設定した後、エッジ画素のＩＤを、画素に接した領域ＩＤでおき直すことにより図７の分割３０に示すように、領域分割手段１３で得られた結果を補正する。即ちＯＲ（細分化）効果が得られる。
【００２７】
次に、第２の重要構成要素である顔領域特定部７の動作をフローチャートの図８から説明図の図１１を用いて詳細に説明する。図９は顔領域判定手段１５で得られた顔領域の一例を示す図である。図１０は、顔領域補正手段１６において、顔領域に発生した亀裂を補正した画像例である。図１１は、顔領域補正手段１６において、顔領域に発生した穴領域を埋める処理を説明するための図である。
まず、顔領域判定手段１５において、候補領域特定部６で得られた領域のうち、肌色領域抽出部５で抽出した肌色領域と重なる領域を求める（ステップＳ５−１）。このとき、（式８）に示す重なり率を求める。
ＯＲ＝ｏｐ／ｒｐ（式８）
ここで、ＯＲは重なり率、ｏｐは１領域中の画素のうち肌色領域と重なる画素数、ｒｐは領域の画素数である。
【００２８】
顔領域判定手段１５では、（式８）で求めた重なり率が、ある閾値以上のものを顔領域と判定する。この重なり率を導入することにより肌色領域抽出手段５において、顔領域の周囲にある背景画素を誤って抽出したとしても、その画素が属する領域の重なり率は低くなることから、顔領域判定手段１５でそのような背景画素を削除することが可能となる効果がある。
顔領域判定手段１５で求めた顔領域には、図９に示すように、穴や亀裂が生じている場合がある。そこで、顔領域補正手段１６において、図９に示すような穴や亀裂等の欠落を補正することにより顔領域全体を抽出する（ステップＳ５−２）。
【００２９】
裂け目部分の修復は、肌色領域抽出手段５で抽出した肌色領域の画素を１、それ以外を０とした２値画像に対して図８（ｂ）に示す近傍パターンを用いて、（文献１）に示される膨張と収縮処理を行って修復ができる。即ち膨張処理で近傍パターンに基いて領域をいったん拡大し、その後、収縮処理では別の近傍パターン図５（ｂ）を用いる膨張収縮処理により、図１０の２９示すような裂け目が修復され３０のようになる。また、この処理により微小の穴も埋めることが可能である。
膨張収縮処理により頭部領域に発生した裂け目が修復された後は、頭部領域内の全ての穴を埋めることにより頭部全体を一領域として抽出できる。この穴埋め処理は、図１１に示す論理演算処理により行う。
まず図１１（ａ）では、裂け目修復処理（領域の欠落補正）により得られた顔領域３１と、画素値が全て１のマスク３２との排他的論理和（ｘｏｒ）を求める。その結果、背景領域と頭部領域内の穴が得られる。次に、図１１（ｂ）に示すように、得られた画像３３から、画像の外辺に接している領域（背景領域）除去し、除去した画像３４と元の顔領域画像３１と論理和を求めることにより、顔領域全体３５を抽出することができる。
【００３０】
次に、第３の重要構成要素である、髪の毛領域特定部８の動作をフローチャート図１２および説明図図１３を用いて説明する。図１３は髪の毛領域特定手段８の処理過程を示す図である。
まず、髪の毛候補領域選定手段１７において、顔領域特定部７で抽出した顔領域の大きさに基づき、髪の毛領域の探索範囲３６を設定する（ステップＳ６−１）。次に、顔領域を現在の頭部領域として設定し（ステップＳ６−２）、探索範囲内に処理対象領域が無くなるまで以下の処理を繰り返す（ステップＳ６−３）。
【００３１】
まず、頭部領域に接している全ての領域を対象とする（ステップＳ６−４）。次に、全ての対象領域に対して以下の処理を繰り返す（ステップＳ６−５）。
即ち、毛髪色データベース１９内に格納されている毛髪色サンプルと、領域の平均色との式差を求め（ステップＳ６−６）、色差が閾値以下の領域を髪の毛領域とする（ステップＳ６−７、ステップＳ６−８）。
ここで、色差とは、例えば（文献１）に記載のＬ＊ａ＊ｂ＊色空間におけるユークリッド距離として求めることができる。
以上で求めた髪の毛領域を含めた領域を新たな頭部領域として設定し（ステップＳ６−１０）、ステップＳ６−３から処理を繰り返すことにより髪の毛領域を抽出することができる。
【００３２】
例えば、図１３（ａ）での領域３６のように髪の毛領域探索範囲を設定した場合、１回目の処理で図１３（ｂ）の領域３７に示す領域が抽出され、２回目の処理で図１３（ｃ）の領域３８に示す領域が抽出される。３回目の処理では、探索範囲内に処理対象となる領域が残らなくなるため、領域３８が毛髪領域抽出結果となる。
ここで、単純に毛髪色に類似した領域を抽出した場合、頭部領域に接しない単独な領域が抽出される可能性があるが、顔領域に接する領域から徐々に処理することにより、そのような単独領域をノイズとして除去することが可能となる効果がある。
また、毛髪色ＤＢを用いて髪の毛であるか否かを判定すれば、茶髪や金髪などの様々な髪の毛に対応して適切に髪の毛領域を抽出できる。
【００３３】
次に、部位検出部３の動作をフローチャート図１４および説明図図１５を用いて説明する。図１５は、頭部領域抽出部２で抽出した頭部領域に基づき、両目、口、鼻の探索範囲を設定したときの結果を示した説明図である。
まず、頭部領域抽出部２で抽出した頭部領域の幅、高さから、両目、鼻、口の探索範囲３９、４０、４１、４２を設定する（ステップＳ７−１）。
【００３４】
次に、“ＧＲＡＰＨＩＣＳＧＥＭＳＩＶ，”ＰａｕｌＳ．Ｈｅｃｋｂｅｒｔ，ＭｏｒｇａｎＫａｕｆｍａｎｎ（文献４）に記載の適応型ヒストグラム平均化法を用いて各探索範囲内の画素の輝度を正規化し、閾値以下（暗い）の画素を抽出し、４連結あるいは８連結で連結し領域とする（ステップＳ７−２）。次に、鼻探索範囲内４１の領域に着目し、探索範囲内の領域のうち、探索範囲の中心と領域の中心との距離が最も近い領域を鼻領域として求め、求めた領域の中心座標を鼻位置とする（ステップＳ７−３）。
次に、左目探索範囲内３９の領域のうち、領域の中心と、上記鼻位置との距離が最も近い領域を左目領域とし、その中心を左目位置とする（ステップＳ７−４）。同様に右目探索範囲内４０の領域のうち、領域の中心と鼻位置との距離が最も近い領域を右目領域とし、その中心を右目位置とする（ステップＳ７−４）。
最後に、口探索範囲内４２の領域のうち、領域の中心と、鼻位置との距離が最も近い領域を口領域とし、その中心を口位置とする（ステップＳ７−５）。
以上のように、抽出した頭部領域の幅、高さから両目、口、鼻が存在するおおよその範囲を限定することにより、同部位位置を的確に求めることが可能となる効果がある。
【００３５】
【発明の効果】
以上のようにこの発明によれば、複数の細分化領域を全て抽出単位とする候補領域特定部と、顔領域特定部と、毛髪領域特定部とを備えたので、正確に顔、毛髪領域を抽出できる効果がある。
【００３６】
また更に、候補領域中の最大領域を抽出する肌色領域抽出部を備えたので、顔領域の推定が更に正確に行える効果がある。
【００３７】
また更に、画像データの欠落を膨張と収縮処理で補正する顔領域特定部としたので、顔領域の推定が更に正確に行える効果がある。
【図面の簡単な説明】
【図１】この発明の実施の形態１における特徴領域抽出頭部位置判定装置の構成を示す図である。
【図２】特徴領域抽出頭部位置判定装置が行う全体としての動作フロー図である。
【図３】頭部領域抽出部の動作フロー図である。
【図４】肌色領域抽出部の動作フロー図である。
【図５】肌色領域抽出部が行う動作を説明する図である。
【図６】候補領域特定部の動作フロー図である。
【図７】候補領域特定部が行う動作を説明する図である。
【図８】顔領域特定部の動作フロー図である。
【図９】顔領域特定部が行う動作を説明する図である。
【図１０】顔領域特定部が行う動作を説明する図である。
【図１１】顔領域特定部が行う動作を説明する図である。
【図１２】毛髪領域特定部の動作フロー図である。
【図１３】毛髪領域特定部が行う動作を説明する図である。
【図１４】部位検出部の動作フロー図である。
【図１５】部位検出部が行う動作を説明する図である。
【図１６】第１の従来例における画像処理装置の構成を示す図である。
【符号の説明】
１画像入力部、２頭部領域抽出部、３部位検出部、５肌色領域抽出部、６候補領域特定部、７顔領域特定部、８毛髪領域特定部、９フィルタリング手段、１０肌色抽出手段、１１最大領域抽出手段、１２エッジ抽出手段、１３領域分割手段、１４領域分割補正手段、１５顔領域判定手段、１６顔領域補正手段、１７毛髪候補領域選定手段、１８毛髪領域判定手段、１９毛髪色データベース、２０部位領域マスク設定手段、２１部位領域特定手段。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an apparatus for extracting a face area including a hair area from an inputted person image and further obtaining positions of both eyes, a mouth and a nose which are constituent elements of the face as coordinate values of the image.
[0002]
[Prior art]
FIG. 16 shows a first conventional example, for example, a conventional image processing apparatus disclosed in JP-A-8-153197. According to this apparatus, a coordinate converter that converts input image data into luminance data, hue data, and saturation data that are color attributes, and histogram generation that generates a histogram of skin color pixels based on these converted three attribute data And a face area determination unit that determines a face area based on the extracted skin color pixels.
[0003]
Next, the operation will be described.
First, the coordinate converter converts the input image data into three attribute data of luminance data, hue data, and saturation data. Based on the three attribute data converted by the histogram generator, a cumulative histogram of skin color pixels in the vertical direction is generated for each horizontal position of the image. Finally, in the face area determination unit, an area where skin color pixels are dense is extracted as a face area.
[0004]
As a second conventional example, there is a face area extraction method and a copying condition determination method disclosed in Japanese Patent Laid-Open No. 9-101579. According to this method, after extracting the skin color area from the image, the background and the face are simultaneously extracted from the image photographed in front of the background similar to the skin color by dividing the skin color area again using the edge extraction result. Is being prevented.
[0005]
[Problems to be solved by the invention]
In the image processing apparatus according to the first conventional example described above, when a region similar to the skin color exists around the face, there is a problem that the region is extracted by being integrated with the face region.
On the other hand, in the face area extraction method and the copy condition determination method as the second conventional example, after the skin color area is extracted, the under-divided portion is corrected using the edge extraction result. Generally, the contour line obtained by edge extraction is used. In many cases, there is a defect, and the face area is not an area surrounded by a contour line, and the background area cannot be removed after all.
Moreover, when this method is applied to a person wearing glasses, the face area is divided by the frame of the glasses, and the area is erroneous.
[0006]
The present invention has been made to solve the above-described problems. The head region is accurately extracted from a human image photographed in front of an arbitrary background, and the positions of eyes, nose and mouth are determined.
[0007]
[Means for Solving the Problems]
The face area extraction and face component position determination apparatus according to the present invention is an apparatus that analyzes input human image data to extract a face area and obtains positions of components such as eyes and nose.
A candidate area specifying unit that divides human image data into a plurality of extraction unit areas based on color information of each pixel;
A face area specifying unit for specifying a face area from human image data;
A hair region specifying unit for specifying a hair part from human image data, expressing the obtained face region by the obtained extraction unit region and the specified hair part, and using the position of the face component as a coordinate value I asked for it.
[0008]
Furthermore, the human image data is filtered, and then a skin color area candidate is extracted based on the standard color, and a skin color area extraction unit is provided for extracting the maximum area from the extracted skin color area candidates.
[0009]
Still further, the candidate area specifying unit uses edge extracting means for extracting a contour as one of a plurality of area specifying means.
[0010]
Further, the face area specifying unit estimates the face area with reference to the output of the skin color area extracting unit, and corrects the lack of image data in the estimated face area based on the expansion and contraction processing, thereby The area was specified.
[0011]
Furthermore, the hair region specifying unit provides a hair color database, sets the hair region search range with reference to the output of the face region specifying unit, and specifies the hair portion based on the hair color database from the hair region search range. I did it.
[0012]
Further, the eye, nose, and mouth position search range is set based on the output of the face area specifying unit, and the brightness of the input human image data is normalized to determine the eye, nose, and mouth position.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Embodiment 1 FIG.
The gist of the present invention lies in the accurate identification of the face area including the hair portion. For this purpose, a probable area is estimated and extracted by combining a plurality of candidate areas. That is, first, a candidate area specifying unit that subdivides an area as an extraction unit, a face area specifying unit that specifies the maximum contour of the face excluding the influence of glasses, hair in the face, accessories, etc. A hair specifying part that estimates and specifies the hair part is provided, and these are combined to determine the face area. Hereinafter, the overall configuration and the configuration and operation of each unit will be described in order.
FIG. 1 is a diagram illustrating a configuration of a feature region extraction head position determination device according to the present embodiment. In the figure, 1 is an image input unit for inputting an image, 2 is a head region extracting unit for extracting a head region from the input image, and 3 is both eyes, mouth and nose from the extracted head region. It is a site | part detection part for detecting the position of site | parts, such as.
The head region extracting unit 2 further includes a skin color region extracting unit 5 for extracting a skin color region from the image, a candidate region specifying unit 6 for specifying candidates for the face region and the hair region, the skin color region extracting unit 5 and the candidate region. The face region specifying unit 7 for specifying the face region based on the result of the specifying unit 6, and the hair region specifying unit 8 for specifying the hair region based on the results of the face region specifying unit 7 and the candidate region specifying unit 6 are configured. The
[0014]
The skin color area extraction unit 5 further extracts a filtering area 9 for blurring the image, a skin color extraction means 10 for extracting the skin color pixels, and a maximum area among the areas obtained by connecting the skin color pixels extracted by the skin color extraction means 10. The maximum area extraction means 11 for this is comprised.
The candidate area specifying unit 6, which is the first important component in the present embodiment, further includes an edge extracting means 12 for extracting an edge of the image, an area dividing means 13 for dividing the image into areas, and an area dividing means. 13 is composed of a region division correction unit 14 for correcting the result 13 using the result of the edge extraction unit 12.
Similarly, the face region specifying unit 7, which is also a second important element, further includes a face region determining unit 15 and a face region determining unit for specifying a face region from the result of the skin color region extracting unit 5 and the result of the candidate region specifying unit 6. 15 is constituted by face area correction means 16 for correcting omissions such as holes and cracks occurring in the face area obtained as a result of 15.
[0015]
The hair region specifying unit 8 further includes a hair candidate region selecting means 17 for specifying a range in which the hair region is included based on the result of the face region specifying unit 7, a hair color database 19 for storing hair color, and a candidate region specifying unit Among the regions obtained in 6, the region similar to the hair color stored in the hair color database 19 among the regions included in the range set by the hair candidate region selecting means 17 is determined as the hair region. The hair region determination means 18 is configured.
[0016]
The part detection unit 3 further includes a part region mask setting unit 20 for setting a search range for both eyes, mouth, and nose region from the result of the head region extraction unit 2, and each search range set by the part region mask setting unit 20. It consists of a part region specifying means 21 for specifying a part region from the inside.
[0017]
Next, the operation of the feature region extraction head position determination / synthesis apparatus according to the present embodiment will be described with reference to FIG.
First, an image is input by the image input means 1 (step S1-1). Next, the head region extraction unit 2 extracts a human head region from the input image (step S1-2). Then, the part detection unit 3 detects the positions of both eyes, the mouth, and the nose from the head region extracted by the head region extraction unit 2 (step S1-3). When editing the image output, the head region extracted by the image synthesizing unit (not shown) is synthesized using an arbitrary background image and a synthesis parameter attached to the background image (step S1-4).
[0018]
Next, the operation of the head region extracting unit 2 will be described with reference to FIG.
The head region extraction unit 2 first extracts a skin color region from the image input by the image input unit 1 in the skin color region extraction unit 5 (step S2-1). The candidate area specifying unit 6 divides the image into areas (step S2-2). Next, in the face area specifying unit 7, an area overlapping with the skin color area extracted by the skin color area extracting unit 5 is extracted as a face area from the area division result obtained by the candidate area specifying unit 6, and the face area is generated inside. Missing holes and cracks are corrected (step S2-3). Finally, in the hair region specifying unit 8, a range including the hair region is set from the face regions extracted by the face region specifying unit 7, and the hair color stored in the hair color DB for the region included in the range is set. A region similar to that is identified as a hair region (step S2-4).
[0019]
Further, the operation of each element will be described individually. First, the operation of the skin color area extracting unit 5 will be described in detail with reference to the flowchart of FIG. 4 and FIG. 5 for explanation. FIG. 5 is an example of an image before and after the processing when the filtering unit 9 performs the blurring process.
The skin color area extraction unit 5 performs a process of blurring the image input by the image input unit 1 in the filtering unit 9 (step S3-1). This is because, for example, when a face area is specified by extracting skin color pixels described later from a person wearing glasses, the face area may be divided by the glasses. When extracting a face area, it is desirable to extract the face area as one area. If the face area is divided, it is difficult to determine which area is the face area. Therefore, by performing blurring before extracting the skin color pixels, the face area is not divided and extracted as one area by assimilating the pixels such as glasses with the skin color.
[0020]
For the blurring process, for example, a median filter described in “Image Handbook” (supervised by Mikio Takagi, Yoji Shimoda, University of Tokyo Press) (Reference 1) may be used. When the filtering size of the median filter is adjusted in accordance with the size of the image, for example, when the input image has a shape 26 in FIG. 5A, the result becomes a shape 27 in FIG. The edge part of will become faded. Therefore, when the subsequent skin color extraction process is performed, the face area is extracted as one area without being divided by glasses or the like.
After performing the blurring process of the input image by the filtering unit 9, the skin color pixel is extracted by the skin color extracting unit 10 (step S3-2).
[0021]
For example, “Pic To Seek: Combining Color and Shape Invant Features Forage Retrieval,” Theo Gerers and Arnold W. M.M. Smeulders, IEEE Transaction on Image Processing, Vol. 9, no. 1, using the skin color model described in (Reference 2). That is, the pixel values (R, G, B) of the input image are mapped to another color space for skin color extraction, and pixels that fall within the range to which the skin color belongs are extracted.
[0022]
Specifically, in this method, when R, G, and B are pixel color data as a color space, the color is first normalized by the following equation.
[0023]
[Expression 1]

[0024]
The color normalized by the above equation is further converted by the following equation.
C1 = c2 / c1 (Formula 4)
C2 = c3 / c2 (Formula 5)
The skin color area extracting means 8 determines whether the color mapped from the RGB space to the C1-C2 space in (Expression 4) and (Expression 5) falls within the range of the skin color range defined by the following expression: A skin color region is extracted from the input image.
th <C1 <th2 (Formula 6)
th3 <C2 <th4 (Formula 7)
Finally, in the maximum area extraction unit 11, the skin color pixels extracted by the skin color extraction unit 10 are connected by 4-connection or 8-connection and classified for each area, and the largest of the classified skin color areas is extracted as a face area. (Step S3-4).
[0025]
Next, the operation of the candidate area specifying unit 6 that is the first important component will be described in detail with reference to the flowchart of FIG. 6 and FIG. 7 of the explanatory diagram. FIG. 7 is a diagram illustrating a result of correcting the processing result of the region dividing unit 13 using the result of the edge extracting unit 12 in the region dividing correcting unit 14.
First, the area dividing means 13 divides the input image into areas (step S4-1). There are various methods of area division. For example, “Color Quantification by Dynamic Programming and Principal Analysis,” Xiaolin Wu, Transactions on Graphics, Vol. 11, no. 4, 1992. Region division by color described in (Reference 3) is used. In this method, pixel values (RGB values) of an image are subjected to principal component analysis, and colors are classified by a designated number of colors on a plane orthogonal to the first principal component. In this method, an optimal solution is obtained by using Dynamic Programming so that the color difference between the pixel value of the original image and the representative color (average color) of each cluster after classification does not increase.
The area dividing means 13 generates an area in which the pixels clustered as described above are connected in a 4-connected or 8-connected form having the same cluster ID, and an image is assigned by reassigning a unique ID to the generated area. Is divided into regions (step S4-1).
Note that another method may be used as the region dividing method, and in any case, the first subdivided region is first obtained.
[0026]
Next, the edge extraction means 12 extracts an edge from the image (step S4-2). As an edge extraction method, for example, the Canny edge extraction method described in (Reference 1) cited above is used.
Here, it is assumed that the processing result of the region dividing unit 13 is the division 28 in FIG. 7 and the result of the edge extracting unit 12 is the division 29 in FIG.
In this example, the original number of regions is four. However, the number of areas is 3 in the area division result. As described above, the result of area division by the area division method may cause under-division (originally dividing another area as one area) if parameters are set so as to roughly divide the image. Therefore, the area division correction unit 14 sets ID = 0 which means that it does not belong to any area along the edge 29 obtained by the edge extraction unit 12 to the area division result 28 obtained by the area division unit 13. Set. Then, after dividing the region that caused the under-division and resetting the region ID, the ID of the edge pixel is reset by the region ID that is in contact with the pixel, as shown in division 30 in FIG. The result obtained in is corrected. That is, an OR (subdivision) effect is obtained.
[0027]
Next, the operation of the face area specifying unit 7 which is the second important component will be described in detail with reference to FIG. FIG. 9 is a diagram illustrating an example of a face area obtained by the face area determination unit 15. FIG. 10 is an example of an image in which a crack generated in the face area is corrected by the face area correction unit 16. FIG. 11 is a diagram for explaining a process of filling a hole area generated in the face area in the face area correcting unit 16.
First, the face area determination unit 15 obtains an area that overlaps the skin color area extracted by the skin color area extraction unit 5 from the areas obtained by the candidate area specifying unit 6 (step S5-1). At this time, the overlap ratio shown in (Expression 8) is obtained.
OR = op / rp (Formula 8)
Here, OR is the overlapping rate, op is the number of pixels that overlap the skin color area among the pixels in one area, and rp is the number of pixels in the area.
[0028]
The face area determination unit 15 determines that the overlap ratio obtained by (Equation 8) is a certain threshold value or more as a face area. By introducing this overlap ratio, even if the background color surrounding the face area is erroneously extracted in the skin color area extraction means 5, the overlap ratio of the area to which the pixel belongs becomes low. Therefore, the face area determination means 15 Thus, it is possible to delete such background pixels.
The face area obtained by the face area determination means 15 may have a hole or a crack as shown in FIG. Therefore, the face area correction means 16 extracts the entire face area by correcting the lack of holes and cracks as shown in FIG. 9 (step S5-2).
[0029]
The repair of the fissure portion is performed by using a neighborhood pattern shown in FIG. 8B for a binary image in which the skin color region pixels extracted by the skin color region extraction means 5 are set to 1 and the others are set to 0 (Reference 1). It can be repaired by performing the expansion and contraction process shown in Fig. 1. That is, the expansion process temporarily expands the area based on the neighboring pattern, and then the shrinking process expands and contracts using another neighboring pattern shown in FIG. become. Moreover, it is possible to fill a minute hole by this processing.
After the tear that has occurred in the head region due to the expansion / contraction process is repaired, the entire head can be extracted as one region by filling all the holes in the head region. This hole filling process is performed by the logical operation process shown in FIG.
First, in FIG. 11A, an exclusive OR (xor) of the face region 31 obtained by the tear repair processing (region missing correction) and the mask 32 having all pixel values is obtained. As a result, holes in the background region and the head region are obtained. Next, as shown in FIG. 11B, an area (background area) in contact with the outer edge of the image is removed from the obtained image 33, and the removed image 34 and the original face area image 31 are logically ORed. , The entire face area 35 can be extracted.
[0030]
Next, the operation of the hair region specifying unit 8, which is the third important component, will be described with reference to the

flowcharts

12 and 13. FIG. 13 is a diagram showing a processing process of the hair region specifying means 8.
First, the hair candidate region selection means 17 sets a hair region search range 36 based on the size of the face region extracted by the face region specifying unit 7 (step S6-1). Next, the face region is set as the current head region (step S6-2), and the following processing is repeated until there is no processing target region in the search range (step S6-3).
[0031]
First, all the areas in contact with the head area are targeted (step S6-4). Next, the following processing is repeated for all target regions (step S6-5).
That is, an expression difference between the hair color sample stored in the hair color database 19 and the average color of the region is obtained (step S6-6), and a region where the color difference is equal to or less than a threshold is set as a hair region (step S6-7). Step S6-8).
Here, the color difference can be obtained as, for example, the Euclidean distance in the L * a * b * color space described in (Reference 1).
The region including the hair region obtained as described above is set as a new head region (step S6-10), and the hair region can be extracted by repeating the processing from step S6-3.
[0032]
For example, when the hair region search range is set like the region 36 in FIG. 13A, the region shown in the region 37 in FIG. 13B is extracted by the first processing, and the region shown in FIG. The area shown in the area 38 of (c) is extracted. In the third processing, the region to be processed does not remain in the search range, so the region 38 becomes the hair region extraction result.
Here, when a region similar to the hair color is simply extracted, a single region that does not contact the head region may be extracted, but by gradually processing from the region that contacts the face region, such a region is extracted. It is possible to remove a single isolated area as noise.
Moreover, if it is determined whether it is hair using hair color DB, a hair area | region can be extracted appropriately corresponding to various hairs, such as brown hair and blond hair.
[0033]
Next, the operation of the part detection unit 3 will be described with reference to the flowcharts FIG. 14 and FIG. FIG. 15 is an explanatory diagram showing the results when the search range for both eyes, mouth, and nose is set based on the head region extracted by the head region extraction unit 2.
First, from the width and height of the head region extracted by the head region extraction unit 2, search ranges 39, 40, 41, and 42 for both eyes, nose, and mouth are set (step S7-1).
[0034]
Next, “GRAPHICS GEMS IV,” Paul S. The adaptive histogram averaging method described in Heckbert, Morgan Kaufmann (Reference 4) is used to normalize the brightness of pixels within each search range, extract pixels below the threshold (dark), and connect them in four or eight connections. The region is set to be a region (step S7-2). Next, paying attention to the region within the nose search range 41, the region within the search range where the distance between the center of the search range and the center of the region is the closest is obtained as the nose region, and the center coordinates of the obtained region are determined. The nose position is set (step S7-3).
Next, among the regions in the left eye search range 39, the region having the closest distance between the center of the region and the nose position is set as the left eye region, and the center is set as the left eye position (step S7-4). Similarly, of the regions in the right eye search range 40, the region having the closest distance between the center of the region and the nose position is set as the right eye region, and the center is set as the right eye position (step S7-4).
Finally, among the regions in the mouth search range 42, the region having the closest distance between the center of the region and the nose position is set as the mouth region, and the center is set as the mouth position (step S7-5).
As described above, by limiting the approximate range in which both eyes, mouth, and nose are present from the width and height of the extracted head region, there is an effect that the position of the part can be accurately obtained.
[0035]
【The invention's effect】
As described above, according to the present invention, since the candidate region specifying unit, the face region specifying unit, and the hair region specifying unit, all of which are a plurality of subdivided regions as extraction units, are provided, the face and hair region can be accurately determined. There is an effect that can be extracted.
[0036]
Furthermore, since the skin color area extracting unit for extracting the maximum area in the candidate area is provided, the face area can be estimated more accurately.
[0037]
Furthermore, since the face area specifying unit that corrects the lack of image data by expansion and contraction processing is used, there is an effect that the face area can be estimated more accurately.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of a feature region extraction head position determination device according to a first embodiment of the present invention.
FIG. 2 is an overall operation flow diagram performed by a feature region extraction head position determination device.
FIG. 3 is an operation flow diagram of a head region extraction unit.
FIG. 4 is an operation flow diagram of a skin color area extraction unit.
FIG. 5 is a diagram illustrating an operation performed by a skin color area extracting unit.
FIG. 6 is an operation flowchart of a candidate area specifying unit.
FIG. 7 is a diagram illustrating an operation performed by a candidate area specifying unit.
FIG. 8 is an operation flowchart of the face area specifying unit.
FIG. 9 is a diagram illustrating an operation performed by a face area specifying unit.
FIG. 10 is a diagram illustrating an operation performed by a face area specifying unit.
FIG. 11 is a diagram illustrating an operation performed by a face area specifying unit.
FIG. 12 is an operation flowchart of the hair region specifying unit.
FIG. 13 is a diagram illustrating an operation performed by a hair region specifying unit.
FIG. 14 is an operation flowchart of the part detection unit.
FIG. 15 is a diagram illustrating an operation performed by a part detection unit.
FIG. 16 is a diagram showing a configuration of an image processing apparatus in a first conventional example.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Image input part, 2 Head area | region extraction part, 3 site | part detection part, 5 Skin color area extraction part, 6 Candidate area | region identification part, 7 Face area | region identification part, 8 Hair area | region identification part, 9 Filtering means, 10 Skin color extraction means, DESCRIPTION OF SYMBOLS 11 Maximum area extraction means, 12 Edge extraction means, 13 Area division means, 14 Area division correction means, 15 Face area determination means, 16 Face area correction means, 17 Hair candidate area selection means, 18 Hair area determination means, 19 Hair color Database, 20 part region mask setting means, 21 part region specifying means.

Claims

In the device for analyzing the input image of the person and extracting the face region and obtaining the positions of the components such as eyes, nose and mouth,
A skin color region extraction unit that extracts a skin color region by a predetermined method based on color information of each pixel constituting the person image;
Based on the color information of each pixel constituting the person's image, the region is divided by a method different from the method described above, an edge where the image changes is extracted, and the divided region is divided by the edge. A candidate area specifying unit for obtaining a plurality of extraction unit areas;
A face area specifying unit for specifying, as a face area, an extraction unit area that overlaps each skin color area with a predetermined ratio or more among the extraction unit areas obtained by the candidate area specifying unit in the person image. ,
A face area extraction / determination apparatus characterized in that a position of a face component in the specified face area is obtained.

A hair region specifying unit for specifying a hair part from an image of a person is provided, the hair specifying unit is provided with a hair color database in which colors for searching for hair are recorded, and the hair is referenced with reference to the output of the face region specifying unit 2. The face region extraction apparatus according to claim 1, wherein a region search range is set, and a hair part is specified based on the hair color database from the hair region search range.

Based on the output of the face area specifying unit, the search range of the eye, nose and mouth positions is set, and the brightness of the input person image is normalized to determine the positions of the eyes, nose and mouth. The face area extracting device according to claim 1, wherein