JP3272584B2

JP3272584B2 - Region extraction device and direction detection device using the same

Info

Publication number: JP3272584B2
Application number: JP26640795A
Authority: JP
Inventors: 竜士船山; 直和横矢; 治雄竹村; 英彦岩佐
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1995-09-19
Filing date: 1995-09-19
Publication date: 2002-04-08
Anticipated expiration: 2015-09-19
Also published as: JPH0981732A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された画像か
ら、顔が存在している位置、顔の輪郭、及び顔部品など
を検出する領域抽出装置、及びそれを用いて頭の方向や
視線方向を検出する方向検出装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an area extracting apparatus for detecting a face position, a face outline, a face part, and the like from an input image, and a head direction and a line of sight using the apparatus. The present invention relates to a direction detecting device that detects a direction.

【０００２】[0002]

【従来の技術】顔に関する画像処理は、２０数年来とい
う非常に長い間多くの研究がなされてきた。顔に関する
画像処理は、顔の特徴点抽出による個人同定、テンプレ
ートマッチングなどによる顔部品抽出、色情報を利用し
たデータ圧縮、モーフィングなどのアニメーション応
用、位相情報の適用など、非常に広範囲に渡る。その中
でも、入力された画像中で、顔がどこにあるか、そし
て、顔画像から、目、鼻、口といった顔部品を抽出する
問題は、最も基本的、かつ非常に応用範囲の広い有用な
技術である。これは、顔画像処理の最初のキーポイント
であり、従来から様々な手法が提案されてきた。2. Description of the Related Art A great deal of research has been carried out on image processing relating to faces for a very long time, which has been around 20 years. Image processing related to a face covers a very wide range, such as personal identification by extracting feature points of a face, extraction of face parts by template matching, data compression using color information, application of animation such as morphing, and application of phase information. Among them, the problem of extracting a face part such as eyes, nose, and mouth from a face image in the input image, and the problem of extracting facial parts such as eyes, nose, and mouth are the most basic and very wide-ranging useful technologies. It is. This is the first key point of the face image processing, and various methods have been conventionally proposed.

【０００３】入力画像から顔位置を検出する手法は、画
像の解像度を落として濃淡画像でテンプレートマッチン
グを行う手法や、色情報とセグメンテーションを利用す
る手法などがある。これらのいずれの手法も、光軸方向
の回転やノイズなどに弱く、誤抽出を起こしやすい。[0003] As a method of detecting a face position from an input image, there are a method of performing template matching with a grayscale image by lowering the resolution of the image, and a method of using color information and segmentation. Any of these methods is susceptible to rotation in the optical axis direction, noise, and the like, and is likely to cause erroneous extraction.

【０００４】顔部品の抽出に関しては、投影を用いる方
法が最もアルゴリズムが単純かつ高速な抽出手法である
が、この方法は照明の強度及び方向、画像中の顔の位置
や大きさや回転などの変動に対して非常に弱く、照明の
被写体までの距離や画質などがかなり制約された環境内
で撮像された画像に対してでないと有効でないことがわ
かっている。また、テンプレートマッチングを利用した
顔部品の抽出法も提案されているが、これは投影を用い
る方法と同様に、照明の変動などにより影が生じたり、
顔の方向、回転の変更や個人差による変動などに非常に
弱いということが言える。以下に投影とテンプレートマ
ッチングについて詳しく説明する。[0004] For extraction of facial parts, the method using projection is the simplest and fastest algorithm in terms of the algorithm. However, this method involves variations in the intensity and direction of illumination, the position, size and rotation of the face in the image. It has been found that the method is very effective for images captured in an environment where the distance to the subject of illumination and the image quality are considerably restricted. In addition, a method for extracting face parts using template matching has also been proposed. However, similar to the method using projection, shadows may be generated due to variations in illumination, etc.
It can be said that it is extremely vulnerable to changes in the face direction and rotation and variations due to individual differences. Hereinafter, the projection and the template matching will be described in detail.

【０００５】投影とはある画像をある方向から見た画像
へ変換することである。この場合の方向とは３次元的な
方向とは異なる。例えば、本実施の形態の説明に用いる
図８に示す入力顔画像があるとする。これを水平方向に
微分して２値化すると図４７（ａ）に示すようになる。
水平方向に微分するとは、図４８に示すように、ピクセ
ルの明るさを横方向に順に見ていき、あるピクセルの明
るさと、その前のピクセルの明るさとの差を新しいピク
セルの明るさとすることである。従って、明るさが急激
に変化している画像を微分すると、その変化していると
ころで微分値が大きくなっているため、エッジの抽出な
どを行うことができる。[0005] Projection is the conversion of an image into an image viewed from a certain direction. The direction in this case is different from the three-dimensional direction. For example, assume that there is an input face image shown in FIG. 8 used for describing the present embodiment. When this is differentiated in the horizontal direction and binarized, the result is as shown in FIG.
Differentiating in the horizontal direction means that, as shown in FIG. 48, the brightness of a pixel is viewed in order in the horizontal direction, and the difference between the brightness of a certain pixel and the brightness of the previous pixel is defined as the brightness of a new pixel. It is. Therefore, when differentiating an image in which the brightness is rapidly changing, the differential value is large where the brightness is changing, so that the edge can be extracted.

【０００６】この２値化された水平方向の微分画像を水
平方向に投影すると図４７（ｂ）、垂直方向に投影する
と図４７（ｃ）に示すようになる。このとき、水平方向
に投影するとは、２値化した微分画像のピクセルを水平
方向に見ていき、ＯＮになっているピクセル（図４７
（ａ）では黒いピクセル）の数を数えてそれをグラフに
したものである。垂直方向への投影は、２値化した微分
画像のピクセルを垂直に見ていき、ＯＮになっているピ
クセルの数を数えてそれをグラフにしたものである。こ
の投影画像を調べることにより、顔領域や顔部品領域を
抽出することができる。例えば、図４７（ｂ）の水平方
向の投影画像のピークを調べることにより、目の垂直位
置がわかる。また、図４７（ｃ）の垂直方向の投影画像
の左右のピークを調べることにより、顔領域の左右の端
を調べることができる。When this binarized horizontal differential image is projected in the horizontal direction, it becomes as shown in FIG. 47 (b), and when it is projected in the vertical direction, it becomes as shown in FIG. 47 (c). At this time, projecting in the horizontal direction means that pixels of the binarized differential image are viewed in the horizontal direction, and pixels that are ON (FIG. 47)
(A), the number of black pixels) is counted and graphed. In the vertical projection, the pixels of the binarized differential image are viewed vertically, the number of ON pixels is counted, and the result is graphed. By examining the projection image, a face region and a face part region can be extracted. For example, by examining the peak of the projected image in the horizontal direction in FIG. 47B, the vertical position of the eye can be determined. Also, by examining the left and right peaks of the projected image in the vertical direction in FIG. 47C, the left and right edges of the face area can be examined.

【０００７】しかしながら、この手法は、照明条件が変
わると影のでき方が変わるので微分画像が大きく変化す
ると共に、雑音や背景、顔の回転などにより投影画像が
変化するという問題がある。従って、投影の手法では非
常に限られた条件で撮影された画像でないと、顔領域や
顔部品の抽出ができないということがわかる。即ち、照
明や画像の大きさ、画像中の顔位置や大きさ、顔の向き
などが厳密にコントロールされた環境で撮影された画像
に対してでないと、投影の手法は有効でない。[0007] However, this method has a problem that when the illumination condition is changed, the way the shadow is formed changes, so that the differential image changes greatly and the projected image changes due to noise, background, face rotation, and the like. Therefore, it can be understood that the face technique cannot extract a face region or a face part unless the image is captured under very limited conditions. That is, the projection method is not effective unless the image is captured in an environment in which the lighting, the size of the image, the face position and size in the image, the face direction, and the like are strictly controlled.

【０００８】次にテンプレートマッチングについて説明
する。テンプレートとは、例えば図４９に示す目の画像
のように、抽出すべき領域の画像を予め記憶させたもの
である。テンプレートマッチングは、テンプレートと入
力画像の対応する領域を比べて、最も点数の高い領域を
そのテンプレートにマッチする領域として抽出するとい
うものである。Next, template matching will be described. The template is one in which an image of a region to be extracted is stored in advance, such as an image of an eye shown in FIG. The template matching is to compare a template with a corresponding region of an input image, and extract a region having the highest score as a region matching the template.

【０００９】テンプレートは、入力画像より小さく適当
な大きさの画像である。図５０に示すように、テンプレ
ートの大きさが５×５だとすると、まず、入力画像の左
上の５×５領域と、テンプレートとを比較して評価値を
付ける。比較の方法は、例えば、テンプレートとそれに
対応する入力画像の領域とで、対応するピクセルの明る
さの２乗和の差を点数とする。これは以下のような式と
なる。[0009] The template is an image of an appropriate size smaller than the input image. As shown in FIG. 50, assuming that the size of the template is 5 × 5, first, an upper left 5 × 5 region of the input image is compared with the template to give an evaluation value. The comparison method uses, for example, the difference between the sum of squares of the brightness of the corresponding pixels between the template and the area of the input image corresponding to the template as a score. This results in the following equation.

【００１０】[0010]

【数１】ここで、Ｖ（ｐ）はあるピクセルＰの明るさ、Ｔ_ijはテ
ンプレートの座標（ｉ，ｊ）におけるピクセル、Ｉ_ijは
テンプレートに対応する領域の座標（ｉ，ｊ）における
ピクセルである。(Equation 1) Here, V (p) is the brightness of a certain pixel P, T _ij is the pixel at the coordinates (i, j) of the template, and I _ij is the pixel at the coordinates (i, j) of the area corresponding to the template.

【００１１】図５１に示すように、入力画像左上の５×
５領域について評価値を計算したら、次に１ピクセル水
平方向にずらした５×５領域について同じように評価値
を計算する。同様にして、右下の５×５領域まで、入力
画像のすべての５×５領域とテンプレートとの間で評価
値を計算する。そして、すべての評価値の中で最も点数
の高いものがテンプレートと最も似ている領域として抽
出される。[0011] As shown in FIG. 51, 5 × in the upper left of the input image
After calculating the evaluation values for the five regions, the evaluation values are calculated in the same manner for the 5 × 5 region shifted by one pixel in the horizontal direction. Similarly, the evaluation value is calculated between all the 5 × 5 regions of the input image and the template up to the lower right 5 × 5 region. Then, the one with the highest score among all the evaluation values is extracted as the region most similar to the template.

【００１２】しかしながら、この手法は、抽出しようと
している領域がテンプレートとずれる場合、例えばテン
プレートより大きかったり小さかったりした場合、ま
た、雑音や背景などの影響でテンプレートに似た領域な
どができた場合に誤った抽出を起こしやすい。この手法
も投影を用いた手法と同様に、照明や画像の大きさ、画
像中の顔位置や大きさ、向きなどが厳密にコントロール
された画像に対してでないと有効でない。However, this method is used when the region to be extracted is shifted from the template, for example, when the region is larger or smaller than the template, or when a region similar to the template is formed due to noise or background. Prone to incorrect extraction. As with the method using projection, this method is not effective unless the illumination and the size of the image, the position, size, and orientation of the face in the image are strictly controlled.

【００１３】上記投影やテンプレートマッチングの欠点
である照明や画像の大きさ、画像中の顔位置や大きさ、
向きなどが厳密にコントロールされた環境で撮像された
画像という前提条件をある程度緩和するものとして、De
formable template と呼ばれる手法を用いた顔部品の抽
出が提案されている。The disadvantages of the above-described projection and template matching are illumination and image size, face position and size in the image,
As a prerequisite to alleviate the prerequisite of images taken in an environment where the orientation is strictly controlled, De
Extraction of facial parts using a method called formable template has been proposed.

【００１４】これは、図５２に示すように、抽出しよう
とする領域の特徴を単純な数学的な図形関数の組み合わ
せで表現したテンプレート（図中の点線）と、そのテン
プレートに対応する画像との間でエネルギーを計算し
て、最もエネルギーが小さくなるようにテンプレートを
変形、移動させることによって領域を抽出しようという
ものである。As shown in FIG. 52, a template (dotted line in the drawing) expressing the feature of the region to be extracted by a combination of simple mathematical graphic functions and an image corresponding to the template The energy is calculated between them, and the area is extracted by deforming and moving the template so that the energy is minimized.

【００１５】テンプレートは、例えば、次のように設定
されている。図５３に示すように、中心ｘ_c・半径ｒの
円が瞳に相当する。円周は瞳と白目のエッジに引き寄せ
られる一方、円内部は明るさの落ち込んだ領域に引き寄
せられるようにエネルギーの定義を行う。The template is set, for example, as follows. As shown in FIG. 53, a circle having a center x _c and a radius r corresponds to a pupil. The energy is defined so that the circumference is drawn to the edges of the pupil and the white eye, while the inside of the circle is drawn to the area where the brightness is reduced.

【００１６】目の境界は、焦点をｘ_eに持つ上下に凸の
２つの放物線の一部が合わさって形成される。焦点ｘ_e
から目の片端までのまでの幅をｂ’、即ち目の幅を２
ｂ’、上に凸の放物線の高さをａ’、下に凸の放物線の
高さをｃ’、焦点ｘ_eを通る水平線と目の中心線（図
中、焦点ｘ_eを通る点線）とのなす角を回転角θとす
る。エネルギーは、目の境界がエッジに引き寄せられる
ように定義する。The boundary of the eye is formed by combining two vertically convex parabolas having a focal point of x _e . Focus x _e
The width from b to one end of the eye is b ', that is, the width of the eye is 2
b ′, the height of the upwardly convex parabola is a ′, the height of the downwardly convex parabola is c ′, the horizontal line passing through the focal point x _e and the center line of the eye (the dotted line passing through the focal point x _{e in the} figure) Is the rotation angle θ. Energy defines the eye boundaries to be drawn to the edges.

【００１７】瞳により２分割された白目領域のそれぞれ
の中心は、ｘ_e＋ｐ₁( cosθ， sinθ）、及びｘ_e＋ｐ
₂( cosθ， sinθ）で表される（ｐ₁ ≧０，ｐ₂ ≦
０）。２つの白目領域の中心が近傍画素の中で最も明る
い領域の中心に引き寄せられるようにエネルギーの定義
を行う。The centers of the white-eye regions divided into two by the pupil are x _e + p ₁ (cos θ, sin θ) and x _e + p
₂ (cos θ, sin θ) (p ₁ ≧ 0, p ₂ ≦
0). Energy is defined so that the centers of the two white-eye regions are drawn to the center of the brightest region among the neighboring pixels.

【００１８】目の境界と瞳の間は白目に相当し、明度の
大きい領域に引き寄せられるようにエネルギーの定義を
行う。The energy between the boundary of the eye and the pupil corresponds to the white eye, and the energy is defined so as to be drawn to a region having a high brightness.

【００１９】これらの要素には、中心ｘ_cと焦点ｘ_e
とは互いに引き寄せ合い、一致したときにエネルギーが
最小となる、目の幅２ｂ’はほぼ半径ｒの４倍とな
る、白目領域の２つの中心点は目の中心線上に乗るよ
うな力が作用する。These elements include a center x _c and a focus x _e
Attracts each other and minimizes energy when they match each other, the eye width 2b 'is almost four times the radius r, and the two center points of the white-eye region act on the center line of the eye. I do.

【００２０】以上のパラメータを少しずつ変化させ、エ
ネルギー関数Ｅ_c(x_c,x_e,p₁,p₂,r,a',b',c',θ）が最
小となるようにテンプレートが変形することにより目に
マッチングする。エネルギー関数Ｅ_cは以下の式で表さ
れる。By changing the above parameters little by little, the template is reduced so that the energy function E _c (x _c , x _e , p ₁ , p ₂ , r, a ′, b ′, c ′, θ) is minimized. Matches eyes by deforming. Energy function E _c is expressed by the following equation.

【００２１】[0021]

【数２】ここで、Ｅ_vは谷エネルギー、Ｅ_eはエッジエネルギ
ー、Ｅ_iは画像エネルギー、Ｅ_pはピークエネルギー、
Ｅ_intは内部エネルギーであり、次のように定義され
る。(Equation 2) Where E _v is the valley energy, E _e is the edge energy, E _i is the image energy, _Ep is the peak energy,
E _int is an internal energy and is defined as follows.

【００２２】[0022]

【数３】 (Equation 3)

【００２３】[0023]

【数４】 (Equation 4)

【００２４】[0024]

【数５】 (Equation 5)

【００２５】[0025]

【数６】 (Equation 6)

【００２６】[0026]

【数７】 (Equation 7)

【００２７】[0027]

【数８】上記の式は、幾何的な構成が最適になったときにそれぞ
れのエネルギーが最小になるように定義されている。こ
の方法は、画像の大きさ、画像面に平行な回転、頭の回
転や照明の影響を受けにくいという特徴がある。(Equation 8) The above equations are defined such that each energy is minimized when the geometric configuration is optimized. This method is characterized in that it is hardly affected by the size of the image, rotation parallel to the image plane, rotation of the head, and illumination.

【００２８】上記のように顔領域や顔部品を検出する以
外に、頭部方向や視線方向を検出する技術も開示されて
いる。In addition to the detection of the face area and the face parts as described above, a technique for detecting the head direction and the gaze direction has also been disclosed.

【００２９】頭部方向を検出する方法としては、磁気セ
ンサなどを用いた方法がある。これは、図５４に示すよ
うに、利用者が頭部に３つ以上の電磁波を放射する電磁
波放射装置を装着すると共に、電磁波放射装置からの電
磁波を受信する電磁波受信機を床面などに３箇所以上設
置する構成である。これによれば、１つの電磁波放射装
置につき３箇所以上の電磁波受信機が受信する信号の位
相差を利用して、各電磁波放射装置の３次元位置を検出
し、電磁波放射装置が取り付けられている頭部の向きを
検出することができる。As a method of detecting the head direction, there is a method using a magnetic sensor or the like. This is because, as shown in FIG. 54, a user wears an electromagnetic wave radiating device that emits three or more electromagnetic waves on his / her head and mounts an electromagnetic wave receiver that receives electromagnetic waves from the electromagnetic wave radiating device on a floor or the like. It is configured to be installed in more than one place. According to this, the three-dimensional position of each electromagnetic wave radiation device is detected by using the phase difference of signals received by three or more electromagnetic wave receivers for one electromagnetic wave radiation device, and the electromagnetic wave radiation device is attached. The direction of the head can be detected.

【００３０】また、専用ハードウェアを用いずに頭部位
置を検出する方法もある。これは、図５５に示すよう
に、利用者が頭部の適当な位置に３箇所以上、目立つ色
のシールなどを貼るか、あるいはシールを貼ったヘッド
ギアなどをかぶると共に、２つ以上の撮像装置を備える
構成である。これによれば、シールを撮影した２つ以上
の画像からシールの３次元位置を測定し、頭部の方向を
検出することができる。There is also a method of detecting the position of the head without using dedicated hardware. This is because, as shown in FIG. 55, the user puts three or more conspicuous color stickers or the like at appropriate positions on the head or wears headgear or the like on which the stickers are stuck. It is a structure provided with. According to this, the three-dimensional position of the seal can be measured from two or more images of the seal, and the direction of the head can be detected.

【００３１】また、視線を検出するには、普通、装着型
の視線トラッカーを用いる。視線トラッカーには、常に
眼球を捉えて眼球の画像を撮影するために、眼球のすぐ
近くにＣＣＤなどの撮像装置が配置されている。In order to detect the gaze, a wearable gaze tracker is usually used. In the eye-gaze tracker, an imaging device such as a CCD is arranged immediately near the eyeball in order to constantly capture the eyeball and capture an image of the eyeball.

【００３２】[0032]

【発明が解決しようとする課題】しかしながら、上記De
formable template の手法は、抽出しようとしている部
品を表すテンプレートの初期位置を別手法を用いて設定
しなければならないという問題がある。これは、初期位
置として抽出したい領域近くにテンプレートを位置させ
ないと誤抽出が多くなると共に、計算量が莫大になるか
らである。別手法としては上述の投影やテンプレートマ
ッチングなどがあげられるが、テンプレートマッチング
のような従来手法の欠点を克服するための手法が、前提
としてそのような従来技術を用いなければならないこと
が大きな矛盾となっている。However, the above De
The formable template method has a problem that the initial position of the template representing the part to be extracted must be set using another method. This is because if the template is not located near the region to be extracted as the initial position, erroneous extraction increases and the amount of calculation becomes enormous. Other methods include the above-mentioned projection and template matching.However, there is a major inconsistency that a method for overcoming the drawbacks of the conventional method such as template matching must use such a conventional technique as a premise. Has become.

【００３３】結果として、Deformable template の手法
による顔輪郭・顔部品の抽出も、投影やテンプレートマ
ッチングの手法と同じような問題が発生する。即ち、De
formable template の手法は、その抽出の精度と、照明
条件や画像の大きさ、画像中の顔位置や大きさ、回転、
背景などの変化に対して弱く、厳密にコントロールされ
た環境で撮影された画像に対してでないと有効でない。
つまり、普通の環境で人物を撮影する場合、常に同じ照
明を用い、常に同じ背景・方向・向きに設定されている
ことはあり得ず、従って、通常の状態で撮影された画像
に対してそれらの技術を適用すると、誤った領域の抽出
を起こしかねない。また、エネルギーの定義を部品ごと
に定義しなければならないという問題も有している。As a result, the same problem as that of the projection or template matching method occurs in the extraction of the face outline / face part by the Deformable template method. That is, De
The formable template method uses the extraction accuracy, lighting conditions, image size, face position and size in the image, rotation,
It is vulnerable to changes in the background and the like, and is effective only for images taken in a strictly controlled environment.
In other words, when photographing a person in a normal environment, it is unlikely that the same lighting will always be used and the background, direction, and orientation will always be the same. Applying this technique may cause the extraction of an incorrect region. Another problem is that energy must be defined for each component.

【００３４】また、電磁波放射装置を用いて頭部方向を
検出する方法は、電磁波放射装置を頭部に動かないよう
にしっかり固定する必要があり、また、そこに電源を供
給する仕組みも必要となる。さらに、電磁波放射装置か
ら一定距離以上離れた床などに電磁波受信機を固定して
置かなくてはならない。この結果、頭部にハードウェア
を装着しなければならないとうい煩わしさと、コストが
かかるという欠点がある。一方、頭部にシールなどのマ
ーカーを貼る方法は低コストで実現できるが、シールを
貼ったり、あるいはヘッドギア状のものを被ったりとい
う煩わしさを除去することはできないという問題を有し
ている。The method of detecting the head direction using an electromagnetic wave radiating device requires that the electromagnetic wave radiating device be firmly fixed so as not to move to the head, and that a mechanism for supplying power thereto is also required. Become. Further, the electromagnetic wave receiver must be fixedly placed on a floor or the like which is at least a certain distance from the electromagnetic wave radiation device. As a result, there are drawbacks in that it is troublesome to attach hardware to the head and the cost is high. On the other hand, a method of attaching a marker such as a seal to the head can be realized at low cost, but has a problem that it is not possible to eliminate the troublesomeness of attaching a seal or wearing a headgear-shaped object.

【００３５】さらに、視線方向を検出する方法は、顔に
固定して動かないような堅固な構造を持つ視線トラッカ
ーを装着するために、装着感が非常に悪く、また、コス
トもかかるという問題を有している。Furthermore, the method of detecting the direction of the line of sight has a problem that the user has to wear a line of sight tracker having a rigid structure that is fixed to the face and does not move. Have.

【００３６】[0036]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明の請求項１記載の領域抽出装置は、対象物
を撮像してカラーデジタル画像とする撮像手段（例え
ば、撮像装置）と、上記カラーデジタル画像において均
一に同様の色を持つ所望領域を示す確率密度関数を予め
記憶しておく確率密度関数格納手段（例えば、確率密度
関数格納部）と、上記所望領域を構成する各画素（ピク
セル）の色に基づいて上記確率密度関数を計算すること
により、各画素の色の確からしさを算出する領域確率計
算手段（例えば、領域確率計算装置）と、上記カラーデ
ジタル画像上の任意の３つ以上の点（例えば、格子点）
を接続することにより形成され画像全体を覆うように設
けられる仮想的な網の情報を記憶しておく網情報格納手
段（例えば、顔領域抽出用網情報格納部）と、上記点の
位置関係に基づいて定義される網の内部エネルギーと、
点が位置する画素の色の確からしさに基づいて定義され
る網の画像エネルギーとをそれぞれ計算するエネルギー
計算手段（例えば、エネルギー計算装置）とを有し、１
つの点を移動させたときの網の内部エネルギーと画像エ
ネルギーとの合計値を、上記点を移動させる前の合計値
と比較し、合計値が変化するときは網が収縮する方向へ
点の移動を行い、合計値が変化しなくなった時点で点の
移動を停止することを特徴としている。In order to achieve the above object, an area extracting apparatus according to the first aspect of the present invention is an image pickup means (for example, an image pickup apparatus) which picks up an object and converts it into a color digital image. A probability density function storage unit (for example, a probability density function storage unit) that stores in advance a probability density function indicating a desired region having a similar color uniformly in the color digital image; A region probability calculation means (for example, a region probability calculation device) for calculating the probability of the color of each pixel by calculating the probability density function based on the color of the pixel; Three or more points (eg, grid points)
And a network information storage unit (for example, a face area extraction network information storage unit) configured to store virtual network information provided by covering the entire image and connected to the point. The internal energy of the net defined based on
Energy calculating means (for example, an energy calculating device) for calculating the image energy of the net defined based on the likelihood of the color of the pixel where the point is located, and
The total value of the internal energy of the net and the image energy when moving one point is compared with the total value before moving the point, and when the total value changes, the point moves in the direction in which the net contracts. And the movement of the point is stopped when the total value no longer changes.

【００３７】上記構成によれば、まず、撮像手段により
対象物が撮影され、対象物のカラーデジタル画像が得ら
れる。そして、領域確率計算手段が確率密度関数格納手
段を参照し、カラーデジタル画像における抽出したい領
域を構成する各画素の色の値を確率密度関数に代入して
計算することにより、各画素の色の確からしさが算出さ
れる。このとき、所望領域は均一に同様の色で形成され
ており、かつ確率密度関数も所望領域であることを表す
関数であるので、所望領域の色の確からしさが大きくな
る。According to the above arrangement, first, the object is photographed by the imaging means, and a color digital image of the object is obtained. Then, the region probability calculation unit refers to the probability density function storage unit, and substitutes the color value of each pixel constituting the region to be extracted in the color digital image into the probability density function for calculation, thereby calculating the color of each pixel. Probability is calculated. At this time, the desired area is uniformly formed with the same color, and the probability density function is also a function indicating that the area is the desired area. Therefore, the likelihood of the color of the desired area increases.

【００３８】次に、入力されたカラーデジタル画像全体
を覆うように設けられた仮想的な網を構成する点を１つ
ずつ移動させ、点を移動する前と後とで網のエネルギー
に変化があるかどうかを調べる。ここで、網のエネルギ
ーとは、エネルギー計算手段により求められた、網の形
状を反映する内部エネルギーと網を構成する点が位置す
る画素の色を反映する画像エネルギーとの合計値であ
る。Next, the points constituting the virtual net provided to cover the entire input color digital image are moved one by one, and the energy of the net changes before and after the point is moved. Check if there is. Here, the energy of the net is the total value of the internal energy that reflects the shape of the net and the image energy that reflects the color of the pixel at which the points forming the net are located, which is obtained by the energy calculating means.

【００３９】このエネルギーの合計値に変化がある場合
には網が収縮する方向へ点を移動させ、合計値に変化が
なくなると点の移動を停止させる。即ち、初めに画像全
体を覆っていた網は、色の確からしさが大きい所望領域
に向かって収縮変形し、最終的に所定領域を囲んで停止
することになる。従って、所望領域を抽出することがで
きる。When there is a change in the total value of the energy, the point is moved in the direction in which the net contracts, and when there is no change in the total value, the movement of the point is stopped. That is, the net that initially covers the entire image contracts and deforms toward the desired area where the likelihood of color is high, and eventually stops around the predetermined area. Therefore, a desired area can be extracted.

【００４０】これにより、本領域抽出装置は照明条件の
変化に強い色の確からしさに基づいて所望領域を抽出す
ることができるので、撮影する場所、時間、補助照明の
有無、方向などに左右されることがなくなる。また、網
はどのような形状にもなり得るので、入力画像の大き
さ、及び画像中の顔位置や大きさなどに制限されること
がなくなる。これらの結果、撮影の自由度を増大させる
ことができる。Thus, the present region extraction apparatus can extract a desired region based on the likelihood of a color that is resistant to changes in lighting conditions, and is thus affected by the location, time, presence or absence of auxiliary lighting, direction, etc. No more. Further, since the net can have any shape, the size of the input image and the position and size of the face in the image are not limited. As a result, the degree of freedom of photographing can be increased.

【００４１】請求項２記載の領域抽出装置は、請求項１
記載の構成に加えて、上記対象物は人物であり、上記所
望領域は人物の顔の領域であることを特徴としている。According to a second aspect of the present invention, there is provided an area extracting apparatus.
In addition to the configuration described above, the object is a person, and the desired area is an area of a face of the person.

【００４２】上記構成によれば、まず、撮像手段により
人物を撮影することにより人物画像が得られる。この画
像に対して仮想的な網を用いると、顔領域に網が収縮し
て顔領域を抽出することができる。According to the above arrangement, first, a person image is obtained by photographing a person by the image pickup means. If a virtual mesh is used for this image, the mesh shrinks to the face region and the face region can be extracted.

【００４３】これにより、さまざまな家電製品などの機
器に利用することができる。即ち、機器に備えられた領
域抽出装置が顔領域を抽出することにより、機器と利用
者とがどのような位置関係にあるのかを判断することが
できる。そして、機器の前に利用者が存在するか否か、
また存在するのならばどの辺りにいるのかの情報を得る
ことができる。これらの情報を利用して、利用者の位置
による機器の制御を行うことが可能となる。Thus, it can be used for various appliances such as home appliances. That is, by extracting the face region by the region extraction device provided in the device, it is possible to determine what positional relationship the device and the user have. And whether there is a user in front of the device,
Also, if it exists, information about where it is can be obtained. Using such information, it is possible to control the device based on the position of the user.

【００４４】請求項３記載の領域抽出装置は、請求項１
または２記載の構成に加えて、確率密度関数格納手段に
は上記所望領域と異なる色を持つ部分領域を示す確率密
度関数が予め記憶されると共に、網情報格納手段には上
記所望領域を囲んで停止した網の情報が部分領域を抽出
するための部分領域抽出用網の初期値として記憶され、
上記所望領域内に部分領域が１つある場合に、領域確率
計算手段が上記部分領域を構成する各画素の色に基づい
て上記確率密度関数を計算することにより各画素の色の
確からしさを算出して、エネルギー計算手段が部分領域
抽出用網の内部エネルギー及び画像エネルギーと、部分
領域に収束されるように定義された移動エネルギーとを
それぞれ計算することにより、部分領域を抽出すること
を特徴としている。According to the third aspect of the present invention, there is provided an area extracting apparatus.
Or, in addition to the configuration described in 2, the probability density function indicating a partial area having a color different from the desired area is stored in advance in the probability density function storage means, and the desired area is surrounded by the network information storage means. Information of the stopped network is stored as an initial value of the partial area extraction network for extracting the partial area,
When there is one partial area in the desired area, the area probability calculation means calculates the probability density function based on the color of each pixel constituting the partial area, thereby calculating the likelihood of the color of each pixel. Extracting the partial region by calculating the internal energy and image energy of the partial region extraction network and the moving energy defined so as to converge on the partial region. I have.

【００４５】上記構成によれば、請求項１または２にお
いて所定領域（顔領域）を囲んで停止した網の情報が部
分領域を抽出する部分領域抽出用網の初期値として網情
報格納手段に記憶される。また、確率密度関数格納手段
には上記所望領域と異なる色を持つ部分領域を示す確率
密度関数が予め記憶されている。そして、所望領域内に
部分領域が１つある場合に、部分領域の各画素の色の確
からしさを求めた後、部分領域に収束されるように定義
された移動エネルギーを持つ網を動かすことで部分領域
を抽出する。即ち、初めに所定領域を覆っていた網は、
色の確からしさが大きい部分領域に向かって収縮変形・
移動し、最終的に部分領域を囲んで停止することにな
る。According to the above construction, the information of the network stopped surrounding the predetermined area (face area) in claim 1 or 2 is stored in the network information storage means as an initial value of the partial area extracting network for extracting the partial area. Is done. Further, a probability density function indicating a partial region having a color different from the desired region is stored in the probability density function storage means in advance. Then, when there is one partial region in the desired region, the likelihood of the color of each pixel in the partial region is determined, and then a net having a moving energy defined to be converged on the partial region is moved. Extract a partial region. That is, the net initially covering the predetermined area is
Shrinking deformation toward partial area where color certainty is large
It will move and eventually stop around the partial area.

【００４６】これにより、最初の抽出領域に用いた網
を、さらに領域内の別の領域の抽出に用いることができ
るので、領域毎に網の設定を変える必要がない。また、
請求項２においては、顔領域が抽出されると、顔領域と
色の異なる口唇領域や目領域が抽出できる。As a result, the net used for the first extraction area can be used for extracting another area in the area, so that it is not necessary to change the setting of the net for each area. Also,
According to the second aspect, when the face region is extracted, a lip region or an eye region having a different color from the face region can be extracted.

【００４７】請求項４記載の領域抽出装置は、請求項３
記載の構成に加えて、部分領域抽出用網の外周を形成す
る点からフレーム重心を求めると共に、画像の明るさを
重みとした部分領域抽出用網が囲む領域の重心を求める
重心計算手段（例えば、重心計算装置）を有し、上記移
動エネルギーを、上記フレーム重心が領域の重心の方向
に引き寄せられるように定義することを特徴としてい
る。According to the fourth aspect of the present invention, there is provided an area extracting apparatus.
In addition to the configuration described above, a barycenter calculating means (for example, a center of gravity of a region surrounded by the partial region extraction network with the brightness of the image as a weight, as well as a frame centroid from a point forming the outer periphery of the partial region extraction network) , A center of gravity calculation device), and the moving energy is defined so that the frame center of gravity is drawn in the direction of the center of gravity of the region.

【００４８】上記構成によれば、部分領域抽出用が囲む
領域の重心が画像の明るさを重みとしているので、領域
重心の位置はフレーム重心の位置に比べて抽出したい領
域近くに位置することになる。このとき、フレーム重心
が領域重心の方向に引き寄せられるように移動エネルギ
ーが設定されているので、網は部分領域に向かって移動
する。これにより、所定領域内に部分領域に似た色を持
つ領域があったとしても、その領域に引き寄せられるこ
となく、部分領域を抽出することができる。この結果、
処理速度が速く、高精度の部分領域の抽出が可能とな
る。According to the above configuration, the brightness of the image is weighted by the center of gravity of the area surrounded by the partial area extraction, so that the position of the area center of gravity is closer to the area to be extracted than the position of the frame center of gravity. Become. At this time, since the moving energy is set so that the frame center of gravity is drawn in the direction of the area center of gravity, the net moves toward the partial area. Thereby, even if there is an area having a color similar to the partial area in the predetermined area, the partial area can be extracted without being drawn to the area. As a result,
The processing speed is high, and it is possible to extract a partial region with high accuracy.

【００４９】請求項５記載の領域抽出装置は、請求項２
記載の構成に加えて、網の外周を形成する点からフレー
ム重心を求めると共に、画像の明るさを重みとして網が
囲む領域の重心を求める重心計算手段を有し、確率密度
関数格納手段には口唇領域を示す確率密度関数が予め記
憶されると共に、網情報格納手段には上記顔領域を囲ん
で停止した網の情報が口唇領域及び左右の目領域を抽出
するための網の初期値として記憶され、領域確率計算手
段が、上記口唇領域を構成する各画素の色に基づいて上
記確率密度関数を計算することにより各画素の色の確か
らしさを算出すると共に、上記左右の目領域を構成する
各画素の色に基づいて顔領域あるいは口唇領域の確率密
度関数を計算することにより各画素の色の確からしさを
算出して、エネルギー計算手段が、各領域の内部エネル
ギー及び画像エネルギーと、各々のフレーム重心が各々
の領域重心の方向へ引き寄せられ、かつ、３つの領域の
重心の位置関係が予め測定され記憶された口唇及び左右
の目の位置関係と等しくなるように定義されたバランス
エネルギーとを計算することにより、口唇及び左右の目
領域を抽出することを特徴としている。According to the fifth aspect of the present invention, there is provided an area extracting apparatus.
In addition to the configuration described above, a center of gravity calculating means for obtaining a frame center of gravity from a point forming the outer periphery of the network, and obtaining a center of gravity of an area surrounded by the network using the brightness of the image as a weight, the probability density function storage means The probability density function indicating the lip region is stored in advance, and the information of the net stopped surrounding the face region is stored in the net information storage means as initial values of the net for extracting the lip region and the left and right eye regions. The region probability calculation means calculates the probability of each pixel by calculating the probability density function based on the color of each pixel constituting the lip region, and configures the left and right eye regions. The likelihood of the color of each pixel is calculated by calculating the probability density function of the face area or the lip area based on the color of each pixel, and the energy calculation means calculates the internal energy of each area and the image energy. And the center of gravity of each frame is drawn in the direction of the center of gravity of each region, and the positional relationship of the centers of gravity of the three regions is defined to be equal to the previously measured and stored positional relationship of the lips and left and right eyes. The lip and the left and right eye regions are extracted by calculating the balance energy.

【００５０】上記構成によれば、請求項２において顔領
域を囲んで停止した網の情報が口唇領域及び左右の目領
域を抽出するための網の初期値として網情報格納手段に
記憶される。また、確率密度関数格納手段には口唇領域
の確からしさを示す確率密度関数が予め記憶されてい
る。そして、口唇領域の各画素の色の確からしさと、左
右の目領域の各画素の色の確からしさとを求めた後、内
部エネルギー、画像エネルギー、及びバランスエネルギ
ーを持つ３つの網を動かすことで口唇及び左右の目領域
を抽出する。即ち、初めに顔領域を覆っていた網は、色
の確からしさが大きい３つの領域に向かってそれぞれ収
縮変形・移動し、最終的に各領域を囲んで停止すること
になる。According to the above configuration, the information of the net stopped surrounding the face region in claim 2 is stored in the net information storage means as the initial value of the net for extracting the lip region and the left and right eye regions. Further, the probability density function indicating the likelihood of the lip region is stored in the probability density function storage means in advance. Then, after obtaining the certainty of the color of each pixel in the lip area and the certainty of the color of each pixel in the left and right eye areas, by moving three nets having internal energy, image energy, and balance energy, The lips and the left and right eye regions are extracted. That is, the net that initially covers the face area contracts and deforms and moves toward three areas having high color certainty, and finally stops around each area.

【００５１】これにより、顔領域抽出用網を、口唇領域
抽出用網及び左右の目領域抽出用網とすることができる
ので、領域毎に網の設定を変える必要がない。Thus, the face area extracting net can be the lip area extracting net and the left and right eye area extracting nets, so that it is not necessary to change the setting of the net for each area.

【００５２】また、顔領域に対する顔部品の位置がわか
るので、顔の回転角（顔の傾き）を認識することができ
る。顔の回転角がわかると、例えば、人物データベース
を作る際に、写真をスキャナなどで読み込む場合に、写
真の方向などを気にせずに入力することができ、手間が
大いに軽減する。この結果、撮影時、及び処理時の自由
度が増大する。Further, since the position of the face part with respect to the face area is known, the face rotation angle (face inclination) can be recognized. If the angle of rotation of the face is known, for example, when creating a person database, when a photograph is read by a scanner or the like, it is possible to input without worrying about the direction of the photograph and the like, and labor is greatly reduced. As a result, the degree of freedom during photographing and processing increases.

【００５３】さらに、顔領域と口唇領域とのバランスに
より、顔のおおよその方向を推定することができる。即
ち、口唇領域の中心が顔領域の主軸に近い位置にあるな
らば、利用者は撮像手段の方向を向いており、左右のど
ちらかにずれていればそれぞれの方向に向いていること
になる。Further, the approximate direction of the face can be estimated based on the balance between the face area and the lip area. That is, if the center of the lip region is located at a position close to the main axis of the face region, the user is facing the direction of the imaging unit, and if the center is shifted to the left or right, the user is facing each direction. .

【００５４】請求項６記載の方向検出装置は、請求項５
記載の領域抽出装置を用いて、上記撮像手段が複数個設
けられ、複数の撮像手段から同時に得られる複数の画像
から口唇領域及び左右の目領域をそれぞれ抽出し、互い
に比較することにより口唇及び目の３次元位置情報を導
出することを特徴としている。According to a sixth aspect of the present invention, there is provided a direction detecting device.
A plurality of the above-mentioned image pickup means are provided using the region extraction device described above, and a lip area and a left and right eye area are respectively extracted from a plurality of images obtained simultaneously from the plurality of image pickup means, and the lip area and the eye area are compared with each other. Is derived.

【００５５】上記構成によれば、複数の撮像手段により
同時に同じ人物を撮影すると、画面上で人物の位置が異
なる複数の画像が得られる。そして、請求項５記載の構
成の領域抽出装置により複数の画像における口唇領域及
び目領域がそれぞれ抽出される。それらの複数の画像に
おいて、対応する顔部品どうしを互いに比較すると、画
像どうしのずれにより口唇、左目、及び右目それぞれの
３次元空間上の３点が決まる。これにより、頭部がどの
方向に向いているかを検出することが可能となる。According to the above arrangement, when the same person is photographed simultaneously by a plurality of image pickup means, a plurality of images having different positions of the person on the screen are obtained. Then, the lip region and the eye region in the plurality of images are respectively extracted by the region extraction device having the configuration described in claim 5. When the corresponding face parts are compared with each other in the plurality of images, three points in the three-dimensional space of the lips, the left eye, and the right eye are determined by the shift between the images. This makes it possible to detect in which direction the head is facing.

【００５６】このようにして、従来のような専用のハー
ドウェアを用いなくとも、容易に頭部方向を検出するこ
とができる。この結果、ハードウェアなどを装着する煩
わしさを除去することができると共に、装置の低コスト
化が可能となる。In this way, the head direction can be easily detected without using dedicated hardware as in the related art. As a result, it is possible to eliminate the trouble of mounting hardware and the like, and to reduce the cost of the apparatus.

【００５７】請求項７記載の方向検出装置は、請求項６
記載の構成に加えて、抽出された目の画像と、予め測定
して視線方向の情報が付加されて記憶された目の画像と
を比較することにより推定視線方向を算出し、さらに推
定視線方向と検出された頭部の向きとを比較することに
より真の視線方向を検出することを特徴としている。According to a seventh aspect of the present invention, there is provided a direction detecting device.
In addition to the configuration described above, the estimated eye direction is calculated by comparing the extracted eye image with an eye image that has been measured and stored with the eye direction information added thereto in advance, and the estimated eye direction is further calculated. Then, the true gaze direction is detected by comparing the detected head direction.

【００５８】上記構成によれば、予めの測定により黒目
の位置と視線方向との関係がある程度わかっているの
で、請求項６記載の構成にて抽出された目の画像を用い
て、推定視線方向を算出することができる。推定視線方
向が得られると、それをさらに頭部の向きと比較して補
正を行い、真の視線方向を検出することができる。According to the above configuration, since the relationship between the position of the iris and the line of sight is known to some extent by the pre-measurement, the estimated line of sight is obtained by using the eye image extracted by the structure of claim 6. Can be calculated. When the estimated gaze direction is obtained, the estimated gaze direction is further compared with the direction of the head to perform correction, and the true gaze direction can be detected.

【００５９】これにより、従来のような専用のハードウ
ェアを用いなくとも、容易に視線方向を検出することが
できる。この結果、ハードウェアなどを装着する煩わし
さを除去することができると共に、装置の低コスト化が
可能となる。Thus, the gaze direction can be easily detected without using dedicated hardware as in the related art. As a result, it is possible to eliminate the trouble of mounting hardware and the like, and to reduce the cost of the apparatus.

【００６０】[0060]

BEST MODE FOR CARRYING OUT THE INVENTION

〔実施の形態１〕本発明の実施の一形態について図１な
いし図９、図１１ないし図１９に基づいて説明すれば、
以下の通りである。[Embodiment 1] An embodiment of the present invention will be described with reference to FIGS. 1 to 9 and FIGS. 11 to 19.
It is as follows.

【００６１】本実施の形態にかかる領域抽出装置は、カ
ラーデジタル画像上の任意の点（ピクセル）を結ぶこと
によって形成される仮想的な網のモデルを用いることに
より、顔領域を抽出する装置である。The area extracting apparatus according to the present embodiment is an apparatus for extracting a face area by using a virtual net model formed by connecting arbitrary points (pixels) on a color digital image. is there.

【００６２】上記領域抽出装置は、図１に示すように、
人物などの対象物を撮像してカラーデジタル画像とする
ＣＣＤなどの撮像装置（撮像手段）である入力装置１、
各種計算処理を行う演算装置２、データを予め記憶した
り、一時的にデータを記憶する記憶装置３、及び網が目
的の領域に収束した時点の結果を出力する出力装置４に
より構成される。As shown in FIG.
An input device 1 which is an imaging device (imaging means) such as a CCD which images a target object such as a person and converts it into a color digital image;
An arithmetic unit 2 performs various calculation processes, a storage device 3 that stores data in advance or temporarily stores data, and an output device 4 that outputs a result when the network converges on a target area.

【００６３】上記演算装置２は、領域確率計算装置（領
域確率計算手段）５及びエネルギー計算装置（エネルギ
ー計算手段）６を有する。領域確率計算装置５は、入力
された画像において目的の領域を抽出する前段階とし
て、後述の確率密度関数格納部７を参照することによ
り、色情報としての領域の確率密度関数を導出し後述の
領域確率画像を生成する装置である。エネルギー計算装
置６は、後述の顔領域抽出用網情報格納部９を参照する
ことにより網の内部エネルギー（Ｅ_int）を計算すると
共に、顔領域抽出用網情報格納部９及び領域確率画像格
納部８を参照することにより画像エネルギー
（Ｅ_image）を計算する。The arithmetic unit 2 has a region probability calculation device (region probability calculation means) 5 and an energy calculation device (energy calculation means) 6. The region probability calculating device 5 derives a probability density function of a region as color information by referring to a probability density function storage unit 7 described below as a pre-stage for extracting a target region in the input image, and This is an apparatus for generating a region probability image. The energy calculation device 6 calculates the internal energy (E _int ) of the network by referring to a face area extraction network information storage unit 9 described later, and also stores the face area extraction network information storage unit 9 and the area probability image storage unit. 8 to calculate the image energy (E _image ).

【００６４】記憶装置３は、確率密度関数格納部（確率
密度関数格納手段）７、領域確率画像格納部８、及び顔
領域抽出用網情報格納部（網情報格納手段）９を有す
る。The storage device 3 has a probability density function storage unit (probability density function storage unit) 7, an area probability image storage unit 8, and a face area extraction network information storage unit (network information storage unit) 9.

【００６５】確率密度関数格納部７には予めサンプリン
グすることにより求められた顔領域の肌の色の確率密度
関数が格納されている。領域確率画像格納部８には上記
領域確率計算装置５で生成された領域確率画像が格納さ
れる。The probability density function storage unit 7 stores the probability density function of the skin color of the face area obtained by sampling in advance. The region probability image storage unit 8 stores the region probability image generated by the region probability calculation device 5.

【００６６】顔領域抽出用網情報格納部９には、入力画
像上の複数の点を接続することにより形成され画像全体
を覆うように設けられる仮想的な網の情報が記憶されて
いる。即ち、仮想的な網を構成する格子点の初期座標が
予め格納されている。この初期座標は、入力された画像
に対して均等に位置するように自動的に計算されて設定
される。また、理論的には格子点が３点あれば領域を形
成することができるが、抽出する精度や抽出しようとす
る形、画像の大きさに合わせて格子点の数は設定され
る。また、顔領域抽出用網情報格納部９には網が移動す
ることにより変化した後の格子点の座標も記憶される。
さらに、顔領域抽出用網情報格納部９にはエネルギー計
算装置６により計算され求められた内部エネルギー（Ｅ
_int）及び画像エネルギー（Ｅ_image）が格納される。The face area extraction network information storage 9 stores information on a virtual network formed by connecting a plurality of points on the input image and provided so as to cover the entire image. That is, the initial coordinates of the grid points that make up the virtual network are stored in advance. The initial coordinates are automatically calculated and set so as to be evenly positioned with respect to the input image. Further, theoretically, an area can be formed if there are three grid points, but the number of grid points is set according to the accuracy of extraction, the shape to be extracted, and the size of the image. The face area extracting mesh information storage unit 9 also stores the coordinates of the lattice points that have been changed by the movement of the mesh.
Further, the internal energy (E) calculated and obtained by the energy calculator 6 is stored in the face area extracting net information storage unit 9.
_int ) and the image energy (E _image ) are stored.

【００６７】以下に、上記構成による領域抽出装置の動
作、及び網の定義の仕方などを具体的に説明する。Hereinafter, the operation of the region extracting apparatus having the above configuration, the method of defining a network, and the like will be described in detail.

【００６８】まず、顔領域を表す確率密度関数を算出す
るために、人間の顔の皮膚の色分布を調べる。この作業
の工程を図２のフローチャートに基づいて説明する。統
計をとるためにより多くの人物顔の画像を撮影し、手作
業で顔の皮膚だけの画像を作成する（Ｓ１）。例えば、
図３（ａ）に示す入力人物画像に対して、図３（ｂ）の
Ａに示すように、顔の肌色部分のみを取り出す。顔領域
だけを切り出したデジタル画像から適当な数のピクセル
（画素）をランダムに選ぶ。そのピクセルの色をＨＳＶ
表色系で表した場合のＨ（色相）及びＳ（彩度）の出現
回数をカウントし、それらの度数分布を得る（Ｓ２）。First, in order to calculate a probability density function representing a face region, the color distribution of the skin of a human face is examined. The steps of this operation will be described based on the flowchart of FIG. More images of the human face are taken to obtain statistics, and an image of only the skin of the face is manually created (S1). For example,
From the input person image shown in FIG. 3A, only the skin color portion of the face is extracted as shown in A of FIG. 3B. An appropriate number of pixels (pixels) are randomly selected from a digital image obtained by cutting out only the face region. The color of the pixel is HSV
The number of appearances of H (hue) and S (saturation) in a color system is counted, and their frequency distribution is obtained (S2).

【００６９】色相は、赤・青・黄といった色の違いを区
別する属性で、照明などによる反射や陰影の影響を受け
にくいという特徴がある。また、彩度は色の鮮やかな程
度を表す指標である。人間の顔は比較的彩度が高く、一
方、室内などは比較的彩度が低いもので構成されている
ことが多い。Hue is an attribute for distinguishing between colors such as red, blue, and yellow, and is characterized by being hardly affected by reflection or shading due to illumination or the like. Saturation is an index indicating the vividness of a color. Human faces are often relatively saturated, while indoors and the like are often constructed with relatively low saturation.

【００７０】図４は、２４人の人物画像データにより、
１画像につき１０００ピクセルの点をサンプリングした
ときの顔領域の色相の色度数分布を示す。図の縦軸は顔
領域である確からしさを示す確率密度（０〜１の範囲で
示される）である。また、横軸は色相であり、赤を０°
とし、黄・緑の方向を正として＋１８０°まで、マゼン
ダ・青の方向を負として−１８０°までとっている。図
５は、同様にサンプリングしたときの顔領域の彩度の色
度数分布を示す。図の縦軸は顔領域である確からしさを
示す確率密度（０〜１の範囲で示される）であり、横軸
は無彩色を０とし、単色を１００としたときの彩度であ
る。FIG. 4 shows the image data of 24 persons.
The chromaticity distribution of the hue of the face area when sampling the points of 1000 pixels per image is shown. The vertical axis in the figure is the probability density (shown in the range of 0 to 1) indicating the probability of being a face area. The horizontal axis is hue, and red is 0 °.
The direction of yellow and green is taken to be up to + 180 ° as positive, and the direction of magenta and blue is taken up to -180 ° as being negative. FIG. 5 shows the chromaticity distribution of the saturation of the face area when similarly sampling. The vertical axis of the figure is the probability density (represented in the range of 0 to 1) indicating the probability of being a face area, and the horizontal axis is the saturation when the achromatic color is 0 and the single color is 100.

【００７１】顔領域における色相及び彩度の色度数分布
を得ると、色相と彩度とのバラツキが正規分布に従って
いると仮定して、顔領域を表す２次元の正規確率密度関
数を導出する（Ｓ３）。即ち、μ₁を色相の平均値、μ
₂を彩度の平均値とし、σ_ijを色相と彩度の分布の分散
共分散行列とすると、まず、手作業で切り出してカウン
トした色の分布からこれらの値を計算する。When the chromaticity distribution of hue and saturation in the face area is obtained, a two-dimensional normal probability density function representing the face area is derived assuming that the variation between hue and saturation follows a normal distribution ( S3). That is, μ ₁ is the average value of hue, μ
Assuming that ₂ is an average value of saturation and σ _ij is a variance-covariance matrix of a distribution of hue and saturation, first, these values are calculated from a distribution of colors cut out and counted manually.

【００７２】２次元の正規確率密度関数は、The two-dimensional normal probability density function is

【００７３】[0073]

【数９】で表されるので、求められたμ₁，μ₂，σ_ijを（１）
式に代入することにより顔領域を表す確率密度関数を導
出することができる。この確率密度関数を予め前記確率
密度関数格納部７に記憶させておく。図４及び図５の色
度数分布に基づいて求められた確率密度関数をプロット
したものを図６に示す。図の高さ軸は確率密度、左側軸
は色相、右側軸は彩度を表している。(Equation 9) Therefore, the obtained μ ₁ , μ ₂ , and σ _ij are represented by (1)
By substituting into the expression, a probability density function representing the face region can be derived. This probability density function is stored in the probability density function storage unit 7 in advance. FIG. 6 shows a plot of the probability density function obtained based on the chromaticity distributions of FIG. 4 and FIG. In the figure, the height axis represents the probability density, the left axis represents the hue, and the right axis represents the saturation.

【００７４】領域確率計算装置５が、求められた確率密
度関数の、ｘ₁に色相値、ｘ₂に彩度を代入して計算す
ることにより、入力デジタル画像のすべてのピクセルに
ついて、顔の色の確からしさが０〜１の範囲で求められ
る。そして、そのピクセルの色相と彩度とにより求めた
確からしさの値（０〜１）をピクセルの新しい値とする
ような画像を生成する。このようにして求められた画像
を領域確率画像と称することにする。[0074] a region probability calculation unit 5, a probability density function obtained, hue values x _1, by calculating by substituting saturation to x _2, for all pixels of the input digital image, the face color Is determined in the range of 0 to 1. Then, an image is generated in which the likelihood value (0 to 1) obtained from the hue and saturation of the pixel is set as a new value of the pixel. The image obtained in this manner is referred to as a region probability image.

【００７５】領域確率画像の生成の過程を図７のフロー
チャートに基づいて説明する。画像が入力されると（Ｓ
１１）、入力画像の左上から順に１つずつピクセルを取
り出す（Ｓ１２）。取り出したピクセルの色相と彩度と
を求め（Ｓ１３）、先に求めた確率密度関数の引数とし
てその色相と彩度を用いて、領域の確からしさを計算す
る。その確からしさの値をピクセルの新しい値とする
（Ｓ１４）。そして、すべてのピクセルの新しい値が得
られたかどうかを判断し（Ｓ１５）、すべてのピクセル
についてまだ計算されていない場合には上記Ｓ１２〜Ｓ
１４の工程を繰り返し、すべてのピクセルに対して計算
が行われたら処理を終了する。これにより、領域確率画
像が生成される。図９は、図８に示される入力画像から
生成された顔領域確率画像を示す。これは、明るいピク
セルほど顔領域である可能性が高いことを表している。The process of generating a region probability image will be described with reference to the flowchart of FIG. When an image is input (S
11) Pixels are extracted one by one sequentially from the upper left of the input image (S12). The hue and saturation of the extracted pixel are obtained (S13), and the certainty of the area is calculated using the hue and saturation as arguments of the probability density function obtained earlier. The value of the certainty is set as a new value of the pixel (S14). Then, it is determined whether new values have been obtained for all the pixels (S15). If the calculation has not been performed for all the pixels yet, the above S12 to S12 are executed.
Step 14 is repeated, and when the calculation has been performed for all the pixels, the process ends. Thereby, a region probability image is generated. FIG. 9 shows a face area probability image generated from the input image shown in FIG. This indicates that a brighter pixel is more likely to be a face area.

【００７６】次に網のモデルについて説明する。網は、
図１１に示すように、デジタル画像上のピクセルに相当
する黒点を結ぶことにより形成される。この網を形作る
黒点を格子点と呼ぶ。この格子点の位置関係によって網
のエネルギーが定義される。網のエネルギーは、網自身
の形状から生じる内部エネルギーと、網とそれが覆う画
像とで決まる画像エネルギーとの２つのエネルギーを持
っている。この２つのエネルギーの合計値が小さくなる
ように格子点を移動させて網を変形させることにより目
的の領域を抽出する。Next, a network model will be described. The net is
As shown in FIG. 11, it is formed by connecting black dots corresponding to pixels on a digital image. The black dots that form this net are called grid points. The energy of the net is defined by the positional relationship between the grid points. The energy of the net has two energies: internal energy generated from the shape of the net itself, and image energy determined by the net and the image it covers. The target area is extracted by moving the grid points and deforming the net so that the total value of these two energies becomes smaller.

【００７７】ある格子点は隣り合う別の格子点と接続関
係を持っており、この点どうしの位置関係によって網の
内部エネルギーが定義される。網の内部エネルギーは、
格子点の距離が小さければ小さいほどエネルギーも小さ
く、また、格子点どうしを結ぶ網を構成するラインが滑
らかであればあるほどエネルギーが小さくなるように、
格子点どうしを結んで構成される格子の向かい合う辺が
平行になるように定義する。また、格子点でのピクセル
の明るさにより画像エネルギーを定義する。つまり、格
子点上にあるピクセルの状態をエネルギーに反映するこ
とにより領域の抽出を行うのである。A certain grid point has a connection relation with another adjacent grid point, and the internal energy of the network is defined by the positional relation between these points. The internal energy of the net is
The smaller the distance between the grid points, the smaller the energy, and the smoother the lines that make up the net connecting the grid points, the smaller the energy,
Opposite sides of a grid formed by connecting grid points are defined to be parallel. The image energy is defined by the brightness of the pixel at the grid point. That is, the region is extracted by reflecting the state of the pixel on the grid point in the energy.

【００７８】領域の抽出は、上記のような網のモデルを
前記領域確率計算装置５によって生成された領域確率画
像に対して適用することにより行われる。図１２（ａ）
〜（ｄ）は、網のモデルを用いてある領域を抽出するよ
うすを示している。図１２（ａ）に示す入力画像から領
域確率画像を算出し、同図（ｂ）に示すように、画像全
体を覆うような網をかぶせる。すると、網は目的の領域
に向かって収縮し（同図（ｃ）参照）、最終的にその領
域を囲んで停止する（同図（ｄ）参照）。顔領域を抽出
する場合には、図１３（ａ）〜（ｄ）に示すようにな
る。即ち、顔領域確率画像に対して、画像全体を覆うよ
うな網をまずかぶせる。そして、この網のエネルギーが
小さくなるように格子点を移動させていき、エネルギー
の変化がなくなったところで、格子点の移動を停止させ
る。すると、網は顔領域を囲むような形に変形してい
る。The region extraction is performed by applying the above-described network model to the region probability image generated by the region probability calculation device 5. FIG. 12 (a)
(D) shows how to extract a certain area using a network model. An area probability image is calculated from the input image shown in FIG. 12A, and a net covering the entire image is covered as shown in FIG. 12B. Then, the net shrinks toward the target area (see FIG. 3C), and finally stops surrounding the area (see FIG. 4D). When a face region is extracted, the result is as shown in FIGS. That is, the face area probability image is first covered with a net covering the entire image. Then, the grid points are moved so that the energy of the net becomes small, and the movement of the grid points is stopped when there is no change in the energy. Then, the net is deformed to surround the face area.

【００７９】（１）網の形状上記網の形状は図１４に示すように定義される。ある点
を中心に半径を次第に大きくしていき複数の同心円を作
る。そして、その中心点から放射状に線分を伸ばし、最
も外側の同心円まで伸ばす。同心円の円周と、その放射
状に伸びた線分との交点を格子点とする。従って、格子
点の数は、同心円の数をＴ、放射状線分の数をＳとする
と、Ｔ×Ｓ＋１個になる。ここで、同心円をレイヤー、
放射状線分をスポークと呼ぶことにする。また、一番外
側のレイヤーをレイヤー０とし、内側に向かってレイヤ
ー１、レイヤー２、…とする。一番外側のレイヤー０と
各スポークとの交点を最外郭格子点と呼び、それ以外の
格子点を内部格子点と呼ぶ。なお、網の形状は円形とし
ているが、図１５に示すように、長方形に変形しても格
子点どうしの接続関係は変わらず、同じように扱うこと
ができる。(1) Shape of Net The shape of the net is defined as shown in FIG. The radius is gradually increased around a certain point to create a plurality of concentric circles. Then, a line segment is extended radially from the center point to the outermost concentric circle. The intersection of the circumference of the concentric circle and the radially extending line segment is defined as a lattice point. Therefore, the number of lattice points is T × S + 1, where T is the number of concentric circles and S is the number of radial segments. Here, layer the concentric circles,
The radial segments will be called spokes. The outermost layer is referred to as layer 0, and the layers are referred to as layer 1, layer 2,. The intersection between the outermost layer 0 and each spoke is called an outermost grid point, and the other grid points are called inner grid points. Note that, although the shape of the net is circular, as shown in FIG. 15, even if the net is changed to a rectangle, the connection relation between the grid points does not change and the grid can be handled in the same manner.

【００８０】（２）網の内部エネルギーあるレイヤーｔとスポークｓとにより形成される格子点
をｐ(t,s) と表すことにする。あるレイヤーｔより１つ
内側のレイヤーをレイヤー（ｔ＋１）、１つ外側のレイ
ヤーをレイヤー（ｔ−１）とする。もちろん、最外郭格
子点においては、レイヤー（ｔ−１）は存在しない。同
様に、あるスポークｓより時計回り方向の次のスポーク
をスポーク（ｓ＋１）、反時計回り方向の次のスポーク
をスポーク（ｓ−１）とする。(2) Internal Energy of Network A grid point formed by a certain layer t and a spoke s is represented by p (t, s). A layer inside one layer t is defined as a layer (t + 1), and a layer outside one layer is defined as a layer (t-1). Of course, the layer (t-1) does not exist at the outermost lattice point. Similarly, the next spoke in the clockwise direction from a certain spoke s is referred to as a spoke (s + 1), and the next spoke in the counterclockwise direction is referred to as a spoke (s-1).

【００８１】格子点ｐ(t,s) の内部エネルギーＥ
_int(t,s) は、次のように表される。なお、本文中及び
図１６のｐはベクトルを表す。The internal energy E of the lattice point p (t, s)
_int (t, s) is expressed as follows. Note that p in the text and FIG. 16 represents a vector.

【００８２】[0082]

【数１０】 (Equation 10)

【００８３】[0083]

【数１１】 [Equation 11]

【００８４】[0084]

【数１２】 (Equation 12)

【００８５】[0085]

【数１３】 (Equation 13)

【００８６】[0086]

【数１４】 [Equation 14]

【００８７】[0087]

【数１５】上記Ｅ_tは、図１６（ａ）に示すように、同一スポーク
ｓ上にある隣り合う格子点ｐ(t,s) とｐ(t＋1,s)との距
離を表す。距離が小さいほどＥ_tは小さくなり、ｐ(t,
s) とｐ(t＋1,s)とが一致するとＥ_t＝０となる。(Equation 15) The above _Et , as shown in FIG. 16 (a), represents the distance between adjacent grid points p (t, s) and p (t + 1, s) on the same spoke s. As the distance is small E _t becomes smaller, p (t,
If s) and p (t + 1, s) match, then E _t = 0.

【００８８】Ｅ_sは、同一レイヤーｔ上にある隣り合う
格子点ｐ(t,s) とｐ(t,s＋1)との距離を表す。距離が小
さいほどＥ_sは小さくなり、ｐ(t,s) とｐ(t,s＋1)とが
一致するとＥ_s＝０となる。E _s represents the distance between adjacent grid points p (t, s) and p (t, s + 1) on the same layer t. As the distance is smaller E _s becomes small, and p (t, s) and p (t, s + 1) when the match E _s = 0.

【００８９】Ｅ_ttは、図１６（ｂ）に示すように、ｐ
(t,s) と、同一スポークｓ上でｐ(t,s) の前後の格子点
ｐ(t−1,s)及びｐ(t＋1,s)とが滑らかに接続されている
かどうかを表す。３つの格子点が直線となる場合、Ｅ_tt
＝０となる。E _tt is, as shown in FIG.
(t, s) and whether or not grid points p (t−1, s) and p (t + 1, s) before and after p (t, s) on the same spoke s are connected smoothly. If three grid points are straight lines, E _tt
= 0.

【００９０】Ｅ_ssは、ｐ(t,s) と、同一レイヤーｔ上で
ｐ(t,s) の左右の格子点ｐ(t,s−1)及びｐ(t,s＋1)とが
滑らかに接続されているかどうかを表す。３つの格子点
が直線となる場合、Ｅ_ss＝０となる。E _ss is such that p (t, s) and p (t, s−1) and p (t, s + 1) on the left and right of p (t, s) on the same layer t are smooth. Indicates whether it is connected. If the three grid points are straight lines, E _ss = 0.

【００９１】Ｅ_tsは、ｐ(t,s−1)−ｐ(t,s) と、ｐ(t−
1, s−1)−ｐ(t−1,s)とが同じベクトルである、即ち互
いに平行で大きさが同じであるときにＥ_ts＝０となる。 _Ets is p (t, s−1) −p (t, s) and p (t−
1, s-1) -p and (t-1, s) are the same vector, the E _ts = 0 when ie parallel magnitude to each other are the same.

【００９２】ある格子点ｐ(t,s) に注目し、その周囲の
格子点の位置から、上で求められるＥ_int(t,s) を計算
する。これがその格子点の持つ内部エネルギーである。
これをすべての格子点について計算しその和を求める
と、網の内部エネルギーＥ_intが算出される。なお、
（２）式におけるＥ_int(t,s) を構成する各項につい
て、画像の大きさと格子点の数との比や色の確からしさ
の範囲によっては、適当に重みを決めてやってもよい。Attention is paid to a certain grid point p (t, s), and E _int (t, s) obtained above is calculated from the positions of the surrounding grid points. This is the internal energy of the lattice point.
When this is calculated for all grid points and the sum is obtained, the internal energy E _{int of the} net is calculated. In addition,
For each term constituting E _int (t, s) in equation (2), the weight may be appropriately determined depending on the ratio between the size of the image and the number of grid points and the range of certainty in color. .

【００９３】（３）網の画像エネルギーある格子点ｐ(t,s) の画像エネルギーＥ_image(t,s)
は、その格子点上のピクセルの状態として、以下のよう
に定義される。(3) Network image energy Image energy E _image (t, s) of a certain grid point p (t, s)
Is defined as the state of a pixel on the grid point as follows.

【００９４】[0094]

【数１６】ここで、FRPI(t,s) は、領域確率画像における格子点ｐ
(t,s) 上のピクセルの明るさ（領域の確からしさ）を示
している。即ち、内部格子点（ｔ＞１）においては領域
確率画像における格子点ｐ(t,s) 上のピクセルの明るさ
にマイナス符号を付け、最外郭格子点 (ｔ＝０）におい
ては符号を付けずにそのまま格子点ｐ(t,s) の画像エネ
ルギーとする。(Equation 16) Here, FRPI (t, s) is the lattice point p in the region probability image.
(t, s) indicates the brightness of the pixel on (the certainty of the area). That is, at the internal grid point (t> 1), the brightness of the pixel on the grid point p (t, s) in the region probability image is given a minus sign, and at the outermost grid point (t = 0), the sign is given. The image energy of the grid point p (t, s) is used as it is.

【００９５】ある格子点ｐ(t,s) に注目し、上で求めら
れるＥ_image(t,s) を計算する。これがその格子点の持
つ画像エネルギーである。これをすべての格子点につい
て計算しその和を求めると、網の画像エネルギーＥ
_imageが算出される。なお、（３）式におけるＥ_image
(t,s) を構成する各項については、（２）式の場合と同
様に、適当に重みを決めてやってもよい。Attention is paid to a certain grid point p (t, s), and E _image (t, s) obtained above is calculated. This is the image energy of the grid point. When this is calculated for all grid points and the sum is obtained, the image energy E
_image is calculated. Note that E _{image in} equation (3)
The weights of the terms constituting (t, s) may be appropriately determined as in the case of the equation (2).

【００９６】任意の格子点ｐ(t,s) のエネルギーＥ_net
(t,s) は、格子点の持つ内部エネルギーＥ_int(t,s) と
画像エネルギーＥ_image(t,s) との和で表される。The energy E _{net of an} arbitrary lattice point p (t, s)
(t, s) is represented by the sum of the internal energy E _int (t, s) of the grid point and the image energy E _image (t, s).

【００９７】[0097]

【数１７】網全体のエネルギーＥ_netは、（４）式をすべての格子
点について計算し、合計したものであり、次のように表
される。[Equation 17] The energy E _net of the entire network is obtained by calculating Expression (4) for all grid points and summing them, and is expressed as follows.

【００９８】[0098]

【数１８】このようにして、定義した網全体のエネルギーＥ_netを
計算し、このエネルギーの大きさが小さくなるように格
子点を移動させる。エネルギーの変化がなくなった時点
で格子点の動きを止めると、網は顔領域を囲んで収束し
ている（図１３参照）。(Equation 18) In this way, the energy E of the entire network defined is_netTo
Calculate the case so that the magnitude of this energy is small.
Move the child point. When there is no change in energy
When the movement of the grid points is stopped with, the mesh converges around the face area
(See FIG. 13).

【００９９】（４）格子点の移動次に、格子点をどのように移動させていくかを図１７の
フローチャートに基づいて説明する。(4) Moving Grid Points Next, how to move grid points will be described with reference to the flowchart of FIG.

【０１００】まず、上述のように、領域確率計算装置５
により顔領域確率画像を生成する（Ｓ２１）。次に、初
期状態の網を設定する。この設定は、図１３に示すよう
に、入力画像全体を覆うように設定する（Ｓ２２）。最
初に最外郭格子点の中から１つの格子点を選び、その格
子点のエネルギーＥ_net(t,s) を計算する（Ｓ２３）。
そして、図１８に示すように、この格子点が位置するピ
クセルの周りに配置されている８つのピクセル（以下、
８近傍と称する）のいずれかに格子点を移動させる。こ
のとき、格子点を８近傍のそれぞれに移動したときのエ
ネルギーを計算し（Ｓ２４）、その中で最もエネルギー
が小さくなるピクセルに格子点を移動させる（Ｓ２
５）。最初の格子点の位置でのエネルギーが最も小さい
場合は、格子点を移動させる必要はない。First, as described above, the region probability calculation device 5
To generate a face area probability image (S21). Next, an initial network is set. This setting is performed so as to cover the entire input image as shown in FIG. 13 (S22). First, one grid point is selected from the outermost grid points, and the energy E _net (t, s) of the grid point is calculated (S23).
Then, as shown in FIG. 18, eight pixels (hereinafter, referred to as pixels) arranged around the pixel where the grid point is located
(Referred to as 8 neighborhoods). At this time, the energy when the grid point is moved to each of the vicinity of 8 is calculated (S24), and the grid point is moved to the pixel having the lowest energy among them (S2).
5). When the energy at the position of the first grid point is the smallest, there is no need to move the grid point.

【０１０１】最外郭格子点から内部格子点に順に格子点
を１つずつ、すべての格子点について８近傍のどこに移
動させればエネルギーが最も小さくなるかを調べて、そ
こに格子点を移動させる。すべての格子点についてピク
セル１つ分移動させる移動処理（以下、１単位変形と称
する）が終了したかどうかを調べ（Ｓ２６）、処理が終
了していない場合にはＳ２３の工程に戻り、終了した場
合にはＳ２７の工程へ進む。The grid points are moved one by one from the outermost grid point to the inner grid point, and it is checked where the energy is minimized for all the grid points in the vicinity of 8, and the grid points are moved there. . It is checked whether or not the movement processing for moving all pixels by one pixel (hereinafter, referred to as one-unit deformation) has been completed (S26). If the processing has not been completed, the process returns to the step S23 and has been completed. In this case, the process proceeds to step S27.

【０１０２】Ｓ２７では、各格子点でのエネルギーを合
計した網全体のエネルギーＥ_netが、格子点を移動する
前のエネルギーＥ_netと比較してその大きさに変化が生
じているかどうかを調べる。Ｓ２７で網全体のエネルギ
ーＥ_netに変化がある場合にはＳ２３の工程に戻って再
び上Ｓ２３〜Ｓ２６の処理を行い、Ｓ２７で網全体のエ
ネルギーＥ_netに変化がない場合には格子点の移動を終
了する。格子点の移動が終了したときには、網は顔領域
全体を囲んでいる（図１３参照）。In S27, it is checked whether or not the energy E _{net of the} entire network obtained by summing the energies at the respective grid points is changed in comparison with the energy E _net before moving the grid points. If there is a change in the energy E _net of the entire network in S27, the process returns to the step S23 and the processes of S23 to S26 are performed again. If there is no change in the energy E _net of the whole network in S27, the movement of the grid point is performed. To end. When the movement of the grid points is completed, the net surrounds the entire face area (see FIG. 13).

【０１０３】以上により、本実施の形態における領域抽
出装置は、格子点の位置関係に基づく網の内部エネルギ
ーと、格子点が位置するピクセルの色の確からしさに基
づく画像エネルギーとの合計値が変化しなくなるまで網
を収縮変形させ、合計値が変化しなくなった時点で格子
点の移動を停止する構成である。As described above, the region extracting apparatus according to the present embodiment changes the total value of the internal energy of the net based on the positional relationship between the grid points and the image energy based on the likelihood of the color of the pixel where the grid point is located. In this configuration, the mesh is shrunk and deformed until it stops, and the movement of the grid points is stopped when the total value no longer changes.

【０１０４】従って、入力画像全体を覆っていた網は、
色の確からしさが大きい顔領域に向かって収縮し、最終
的に顔領域を囲んで停止する。この結果、顔領域を抽出
することができる。Therefore, the net covering the entire input image is
It contracts toward the face area where the likelihood of color is large, and finally stops around the face area. As a result, a face region can be extracted.

【０１０５】このとき、顔領域の色の確からしさは照明
条件に左右されない色相に基づいて求められるので、本
領域抽出装置では常に同じ照明条件で撮影する必要がな
い。即ち、本領域抽出装置は照明条件の変化に強いの
で、撮影する場所、時間、補助照明の有無、方向などを
気にしなくてもよい。なお、従来では採光窓のない室内
の同じ位置で、同じ照明器具を用い、同じ方向から人物
を撮影することによって常に同じ照明条件を設定しなけ
ればならなかった。At this time, since the likelihood of the color of the face area is determined based on the hue which is not influenced by the lighting conditions, the present area extracting apparatus does not always need to shoot under the same lighting conditions. That is, since the present region extraction apparatus is resistant to changes in lighting conditions, it is not necessary to worry about the location, time, presence / absence of auxiliary lighting, direction, etc. Conventionally, the same lighting conditions had to be always set by photographing a person from the same direction using the same lighting equipment at the same position in a room without a lighting window.

【０１０６】また、本領域抽出装置は、網はどのような
形状にもなり得るので、入力画像の大きさがどんなもの
でも対応することができる。一般に領域抽出の処理を行
うためには、画像をデジタルで扱う必要がある。１次的
に入力される画像は、デジタルイメージであっても通常
のフィルム写真であってもどのような形態でもかまわな
いのだが、抽出処理を行う段階ではデジタルイメージで
ある必要がある。従って、入力された画像の形態がいか
なるものであれ、領域抽出処理をする前段階として、デ
ジタルイメージへの変換が必要である。Further, since the area extracting apparatus of the present invention can take any shape of the net, it can handle any size of input image. Generally, in order to perform a region extraction process, it is necessary to handle an image digitally. The primary input image may be a digital image or a normal film photograph, but may be in any form. However, the image must be a digital image at the stage of performing the extraction process. Therefore, regardless of the form of the input image, conversion to a digital image is necessary as a pre-stage for performing the region extraction processing.

【０１０７】このとき、画像の大きさによらないという
ことは、例えば１枚のデジタル画像が１００×１００ド
ットでも５１２×５１２ドットでも全く関係ないという
ことである。従来のテンプレートマッチングのような手
法を用いると、抽出しようとしている部品の大きさに合
わせてテンプレートを設定しなければならないので、画
像の大きさにおのずと制限がついてしまう。また、Defo
rmable template による手法でも設定されたテンプレー
トに対応する部品のみしか抽出できない。At this time, the fact that the image size does not depend on the size of the image means that, for example, whether a single digital image is 100 × 100 dots or 512 × 512 dots has nothing to do with it. If a technique such as the conventional template matching is used, the template must be set according to the size of the component to be extracted, so that the size of the image is naturally limited. Also, Defo
Even with the rmable template method, only parts corresponding to the set template can be extracted.

【０１０８】一方、本領域抽出装置は画像の大きさによ
らないので、どのような大きさの画像に対してでも対応
することができる。従って、どのような解像度の撮像装
置を用いてもよく、また、アナログ画像をサンプリング
するときにどのようなレートでもよいので、撮影の自由
度が増える。On the other hand, since the present region extracting apparatus does not depend on the size of the image, it can deal with an image of any size. Accordingly, an imaging device having any resolution may be used, and any rate may be used when sampling an analog image, so that the degree of freedom of photographing is increased.

【０１０９】同様に、本領域抽出装置は画像中の顔位置
や大きさによらないので、顔が映っていれば、人物が画
面いっぱいに映っていたとしても、あるいは、画面の隅
に小さく映っていても顔を抽出することができる。この
結果、人物を好きな位置から好きな構図で撮影すること
ができるので、撮影の自由度が向上する。なお、従来の
テンプレートマッチングの手法では、前述したように、
抽出しようとしている領域に合わせてテンプレートを設
定するため、入力画像の大きさだけでなく、画像中の顔
の大きさにも制限がついてしまう。また、投影を用いる
方法においても、画像中の顔の大きさがある程度わかっ
ていないと、部品の抽出の精度は極端に落ちてしまう。Similarly, since the present region extraction apparatus does not depend on the face position and size in the image, if the face is projected, even if a person is projected on the entire screen, or a small image is projected on a corner of the screen. You can extract the face even if you are. As a result, a person can be photographed from a desired position with a desired composition, so that the degree of freedom in photographing is improved. In the conventional template matching method, as described above,
Since the template is set according to the region to be extracted, not only the size of the input image but also the size of the face in the image is limited. Also, in the method using projection, if the size of the face in the image is not known to some extent, the accuracy of component extraction is extremely reduced.

【０１１０】また、顔領域が抽出されることにより、こ
れを情報圧縮に利用することが可能となる。例えば、顔
領域以外の領域の解像度を下げたり、また、色数を落と
すなどして情報を圧縮し、少ない記憶容量で多くの人物
画像を保存することができる。これは、人物データベー
スなどを作成する際に大きなメリットとなる。また、単
位時間当たりの電送容量が少ない回線で人物中心の画像
を転送する際、人物以外の情報を荒くすることによって
情報量を減らし、狭い回線でもコマ数の多い滑らかな画
像を送ったり、その他の多くの情報を送ることが可能と
なる。これは、テレビ会議、テレビ電話、電子秘書、及
びナビゲーションシステムなどで人物による説明などが
入るようなアプリケーションに適用可能である。Further, since the face area is extracted, it can be used for information compression. For example, it is possible to compress information by reducing the resolution of an area other than the face area or reducing the number of colors, so that many human images can be stored with a small storage capacity. This is a great merit when creating a person database or the like. Also, when transferring images centered on a person over a line with a small transmission capacity per unit time, the amount of information is reduced by roughening information other than people, and a smooth image with many frames is sent over a narrow line. Will be able to send a lot of information. This is applicable to applications such as videoconferencing, videophone, electronic secretary, and navigation system that include explanations by persons.

【０１１１】また、本領域抽出装置は、家電製品などの
さまざまな機器の制御などに利用することができる。図
１９は、本領域抽出装置１１を機器１３の制御に適用し
た場合の基本的なブロック図である。機器１３に取り付
けられた撮像装置１２が利用者の姿を捉え、そのデジタ
ル画像を領域抽出装置１１に送る。領域抽出装置１１
は、画像から利用者の顔領域を抽出し、その情報を制御
装置１４に送る。制御装置１４は機器１３に接続されて
おり、領域抽出装置１１から送られてきた情報をもとに
機器１３を制御する。この一連の作業を短時間で繰り返
すことにより、実時間で機器１３の制御を行うことので
きるシステムを構築することができる。The present region extracting apparatus can be used for controlling various devices such as home appliances. FIG. 19 is a basic block diagram in the case where the present region extraction device 11 is applied to control of the device 13. The imaging device 12 attached to the device 13 captures the user's figure and sends the digital image to the region extraction device 11. Region extraction device 11
Extracts the face area of the user from the image and sends the information to the control device 14. The control device 14 is connected to the device 13 and controls the device 13 based on information sent from the region extraction device 11. By repeating this series of operations in a short time, a system capable of controlling the device 13 in real time can be constructed.

【０１１２】このようにして、機器１３に備えられた領
域抽出装置１１が顔領域を抽出することにより、機器１
３と利用者とがどのような位置関係にあるのかを判断す
ることができる。そして、機器１３の前に利用者が存在
するか否か、また存在するのならばどの辺りにいるのか
の情報を得ることができる。As described above, the region extracting device 11 provided in the device 13 extracts the face region, and thereby the device 1
3 and the user can be determined in a positional relationship. Then, it is possible to obtain information as to whether or not the user exists in front of the device 13 and, if so, where the user is.

【０１１３】これにより、家電製品などを人物の向いて
いる方向に向かって制御することが可能となる。例え
ば、エアコンに領域抽出装置を取り付けることにより、
部屋のどの位置に人間がいて、どの位置にいないかを検
出することができるため、人間に向かって風を送った
り、逆に人間を避けて風を送ったりといった制御が可能
となる。また、人間がいないことを検知して、送風をス
トップさせることも可能である。As a result, it becomes possible to control home electric appliances and the like in the direction in which the person is facing. For example, by attaching an area extraction device to an air conditioner,
Since it is possible to detect at which position in the room a person is and where it is not, it is possible to control such as sending a wind toward a person or sending a wind while avoiding a person. Further, it is also possible to detect the absence of a person and stop blowing.

【０１１４】さらに、テレビやオーディオ製品などに適
用した場合、画面の方向や音の方向を利用者の方向に向
けて制御したり、音量やステレオの場合の左右の音量の
バランスを調整したり、人がいないときにスイッチを切
ると共に人が現れたらスイッチを入れたりするというこ
とも可能となる。Further, when the present invention is applied to a television or an audio product, the direction of the screen and the direction of the sound can be controlled in the direction of the user, and the volume and the balance between the left and right volumes in the case of stereo can be adjusted. It is also possible to switch off when no one is present and switch on when a person appears.

【０１１５】このように、利用者が機器の近くに来るこ
とによって機器のスイッチをＯＮ／ＯＦＦしたり、利用
者の位置によって機器を制御することは、ほとんどの電
気製品に適用することが可能である。これにより、スイ
ッチの入切の手間や、スイッチの切り忘れ、機器の調整
の手間を低減することができる。As described above, turning on / off the device when the user comes close to the device and controlling the device according to the position of the user can be applied to most electric appliances. is there. This can reduce the trouble of turning on and off the switch, forgetting to turn off the switch, and trouble of adjusting the device.

【０１１６】なお、電気製品などの人のいる／いないに
あわせて制御するようなアプリケーションにおいては、
従来では人間がスイッチをいちいちオン／オフしたり、
赤外線画像を用いて人のいる／いないを判断したりして
いたが、前者の場合は操作の煩わしさがあり、後者の場
合は認識の精度をあげようとすると、高価な赤外線撮像
装置などを用いなければならず高コストになってしまう
問題があった。本領域抽出装置を用いれば、低コストで
人のいる／いないに関する電気製品の制御が可能とな
る。In an application such as an electric appliance that controls according to the presence or absence of a person,
Conventionally, humans turn on / off the switch one by one,
Infrared images were used to determine the presence / absence of a person. In the former case, however, the operation was cumbersome. In the latter case, an expensive infrared imaging device was used in order to increase the recognition accuracy. There is a problem that it has to be used and the cost is high. The use of the present region extraction device enables low-cost control of electrical products related to the presence / absence of people.

【０１１７】〔実施の形態２〕本発明の他の実施の形態
について図３、図８ないし図１０、図２０ないし図３１
に基づいて説明すれば、以下の通りである。なお説明の
便宜上、前記の実施の形態の図面に示した部材と同一の
部材には同一の符号を付記し、その説明を省略する。[Second Embodiment] Another embodiment of the present invention will be described with reference to FIGS. 3, 8 to 10, 20 to 31.
This will be described below. For convenience of explanation, the same members as those shown in the drawings of the above-described embodiment are denoted by the same reference numerals, and description thereof will be omitted.

【０１１８】本実施の形態にかかる領域抽出装置は、入
力された人物画像により顔領域、及び口唇領域を抽出す
るものである。ここでは、顔領域を抽出した後、その抽
出された顔領域を初期状態とした口唇領域抽出用網を設
定し、それを移動させて口唇領域を抽出する。The region extracting apparatus according to the present embodiment extracts a face region and a lip region from an input human image. Here, after extracting the face area, a lip area extraction network is set with the extracted face area as an initial state, and moved to extract the lip area.

【０１１９】上記領域抽出装置は、図２０に示すよう
に、実施の形態１の構成に加えて、演算装置２に重心計
算装置（重心計算手段）２１を、記憶装置３に口唇領域
抽出用網情報格納部（網情報格納手段）２２を有してい
る。As shown in FIG. 20, the above-described region extracting device has a configuration in which the arithmetic unit 2 has a centroid calculating device (centroid calculating means) 21 and the storage device 3 has a lip region extracting network in addition to the configuration of the first embodiment. An information storage unit (network information storage means) 22 is provided.

【０１２０】重心計算装置２１は、最外郭格子点の座標
を平均することにより求められる最外郭フレームの重心
（ｇ）と、網内部のピクセルの座標を口唇領域の画像の
明るさを重みとして平均することにより求められる網が
囲む領域の重心（Ｇ）とを算出する装置である。The center-of-gravity calculator 21 calculates the center of gravity (g) of the outermost frame obtained by averaging the coordinates of the outermost lattice points and the coordinates of the pixels inside the net using the brightness of the image in the lip region as a weight. This is a device for calculating the center of gravity (G) of the area surrounded by the net, which is obtained by the calculation.

【０１２１】顔領域抽出用網情報格納部９には重心計算
装置２１により求められた顔領域抽出用網の２つの重心
と、実施の形態１と同様に顔領域抽出用網の内部エネル
ギー（Ｅ_int）及び画像エネルギー（Ｅ_image）とが格
納される。また、口唇領域抽出用網情報格納部２２に
は、初期状態として顔領域に収縮した網の座標が格納さ
れる。さらに、口唇領域抽出用網の２つの重心と、口唇
領域抽出用網の内部エネルギー（Ｅ_int）及び画像エネ
ルギー（Ｅ_image）と、後述の形状エネルギー
（Ｅ_form）と、上記２つの重心から求められる移動エネ
ルギー（Ｅ_move）とが格納される。また、確率密度関数
格納部７には、顔領域を表す確率密度関数に加えて、口
唇領域を表す確率密度関数が予め格納されている。The face area extraction net information storage unit 9 stores the two centroids of the face area extraction net obtained by the centroid calculator 21 and the internal energy (E) of the face area extraction net as in the first embodiment. _int ) and the image energy (E _image ) are stored. The lips area extraction net information storage unit 22 stores the coordinates of the net that has shrunk to the face area as an initial state. Further, the two centroids of the lip region extraction net, the internal energy (E _int ) and the image energy (E _image ) of the lip region extraction net, the shape energy (E _form ) described later, and the two centroids are obtained. And the moving energy (E _move ) to be stored. Further, the probability density function representing the lip region is stored in the probability density function storage unit 7 in addition to the probability density function representing the face region.

【０１２２】上記構成によれば、領域確率計算装置５が
領域確率画像格納部８を参照して、顔領域及び口唇領域
を構成する各ピクセルの色に基づいて確率密度関数を計
算することにより各ピクセルの色の確からしさを算出
し、顔領域確率画像及び口唇領域確率画像を生成する。
次に、実施の形態１と同様に、エネルギー計算装置６が
顔領域抽出用網の内部エネルギー（Ｅ_int）と画像エネ
ルギー（Ｅ_image）を計算し、それを顔領域抽出用網情
報格納部９に格納する。これらのエネルギーの合計が小
さくなるように網を収縮変形・移動させ、まず顔領域を
抽出する。この結果が出力装置４に出力される。According to the above configuration, the region probability calculation device 5 refers to the region probability image storage unit 8 and calculates a probability density function based on the color of each pixel constituting the face region and the lip region, thereby obtaining each function. The likelihood of the color of the pixel is calculated, and a face region probability image and a lip region probability image are generated.
Next, as in the first embodiment, the energy calculation device 6 calculates the internal energy (E _int ) and the image energy (E _image ) of the face area extraction network, and stores them in the face area extraction network information storage unit 9. To be stored. The mesh is shrunk, deformed and moved so that the sum of these energies is reduced, and a face region is first extracted. This result is output to the output device 4.

【０１２３】重心計算装置２１が、顔領域抽出用網情報
格納部９に格納されている網の座標を参照して最外郭フ
レームの重心（ｇ）を、上記網の座標と領域確率画像格
納部８に格納された顔領域確率画像とを参照して顔領域
抽出用網の囲む領域の重心（Ｇ）を計算し、それぞれ顔
領域抽出用網情報格納部９に格納する。同様にして、重
心計算装置２１が、口唇領域抽出用網情報格納部２２に
格納されている口唇領域抽出用網の座標を参照して最外
郭フレームの重心（ｇ）を、上記網の座標と領域確率画
像格納部８に格納された口唇領域確率画像とを参照して
口唇領域抽出用網の囲む領域の重心（Ｇ）を計算し、そ
れぞれ口唇領域抽出用網情報格納部２２に格納する。The center-of-gravity calculating device 21 refers to the coordinates of the mesh stored in the face area extracting mesh information storage 9 to determine the center of gravity (g) of the outermost frame and the coordinates of the mesh and the area probability image storage. The center of gravity (G) of the area surrounding the face area extraction network is calculated with reference to the face area probability image stored in 8 and stored in the face area extraction network information storage unit 9. Similarly, the center-of-gravity calculation device 21 refers to the coordinates of the lip region extraction network stored in the lip region extraction network information storage unit 22, and calculates the center of gravity (g) of the outermost frame as the coordinates of the network. Referring to the lip area probability image stored in the area probability image storage unit 8, the center of gravity (G) of the area surrounding the lip area extraction network is calculated, and stored in the lip area extraction network information storage unit 22.

【０１２４】また、エネルギー計算装置６が口唇領域抽
出用網の内部エネルギー（Ｅ_int）、画像エネルギー
（Ｅ_image）、形状エネルギー（Ｅ_form）、及び移動エ
ネルギー（Ｅ_move）を計算し、それらを口唇領域抽出用
網情報格納部２２に格納する。これらの口唇領域抽出用
網のエネルギーの合計値が小さくなるように口唇領域抽
出用網を変形収縮・移動し、口唇領域を抽出する。その
結果が出力装置４に出力される。The energy calculator 6 calculates the internal energy (E _int ), the image energy (E _image ), the shape energy (E _form ), and the moving energy (E _move ) of the lip region extraction net, and calculates them. The information is stored in the lip area extraction net information storage unit 22. The lip region extraction net is deformed and contracted / moved so that the total value of the energy of these lip region extraction nets becomes small, and the lip region is extracted. The result is output to the output device 4.

【０１２５】以下に、上記構成による領域抽出装置の動
作、及び網の定義の仕方などを具体的に説明する。Hereinafter, the operation of the region extracting apparatus having the above configuration, the method of defining a network, and the like will be described in detail.

【０１２６】まず、口唇領域を表す確率密度関数の算出
するために、実施の形態１と同様にして、例えば、図３
（ａ）に示す入力人物画像に対して図３（ｂ）のＢに示
すように口唇のみを取り出して、口唇の色分布を調べ
る。図２１は、２４人の人物画像データにより、１画像
につき１００ピクセルの点をサンプリングしたときの口
唇領域の色相の色度数分布を示す。図２２は、同様にサ
ンプリングしたときの口唇領域の彩度の色度数分布を示
す。なお、各図の縦軸及び横軸は、実施の形態１の図４
及び図５と同じである。First, in order to calculate a probability density function representing a lip region, for example, as shown in FIG.
As shown in FIG. 3B, only the lips are extracted from the input human image shown in FIG. 3A, and the color distribution of the lips is examined. FIG. 21 shows a chromaticity distribution of hues in the lip region when 100 pixel points are sampled per image from 24 person image data. FIG. 22 shows the chromaticity distribution of the saturation of the lip region when similarly sampled. Note that the ordinate and abscissa of each figure correspond to FIG. 4 of the first embodiment.
5 and FIG.

【０１２７】そして、実施の形態１と同様にして求めた
口唇領域を表す２次元の正規確率密度関数をプロットし
たものを図２３に示す。図の高さ軸は確率密度、左側軸
は色相、右側軸は彩度を表している。FIG. 23 shows a plot of the two-dimensional normal probability density function representing the lip region obtained in the same manner as in the first embodiment. In the figure, the height axis represents the probability density, the left axis represents the hue, and the right axis represents the saturation.

【０１２８】前記領域確率計算装置５が、求められた確
率密度関数に色相値及び彩度を代入して計算することに
より、入力デジタル画像のすべてのピクセルについて、
口唇の色の確からしさが０〜１の範囲で求められる。そ
して、そのピクセルの色相と彩度とにより求めた確から
しさの値（０〜１）をピクセルの新しい値とする口唇領
域確率画像を生成する。図１０は、図８に示される入力
画像から生成された口唇領域確率画像であり、明るいピ
クセルほど口唇領域である可能性が高いことを示してい
る。The region probability calculating device 5 calculates the probability density function by substituting the hue value and the saturation into the obtained probability density function, so that for all the pixels of the input digital image,
The likelihood of the color of the lips is determined in the range of 0 to 1. Then, a lip region probability image is generated in which the likelihood value (0 to 1) obtained from the hue and saturation of the pixel is set as a new value of the pixel. FIG. 10 is a lip region probability image generated from the input image shown in FIG. 8, and shows that brighter pixels are more likely to be lip regions.

【０１２９】次に、（１）最外郭フレーム、（２）最外
郭フレームの重心、及び（３）網が囲む領域の重心を定
義する。Next, (1) the outermost frame, (2) the center of gravity of the outermost frame, and (3) the center of gravity of the area surrounded by the net are defined.

【０１３０】最外郭フレームとは、図２４に示すよう
に、最外郭格子点を順に結んでできる輪郭のことであ
り、レイヤー０のことである。即ち、網の外形を表して
いる。As shown in FIG. 24, the outermost frame is an outline formed by connecting outermost lattice points in order, and is layer 0. That is, it represents the outer shape of the net.

【０１３１】最外郭フレームの重心とは、最外郭格子点
の座標を平均したものである。即ち、最外郭格子点の数
をｎ、ｉ番目の最外郭格子点を表すベクトルをｐｉとす
ると、重心を表すベクトルｇは、次のように表される。The center of gravity of the outermost frame is the average of the coordinates of the outermost lattice points. That is, assuming that the number of outermost lattice points is n and the vector representing the i-th outermost lattice point is pi, the vector g representing the center of gravity is expressed as follows.

【０１３２】[0132]

【数１９】網が囲む領域の重心とは、図２５に示すように、網内部
のピクセルの座標をその明るさの重みとして平均したも
のである。即ち、網内部にあるピクセルの数をＮ、網内
部のピクセルｊを表すベクトルをｐｊ、ピクセルｊにお
ける明るさをＩ(pj)とすると、網が囲む領域の重心Ｇ
は、次のように定義される。なお、Σは網内部のすべて
のピクセルについて合計している。[Equation 19] As shown in FIG. 25, the center of gravity of the area surrounded by the mesh is obtained by averaging the coordinates of the pixels inside the mesh as the weight of the brightness. That is, assuming that the number of pixels in the network is N, the vector representing the pixel j in the network is pj, and the brightness at the pixel j is I (pj), the center of gravity G of the area surrounded by the network
Is defined as follows: Note that Σ is the sum of all pixels inside the net.

【０１３３】[0133]

【数２０】（１）網の内部エネルギー口唇領域を抽出するための網の内部エネルギーの定義は
以下の通りである。本実施の形態の内部エネルギーは、
実施の形態１の内部エネルギーに、顔幅から推定される
口唇の平均的な形状と大きさとに網が収縮するようなエ
ネルギーを加える。このエネルギーを形状エネルギーＥ
_form(t,s) と呼ぶことにする。(Equation 20) (1) Internal Energy of Net The definition of the internal energy of the net for extracting the lip region is as follows. The internal energy of the present embodiment is:
To the internal energy according to the first embodiment, energy is applied so that the net shrinks to the average shape and size of the lips estimated from the face width. This energy is used as the shape energy E
_We will call it _form (t, s).

【０１３４】具体的には、まず、図２６に示すように、
顔領域抽出用網の最外郭フレームＦ_fの重心ｇ_fと、口
唇領域抽出用網が囲む領域Ａ_mの重心Ｇ_mを結ぶ線に直
角で、重心ｇ_fを通る直線を引く。これを顔幅直線と呼
ぶ。顔幅直線と最外郭フレームＦ_fとの２つの交点Ｆ_a
・Ｆ_bを結ぶ線分の長さを顔幅の推定値Ｆ_abとする。顔
幅の推定値Ｆ_abに一定の比率を掛けることによって、顔
幅に対する口唇の平均的な大きさ、即ち口唇の幅及び高
さを推定する。Specifically, first, as shown in FIG.
And the center of gravity g _f outermost frame F _f of the face region extraction network, perpendicular to a line connecting the center of gravity G _m area A _m surrounding the lip region extraction network, draw a straight line through the center of gravity g _f. This is called a face width straight line. Two intersections F _a between the face width straight line and the outermost frame F _f
· The estimate F _ab face width a length of a line connecting the F _b. By multiplying the estimated value F _ab of the face width by a fixed ratio, the average size of the lips with respect to the face width, that is, the width and height of the lips is estimated.

【０１３５】そして、図２７に示すように、口唇の形状
を楕円に近いものとして、重心Ｇ_mを中心とし、推定し
た口唇の幅を長径、推定した口唇の高さを短径とし、長
径が顔幅直線と同じ傾きをもつ楕円ｅ_mを考える。As shown in FIG. 27, assuming that the shape of the lips is close to an ellipse, the center of gravity _Gm is the center, the estimated width of the lips is the major axis, the estimated height of the lips is the minor axis, and the major axis is consider an ellipse e _m having the same inclination as the face width linearly.

【０１３６】最外郭格子点の形状エネルギーＥ_form(t,
s) は、この楕円ｅ_mの中心（重心Ｇ_m）から口唇領域
抽出用網の最外郭格子点ｍ₁の一つに向かって直線を引
いたとき、その直線と楕円ｅ_mとの交点と、最外郭格子
点ｍ₁との距離として計算される。また、内部格子点ｍ
₂についても最外郭格子点ｍ₁と同様に、形状エネルギ
ーＥ_form(t,s) が計算されるが、内部格子点ｍ₂が楕円
ｅ_mの内側にあるときにはＥ_form(t,s) ＝０とする。こ
のようにして求められた形状エネルギーＥ_form(t,s) を
実施の形態１の（１）式に加えたものを口唇領域抽出用
網の内部エネルギーとする。The shape energy E _form (t,
s) of when a straight line is drawn toward the center of the ellipse e _m (the center of gravity G _m) to one of the outermost grid point m ₁ of the lip region extraction network, the intersection of the straight line and the elliptic e _m , Is calculated as the distance from the outermost lattice point m ₁ . Also, the internal grid point m
Similarly the outermost grid point m ₁ for _2, the shape energy E _form (t, s) but is computed, E _form (t, s) when the inner grid point m ₂ are inside the ellipse e _m = Set to 0. The value obtained by adding the shape energy E _form (t, s) thus obtained to the equation (1) of the first embodiment is defined as the internal energy of the lip region extracting net.

【０１３７】（２）網の画像エネルギー画像エネルギーは、口唇領域確率差分画像のピクセルの
明るさ用いる。口唇領域確率差分画像とは、領域確率計
算装置５で生成された口唇領域確率画像と顔領域確率画
像との対応するピクセルの明るさの差をとったものであ
る。口唇領域確率画像のピクセルが顔領域確率画像のピ
クセルより明るい場合には、その明るさの差をとり、そ
うでない場合には０とする。即ち、図２８に示すよう
に、座標（ｘ，ｙ）における口唇領域確率画像のピクセ
ルの明るさをＩ_m(x,y) 、顔領域確率画像のピクセルの
明るさをＩ_f(x,y) 、口唇領域確率差分画像のピクセル
の明るさをＩ_mf(x,y) とすると、次のようになる。(2) Image energy of net The image energy uses the brightness of the pixels of the lip region probability difference image. The lip region probability difference image is obtained by calculating the difference in brightness of the corresponding pixels between the lip region probability image generated by the region probability calculation device 5 and the face region probability image. If the pixel of the lip region probability image is brighter than the pixel of the face region probability image, the difference in the brightness is calculated, and if not, it is set to 0. That is, as shown in FIG. 28, the brightness of the pixel of the lip region probability image at the coordinates (x, y) is I _m (x, y), and the brightness of the pixel of the face region probability image is I _f (x, y). ), Assuming that the brightness of the pixel of the lip region probability difference image is I _mf (x, y).

【０１３８】[0138]

【数２１】実際の画像では、図９に示す顔領域確率画像と図１０に
示す口唇領域確率画像との差分をとると、図２９に示す
口唇領域確率差分画像が得られる。このようにして生成
した口唇領域確率差分画像のピクセルの明るさに−１を
掛けたものを内部格子点でのエネルギーとする。最外郭
格子点のエネルギーは、顔領域確率画像のピクセルの明
るさに−１を掛けたものを用いる。即ち、口唇領域抽出
用網の格子点ｐ(t,s) での画像エネルギーは、以下のよ
うになる。(Equation 21) In the actual image, when the difference between the face region probability image shown in FIG. 9 and the lip region probability image shown in FIG. 10 is obtained, a lip region probability difference image shown in FIG. 29 is obtained. The value obtained by multiplying the pixel brightness of the lip region probability difference image generated in this manner by −1 is defined as the energy at the internal grid point. The energy of the outermost lattice point is obtained by multiplying the pixel brightness of the face area probability image by -1. That is, the image energy at the lattice point p (t, s) of the lip region extraction net is as follows.

【０１３９】[0139]

【数２２】ここで、MFDRPI(t,s) は口唇領域確率差分画像での格子
点ｐ(t,s) での明るさ、FRPI(t,s) は顔領域確率画像で
の格子点ｐ(t,s) での明るさである。(Equation 22) Here, MFDRPI (t, s) is the brightness at the lattice point p (t, s) in the lip region probability difference image, and FRPI (t, s) is the lattice point p (t, s) in the face region probability image. ).

【０１４０】（３）移動エネルギー口唇領域抽出用網を動かすために、網の内部エネルギー
及び画像エネルギーの他に網の移動エネルギーを定義す
る。これは、口唇領域を抽出するための網が、精度良く
口唇領域を抽出するために、網全体を口唇領域の方へ動
かすためのエネルギーである。この移動エネルギーＥ
_moveは、以下のように定義する。(3) Moving Energy In order to move the lip region extracting net, the moving energy of the net is defined in addition to the internal energy of the net and the image energy. This is energy for the net for extracting the lip region to move the entire net toward the lip region in order to extract the lip region with high accuracy. This transfer energy E
_move is defined as follows.

【０１４１】[0141]

【数２３】ここで、Ｇ_netは口唇領域抽出用網が囲む領域の重心を
表すベクトル、ｇ_netは口唇領域抽出用網の最外郭フレ
ームの重心を表すベクトルである。即ち、Ｅ_mo _veは、２
つの重心が重なるように網を移動させる力をもたらす。
なお、口唇領域抽出用網の最外郭フレームの重心は初期
状態では、顔領域抽出用網の最外郭フレームの重心と一
致している。(Equation 23) Here, G _net is a vector representing the center of gravity of the area surrounded by the lip region extraction network, and g _net is a vector representing the center of gravity of the outermost frame of the lip region extraction network. In other words, E _mo _ve is, 2
The force causes the net to move so that the two centers of gravity overlap.
Note that, in the initial state, the center of gravity of the outermost frame of the lip region extraction network matches the center of gravity of the outermost frame of the face region extraction network.

【０１４２】次に、上記のように定義された口唇領域抽
出用網が口唇領域を抽出するプロセスを図３０のフロー
チャートに基づいて説明する。Next, a process of extracting a lip region by the lip region extraction net defined as above will be described with reference to a flowchart of FIG.

【０１４３】まず、実施の形態１と同様にして顔領域抽
出用網により顔領域を抽出する（Ｓ３１）。従って、仮
想的な網は、図３１の上段に示すように、顔領域を囲む
ような形・大きさになっている。口唇領域は、顔領域の
内部にあるのだから、顔領域が抽出されたときの網を口
唇領域抽出用網の初期値として、口唇領域抽出用網を動
かし始める（Ｓ３２）。顔領域抽出用網のときと同じよ
うにして、すべての格子点について８近傍を移動させ、
網を１単位変形させる（Ｓ３３）。次に、先に挙げた口
唇領域抽出用網が囲む領域の重心Ｇ_netと、口唇領域抽
出用網の最外郭フレームの重心ｇ_netを計算し、それら
の距離が小さくなる方向に網を移動させる（Ｓ３４）。
網全体のエネルギーに変化があるかどうかを調べ（Ｓ３
５）、Ｓ３５でエネルギー変化がある場合にはＳ３３の
工程へ戻り、Ｓ３５でエネルギーに変化がなくなったら
網の変形を停止させる。網が停止したときに網は図３１
の下段に示すように、口唇領域を囲っている。First, a face area is extracted by the face area extracting net in the same manner as in the first embodiment (S31). Therefore, the virtual net has a shape and size surrounding the face area as shown in the upper part of FIG. Since the lip area is inside the face area, the lip area extraction net is started to be moved using the net at the time of extraction of the face area as the initial value of the lip area extraction net (S32). In the same manner as in the face area extraction net, 8 neighborhoods are moved for all grid points,
The net is deformed by one unit (S33). Next, the center of gravity G _{net of the} region surrounded by the lip region extraction network and the center of gravity g _net of the outermost frame of the lip region extraction network are calculated, and the net is moved in a direction in which the distance between them is reduced. (S34).
It is checked whether there is a change in the energy of the entire network (S3
5) If there is a change in energy in S35, the process returns to step S33, and if there is no change in energy in S35, the deformation of the net is stopped. When the network stops, the network
As shown in the lower part of FIG.

【０１４４】以上により、本実施の形態における領域抽
出装置は、顔領域を囲む網を初期値とした口唇領域抽出
用網を設定し、格子点の位置関係に基づく網の内部エネ
ルギーと、格子点が位置するピクセルの色の確からしさ
に基づく画像エネルギーと、口唇領域抽出用網の最外郭
フレームの重心と網が囲む領域の重心とが一致するよう
に定義された移動エネルギーとの合計値が変化しなくな
るまで網を収縮変形・移動させ、合計値が変化しなくな
った時点で格子点の移動を停止する構成である。As described above, the region extracting apparatus according to the present embodiment sets the lip region extracting net with the initial value of the net surrounding the face region, and sets the internal energy of the net based on the positional relationship between the grid points and the grid point. The total value of the image energy based on the likelihood of the color of the pixel where is located and the moving energy defined so that the center of gravity of the outermost frame of the lip region extraction network matches the center of gravity of the region surrounded by the network changes In this configuration, the mesh is shrunk, deformed and moved until it stops, and the movement of the grid points is stopped when the total value no longer changes.

【０１４５】従って、顔領域を覆っていた網は、色の確
からしさが大きい口唇領域に向かって収縮し、最終的に
口唇領域を囲んで停止する。この結果、口唇領域を抽出
することができる。Therefore, the net covering the face region contracts toward the lip region having a high degree of certainty in color, and finally stops around the lip region. As a result, a lip region can be extracted.

【０１４６】また、実施の形態１の効果に加えて、顔領
域の最外郭フレームの重心と口唇領域抽出用網の囲む領
域の重心とを結ぶ直線と、垂直軸とのなす角を求めるこ
とにより、顔の回転角を認識することができる。ここ
で、顔の回転とは顔の傾きのことであり、即ち、撮像装
置から撮影方向に延びた軸の回りの回転を示す。Further, in addition to the effect of the first embodiment, the angle between the vertical axis and the straight line connecting the center of gravity of the outermost frame of the face area and the center of gravity of the area surrounded by the lip area extraction net is obtained. , The face rotation angle can be recognized. Here, the rotation of the face refers to the inclination of the face, that is, rotation around an axis extending in the shooting direction from the imaging device.

【０１４７】顔の回転角がわかると、例えば、人物デー
タベースを作る際に、写真をスキャナなどで読み込む場
合に、写真の方向などをきにせずに入力することがで
き、手間が大いに軽減する。また、同様にして、バラバ
ラになった写真の方向を揃えるような装置に適用するこ
とが可能である。さらに、撮影時のブレやフィルムイメ
ージをデジタル化するときなどの方向の間違いを気にし
なくてもよくなる。この結果、撮影時、及び処理時の自
由度が増大する。When the rotation angle of the face is known, for example, when creating a person database, when a photograph is read by a scanner or the like, it is possible to input the photograph without ignoring the direction of the photograph, thereby greatly reducing labor. Similarly, the present invention can be applied to an apparatus that aligns the direction of a photograph that has fallen apart. Furthermore, it is not necessary to be concerned about a direction error such as blurring during shooting or digitizing a film image. As a result, the degree of freedom during photographing and processing increases.

【０１４８】さらに、顔領域と口唇領域とのバランスに
より、顔のおおよその方向を推定することができる。即
ち、口唇領域の中心が顔領域の主軸に近い位置にあるな
らば、利用者は撮像装置の方向を向いており、左右のど
ちらかにずれていればそれぞれの方向に向いていること
になる。Further, the approximate direction of the face can be estimated from the balance between the face area and the lip area. That is, if the center of the lip region is located at a position close to the main axis of the face region, the user is facing the direction of the imaging device, and if the center is shifted to the left or right, the user is facing each direction. .

【０１４９】また、動画像から口唇領域を連続的に抽出
することにより口唇の動きを検出することができ、マイ
クなどと組み合わせて音声認識の認識精度をあげたり、
マイクが拾った音が利用者が喋っているのか別のところ
から聞こえてきたものかを判断したりすることが可能と
なる。Further, the movement of the lips can be detected by continuously extracting the lip regions from the moving image, and the recognition accuracy of the voice recognition can be improved by combining with a microphone or the like.
It is possible to determine whether the sound picked up by the microphone is talking by the user or heard from another place.

【０１５０】さらに、本領域抽出装置は、実施の形態１
と同様にさまざまな機器の制御などに利用することがで
きる。Further, the present region extracting apparatus is similar to that of the first embodiment.
It can be used to control various devices as well as.

【０１５１】例えば、テレビに本領域抽出装置を取り付
けて、利用者の顔の向きを検出することにより、利用者
がテレビの方を向いていないときだけ画面を暗くして表
示面を保護したり、音センサと組み合わせ、単純な発生
によってスイッチのＯＮ／ＯＦＦを行ったり、単純な機
器の制御を行ったりすることができる。さらに、マイク
で拾った声が利用者のものかどうかを判断できるので、
誤った検出で制御されることを防ぐことができる。For example, by attaching the present area extraction device to a television and detecting the direction of the user's face, the screen is darkened only when the user is not facing the television to protect the display surface. In combination with a sound sensor, the switch can be turned ON / OFF by simple generation, or simple device control can be performed. In addition, it is possible to determine whether the voice picked up by the microphone belongs to the user,
Control by erroneous detection can be prevented.

【０１５２】〔実施の形態３〕本発明のその他の実施の
形態について図３２ないし図３８に基づいて説明すれ
ば、以下の通りである。なお説明の便宜上、前記の実施
の形態の図面に示した部材と同一の部材には同一の符号
を付記し、その説明を省略する。[Embodiment 3] The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, the same members as those shown in the drawings of the above-described embodiment are denoted by the same reference numerals, and description thereof will be omitted.

【０１５３】本実施の形態にかかる領域抽出装置は、入
力された人物画像により顔領域、口唇領域、及び左右の
目領域を抽出するものである。ここでは、顔領域を抽出
した後、その抽出された顔領域を初期状態とした口唇領
域抽出用網、左目領域抽出用網、及び右目領域抽出用網
を設定し、それぞれの網の位置関係を考慮に入れながら
それらの網を同時に移動させて口唇領域、及び左右の目
領域を同時に抽出する。The region extracting apparatus according to the present embodiment extracts a face region, a lip region, and left and right eye regions from an input human image. Here, after extracting the face area, a lip area extraction net, a left eye area extraction net, and a right eye area extraction net with the extracted face area as an initial state are set, and the positional relationship between the respective nets is set. The lips region and the left and right eye regions are simultaneously extracted by simultaneously moving those nets while taking into account.

【０１５４】上記領域抽出装置は、図３２に示すよう
に、実施の形態２の構成に加えて、記憶装置３に左目領
域抽出用網情報格納部２３、右目領域抽出用網情報格納
部２４、バランスエネルギー（Ｅ_balance）格納部２
５、及び顔部品位置関係格納部２６を有している。As shown in FIG. 32, in addition to the configuration of the second embodiment, the above-mentioned area extracting device stores a left-eye area extracting network information storage section 23, a right-eye area extracting network information storing section 24 in E _balance storage unit 2
5 and a face part positional relationship storage unit 26.

【０１５５】左・右目領域抽出用網情報格納部２３・２
４には初期状態として顔領域に収縮した網の座標がそれ
ぞれ格納される。また、左・右目領域抽出用網の２つの
重心と、網の内部エネルギー（Ｅ_int）、画像エネルギ
ー（Ｅ_image）、及び形状エネルギー（Ｅ_form）とがそ
れぞれ格納される。バランスエネルギー格納部２５には
３つの網の位置関係の制御を行うバランスエネルギー
（Ｅ_balance）が格納される。顔部品位置関係格納部２
６にはサンプリングにより予め求められた顔部品の位置
関係が予め格納されている。Left / right eye area extraction network information storage 23.2
4 stores the coordinates of the screen contracted in the face area as an initial state. In addition, the two centroids of the left and right eye region extracting nets, the internal energy (E _int ), the image energy (E _image ), and the shape energy (E _form ) of the net are stored. The balance energy storage unit 25 stores balance energy (E _balance ) for controlling the positional relationship between the three nets. Face part positional relation storage unit 2
6 stores in advance the positional relationship of the face parts obtained in advance by sampling.

【０１５６】上記構成によれば、実施の形態２と同様
に、領域確率計算装置５が領域確率画像格納部８を参照
して、顔領域及び口唇領域を構成する各ピクセルの色に
基づいて確率密度関数を計算することにより各ピクセル
の色の確からしさを算出し、顔領域確率画像及び口唇領
域確率画像を生成する。次に、エネルギー計算装置６が
顔領域抽出用網の内部エネルギー（Ｅ_int）と画像エネ
ルギー（Ｅ_image）を計算し、それを顔領域抽出用網情
報格納部９に格納する。これらのエネルギーの合計が小
さくなるように網を収縮変形・移動させ、まず顔領域を
抽出する。この結果が出力装置４に出力される。According to the above configuration, as in the second embodiment, the region probability calculation device 5 refers to the region probability image storage unit 8 and calculates the probability based on the color of each pixel constituting the face region and the lip region. The likelihood of the color of each pixel is calculated by calculating a density function, and a face region probability image and a lip region probability image are generated. Next, the energy calculator 6 calculates the internal energy (E _int ) and the image energy (E _image ) of the face area extraction network, and stores them in the face area extraction network information storage unit 9. The mesh is shrunk, deformed and moved so that the sum of these energies is reduced, and a face region is first extracted. This result is output to the output device 4.

【０１５７】重心計算装置２１が、顔領域抽出用網情報
格納部９・口唇領域抽出用網情報格納部２２・左目領域
抽出用網情報格納部２３・右目領域抽出用網情報格納部
２４にそれぞれ格納されている網の座標を参照して最外
郭フレームの重心（ｇ）を計算し、それぞれ格納する。
また、各々の網の座標と領域確率画像格納部８に格納さ
れた顔領域確率画像とを参照して顔領域抽出用網・口唇
領域抽出用網・左右の目領域抽出用網の囲む領域の重心
（Ｇ）をぞれぞれ計算し、顔領域抽出用網情報格納部９
・口唇領域抽出用網情報格納部２２・左目領域抽出用網
情報格納部２３・右目領域抽出用網情報格納部２４に各
々格納する。The center-of-gravity calculation device 21 stores the face area extraction net information storage unit 9, the lip area extraction net information storage unit 22, the left eye area extraction net information storage unit 23, and the right eye area extraction net information storage unit 24, respectively. The center of gravity (g) of the outermost frame is calculated with reference to the stored coordinates of the net and stored.
Further, referring to the coordinates of each net and the face area probability image stored in the area probability image storage unit 8, the area surrounding the face area extraction net, the lip area extraction net, and the left and right eye area extraction nets is determined. The center of gravity (G) is calculated, and the face area extraction net information storage unit 9 is calculated.
・ They are stored in the lip area extraction net information storage unit 22, the left eye area extraction net information storage unit 23, and the right eye area extraction net information storage unit 24, respectively.

【０１５８】また、エネルギー計算装置６が口唇領域抽
出用網・左右の目領域抽出用網の内部エネルギー（Ｅ
_int）、画像エネルギー（Ｅ_image）、及び形状エネル
ギー（Ｅ_form）を計算し、それらを口唇領域抽出用網情
報格納部２２・左目領域抽出用網情報格納部２３・右目
領域抽出用網情報格納部２４に格納する。さらに、エネ
ルギー計算装置６が顔部品位置関係格納部２６と４つの
網情報格納部に格納されている重心（ｇ・Ｇ）とを比較
することによりバランスエネルギー（Ｅ_balance）を計
算し、格納する。The energy calculation device 6 calculates the internal energy (E (E) of the lip region extraction net and the left and right eye region extraction nets.
_int ), image energy (E _image ), and shape energy (E _form ) are calculated, and they are stored in the lip area extraction net information storage unit 22, the left eye area extraction net information storage unit 23, and the right eye area extraction net information storage. Stored in the unit 24. Further, the energy calculator 6 calculates and stores the balance energy (E _balance ) by comparing the face part positional relationship storage unit 26 with the centroids (g · G) stored in the four net information storage units. .

【０１５９】口唇領域抽出用網・左右の目領域抽出用網
のエネルギーの合計値がそれぞれ小さくなるように３つ
の網を変形収縮・移動し、口唇領域及び左右の目領域を
抽出する。その結果が出力装置４に出力される。The three meshes are deformed, contracted and moved so that the total value of the energy of the lip region extracting mesh and the energy of the left and right eye region extracting meshes is reduced, and the lip region and the left and right eye regions are extracted. The result is output to the output device 4.

【０１６０】次に、口唇領域抽出用網、左目領域抽出用
網、及び右目領域抽出用網のエネルギーについて説明す
る。口唇領域抽出用網のエネルギーは実施の形態２と同
様である。Next, the energy of the lip region extracting net, the left eye region extracting net, and the right eye region extracting net will be described. The energy of the lip region extracting net is the same as in the second embodiment.

【０１６１】（１）目領域抽出用網の内部エネルギー左右の目領域抽出用網のエネルギーの内部エネルギーに
ついては、口唇領域抽出用網と同様に、顔幅から推定さ
れる目の平均的な形状と大きさとに網が収縮するような
エネルギーを加える。即ち、顔幅に対して一定の比率を
乗ずることによって得られる長径及び短径を持つ楕円を
想定することによって得られる形状エネルギーＥ_formが
実施の形態１の（１）式に加えられることにより目領域
抽出用網の内部エネルギーが定義される。(1) Internal energy of the eye area extraction net The internal energy of the left and right eye area extraction nets is the average energy of the eyes estimated from the face width, similar to the lip area extraction net. Apply energy so that the net shrinks in size and size. That is, the shape energy E _form obtained by assuming an ellipse having a major axis and a minor axis obtained by multiplying the face width by a certain ratio is added to the expression (1) of the first embodiment to obtain the eye energy. The internal energy of the area extraction network is defined.

【０１６２】（２）目領域抽出用網の画像エネルギー左右の目領域抽出用網のエネルギーの画像エネルギー
は、目領域抽出画像のピクセルの明るさを用いる。目領
域抽出画像とは、領域確率計算装置５で生成された口唇
領域確率画像と顔領域確率画像との対応するピクセルの
明るさの大きいほうをとったものである。即ち、図３３
に示すように、座標（ｘ，ｙ）における口唇領域確率画
像のピクセルの明るさをＩ_m(x,y) 、顔領域確率画像の
ピクセルの明るさをＩ_f(x,y) 、目領域抽出画像のピク
セルの明るさをＩ_e(x,y) とすると、次のようになる。(2) Image Energy of Eye Area Extraction Net The brightness of the pixels of the eye area extraction image is used as the image energy of the left and right eye area extraction net energies. The eye region extraction image is obtained by taking the larger of the brightness of the pixels corresponding to the lip region probability image and the face region probability image generated by the region probability calculation device 5. That is, FIG.
As shown in the coordinates (x, y) the brightness of the lip region probability image at pixel I _m (x, y), the brightness of the face area probability image pixel I _f (x, y), the eye area Assuming that the brightness of the pixel of the extracted image is I _e (x, y), the following is obtained.

【０１６３】[0163]

【数２４】このようにして生成された目領域抽出画像のピクセルの
明るさを内部格子点での画像エネルギーとする。最外郭
格子点の画像エネルギーは、顔領域確率画像のピクセル
の明るさに−１を掛けたものを用いる。即ち、目領域抽
出用網のある格子点ｐ(t,s) での画像エネルギーは、以
下のようになる。(Equation 24) The brightness of the pixel of the eye region extracted image generated in this way is defined as the image energy at the internal grid point. The image energy of the outermost lattice point is obtained by multiplying the pixel brightness of the face area probability image by -1. That is, the image energy at a certain grid point p (t, s) of the eye area extracting net is as follows.

【０１６４】[0164]

【数２５】ここで、ERPI(t,s) は、目領域抽出画像での格子点ｐ
(t,s) での明るさ、FRPI(t,s) は顔領域確率画像での格
子点ｐ(t,s) での明るさである。(Equation 25) Here, ERPI (t, s) is the lattice point p in the eye region extracted image.
The brightness at (t, s) and FRPI (t, s) are the brightness at lattice point p (t, s) in the face area probability image.

【０１６５】以上のようにして、目領域抽出画像の内部
エネルギーと画像エネルギーとが定義される。As described above, the internal energy and the image energy of the eye region extracted image are defined.

【０１６６】（３）網のバランスエネルギーこれら左右の目領域抽出用網及び口唇領域抽出用網の３
つの網が協調しながら動くように定義した網のバランス
エネルギーについて説明する。(3) Net Balance Energy These three types of nets for extracting the left and right eye areas and the lip area are used.
A description will be given of the balance energy of a network that is defined so that two networks move in cooperation.

【０１６７】通常、人間の左右の目と口唇との位置関係
は同じであり、個人差はあるものの例えば、両目の間の
距離や、目及び口唇の高さなどは顔幅に対して一定の比
率を持っている。従って、これらの顔部品の位置関係に
関する先見的知識を用いて、それぞれの領域を抽出する
網の位置関係を制御する。Normally, the positional relationship between the left and right eyes and the lips of a human is the same, and although there are individual differences, for example, the distance between the eyes and the height of the eyes and lips are constant with respect to the face width. Have a ratio. Therefore, the positional relationship of the net from which each region is extracted is controlled using the a priori knowledge regarding the positional relationship between these face parts.

【０１６８】表１は２４人のサンプルデータより測定し
た顔部品の大きさ及び位置関係に関するデータである。
これは、図３４（ａ）及び（ｂ）に示すように、顔幅ｈ
を１としたときの各測定幅ａ〜ｌの値の平均、分散、及
び標準偏差を示している。各々のデータは、図３４に示
す箇所を測定している。これらのデータにより算出され
た顔部品の位置関係は、前記顔部品位置関係格納部２６
に格納される。Table 1 shows data relating to the size and positional relationship of face parts measured from sample data of 24 persons.
This corresponds to the face width h, as shown in FIGS.
The average, the variance, and the standard deviation of the values of the respective measurement widths a to l when is 1 are shown. Each data is measured at the location shown in FIG. The positional relationship of the face parts calculated based on these data is stored in the face part positional relationship storage unit 26.
Is stored in

【０１６９】[0169]

【表１】バランスエネルギーＥ_balanceは、内部エネルギー及び
画像エネルギーに加えてそれぞれの網の位置関係を特定
の制約条件に拘束するように定義される。バランスエネ
ルギーＥ_balanceは、次のように表される。[Table 1] The balance energy _Ebalance is defined so as to constrain the positional relationship of each network to specific constraints in addition to the internal energy and the image energy. The balance energy _Ebalance is expressed as follows.

【０１７０】[0170]

【数２６】図３５に示すように、Ｅ_mgは、口唇領域抽出用網の最外
郭格子点の重心ｇ_mが、その網が囲む領域の重心Ｇ_mに
引き寄せられるようなエネルギーである。(Equation 26) As shown in FIG. 35, E _mg is energy such that the center of gravity g _m of the outermost lattice point of the lip region extraction net is drawn to the center of gravity G _m of the region surrounded by the net.

【０１７１】Ｅ_legは、左の目領域抽出用網の最外郭格
子点の重心ｇ_leが、その網が囲む領域の重心Ｇ_leに引き
寄せられるようなエネルギーである。E _leg is energy such that the center of gravity g _le of the outermost lattice point of the left eye area extraction net is drawn to the center of gravity G _le of the area surrounded by the net.

【０１７２】Ｅ_regは、右の目領域抽出用網の最外郭格
子点の重心ｇ_reが、その網が囲む領域の重心Ｇ_reに引き
寄せられるようなエネルギーである。なお、上記重心ｇ
_m、重心ｇ_le、及び重心ｇ_reは、初期状態では顔領域抽
出用網の最外郭格子点の重心ｇ_fに一致している。[0172]E_regIs the outermost case of the right eye region extraction network
Center of gravity g of child point_reIs the center of gravity G of the area surrounded by the net_rePull to
It is energy that can be sent. The center of gravity g
_m, Center of gravity g_leAnd the center of gravity g_reIs the face area extraction in the initial state.
The center of gravity g of the outermost grid point of the destination net_fMatches.

【０１７３】Ｅ_iodは、左右の目領域抽出用網が囲む領
域の重心Ｇ_le・Ｇ_re間の距離が、顔幅に対する特定の比
になろうとするエネルギーである。[0173]E_iodIs the area surrounded by the left and right eye area extraction nets.
Area center of gravity G_le・ G_reDistance is a specific ratio to face width
It is the energy that is about to become.

【０１７４】Ｅ_emhは、左右の目領域抽出用網が囲む領
域の重心Ｇ_leとＧ_reとを結ぶ線分の中点と口唇領域抽出
用網が囲む領域の重心Ｇ_mとの距離が顔幅に対する特定
の比率になろうとするエネルギーである。E _emh is the distance between the midpoint of the line connecting the centroids G _le and G _{re of the} area surrounded by the left and right eye area extraction nets and the center of gravity G _{m of the} area encircled by the lip area extraction net. The energy that is trying to reach a certain ratio to width.

【０１７５】Ｅ_faは、左右の目領域抽出用網が囲む領域
の重心Ｇ_leとＧ_reとを結ぶ線分の中点と口唇領域抽出用
網が囲む領域の重心Ｇ_mとを結ぶ軸と、顔の傾きを表す
軸とのなす角が０になろうとするエネルギーである。E _fa is the axis connecting the midpoint of the line connecting the centroids G _le and G _{re of the} area surrounded by the left and right eye area extraction nets and the center of gravity G _{m of the} area encircled by the lip area extraction net. , The energy that the angle formed with the axis representing the inclination of the face tends to become zero.

【０１７６】Ｅ_emaは、左右の目領域抽出用網が囲む領
域の重心Ｇ_leとＧ_reとを結ぶ線分と、その線分の中点と
口唇領域抽出用網が囲む領域の重心Ｇ_mとを結ぶ軸との
なす角が直角になろうとするエネルギーである。E _ema is a line segment connecting the centroids G _le and G _{re of the} region surrounded by the left and right eye region extraction nets, the center of the line segment and the centroid G _{m of the} region surrounded by the lip region extraction net. Is the energy at which the angle between the axis and the axis connecting the two is going to be a right angle.

【０１７７】以上のように、口唇・左目・右目領域抽出
用網の内部エネルギー及び画像エネルギーは網を収縮さ
せ、かつ滑らかに保とうとしながら、各領域の画像の特
徴にあわせて各領域に縮んで生き、バランスエネルギー
Ｅ_balanceはそれそれの網のバランスを一定の平行状態
に保つ。３つの網がそれぞれの領域に向かって変形・収
縮し、網の位置関係が一定の平衡状態を保つように移動
することにより、それぞれの網は最終的に個々の領域を
囲むように収束する。図３６に、入力画像に対して本領
域抽出装置を適用した結果を示す。同図（ａ）〜（ｃ）
の順に、口唇、左目、及び右目領域抽出用網が各領域を
抽出する様子を示している。As described above, the internal energy and image energy of the lip / left eye / right eye area extraction network are shrunk to each area in accordance with the image characteristics of each area while keeping the net shrunk and smooth. Live and balance energy E _balance keeps the balance of each net constant and parallel. The three nets are deformed and shrunk toward the respective areas, and the positional relations of the nets are moved so as to maintain a constant equilibrium state, so that the respective nets finally converge so as to surround the individual areas. FIG. 36 shows the result of applying the present region extraction device to an input image. (A) to (c) of FIG.
In this order, the lip, left eye, and right eye region extraction nets extract each region.

【０１７８】次に、口唇・左目・右目領域抽出用網がそ
れぞれの領域を抽出するプロセスを図３７のフローチャ
ートに基づいて説明する。Next, the process of extracting the respective regions by the lip / left eye / right eye region extracting net will be described with reference to the flowchart of FIG.

【０１７９】まず、実施の形態１と同様にして顔領域抽
出用網により顔領域を抽出する（Ｓ４１）。従って、図
３８の上段に示すように、仮想的な網は、顔領域を囲む
ような形・大きさになっている。口唇領域及び左右の目
領域は、顔領域の内部にあるのだから、顔領域が抽出さ
れたときの網を口唇・左目・右目領域抽出用網の初期値
として、各網を動かし始める（Ｓ４２）。顔領域抽出用
網のときと同じようにして、口唇領域抽出用網のすべて
の格子点について８近傍を移動させ、網が収束している
かどうかを調べ（Ｓ４３）、Ｓ４３で網が収束していな
ければ網を１単位変形させ（Ｓ４４）、Ｓ４３で収束し
ていれば次に左右の目領域抽出用網について同様に調べ
る（Ｓ４５〜Ｓ４８）。First, a face area is extracted by the face area extraction net in the same manner as in the first embodiment (S41). Therefore, as shown in the upper part of FIG. 38, the virtual net has a shape and size surrounding the face area. Since the lip area and the left and right eye areas are inside the face area, each of the nets is started to be moved using the net at the time of extracting the face area as an initial value of the lip / left eye / right eye area extraction net (S42). . In the same manner as in the face area extraction net, 8 neighborhoods are moved for all grid points of the lip area extraction net, and it is checked whether or not the net is converged (S43). If not, the net is deformed by one unit (S44), and if it converges in S43, then the same is examined for the left and right eye region extraction nets (S45 to S48).

【０１８０】各網が１単位変形変形した後、バランスエ
ネルギーを計算して各網を移動させる（Ｓ４９）。そし
て、口唇領域抽出用網のエネルギーに変化があるかどう
かを調べ（Ｓ５０）、変化がない場合には口唇領域抽出
用網の収束済フラグをＯＮにする（Ｓ５１）。同様にし
て、左右の目領域抽出用網についても調べる（Ｓ５２〜
Ｓ５５）。その後、すべての網のエネルギーに変化があ
るかどうかを調べ（Ｓ５６）、エネルギー変化がある場
合にはＳ４３の工程へ戻り、すべての網のエネルギーに
変化がなくなったら網の変形を停止させる。すべての網
が停止したときには、図３８の下３段に示すように、各
網は各領域を囲っている。After each net is deformed by one unit, the balance energy is calculated and each net is moved (S49). Then, it is checked whether or not there is a change in the energy of the lip region extraction net (S50). If there is no change, the convergence flag of the lip region extraction net is turned ON (S51). Similarly, the left and right eye region extraction nets are checked (S52 to S52).
S55). Thereafter, it is checked whether or not there is a change in the energy of all nets (S56). If there is a change in energy, the process returns to step S43, and if there is no change in the energy of all nets, the deformation of the net is stopped. When all the networks are stopped, each network surrounds each area as shown in the lower three tiers of FIG.

【０１８１】以上により、本実施の形態における領域抽
出装置は、顔領域を囲む網を初期値とした口唇・左右の
目領域抽出用網を設定し、格子点の位置関係に基づく網
の内部エネルギーと、格子点が位置するピクセルの色の
確からしさに基づく画像エネルギーと、３つの領域の重
心の位置関係が予め測定され記憶された口唇及び左右の
目の位置関係と等しくなるように定義されたバランスエ
ネルギーとの合計値が変化しなくなるまで網を収縮変形
・移動させ、合計値が変化しなくなった時点で格子点の
移動を停止する構成である。As described above, the region extracting apparatus according to the present embodiment sets the lip / left and right eye region extracting nets with the net surrounding the face region as the initial value, and sets the internal energy of the net based on the positional relationship between the grid points. And the image energy based on the likelihood of the color of the pixel at which the grid point is located, and the positional relationship between the centers of gravity of the three regions are defined to be equal to the previously measured and stored positional relationship between the lips and the left and right eyes. In this configuration, the mesh is contracted and deformed and moved until the total value of the balance energy does not change, and the movement of the lattice point is stopped when the total value does not change.

【０１８２】従って、顔領域を覆っていた網は、色の確
からしさが大きい口唇・左右の目領域に向かってそれぞ
れ収縮し、最終的に各領域を囲んで停止する。この結
果、口唇及び左右の目領域を抽出することができる。Therefore, the net covering the face area contracts toward the lips and the left and right eye areas having high color certainty, and finally stops around each area. As a result, the lips and the left and right eye regions can be extracted.

【０１８３】これにより、実施の形態１及び２の効果に
加えて、口唇・目の形を認識することができ、これらは
個人によって特有な値を持つので、これを利用して個人
同定を行うことができる。即ち、予め複数の人物の画像
を入力しておき、領域抽出装置により各領域を抽出、そ
れぞれの部品の相対位置関係や大きさ、形状を数値化し
て、各人ごとにデータベースに記憶しておく。データベ
ース検索装置に撮像装置と領域抽出装置を取り付けるこ
とにより、撮像装置に映された人物の顔領域、口唇領
域、目領域を抽出して、同じように各部品の位置関係や
大きさを数量化する。得られた数値とデータベースの数
値とでマッチングを取り、入力画像がデータベース中の
どの人物であるかを判断する。これにより、来客の多い
企業の受付やホテルなどで予めデータベースを作ってお
き、来客人物を撮影してデータベースからその人物のプ
ロフィールなどを自動検索し、それによってスムースな
対応をするアプリケーションを想定することができる。As a result, in addition to the effects of the first and second embodiments, the shape of the lips and eyes can be recognized. Since these have unique values depending on the individual, individual identification is performed using this. be able to. That is, images of a plurality of persons are input in advance, each area is extracted by an area extracting device, and the relative positional relationship, size, and shape of each part are digitized and stored in a database for each person. . By attaching an imaging device and a region extraction device to the database search device, the face, lip, and eye regions of the person reflected on the imaging device are extracted, and the positional relationship and size of each part are similarly quantified. I do. Matching is performed between the obtained numerical value and the numerical value in the database to determine which person in the database the input image is. As a result, an application that creates a database in advance at the reception of a company with many visitors or a hotel, etc., photographs a visitor, automatically searches the profile of that person from the database, etc., and thereby assumes an application that responds smoothly Can be.

【０１８４】また、顔輪郭だけでなく、顔部品の抽出ま
で行えるようになると、情報の圧縮率はさらに高まるこ
とになる。実施の形態１で述べたように、顔領域を抽出
し、顔以外の解像度や色数を落としたりすることで情報
を圧縮できるが、さらに、顔領域の中で、顔部品の領域
を最も高い解像度・色数にし、次いで顔領域、顔領域以
外、という順序で解像度・色数を落とすことにより、画
像の顔領域の劣化を少なく保ちつつ情報を圧縮すること
が可能となる。さらに、顔部品及び顔領域で利用される
色数は限られており、それぞれ独自の色コードを持つこ
とにより、さらなる情報の圧縮が可能となる。これによ
り、実施の形態１や２よりも少ない容量でより多くの画
像を保存でき、あるいは、狭い帯域幅でよりコマ数の多
い滑らかな画像を電送することが可能となる。Further, if it becomes possible to extract not only a face outline but also a face part, the information compression rate will be further increased. As described in the first embodiment, the information can be compressed by extracting the face area and reducing the resolution and the number of colors other than the face, but the face part area is the highest among the face areas. By reducing the resolution and the number of colors in the order of the resolution and the number of colors and then the face area and the area other than the face area, it is possible to compress the information while keeping the deterioration of the face area of the image small. Furthermore, the number of colors used in the face part and the face area is limited, and by having their own color codes, it is possible to further compress information. Thus, more images can be stored with a smaller capacity than in the first and second embodiments, or a smooth image with a larger number of frames can be transmitted with a narrow bandwidth.

【０１８５】また、人物データベースなどのように、画
像を多くファイリングするような必要があるアプリケー
ションにおいては、ひとつひとつの画像の大きさがその
装置に記憶することのできる画像の数に大きく関わって
くる。上述のように、１枚の人物画像をより少ない容量
で保存することができるなら、低コストで多くの画像を
記憶することのできるデータベースを作成することがで
きる。また、帯域幅を広くしなくとも人物画像を電送で
きるようになる。従って、テレビ会議、テレビ電話、及
び電子秘書などの低コスト化が実現できる。In an application such as a person database that requires a large number of images to be filed, the size of each image greatly depends on the number of images that can be stored in the device. As described above, if one person image can be stored with a smaller capacity, a database that can store many images at low cost can be created. In addition, a person image can be transmitted without increasing the bandwidth. Therefore, cost reduction of a video conference, a video phone, an electronic secretary, and the like can be realized.

【０１８６】また、人物データベースを作成する際に、
今までは画像と一緒に検索のためのキーとなるその個人
を表すデータを人間が手作業で単語として登録し、検索
する際もそのキーワードを手で入力したり、あるいは、
データベースを最初から順番に見つかるまで方端から見
なければならなかったが、本領域抽出装置では画像から
顔領域及び顔部品が自動的に抽出できるので、データベ
ースを登録する際に顔部品を抽出して、装置自身がデー
タベース検索装置に接続された撮像装置に映すだけでよ
くなる。この結果、来客や久しぶりに会った人が誰であ
るかを瞬時に見つけ出すといったアプリケーションが可
能となる。Also, when creating a person database,
Until now, humans manually registered data representing the individual, which is a key for search along with the image, as a word, and manually entered the keyword when searching, or
Although the database had to be viewed from the beginning until it was found in order from the beginning, this area extraction device can automatically extract the face area and face parts from the image. Thus, it is only necessary to project the device itself on the imaging device connected to the database search device. As a result, an application that instantly finds out who is a visitor or a person who has met after a long time becomes possible.

【０１８７】さらに、目や口唇の領域が抽出されると、
表情の認識を行うことが可能となる。人間の表情は、目
や口などの顔部品に特徴が現れるため、目や口の形を領
域抽出することにより調べることによって、その人がど
のような表情をしているのかを判定できるようになる。
これは、人の感情にあわせて対応を変化させるロボット
や精神分析などの自動化などを行うことが可能にする。Furthermore, when the eye and lip regions are extracted,
It is possible to recognize facial expressions. Since human facial expressions have features in facial parts such as eyes and mouth, by examining the shape of eyes and mouth by extracting areas, it is possible to determine what kind of facial expression that person is Become.
This makes it possible to perform automation such as robots and psychoanalysis that change correspondence according to human emotions.

【０１８８】同様に、表情の認識ができるということ
は、表情の生成をもできるということになる。目や口の
領域が抽出されると、その領域を感情特有の形に変形さ
せることにより表情を作り出せる。例えば、人が笑って
いるときの典型的な目と口の形状を予め測定してデータ
ベース化しておき、入力画像から抽出された目や口の領
域をそのデータベースにしたがって変形させれば、その
画像中の表情を笑わせることができる。さまざまな表情
を予めデータベース化しておけば、撮像した画像の顔を
自由自在に笑わせたり、怒らせたりといったことが可能
となる。Similarly, the ability to recognize a facial expression means that a facial expression can also be generated. When an eye or mouth region is extracted, an expression can be created by transforming the region into an emotion-specific shape. For example, a typical eye and mouth shape when a person is laughing is measured in advance and compiled into a database, and if the eye or mouth region extracted from the input image is deformed according to the database, the image is obtained. You can make the expression inside laugh. If various expressions are stored in a database in advance, it is possible to make the face of the captured image freely laugh or anger.

【０１８９】これは、人間的なやさしいインターフェイ
スを持った機器を開発することを可能にする。従って、
機器の操作説明などで現れる顔の表情を自在に変化さ
せ、利用者にとって親しみ易い機器の提供などを行うこ
とが可能となる。また、ある人間を別の人間、あるい
は、他の動物などに連続的に変化させるアニメーション
など、映像作品における特殊効果的な利用もこの技術を
用いれば全自動で行うことができる。来るべきマルチメ
ディア時代のエンターテイメントコンテンツとしての利
用価値は大きい。This makes it possible to develop a device having a human-friendly interface. Therefore,
It is possible to freely change the facial expression appearing in the operation explanation of the device or the like, and to provide a device which is easy for the user to access. Further, special effects in video works, such as animation that continuously changes one person into another person or another animal, can be fully automatically performed by using this technology. Its value as entertainment content in the coming multimedia era is great.

【０１９０】また、表情生成、似顔絵作成などが自動的
にできるようになることは、エンターテイメント分野で
のマルチメディアコンテンツ供給の低コスト化を実現で
きる。従来では手作業と独特のセンス、技術を要求され
るものであったものが、自動的にできるようになるた
め、将来的に増大するであろう需要に対しても対応する
ことが可能となる。In addition, the automatic generation of facial expressions and portraits can reduce the cost of supplying multimedia contents in the entertainment field. What used to require manual work and unique senses and techniques can now be automatically performed, so that it can respond to demands that will increase in the future. .

【０１９１】また、実施の形態１及び２と同様に、リア
ルタイムに顔領域及び顔部品を抽出することにより、よ
り細かな機器の制御を行うことが可能となる。例えば、
顔部品の位置関係により顔の傾きを精度良く検出するこ
とが可能となる。顔の傾きの大きさによって、オーディ
オ装置の音量をコントロールしたり、照明の明るさを制
御したりすることが可能となる。あるいは、瞬きなどを
検出して両手が使えない状態、例えば、車の運転中にラ
ジオのチューニングを変えるということが可能となる。Further, similarly to the first and second embodiments, by extracting the face area and the face parts in real time, it becomes possible to perform more detailed control of the device. For example,
The inclination of the face can be accurately detected based on the positional relationship between the face parts. Depending on the magnitude of the inclination of the face, it is possible to control the volume of the audio device and control the brightness of the lighting. Alternatively, it is possible to detect blinking or the like and change the tuning of the radio while driving both hands, for example, while driving a car.

【０１９２】あるいは、顔部品を抽出し個人同定を行う
ことにより、個人の好みに合わせたエアコンの強弱やテ
レビ、オーディオなどの音量などの自動制御や、パソコ
ンのログインや電子ロックといったセキュリティ方面へ
の応用も可能となる。これにより、そのつど機器を調整
しなければならない手間やパスワード入力をする手間、
入力間違いなどが削減される。Alternatively, by extracting facial parts and identifying individuals, automatic control of the strength of the air conditioner, the volume of televisions and audios, etc. according to personal preference, and security measures such as personal computer login and electronic lock can be performed. Application is also possible. This makes it necessary to adjust the device each time, to enter the password,
Input mistakes are reduced.

【０１９３】なお、上記実施の形態１ないし３では、領
域確率計算装置５により一旦領域確率画像を生成し、こ
の領域確率画像を用いて画像エネルギーを計算している
が、これは、処理が簡単になると共に処理が高速化でき
るという効果がある。しかしながら、画像エネルギーを
計算する際に、格子点上にあるピクセルの色から各領域
の確からしさをそのつど計算することによっても画像エ
ネルギーを計算することができる。この場合は、領域確
率画像を生成しなくてもよいため必要なメモリを減らす
ことができるというメリットがある反面、格子点上のピ
クセルの領域の確からしさをそのつど計算する必要があ
るので処理速度が低下する。In the first to third embodiments, a region probability image is once generated by the region probability calculation device 5 and the image energy is calculated using this region probability image. And the processing can be sped up. However, when calculating the image energy, the image energy can also be calculated by calculating the likelihood of each region from the color of the pixel on the grid point in each case. In this case, there is an advantage that it is not necessary to generate a region probability image, so that the required memory can be reduced, but on the other hand, it is necessary to calculate the certainty of the region of the pixel on the grid point each time, so the processing speed is increased. Decrease.

【０１９４】また、上記実施の形態の網のエネルギーは
網の形が顔領域及び顔部品を囲むようになったときにエ
ネルギーが最も小さくなるように定義されているが、網
の形が顔領域及び顔部品を囲むようになったときにエネ
ルギーが最も大きくなるように定義しても同様の効果が
得られる。Further, the energy of the net in the above embodiment is defined so that the energy becomes minimum when the shape of the net surrounds the face area and the face parts. The same effect can be obtained even if the energy is defined so as to be maximized when the face part is surrounded.

【０１９５】また、本実施の形態の網の形状は放射状と
しているが、エネルギーの設定次第で格子状など他の形
状とすることもできる。即ち、点と点との接続関係の情
報があればどのような形状とすることもできる。ただ
し、本実施の形態のように楕円に近い形状を持つ顔、口
唇、目領域などを抽出する場合にはエネルギー設定が行
いやすいので放射状が好ましい。Although the shape of the net in this embodiment is radial, other shapes such as a grid may be used depending on the setting of energy. That is, any shape can be used as long as there is information on the connection relationship between points. However, in the case of extracting a face, a lip, an eye region or the like having a shape close to an ellipse as in the present embodiment, a radial shape is preferable because energy can be easily set.

【０１９６】また、本実施の形態では、格子点を移動さ
せる際に最外郭格子点から移動させているが、これに限
られることはなく、どの格子点から移動させてもよい。
ただし、すべての格子点が１単位移動するまで、同じ格
子点が二重に動かないようにする必要がある。In the present embodiment, the lattice points are moved from the outermost lattice point when moving the lattice points. However, the present invention is not limited to this, and the lattice points may be moved from any lattice point.
However, it is necessary to prevent the same grid point from moving twice until all the grid points have moved by one unit.

【０１９７】また、本実施の形態では格子点の移動を８
近傍としているが、これに限られることはなく、例えば
４近傍としてもよい。さらに、隣接するピクセルではな
く、離れたピクセルに移動してもよいが、この場合は、
抽出したい領域の境界部分を飛び越えてしまわない程度
の移動距離にする必要がある。In this embodiment, the movement of the lattice point is 8
Although the vicinity is set, it is not limited to this, and may be, for example, four. Furthermore, you may move to a distant pixel instead of an adjacent pixel, in which case,
It is necessary to set the moving distance so as not to jump over the boundary of the region to be extracted.

【０１９８】なお、実施の形態２及び３における内部エ
ネルギーには、楕円を想定することにより得られる形状
エネルギーが設定されているが、抽出しようとする物の
形状がわからない場合には設定しなくともよい。ただ
し、本実施の形態のように、抽出対象の形状がわかって
いる場合は、形状エネルギーを設定した方が精度があが
るので望ましい。Although the internal energy in Embodiments 2 and 3 is set to the shape energy obtained by assuming an ellipse, it is not necessary to set the internal energy if the shape of the object to be extracted is not known. Good. However, when the shape of the extraction target is known as in the present embodiment, it is desirable to set the shape energy because the accuracy is improved.

【０１９９】また、実施の形態２における移動エネルギ
ーについても同様に設定しなくてもよいが、設定したほ
うが精度が良くなるので望ましい。Further, the transfer energy in the second embodiment does not need to be set in the same manner, but it is preferable to set the transfer energy because the accuracy is improved.

【０２００】〔実施の形態４〕本発明のその他の実施の
形態について図３９ないし図４３に基づいて説明すれ
ば、以下の通りである。なお説明の便宜上、前記の実施
の形態の図面に示した部材と同一の部材には同一の符号
を付記し、その説明を省略する。[Fourth Embodiment] The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, the same members as those shown in the drawings of the above-described embodiment are denoted by the same reference numerals, and description thereof will be omitted.

【０２０１】本実施の形態に係る方向検出装置は、実施
の形態３の領域抽出装置を利用すると共に２つの撮像装
置を用いて、各撮像装置に映った人物画像よりそれぞれ
顔部品を抽出した後、各画像に映っている顔部品の位置
のずれを調べることにより、頭部の３次元方向を検出す
るものである。The direction detecting apparatus according to the present embodiment uses the area extracting apparatus according to the third embodiment and extracts face parts from a human image reflected on each image capturing apparatus using two image capturing apparatuses. The three-dimensional direction of the head is detected by examining the displacement of the position of the face part shown in each image.

【０２０２】図３９に示すように、機器４３に取り付け
られた撮像装置（撮像手段）４２Ｌ・４２Ｒが利用者の
姿を捉え、そのデジタル画像を領域抽出装置４１に送
る。領域抽出装置４１は、それぞれの画像から利用者の
顔領域を抽出し、その情報を頭部方向検出装置４５に送
る。頭部方向検出装置４５は２つの画像から顔部品の３
次元情報検出することにより頭部の方向を検出し、その
情報を制御装置４４に送る。制御装置４４は機器４３に
接続されており、頭部方向検出装置４５から送られてき
た情報をもとに機器４３を制御する。As shown in FIG. 39, the imaging devices (imaging means) 42L and 42R attached to the device 43 capture the user's figure and send the digital image to the region extraction device 41. The region extraction device 41 extracts a user's face region from each image, and sends the information to the head direction detection device 45. The head direction detecting device 45 detects the face part 3 from the two images.
The direction of the head is detected by detecting the dimensional information, and the information is sent to the control device 44. The control device 44 is connected to the device 43, and controls the device 43 based on information sent from the head direction detection device 45.

【０２０３】上記２つの撮像装置４２Ｌ・４２Ｒを用い
て頭部方向を検出する方法は以下の通りである。まず、
撮像装置４２Ｌ・４２Ｒの配置はエピポーラ拘束に従
う。エピポーラ拘束とは、図４０に示すように、一方の
撮像装置４２Ｌのデジタル画像Ｌに映っているものが、
他方の撮像装置４２Ｒのデジタル画像Ｒにて、走査線に
平行に映っているような制約のことである。即ち、撮像
装置を２台設置する際には、同様の特性を持つ撮像装置
を光軸が同一平面上にのるように設置しなければならな
い。The method of detecting the head direction using the two image pickup devices 42L and 42R is as follows. First,
The arrangement of the imaging devices 42L and 42R complies with the epipolar constraint. As shown in FIG. 40, the epipolar constraint refers to a condition in the digital image L of one of the imaging devices 42L,
This is a restriction such that the digital image R of the other imaging device 42R appears parallel to the scanning line. That is, when two imaging devices are installed, the imaging devices having similar characteristics must be installed such that the optical axes are on the same plane.

【０２０４】まず、図４１に示すように、２つの人物画
像である画像Ｌ・Ｒからそれぞれ口唇領域、左右の目領
域を抽出する。次に、画像Ｌにおける３つの顔部品が画
像Ｒにおいてどの位置にあるかを調べる。画像Ｌにおけ
る口唇位置の、画像Ｒにおける口唇位置の水平方向のず
れをｄｍ、画像Ｌにおける左目位置の、画像Ｒにおける
左目位置の水平方向のずれをｄｌｅ、画像Ｌにおける右
目位置の、画像Ｒにおける右目位置の水平方向のずれを
ｄｒｅとする。これらの２つの画像Ｌ・Ｒでの顔部品の
ずれの逆数が、画像面からの顔部品の相対的な距離に相
当する。First, as shown in FIG. 41, a lip region and left and right eye regions are respectively extracted from images L and R as two person images. Next, it is checked where the three face parts in the image L are located in the image R. The horizontal displacement of the lip position in the image L from the lip position in the image R is dm, the horizontal displacement of the left eye position in the image L is dle, and the horizontal displacement of the left eye position in the image R is dle, and the right eye position in the image L is the image R. Let dre be the horizontal displacement of the right eye position. The reciprocal of the shift of the face part between these two images L and R corresponds to the relative distance of the face part from the image plane.

【０２０５】そこで、図４２に示すように、画像面の各
顔部品から画像面奥の方へ垂直に向かってそれぞれ１／
ｄｍ、１／ｄｌｅ、１／ｄｒｅだけ離れた距離にある点
をそれぞれｐｍ、ｐｌｅ、ｐｒｅとすると、頭部の３次
元向きはｐｌｅとｐｒｅとを結ぶ線分の中点と、ｐｍと
を通る直線でｐｍから中点に向かう方向となる。Therefore, as shown in FIG. 42, each of the face parts on the image plane is vertically shifted 1 /
Assuming that points at distances dm, 1 / dle, and 1 / dre are pm, ple, and pre, respectively, the three-dimensional direction of the head passes through the midpoint of the line segment connecting ple and pre and pm. It is a straight line from pm to the middle point.

【０２０６】上記方向検出装置のプロセスを図４３のフ
ローチャートに基づいて説明する。まず、２つの撮像装
置４２Ｌ・４２Ｒにより利用者の画像を撮影する（Ｓ６
１）。そして、領域抽出装置４１により各画像から口唇
・左右目領域を抽出する（Ｓ６２）。２つの画像間での
口唇・左目・右目それぞれの顔部品の水平位置のずれを
算出する（Ｓ６３）。各顔部品の画像面からの距離を計
算する（Ｓ６４）。そして、頭部の３次元方向を計算す
る（Ｓ６５）。このプロセスを繰り返すことにより、撮
像装置前の利用者の頭部方向をリアルタイムに検出する
ことが可能となる。The process of the direction detecting device will be described with reference to the flowchart of FIG. First, an image of the user is captured by the two imaging devices 42L and 42R (S6).
1). Then, the lip / left and right eye regions are extracted from each image by the region extracting device 41 (S62). The horizontal position shift of the face parts of the lips, left eye, and right eye between the two images is calculated (S63). The distance of each face part from the image plane is calculated (S64). Then, the three-dimensional direction of the head is calculated (S65). By repeating this process, it is possible to detect the head direction of the user in front of the imaging device in real time.

【０２０７】即ち、２つの撮像装置４２Ｌ・４２Ｒによ
り顔画像を同時に撮影し、それぞれの画像の顔領域及び
顔部品を抽出することにより、それぞれの画像の視差に
より口唇、左目、右目、それぞれの３次元空間上の３点
が決まるため、頭部がどの方向を向いているかを検出す
ることができる。That is, the face images are simultaneously photographed by the two image pickup devices 42L and 42R, and the face region and the face parts of each image are extracted, so that the lips, the left eye, the right eye, and the three Since three points in the dimensional space are determined, it is possible to detect which direction the head is facing.

【０２０８】これにより、専用の装置を頭部に装着しな
ければ頭部の方向を検出できなかった従来のものと比較
して、本方向検出装置は専用のハードウェアを用いなく
てもよいので、低コストで製品が実現できると共に装着
する際の煩わしさなどを除去することができる。また、
専用のハードウェアを用いなくても済むので応用範囲が
広がる。As a result, the direction detecting device does not need to use dedicated hardware, as compared with the conventional device in which the direction of the head cannot be detected unless the dedicated device is mounted on the head. In addition, it is possible to realize a product at low cost, and it is possible to eliminate troublesomeness at the time of mounting. Also,
This eliminates the need for dedicated hardware, thus expanding the range of applications.

【０２０９】また、顔の方向を検知して利用者が自分の
方を向いているときだけ作動するような仕組みを提供し
たり、例えば、顔の向きでチャンネルや音量が変えられ
るようなテレビやオーディオ製品が可能となる。[0209] Further, a mechanism that detects the direction of the face and operates only when the user is facing the user is provided. Audio products become possible.

【０２１０】また、バーチャルリアリティに応用するこ
とも可能である。即ち、撮像装置の前の人物の頭の位置
・方向に合わせて、コンピュータ上で表現された仮想世
界での仮想的な利用者の位置・方向を決めることがで
き、実際の利用者の頭部の動きと、仮想世界での利用者
の頭部の動きを、特殊なヘッドギアなどを用いずに同調
させることが可能となる。また、３次元ＣＡＤにおい
て、特定の３次元位置の点を指定したり、複数の属性を
持つものを同時に調節、例えば、テレビの色合い、コン
トラスト、明るさなどを同時に調整することも可能とな
る。[0210] It is also possible to apply to virtual reality. That is, the position and direction of the virtual user in the virtual world represented on the computer can be determined according to the position and direction of the head of the person in front of the imaging device, and the actual user's head can be determined. And the movement of the user's head in the virtual world can be synchronized without using special headgear or the like. Further, in the three-dimensional CAD, it is possible to specify a point at a specific three-dimensional position, and to simultaneously adjust one having a plurality of attributes, for example, to simultaneously adjust the color, contrast, brightness and the like of a television.

【０２１１】なお、撮像装置は２つに限られることはな
く、３つ以上とすることも可能である。この場合、撮像
装置の配置はエピポーラ拘束に従う必要はない。Note that the number of imaging devices is not limited to two, but may be three or more. In this case, the arrangement of the imaging device does not need to follow the epipolar constraint.

【０２１２】〔実施の形態５〕本発明のその他の実施の
形態について図４４ないし図４６に基づいて説明すれ
ば、以下の通りである。なお説明の便宜上、前記の実施
の形態の図面に示した部材と同一の部材には同一の符号
を付記し、その説明を省略する。[Fifth Embodiment] The following will describe another embodiment of the present invention with reference to FIGS. 44 to 46. For convenience of explanation, the same members as those shown in the drawings of the above-described embodiment are denoted by the same reference numerals, and description thereof will be omitted.

【０２１３】本実施の形態にかかる方向検出装置は、図
４４に示すように、実施の形態４の構成に加えて、頭部
方向検出装置４５により検出された頭部方向の情報が入
力される視線方向検出装置４６を有する。As shown in FIG. 44, in addition to the configuration of the fourth embodiment, information on the head direction detected by head direction detecting device 45 is input to the direction detecting device according to the present embodiment, as shown in FIG. A gaze direction detecting device 46 is provided.

【０２１４】視線方向を検出する方法は以下の通りであ
る。まず、実験として、視線の検出精度を設定する。そ
の精度から検出しようとしている方向の数を決め、例え
ば図４５（ａ）〜（ｆ）に示すように、その数だけ被験
者にその方向を注視してもらう。そして、その注視して
いる時の目の画像を撮影する。この目の画像は、人間が
ある方向を見た時に、瞳がどの位置にくるかを示してい
る視線データベースとなる。領域抽出装置によって検出
された目の画像を、この視線データベースと比較するこ
とにより、利用者の視線の方向を推定することができ
る。そして、実施の形態４の頭部方向検出装置４５で検
出された頭部の３次元方向の情報と、この視線方向の情
報とを組み合わせることにより、真の視線方向を検出す
ることが可能となる。The method of detecting the line-of-sight direction is as follows. First, as an experiment, the line-of-sight detection accuracy is set. The number of directions to be detected is determined from the accuracy, and the subject is gazed at that direction by the number as shown in FIGS. 45 (a) to (f), for example. Then, an image of the eye during the gaze is taken. The image of the eyes serves as a gaze database indicating the position of the pupil when a human looks in a certain direction. The direction of the user's line of sight can be estimated by comparing the image of the eye detected by the region extraction device with the line of sight database. Then, by combining the information on the three-dimensional direction of the head detected by the head direction detection device 45 of the fourth embodiment with the information on the gaze direction, it is possible to detect the true gaze direction. .

【０２１５】上記方向検出装置のプロセスを図４６のフ
ローチャートに基づいて説明する。まず、頭部検出装置
により、頭部の３次元方向を検出する（Ｓ７１）。この
時点で目の領域はすでに検出されているため、抽出され
た目の領域を視線データベースにある画像にあわせてサ
イズを変更する。そして、変更された目の領域の画像
と、視線データベースの画像との間でマッチングをとる
（Ｓ７２）。このマッチングにはデンプレートマッチン
グなどの手法を用いる。そして、２つの画像の差を取
り、差が最も小さくなるものを選択することにより、視
線方向を推定する（Ｓ７３）。すでに検出されている頭
部の３次元方向と、推定された視線方向とを合わせるこ
とにより、真の視線方向を検出する（Ｓ７４）。The process of the direction detecting device will be described with reference to the flowchart of FIG. First, the head detecting device detects the three-dimensional direction of the head (S71). At this point, since the eye region has already been detected, the size of the extracted eye region is changed according to the image in the eye-gaze database. Then, matching is performed between the image of the changed eye region and the image of the eye gaze database (S72). For this matching, a technique such as denplate matching is used. Then, the line-of-sight direction is estimated by taking the difference between the two images and selecting the one with the smallest difference (S73). The true gaze direction is detected by matching the three-dimensional direction of the head already detected with the estimated gaze direction (S74).

【０２１６】これにより、本方向検出装置は、特別な装
置を用いずに視線方向を検出することができる。この結
果、ハードウェアなどを装着する煩わしさを除去するこ
とができると共に、装置の低コスト化が可能となる。ま
た、情報機器などのポインティングデバイスの代用、例
えば、コンピュータディスプレイの任意の一点をマウス
やペン、あるいは、現存するアイトラッカーのような大
がかりな装置を使用することなく指定することができ、
さまざまな電子機器の制御を行うことが可能となる。ま
た、キーボードを操作しながら注目点を目で追跡したり
することができ、機器の作業の効率をあげることができ
る。さらに、両手が塞がった状態でも他の操作を行うこ
とが可能となる。また、体の不自由な人への適用も考え
られる。Thus, the present direction detecting device can detect the direction of the line of sight without using any special device. As a result, it is possible to eliminate the trouble of mounting hardware and the like, and to reduce the cost of the apparatus. In addition, instead of a pointing device such as information equipment, for example, any point of the computer display can be specified without using a mouse or pen, or a large device such as an existing eye tracker,
Various electronic devices can be controlled. In addition, it is possible to visually track a point of interest while operating the keyboard, thereby improving the efficiency of device operation. Further, other operations can be performed even when both hands are closed. It can also be applied to people with physical disabilities.

【０２１７】[0219]

【発明の効果】以上のように、本発明の請求項１記載の
領域抽出装置は、対象物を撮像してカラーデジタル画像
とする撮像手段と、上記カラーデジタル画像において均
一に同様の色を持つ所望領域を示す確率密度関数を予め
記憶しておく確率密度関数格納手段と、上記所望領域を
構成する各画素の色に基づいて上記確率密度関数を計算
することにより、各画素の色の確からしさを算出する領
域確率計算手段と、上記カラーデジタル画像上の任意の
３つ以上の点を接続することにより形成され画像全体を
覆うように設けられる仮想的な網の情報を記憶しておく
網情報格納手段と、上記点の位置関係に基づいて定義さ
れる網の内部エネルギーと、点が位置する画素の色の確
からしさに基づいて定義される網の画像エネルギーとを
それぞれ計算するエネルギー計算手段とを有し、１つの
点を移動させたときの網の内部エネルギーと画像エネル
ギーとの合計値を、上記点を移動させる前の合計値と比
較し、合計値が変化するときは網が収縮する方向へ点の
移動を行い、合計値が変化しなくなった時点で点の移動
を停止する構成である。As described above, the region extracting apparatus according to the first aspect of the present invention has an image pickup means for picking up an object to form a color digital image, and the color digital image has the same color uniformly. Probability density function storage means for storing in advance a probability density function indicating a desired area, and calculating the probability density function based on the color of each pixel constituting the desired area, thereby ascertaining the color accuracy of each pixel. And network information for storing virtual network information formed by connecting any three or more points on the color digital image and provided to cover the entire image. The storage means calculates the internal energy of the net defined based on the positional relationship between the points and the image energy of the net defined based on the likelihood of the color of the pixel where the point is located. Energy calculation means, the total value of the internal energy of the net and the image energy when one point is moved is compared with the total value before moving the point, and when the total value changes, In this configuration, the point is moved in the direction in which the net contracts, and the movement of the point is stopped when the total value no longer changes.

【０２１８】これにより、本領域抽出装置は照明条件の
変化に強い色の確からしさに基づいて所望領域を抽出す
ることができるので、撮影する場所、時間、補助照明の
有無、方向などに左右されることがなくなる。また、網
はどのような形状にもなり得るので、入力画像の大き
さ、及び画像中の顔位置や大きさなどに制限されること
がなくなる。これらの結果、撮影の自由度を増大させる
ことができるという効果を奏する。Thus, the present region extracting apparatus can extract a desired region based on the likelihood of a color that is resistant to a change in lighting conditions, and is thus affected by the location, time, presence / absence of auxiliary lighting, direction, etc. No more. Further, since the net can have any shape, the size of the input image and the position and size of the face in the image are not limited. As a result, there is an effect that the degree of freedom of photographing can be increased.

【０２１９】請求項２記載の領域抽出装置は、請求項１
記載の構成に加えて、上記対象物は人物であり、上記所
望領域は人物の顔の領域である構成である。[0219] The region extracting apparatus according to the second aspect is the first aspect.
In addition to the configuration described above, the object is a person, and the desired area is an area of a face of the person.

【０２２０】これにより、顔領域を抽出することができ
るので、さまざまな家電製品などの機器の制御に利用す
ることができる。即ち、利用者の位置による機器の制御
を行うことが可能となるという効果を奏する。As a result, the face region can be extracted, and can be used for controlling various appliances such as home appliances. That is, there is an effect that it is possible to control the device based on the position of the user.

【０２２１】請求項３記載の領域抽出装置は、請求項１
または２記載の構成に加えて、確率密度関数格納手段に
は上記所望領域と異なる色を持つ部分領域を示す確率密
度関数が予め記憶されると共に、網情報格納手段には上
記所望領域を囲んで停止した網の情報が部分領域を抽出
するための部分領域抽出用網の初期値として記憶され、
上記所望領域内に部分領域が１つある場合に、領域確率
計算手段が上記部分領域を構成する各画素の色に基づい
て上記確率密度関数を計算することにより各画素の色の
確からしさを算出して、エネルギー計算手段が部分領域
抽出用網の内部エネルギー及び画像エネルギーと、部分
領域に収束されるように定義された移動エネルギーとを
それぞれ計算することにより、部分領域を抽出する構成
である。The region extracting apparatus according to the third aspect is the first aspect.
Or, in addition to the configuration described in 2, the probability density function indicating a partial area having a color different from the desired area is stored in advance in the probability density function storage means, and the desired area is surrounded by the network information storage means. Information of the stopped network is stored as an initial value of the partial area extraction network for extracting the partial area,
When there is one partial area in the desired area, the area probability calculation means calculates the probability density function based on the color of each pixel constituting the partial area, thereby calculating the likelihood of the color of each pixel. Then, the energy calculation means calculates the internal energy and the image energy of the partial area extraction network and the moving energy defined so as to converge on the partial area, thereby extracting the partial area.

【０２２２】これにより、最初の領域抽出に用いた網
を、さらに領域内の別の領域の抽出に用いることができ
るので、領域毎に網の設定を変える必要がないという効
果を奏する。Thus, since the net used for the first area extraction can be used for extracting another area in the area, it is not necessary to change the setting of the net for each area.

【０２２３】請求項４記載の領域抽出装置は、請求項３
記載の構成に加えて、部分領域抽出用網の外周を形成す
る点からフレーム重心を求めると共に、画像の明るさを
重みとした部分領域抽出用網が囲む領域の重心を求める
重心計算手段を有し、上記移動エネルギーを、上記フレ
ーム重心が領域の重心の方向に引き寄せられるように定
義する構成である。The region extracting apparatus according to the fourth aspect is the third aspect.
In addition to the configuration described above, there is provided a center of gravity calculating means for obtaining a frame center of gravity from a point forming the outer periphery of the partial area extraction network, and obtaining a center of gravity of an area surrounded by the partial area extraction network with the brightness of the image as a weight. The moving energy is defined such that the frame center of gravity is drawn in the direction of the center of gravity of the region.

【０２２４】これにより、フレーム重心が領域重心の方
向に引き寄せられるように移動エネルギーが設定されて
いるので、部分領域に似た色を持つ領域があったとして
も、その領域に引き寄せられることなく、部分領域を抽
出することができる。この結果、処理速度が速く、高精
度の部分領域の抽出が可能となるという効果を奏する。Since the moving energy is set so that the center of gravity of the frame is drawn in the direction of the center of gravity of the region, even if there is a region having a color similar to the partial region, the region is not drawn to that region. Partial regions can be extracted. As a result, there is an effect that the processing speed is high, and a high-precision partial region can be extracted.

【０２２５】請求項５記載の領域抽出装置は、請求項２
記載の構成に加えて、網の外周を形成する点からフレー
ム重心を求めると共に、画像の明るさを重みとして網が
囲む領域の重心を求める重心計算手段を有し、確率密度
関数格納手段には口唇領域を示す確率密度関数が予め記
憶されると共に、網情報格納手段には上記顔領域を囲ん
で停止した網の情報が口唇領域及び左右の目領域を抽出
するための網の初期値として記憶され、領域確率計算手
段が、上記口唇領域を構成する各画素の色に基づいて上
記確率密度関数を計算することにより各画素の色の確か
らしさを算出すると共に、上記左右の目領域を構成する
各画素の色に基づいて顔領域あるいは口唇領域の確率密
度関数を計算することにより各画素の色の確からしさを
算出して、エネルギー計算手段が、各領域の内部エネル
ギー及び画像エネルギーと、各々のフレーム重心が各々
の領域重心の方向へ引き寄せられ、かつ、３つの領域の
重心の位置関係が予め測定され記憶された口唇及び左右
の目の位置関係と等しくなるように定義されたバランス
エネルギーとを計算することにより、口唇及び左右の目
領域を抽出する構成である。According to the fifth aspect of the present invention, there is provided an area extracting apparatus.
In addition to the configuration described above, a center of gravity calculating means for obtaining a frame center of gravity from a point forming the outer periphery of the network, and obtaining a center of gravity of an area surrounded by the network using the brightness of the image as a weight, the probability density function storage means The probability density function indicating the lip region is stored in advance, and the information of the net stopped surrounding the face region is stored in the net information storage means as initial values of the net for extracting the lip region and the left and right eye regions. The region probability calculation means calculates the probability of each pixel by calculating the probability density function based on the color of each pixel constituting the lip region, and configures the left and right eye regions. The likelihood of the color of each pixel is calculated by calculating the probability density function of the face area or the lip area based on the color of each pixel, and the energy calculation means calculates the internal energy of each area and the image energy. And the center of gravity of each frame is drawn in the direction of the center of gravity of each region, and the positional relationship of the centers of gravity of the three regions is defined to be equal to the previously measured and stored positional relationship of the lips and left and right eyes. By calculating the balance energy, the lips and the left and right eye regions are extracted.

【０２２６】これにより、顔領域抽出用網を、口唇領域
抽出用網及び左右の目領域抽出用網とすることができる
ので、領域毎に網の設定を変える必要がない。また、顔
領域に対する顔部品の位置がわかるので、顔の回転角
（顔の傾き）を認識することができ、撮影時、及び処理
時の自由度が増大する。さらに、顔のおおよその方向を
推定することができるという効果を奏する。Thus, the face area extracting net can be the lip area extracting net and the left and right eye area extracting nets, so that it is not necessary to change the setting of the net for each area. In addition, since the position of the face part with respect to the face area is known, the rotation angle of the face (the inclination of the face) can be recognized, and the degree of freedom in photographing and processing increases. Further, there is an effect that the approximate direction of the face can be estimated.

【０２２７】請求項６記載の方向検出装置は、請求項５
記載の領域抽出装置を用いて、上記撮像手段が複数個設
けられ、複数の撮像手段から同時に得られる複数の画像
から口唇領域及び左右の目領域をそれぞれ抽出し、互い
に比較することにより口唇及び目の３次元位置情報を導
出する構成である。The direction detecting device according to claim 6 is the same as in claim 5.
A plurality of the above-mentioned image pickup means are provided using the region extraction device described above, and a lip area and a left and right eye area are respectively extracted from a plurality of images obtained simultaneously from the plurality of image pickup means, and the lip area and the eye area are compared with each other. Is derived.

【０２２８】これにより、従来のような専用のハードウ
ェアを用いなくとも、容易に頭部方向を検出することが
できる。この結果、ハードウェアなどを装着する煩わし
さを除去することができると共に、装置の低コスト化が
可能となるという効果を奏する。Thus, the head direction can be easily detected without using dedicated hardware as in the related art. As a result, it is possible to eliminate the trouble of mounting the hardware and the like, and it is possible to reduce the cost of the apparatus.

【０２２９】請求項７記載の方向検出装置は、請求項６
記載の構成に加えて、抽出された目の画像と、予め測定
して視線方向の情報が付加されて記憶された目の画像と
を比較することにより推定視線方向を算出し、さらに推
定視線方向と検出された頭部の向きとを比較することに
より真の視線方向を検出する構成である。The direction detecting device according to claim 7 is the same as in claim 6.
In addition to the configuration described above, the estimated eye direction is calculated by comparing the extracted eye image with an eye image that has been measured and stored with the eye direction information added thereto in advance, and the estimated eye direction is further calculated. And the direction of the detected head is compared to detect the true gaze direction.

【０２３０】これにより、従来のような専用のハードウ
ェアを用いなくとも、容易に視線方向を検出することが
できる。この結果、ハードウェアなどを装着する煩わし
さを除去することができると共に、装置の低コスト化が
可能となるという効果を奏する。As a result, the gaze direction can be easily detected without using dedicated hardware as in the related art. As a result, it is possible to eliminate the trouble of mounting the hardware and the like, and it is possible to reduce the cost of the apparatus.

[Brief description of the drawings]

【図１】本発明の実施の一形態にかかる領域抽出装置の
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of an area extraction device according to an embodiment of the present invention.

【図２】上記領域抽出装置における領域確率計算装置の
動作を示すフローチャートである。FIG. 2 is a flowchart showing an operation of a region probability calculation device in the region extraction device.

【図３】（ａ）は入力画像を示し、（ｂ）は入力画像か
ら顔の肌色部分と口唇部分とを取り出した状態を示す説
明図である。3A is an explanatory diagram illustrating an input image, and FIG. 3B is an explanatory diagram illustrating a state in which a skin color portion and a lip portion of a face are extracted from the input image.

【図４】顔領域の色相の色度数分布を示すグラフであ
る。FIG. 4 is a graph showing a chromaticity distribution of hues in a face area.

【図５】顔領域の彩度の色度数分布を示すグラフであ
る。FIG. 5 is a graph showing a chromaticity distribution of saturation of a face area.

【図６】顔領域を表す２次元の正規確率密度関数を示す
グラフである。FIG. 6 is a graph showing a two-dimensional normal probability density function representing a face region.

【図７】領域確率画像の生成の過程を表すフローチャー
トである。FIG. 7 is a flowchart illustrating a process of generating a region probability image.

【図８】入力装置に入力される画像を表し、ディスプレ
ー上に表示された中間調画像を示す図面代用写真であ
る。FIG. 8 is a drawing substitute photograph showing an image input to the input device and showing a halftone image displayed on a display.

【図９】図８に示された入力画像から生成された顔領域
確率画像を表し、ディスプレー上に表示された中間調画
像を示す図面代用写真である。FIG. 9 is a drawing substitute photograph showing a face region probability image generated from the input image shown in FIG. 8, and showing a halftone image displayed on a display;

【図１０】図８に示された入力画像から生成された口唇
領域確率画像を表し、ディスプレー上に表示された中間
調画像を示す図面代用写真である。FIG. 10 is a drawing substitute photograph showing a lip region probability image generated from the input image shown in FIG. 8, and showing a halftone image displayed on a display.

【図１１】網を形成する格子点を示す説明図である。FIG. 11 is an explanatory diagram showing grid points forming a net;

【図１２】網のモデルを用いて領域を抽出する様子を示
す説明図であり、（ａ）は入力画像、（ｂ）は網の初期
状態、（ｃ）は収束中の網の状態、（ｄ）は収束後の網
の状態をそれぞれ示す。FIGS. 12A and 12B are explanatory diagrams showing how a region is extracted using a network model. FIG. 12A is an input image, FIG. 12B is an initial state of the network, FIG. 12C is a state of a converging network, and FIG. d) shows the state of the network after convergence, respectively.

【図１３】網のモデルを用いて顔領域を抽出する様子を
示す説明図であり、（ａ）は入力画像、（ｂ）は網の初
期状態、（ｃ）は収束中の網の状態、（ｄ）は収束後の
網の状態をそれぞれ示す。13A and 13B are explanatory diagrams showing how a face region is extracted using a network model, where FIG. 13A is an input image, FIG. 13B is an initial state of the network, FIG. 13C is a state of a converging network, (D) shows the state of the network after convergence.

【図１４】網の形状を示す説明図である。FIG. 14 is an explanatory diagram showing the shape of a net.

【図１５】他の網の形状を示す説明図である。FIG. 15 is an explanatory diagram showing another net shape.

【図１６】（ａ）（ｂ）は、網の内部エネルギーを構成
する各エネルギーの定義を示す説明図である。FIGS. 16A and 16B are explanatory diagrams showing definitions of energies constituting an internal energy of a network.

【図１７】格子点の移動の仕方を示すフローチャートで
ある。FIG. 17 is a flowchart showing how to move a grid point.

【図１８】あるピクセルの８近傍を示す説明図である。FIG. 18 is an explanatory diagram showing eight neighborhoods of a certain pixel.

【図１９】上記領域抽出装置を機器に利用したときの構
成を示すブロック図である。FIG. 19 is a block diagram showing a configuration when the area extraction device is used in a device.

【図２０】本発明の他の実施の形態にかかる領域抽出装
置の構成を示すブロック図である。FIG. 20 is a block diagram showing a configuration of an area extracting device according to another embodiment of the present invention.

【図２１】口唇領域の色相の色度数分布を示すグラフで
ある。FIG. 21 is a graph showing a chromaticity distribution of hues in a lip region.

【図２２】口唇領域の彩度の色度数分布を示すグラフで
ある。FIG. 22 is a graph showing the chromaticity distribution of the saturation of the lip region.

【図２３】口唇領域を表す２次元の正規確率密度関数を
示すグラフである。FIG. 23 is a graph showing a two-dimensional normal probability density function representing a lip region.

【図２４】最外郭フレーム及び最外郭フレームの重心を
示す説明図である。FIG. 24 is an explanatory diagram showing the outermost frame and the center of gravity of the outermost frame.

【図２５】網が囲む領域の重心を示す説明図である。FIG. 25 is an explanatory diagram showing the center of gravity of a region surrounded by a net.

【図２６】形状エネルギーの定義に必要な各パラメータ
を示す説明図である。FIG. 26 is an explanatory diagram showing parameters required for defining shape energy.

【図２７】形状エネルギーの定義を示す説明図である。FIG. 27 is an explanatory diagram showing the definition of shape energy.

【図２８】口唇領域確率差分画像の求め方を示す説明図
である。FIG. 28 is an explanatory diagram showing how to obtain a lip region probability difference image.

【図２９】図９に示す顔領域確率画像と図１０に示す口
唇領域確率画像とにより算出された口唇領域確率差分画
像を表し、ディスプレー上に表示された中間調画像を示
す図面代用写真である。FIG. 29 is a drawing substitute photograph showing a lip region probability difference image calculated from the face region probability image shown in FIG. 9 and the lip region probability image shown in FIG. 10, and showing a halftone image displayed on the display. .

【図３０】口唇領域抽出用網が口唇領域を抽出する過程
を示すフローチャートである。FIG. 30 is a flowchart showing a process in which the lip region extraction net extracts a lip region.

【図３１】口唇領域抽出用網が口唇領域を抽出する過程
を示す模式図である。FIG. 31 is a schematic diagram illustrating a process of extracting a lip region by a lip region extraction net;

【図３２】本発明のその他の実施の形態にかかる領域抽
出装置の構成を示すブロック図である。FIG. 32 is a block diagram showing a configuration of an area extracting device according to another embodiment of the present invention.

【図３３】目領域抽出画像の求め方を示す説明図であ
る。FIG. 33 is an explanatory diagram showing a method of obtaining an eye area extracted image.

【図３４】顔部品の大きさ及び位置関係に関するデータ
を測定したときのデータ測定箇所を示す説明図である。FIG. 34 is an explanatory diagram showing data measurement points when data relating to the size and the positional relationship of the face part is measured.

【図３５】バランスエネルギーの定義に必要な各パラメ
ータを示す説明図である。FIG. 35 is an explanatory diagram showing parameters required for defining balance energy.

【図３６】（ａ）は口唇領域抽出用網、（ｂ）は左目領
域抽出用網、及び（ｃ）は右目領域抽出用網がそれぞれ
収縮する様子を示す説明図である。FIG. 36 (a) is an explanatory diagram showing a state where a lip region extracting network, FIG. 36 (b) is a left eye region extracting network, and FIG.

【図３７】口唇及び左右の目領域抽出用網が各領域を抽
出する過程を示すフローチャートである。FIG. 37 is a flowchart showing a process in which the lip and left and right eye region extraction nets extract each region.

【図３８】口唇及び左右の目領域抽出用網が各領域を抽
出する過程を示す模式図である。FIG. 38 is a schematic diagram showing a process in which the lip and the left and right eye region extraction nets extract each region.

【図３９】本発明のその他の実施の形態にかかる、領域
抽出装置を利用した方向検出装置の構成を示すブロック
図である。FIG. 39 is a block diagram showing a configuration of a direction detection device using an area extraction device according to another embodiment of the present invention.

【図４０】エピポーラ拘束の状態を示す説明図である。FIG. 40 is an explanatory diagram showing a state of epipolar constraint.

【図４１】２つの画像における顔部品のずれを算出する
過程を示す説明図である。FIG. 41 is an explanatory diagram showing a process of calculating a shift of a face part between two images.

【図４２】上記ずれの値から算出される顔部品の３次元
位置情報を示す説明図である。FIG. 42 is an explanatory diagram showing three-dimensional position information of a face part calculated from the value of the shift.

【図４３】上記方向検出装置の頭部検出動作を示すフロ
ーチャートである。FIG. 43 is a flowchart showing a head detecting operation of the direction detecting device.

【図４４】本発明のその他の実施の形態にかかる、領域
抽出装置を利用した方向検出装置の構成を示すブロック
図である。FIG. 44 is a block diagram showing a configuration of a direction detecting device using an area extracting device according to another embodiment of the present invention.

【図４５】（ａ）〜（ｆ）は、視線データベースの一例
を示す説明図である。FIGS. 45A to 45F are explanatory diagrams illustrating an example of a gaze database.

【図４６】上記方向検出装置の視線検出動作を示すフロ
ーチャートである。FIG. 46 is a flowchart showing a line-of-sight detection operation of the direction detection device.

【図４７】従来の投影による抽出手法を示すものであ
り、（ａ）は図８に示す入力画像を水平方向に微分した
状態を示す説明図、（ｂ）は（ａ）の画像を水平方向へ
投影したときのピクセル数をカウントしたグラフ、
（ｃ）は（ａ）の画像を垂直方向へ投影したときのピク
セル数をカウントしたグラフをそれぞれ示す。47 (a) and 47 (b) show a conventional extraction method using projection. FIG. 47 (a) is an explanatory diagram showing a state in which the input image shown in FIG. 8 is differentiated in the horizontal direction. Graph that counts the number of pixels when projected to
(C) shows a graph in which the number of pixels when the image of (a) is projected in the vertical direction is counted.

【図４８】上記水平方向の微分処理を示す説明図であ
る。FIG. 48 is an explanatory diagram showing the horizontal differentiation process.

【図４９】従来のテンプレートによる抽出方法に用いら
れる入力画像とテンプレートとを示す説明図である。FIG. 49 is an explanatory diagram showing an input image and a template used in a conventional template-based extraction method.

【図５０】５×５の大きさのテンプレートと入力画像と
の比較の状態を示す説明図である。FIG. 50 is an explanatory diagram showing a state of comparison between a template having a size of 5 × 5 and an input image.

【図５１】テンプレートと入力画像との比較の順序を示
す説明図である。FIG. 51 is an explanatory diagram showing the order of comparison between a template and an input image.

【図５２】従来のDeformable template による抽出方法
を用いて目を抽出する様子を示す説明図である。FIG. 52 is an explanatory diagram showing a state in which eyes are extracted by using a conventional extraction method using a deformable template.

【図５３】上記抽出方法におけるテンプレートの定義に
必要な各パラメータを示す説明図である。FIG. 53 is an explanatory diagram showing parameters required for defining a template in the extraction method.

【図５４】従来の頭部方向の検出方法を示す説明図であ
る。FIG. 54 is an explanatory diagram showing a conventional head direction detection method.

【図５５】従来の他の頭部方向の検出方法を示す説明図
である。FIG. 55 is an explanatory diagram showing another conventional head direction detection method.

[Explanation of symbols]

１入力装置（撮像手段）５領域確率計算装置（領域確率計算手段）６エネルギー計算装置（エネルギー計算手段）７確率密度関数格納部（確率密度関数格納手段）９顔領域抽出用網情報格納部（網情報格納手段）２１重心計算装置（重心計算手段）２２口唇領域抽出用網情報格納部（網情報格納手
段）２３左目領域抽出用網情報格納部（網情報格納手
段）２４右目領域抽出用網情報格納部（網情報格納手
段）Reference Signs List 1 input device (imaging means) 5 region probability calculation device (region probability calculation device) 6 energy calculation device (energy calculation device) 7 probability density function storage unit (probability density function storage unit) 9 face region extraction network information storage unit ( 21) Center of gravity calculation device (centroid calculation unit) 22 Network information storage unit for lip region extraction (net information storage unit) 23 Network information storage unit for left eye region extraction (net information storage unit) 24 Net for right eye region extraction Information storage (network information storage means)

───────────────────────────────────────────────────── フロントページの続き (56)参考文献坂上勝彦外１名，動的な網のモデルＡｃｔｉｖｅＮｅｔとその領域抽出への応用，テレビジョン学会誌，日本，社団法人テレビジョン学会，1991年10月20 日，第45巻、第10号，ｐｐ．1155−1163 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 7/00 - 7/60 G06T 1/00 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuing the front page (56) References Katsuhiko Sakagami, one other, Dynamic Network Model Active Net and Its Application to Region Extraction, Journal of the Institute of Television Engineers of Japan, The Institute of Television Engineers of Japan, 1991 October 20, Vol. 45, No. 10, pp. 146-64 1155-1163 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06T ⁷ /00-7/60 G06T 1/00 JICST file (JOIS)

Claims

(57) [Claims]

1. An image pickup means for picking up an image of an object to form a color digital image, and a probability density function storage means for storing in advance a probability density function indicating a desired area having a uniform color in the color digital image. And area probability calculation means for calculating the likelihood of the color of each pixel by calculating the probability density function based on the color of each pixel constituting the desired area; Network information storage means for storing information of a virtual network formed by connecting two or more points and provided so as to cover the entire image; and inside the network defined based on the positional relationship of the points. Energy calculating means for calculating the energy and the image energy of the net defined based on the likelihood of the color of the pixel at which the point is located, and moving one point The total value of the internal energy of the net and the image energy is compared with the total value before moving the point, and when the total value changes, the point is moved in a direction in which the net contracts, and the total value changes. An area extracting apparatus, wherein the movement of a point is stopped when the point no longer exists.

2. The image processing apparatus according to claim 1, wherein the object is a person, and the desired area is an area of a face of the person.

3. A probability density function indicating a partial area having a color different from that of the desired area is stored in advance in the probability density function storage means, and the network information storage means stores information of a network stopped surrounding the desired area. The information is stored as an initial value of a partial area extraction network for extracting a partial area, and when there is one partial area in the desired area, the area probability calculating means sets the color of each pixel constituting the partial area. Calculate the probability of each pixel by calculating the probability density function based on the above, and define the energy calculation means to converge on the internal energy and image energy of the partial area extraction network and the partial area. The region extracting apparatus according to claim 1, wherein a partial region is extracted by calculating each of the determined transfer energies.

4. A method for calculating a center of gravity of a frame from a point forming an outer periphery of a partial area extracting network, and a center of gravity calculating means for determining a center of gravity of an area surrounded by the partial area extracting network with weight of image brightness. 4. The region extracting apparatus according to claim 3, wherein the moving energy is defined such that the center of gravity of the frame is drawn toward the center of gravity of the region.

5. A method for calculating a center of gravity of a frame from a point forming an outer periphery of a network, and calculating a center of gravity of an area surrounded by the network using the brightness of an image as a weight. Is stored in advance, and the network information storage means stores information on the network stopped surrounding the face region as initial values of the network for extracting the lip region and the left and right eye regions, The area probability calculation means calculates the probability of each pixel by calculating the probability density function based on the color of each pixel forming the lip area, and calculates each pixel forming the left and right eye areas. Calculating the probability density function of the face area or the lip area based on the color of each pixel to calculate the likelihood of the color of each pixel; A balance defined such that the center of gravity of each frame is drawn in the direction of the center of gravity of each region, and the positional relationship of the centers of gravity of the three regions is equal to the previously measured and stored positional relationship of the lips and left and right eyes. 3. The region extracting apparatus according to claim 2, wherein the lip and the left and right eye regions are extracted by calculating energy.

6. A three-dimensional position of a lip and an eye by extracting a plurality of lip regions and left and right eye regions from a plurality of images simultaneously obtained from the plurality of imaging units. 6. A direction detecting device using the region extracting device according to claim 5, wherein information is derived.

7. An estimated line-of-sight direction is calculated by comparing the extracted eye image with an eye image which has been measured and added with information on the line-of-sight direction and is stored in advance. 7. The direction detecting device using the region extracting device according to claim 6, wherein a true line-of-sight direction is detected by comparing the detected direction of the head.