JP2000268161A

JP2000268161A - Real time expression detector

Info

Publication number: JP2000268161A
Application number: JP7219499A
Authority: JP
Inventors: Tatsumi Sakaguchi; 竜己坂口; Atsushi Otani; 淳大谷; Katsuhiro Takematsu; 克浩竹松
Original assignee: ATR Media Integration and Communication Research Laboratories
Current assignee: ATR Media Integration and Communication Research Laboratories
Priority date: 1999-03-17
Filing date: 1999-03-17
Publication date: 2000-09-29
Anticipated expiration: 2019-03-17
Also published as: JP3062181B1

Abstract

PROBLEM TO BE SOLVED: To reduce burdens on a user without contacting, to be solid relating to the rotation of a face and to mitigate the limitation of an operation. SOLUTION: The face image of a testee is picked up by a CCD camera and the CCD camera is controlled so as to match the centroid of a skin color area and the center of an input image in an image pickup control part 7. Also, in a feature point detection part 4, the positions of the eyes and the mouth are estimated from face image signals. The inclination angle of the face is estimated based on the estimated positions of both eyes, the face image inside a detection area is rotated for the angle and an expression estimation part 5 performs two-dimensional discrete cosine transformation to the rotated detection area and obtains the change of spatial frequency components from the time of non expression detection. In an expression reproducing part 6, by using a parameter learned by genetic algorithm beforehand, the obtained change data of the spatial frequency components are transformed to the deformation of a three-dimensional face model and an expression is reproduced.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は実時間表情検出装
置に関し、特に、距離を隔てた複数の人物間の仮想空間
を介したコミュニケーションを実現するために、リアル
タイムで顔の表情を検出するような実時間表情検出装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a real-time facial expression detection device, and more particularly to a real-time facial expression detection device for realizing communication between a plurality of persons at a distance through a virtual space. The present invention relates to a real-time expression detection device.

【０００２】[0002]

【従来の技術】コミュニケーション環境を通信会議に限
定した臨場感通信会議システムが既に提案されている。
このようなシステムでは、たとえば小型カメラを固定し
たヘルメットを人物が被ることにより、常に一定の位置
から相手方の人物の顔の表情を観測できるようにされて
いる。これにより、顔の自動追跡処理は不要となる。2. Description of the Related Art A realistic communication conference system in which a communication environment is limited to a communication conference has already been proposed.
In such a system, for example, when a person wears a helmet to which a small camera is fixed, the expression of the face of the other person can always be observed from a fixed position. This eliminates the need for automatic face tracking processing.

【０００３】一方、顔の特徴点を検出する技術として、
他にエッジ画像による特徴点検出や、２次元のテンプレ
ートによるマッチングなどを用いたものが提案されてい
る。On the other hand, as a technique for detecting a feature point of a face,
Others using feature point detection using an edge image, matching using a two-dimensional template, and the like have been proposed.

【０００４】[0004]

【発明が解決しようとする課題】上述の小型カメラを固
定したヘルメットを人物が被る方法では、ヘルメットを
装着するための煩わしさが残り、エッジ画像や２次元の
テンプレートを用いる方法では計算コストの面で問題が
あった。In the above-described method in which a person wears a helmet to which a small camera is fixed, the burden of attaching the helmet remains, and in the method using an edge image or a two-dimensional template, calculation costs are reduced. There was a problem.

【０００５】それゆえに、この発明の主たる目的は、非
接触で利用者への負担が少なくて済み、顔の回転に関し
て堅牢であり、動作の制限が緩和されるような実時間表
情検出装置を提供することである。[0005] Therefore, a main object of the present invention is to provide a real-time facial expression detecting device which is non-contact, requires less load on the user, is robust with respect to the rotation of the face, and alleviates the restriction on operation. It is to be.

【０００６】[0006]

【課題を解決するための手段】請求項１に係る発明は、
実時間で被験者の顔の表情を検出する実時間表情検出装
置であって、被験者の顔画像を撮像するための撮像手段
と、撮像手段出力の顔画像信号から肌色領域を抽出して
その重心を求め、入力画像の中心に顔の中心が一致する
ように撮像手段を制御する撮像制御手段と、撮像手段出
力の顔画像信号から眼と口の位置を検出する特徴点検出
手段と、推定された両眼の位置に基づいて顔の傾き角を
推定し、その角度だけ検出領域内の顔画像を回転させる
画像回転手段と、画像回転手段によって顔画像の回転し
た検出領域に対して２次元離散コサイン変換を行ない、
無表情検出時からの空間周波数成分の変化を求める表情
推定手段と、予め遺伝的アルゴリズムによって学習され
たパラメータを用いて、表情推定手段によって求められ
た空間周波数成分の変化データを３次元顔モデルの変形
に変換して表情を再現する表情再現手段とを備えて構成
される。The invention according to claim 1 is
What is claimed is: 1. A real-time facial expression detecting device for detecting a facial expression of a subject in real time, comprising: an imaging unit for capturing a facial image of the subject; An imaging control unit that controls the imaging unit so that the center of the face coincides with the center of the input image; and a feature point detection unit that detects the positions of the eyes and the mouth from the face image signal output from the imaging unit. Image rotation means for estimating the face inclination angle based on the positions of both eyes and rotating the face image in the detection area by the angle, and two-dimensional discrete cosine for the detection area in which the face image is rotated by the image rotation means Perform the conversion,
Using a facial expression estimating means for obtaining a change in the spatial frequency component from the time of detection of no expression and a parameter learned in advance by a genetic algorithm, the spatial frequency component change data obtained by the facial expression estimating means is converted into a three-dimensional face model. Facial expression reproducing means for converting the expression into a deformation and reproducing the expression.

【０００７】請求項２に係る発明では、請求項１の特徴
点検出手段は、前処理として撮像手段からの画像信号か
ら肌色とそれ以外の２値画像を作成し、その画像の重心
位置からサーチして画素値の変化する要素を見出して輪
郭線を追跡し、輪郭線の内側領域の重心を求めて眼と口
の位置を求める。According to a second aspect of the present invention, the feature point detecting means of the first aspect creates a skin color and other binary images from the image signal from the imaging means as pre-processing, and searches from the position of the center of gravity of the image. Then, the element whose pixel value changes is found, the contour is traced, and the center of gravity of the area inside the contour is found to find the positions of the eyes and the mouth.

【０００８】[0008]

【発明の実施の形態】図１はこの発明の一実施形態を実
現するための装置のブロック図である。図１において、
撮像手段としてのＣＣＤカメラ１によって被験者の顔画
像が撮像され、その撮像出力の画像信号が画像処理装置
２に与えられ、この発明による顔画像の実時間表情が検
出され、表示部３に表示される。FIG. 1 is a block diagram of an apparatus for implementing an embodiment of the present invention. In FIG.
A face image of a subject is imaged by a CCD camera 1 as an imaging means, an image signal of the imaged output is given to an image processing device 2, and a real-time expression of the face image according to the present invention is detected and displayed on a display unit 3. You.

【０００９】図２は図１に示した画像処理装置による処
理過程を示す図であり、図３は図２の特徴点検出部のよ
り具体的な処理手順を示すフローチャートであり、図４
は検出領域内の画像を回転する方法を説明するための図
である。FIG. 2 is a diagram showing a processing procedure by the image processing apparatus shown in FIG. 1, and FIG. 3 is a flowchart showing a more specific processing procedure of the feature point detecting section in FIG.
FIG. 4 is a diagram for explaining a method of rotating an image in a detection area.

【００１０】図１に示したＣＣＤカメラ１で撮像された
顔の画像信号は画像処理装置２に与えられる。画像処理
装置２は図２に示すように、特徴点検出部４と表情推定
部５と表情再現部６と撮像制御部７としての機能を有し
ている。これらの機能はソフトウェア処理によって実現
される。An image signal of a face imaged by the CCD camera 1 shown in FIG. As shown in FIG. 2, the image processing device 2 has functions as a feature point detecting unit 4, a facial expression estimating unit 5, a facial expression reproducing unit 6, and an imaging control unit 7. These functions are realized by software processing.

【００１１】撮像制御部７は予め学習された被験者の肌
の色情報によって入力画像を２値変換し、その２値画像
を用いて顔位置を特定し、入力画像の中心が顔の中心に
合致するようにＣＣＤカメラ１をフィードバック制御す
る。The imaging control unit 7 performs a binary conversion on the input image based on the skin color information of the subject which has been learned in advance, specifies the face position using the binary image, and matches the center of the input image with the center of the face. The feedback control of the CCD camera 1 is performed in such a manner.

【００１２】特徴点検出部４は顔画像信号の輝度情報に
よる１次元テンプレートマッチングにより眼の位置を特
定し、両眼の位置から口の位置を推定する。その後、予
め学習された被験者の肌の色情報によって入力画像を２
値変換し、その２値画像を用いて顔位置を特定し、入力
画像の中心に顔の中心が合致するようにＣＣＤカメラ１
をフィードバック制御する。また、特徴点検出部４は推
定された両眼の位置に基づいて顔の傾き角を推定し、そ
の角度だけ検出領域内の顔画像を回転させる。The feature point detecting section 4 specifies the position of the eye by one-dimensional template matching based on the luminance information of the face image signal, and estimates the position of the mouth from the positions of both eyes. Then, the input image is changed to 2 based on the skin color information of the subject that has been learned in advance.
Value conversion, the face position is specified using the binary image, and the CCD camera 1 is moved so that the center of the face matches the center of the input image.
Feedback control. In addition, the feature point detection unit 4 estimates the inclination angle of the face based on the estimated positions of the eyes, and rotates the face image in the detection area by the angle.

【００１３】表情推定部５は表情変化を捉えるために重
要な領域に対して、ＤＣＴ（２次元離散コサイン）変換
を行ない、無表情検出時からの空間周波数成分変化を求
める。そして、表情再現部６では、予め遺伝的アルゴリ
ズムによって学習されたパラメータを用いて、表情推定
部５によって求められた空間周波数成分の変化データを
３次元顔モデルの変形に変換して、表示部３で表情を再
現する。The facial expression estimating unit 5 performs a DCT (two-dimensional discrete cosine) transform on a region important for capturing a facial expression change, and obtains a spatial frequency component change from the time when no facial expression is detected. The facial expression reproducing unit 6 converts the change data of the spatial frequency component obtained by the facial expression estimating unit 5 into a deformation of the three-dimensional face model using the parameters learned in advance by the genetic algorithm, To reproduce the expression.

【００１４】さらに、特徴点検出部４と表情推定部５と
表情再現部６の具体的な動作について説明する。まず、
特徴点検出部４は、表情を検出するためには特徴領域を
なるべく多いピクセル数で捉えた方がより正確に検出を
行なうことができる。しかし、顔を画面全体に表示させ
ると被験者は顔の動きを抑制されることとなり、課せら
れる負担が増大してしまう。そこで、人物の顔をなるべ
く大きく捉えながらも、被験者には自由な姿勢でいられ
るように、頭部運動をＣＣＤカメラ１で追跡する。Further, specific operations of the feature point detecting section 4, the facial expression estimating section 5, and the facial expression reproducing section 6 will be described. First,
In order to detect the facial expression, the feature point detection unit 4 can more accurately detect the feature region by capturing the feature region with as many pixels as possible. However, when the face is displayed on the entire screen, the subject's movement of the face is suppressed, and the burden imposed increases. Therefore, the head movement is tracked by the CCD camera 1 so that the subject can be in a free posture while capturing the face of the person as large as possible.

【００１５】基本的には、被験者の瞳を追跡することに
よって頭部の運動を追跡するが、激しい頭部運動や、俯
き、振り向きなどの動作により、追跡不可能な状態が発
生する。これを極力防ぎ、すばやく追跡処理を行なうた
めに、背景色を被験者の指標となる肌の色に最適化する
処理を行なう。すなわち、背景だけの画像と被験者を含
んだ画像から色度情報による差分画像を求め、しきい値
により人領域と背景領域とに領域分割が行なわれる。Basically, the movement of the head is tracked by tracking the subject's pupil. However, a state in which tracking is impossible occurs due to intense head movement, depression, or turning. In order to prevent this as much as possible and to quickly perform the tracking process, a process of optimizing the background color to the skin color serving as an index of the subject is performed. That is, a difference image based on chromaticity information is obtained from the image including only the background and the image including the subject, and the region is divided into the human region and the background region by the threshold value.

【００１６】経験的に求めた代表的な肌の色の色度値を
中心として色度領域の１つ１つの色度値（Ｉ，Ｑ値）に
対し、その色度値が人領域に含まれているピクセル数：
その色度値が背景領域に含まれているピクセル数を求
め、人領域に分類される比が一定以上多ければ、その色
度値を肌の色とする。Each chromaticity value (I, Q value) of the chromaticity region is centered on the chromaticity value of a representative skin color obtained empirically, and the chromaticity value is included in the human region. Number of pixels:
The number of pixels in which the chromaticity value is included in the background area is obtained. If the ratio of classification into the human area is greater than a certain value, the chromaticity value is regarded as the skin color.

【００１７】以上の処理によって求めた被験者独自の肌
の色情報を用いて顔運動の追跡が行なわれる。次に、眼
口領域の初期特定方法について説明する。The face motion is tracked using the subject's unique skin color information obtained by the above processing. Next, an initial method for specifying an eye-mouth area will be described.

【００１８】被験者を撮像した画像から前述のようにし
て求めた肌の色の情報に従い、肌色とそれ以外の２値情
報が作成される。その画像の重心位置を起点とし、一定
の角度の方向について画素値が変化する（肌の色でなく
なる）要素がサーチされる。見出された画素から構成さ
れる輪郭線が追跡され、その輪郭線長が一定範囲外なら
ばサーチ起点と見つかった点との延長線上と輪郭線が交
差する点が新たなサーチ起点とされる。もし、サーチに
失敗し、画像の端まできたときにはサーチ起点を変えて
再びサーチされる。望んだ輪郭線が得られたならば、輪
郭線の内側領域の重心を求め、それを眼の位置とする。
同様にして、口の初期位置を求め、その人物の眼，口の
相対的な位置関係を初期情報として記憶する。In accordance with the skin color information obtained as described above from the image of the subject, skin color and other binary information are created. Starting from the position of the center of gravity of the image, an element whose pixel value changes in the direction of a fixed angle (is not a skin color) is searched. The outline composed of the found pixels is tracked, and if the outline length is out of a certain range, the point where the outline intersects with the extension of the search start point and the found point is set as a new search start point. . If the search fails and reaches the end of the image, the search starting point is changed and the search is performed again. When the desired outline is obtained, the center of gravity of the area inside the outline is obtained, and this is set as the position of the eye.
Similarly, the initial position of the mouth is determined, and the relative positional relationship between the eyes and the mouth of the person is stored as initial information.

【００１９】この実施例では、眼位置の探索範囲が前回
の眼の位置を中心とした１００×１００ピクセルの矩形
領域とされる。前述の処理によって得られた瞳の座標を
基に、ｙ軸方向の１００ピクセルの長さをもった輝度情
報の１次元テンプレートを作成する。追跡を行なう次の
フレームが入力されると、探索領域内のピクセルに対
し、次の第（１）式により相関を計算し、相関の度合い
が高かった３点の平均の座標を新たな眼の位置とする。In this embodiment, the search range of the eye position is a rectangular area of 100 × 100 pixels centered on the position of the previous eye. Based on the coordinates of the pupil obtained by the above-described processing, a one-dimensional template of luminance information having a length of 100 pixels in the y-axis direction is created. When the next frame to be tracked is input, a correlation is calculated for the pixels in the search area according to the following equation (1), and the average coordinates of the three points having a high degree of correlation are calculated for a new eye. Position.

【００２０】[0020]

【数１】 (Equation 1)

【００２１】口領域に関しては、形状が激しく変形する
ため、テンプレートマッチングは使用していない。しか
し、既に眼の位置がわかっており、眼口位置初期化処理
の過程で、眼と口との相対位置が求められているため、
その相対的な座標位置を基に、両眼の位置から口の位置
を算出する。For the mouth area, template matching is not used because the shape is severely deformed. However, since the position of the eye is already known and the relative position between the eye and the mouth is determined in the process of the eye-mouth position initialization process,
Based on the relative coordinate positions, the position of the mouth is calculated from the positions of both eyes.

【００２２】僅かな頭部の運動ならば、画面内に顔全体
が収まっており、表情検出が可能となる。しかし、この
ままでは、被験者は常にカメラのファインダに収まるよ
うに顔を固定することを強要されることになる。たとえ
ば、首をかしげるなどは比較的自然な動作であるにもか
かわらず、固定されたカメラでは、顔の一部は画面外に
出てしまい、全体を捉えることが難しく、表情の検出が
正しく行なわれない。そこで、この実施形態では、電動
雲台によってカメラを制御し、常に被写体を捉えるよう
にＣＣＤカメラ１を制御することで、この問題を解決す
る。If the movement of the head is slight, the entire face is contained in the screen, and the expression can be detected. However, in this case, the subject is forced to fix his face so that it always fits in the viewfinder of the camera. For example, despite a relatively natural movement such as shaking the head, with a fixed camera, a part of the face goes out of the screen, it is difficult to capture the entire face, and the facial expression is correctly detected Not. Therefore, in this embodiment, this problem is solved by controlling the camera with the electric pan head and controlling the CCD camera 1 so as to always capture the subject.

【００２３】ＣＣＤカメラ１を制御するにあたって、カ
メラの制御系が持っている座標系と入力画像の座標系
は、現在の使用状況である、被写体がＣＣＤカメラ１か
ら約１ｍに位置したときに正常に作動するように、座標
マッチング関数を予め最適化しておく。入力画像から肌
の色を抽出した２値化画像の重心を求め、その点をカメ
ラの座標系で中心に位置するように雲台を制御すること
で、常に人物の顔の中心を捉えることができる。In controlling the CCD camera 1, the coordinate system of the control system of the camera and the coordinate system of the input image are normal when the subject is located at about 1 m from the CCD camera 1, which is the current usage condition. The coordinate matching function is optimized in advance so as to operate. By calculating the center of gravity of the binarized image obtained by extracting the skin color from the input image and controlling the camera platform so that the point is located at the center in the camera coordinate system, the center of the person's face can always be captured. it can.

【００２４】眼と口の領域から表情の特徴を捉える場合
に、顔がＣＣＤカメラ１に対して正対している場合は問
題ないが、図４（ａ）に示すように被験者が首を傾げる
動作をした場合、矩形領域の中の眼が傾いた状態とな
り、正しい表情の特徴を得ることができない。When capturing the features of the facial expression from the eye and mouth regions, there is no problem if the face is directly facing the CCD camera 1, but the subject tilts his / her head as shown in FIG. In this case, the eyes in the rectangular area are inclined, and it is not possible to obtain a correct facial expression.

【００２５】そこで、この実施形態では、顔がＣＣＤカ
メラ１に対して正対していない場合には、特徴点検出部
４で検出領域内の画像を回転する処理が行なわれる。す
なわち、図３に示すステップ（図示ではＳと略称する）
Ｓ１において、図４（ｂ）に示すように両眼検出位置を
基に顔の傾斜角θを決定し、ステップＳ２で図４（ｃ）
に示すように眼領域内の画像を−θだけ回転させる。同
様にして、ステップＳ３において口領域位置を決定し、
ステップＳ４で口領域画像を−θだけ回転させる。Therefore, in this embodiment, when the face does not face the CCD camera 1, the feature point detecting section 4 performs a process of rotating the image in the detection area. That is, the steps shown in FIG. 3 (abbreviated as S in the drawing)
In S1, the inclination angle θ of the face is determined based on the binocular detection positions as shown in FIG. 4B, and in FIG.
The image in the eye area is rotated by −θ as shown in FIG. Similarly, the mouth area position is determined in step S3,
In step S4, the mouth area image is rotated by -θ.

【００２６】次に、表情推定部５による表情推定につい
てより詳細に説明する。人物の顔画像の輝度情報を空間
周波数領域に変換し、その変化を解析することにより表
情を推定する。表情の変化に伴い、その顔画像の空間周
波数成分も変化するが、顔のどの部分の動きによってど
ういった変化が起こったかを一意に確定するために、い
くつかの領域に分けて空間周波数に変換する。Next, the facial expression estimation by the facial expression estimation unit 5 will be described in more detail. The luminance information of the face image of the person is converted into the spatial frequency domain, and the change is analyzed to estimate the facial expression. As the facial expression changes, the spatial frequency component of the face image also changes.However, in order to uniquely determine what kind of change has occurred due to which part of the face, the spatial frequency is divided into several areas and the spatial frequency is changed. Convert.

【００２７】この実施形態では、人物の表情を形成する
部位のうち、特にその変化が顕著に現われる、両眼と口
の画像領域に対して一定の矩形領域を割当て、その部分
を独立に空間周波数変換することによって顔表情を再現
する。ただし、人の口の動きは眼に比べて複雑なため、
さらに細かい６つの領域に分けている。In this embodiment, a certain rectangular area is allocated to the image area of both eyes and the mouth where the change is particularly noticeable among the parts forming the facial expression of the person, and that part is independently subjected to the spatial frequency. The facial expression is reproduced by conversion. However, since the movement of the mouth of a person is more complicated than that of the eyes,
It is further divided into six areas.

【００２８】空間周波数変換の例として、眼の画像を用
いると、眼を閉じていく場合、上下の瞼の間隔が狭まっ
ていくことによって、垂直方向における画像の空間周波
数成分が増加する。水平方向においては、瞼の輪郭が水
平方向に横切る度合いが少なくなることから、垂直方向
の場合と反比例するように水平方向の空間周波数成分が
減少していく。As an example of the spatial frequency conversion, when an image of an eye is used, when the eyes are closed, the space between the upper and lower eyelids is reduced, so that the spatial frequency component of the image in the vertical direction is increased. In the horizontal direction, the degree of horizontal movement of the outline of the eyelid in the horizontal direction decreases, so that the spatial frequency component in the horizontal direction decreases in inverse proportion to the vertical direction.

【００２９】また、眼を開いている場合には閉じていく
場合と全く逆の、水平方向の空間周波数成分が増加し、
垂直方向の空間周波数成分が減少する。これらの空間周
波数成分の変化は、検出を始める際、始めに被験者に無
表情状態を表出させ、その空間周波数成分データから相
対的な変化として捉えることによって表情を検出する。When the eyes are open, the spatial frequency component in the horizontal direction, which is completely opposite to the case where the eyes are closed, increases,
Vertical spatial frequency components are reduced. When the detection of these spatial frequency components is started, an expressionless state is first presented to the subject, and the facial expression is detected as a relative change from the spatial frequency component data.

【００３０】なお、この方法では、画像を空間周波数成
分に変換し、位置に関する情報を使わず、電力量を解析
することによって表情を検出しているため、空間周波数
変換を行なう矩形領域内に、眼（または口）が収まって
さえいれば、多少矩形領域の取得を誤ったとしても、推
定は正常に行なうことができる。In this method, since an image is converted into a spatial frequency component and the facial expression is detected by analyzing the electric energy without using the information on the position, the image is converted into a rectangular area for performing the spatial frequency conversion. As long as the eye (or mouth) is closed, the estimation can be performed normally even if the rectangular area is erroneously acquired.

【００３１】前述のごとく、この実施形態では、空間周
波数領域における電力量によって検出を行なうため、位
相情報は不要となる。この点から、計算コストに優れて
いるＤＣＴ変換が用いられる。As described above, in this embodiment, since the detection is performed based on the amount of power in the spatial frequency domain, no phase information is required. From this point, DCT transform, which is excellent in calculation cost, is used.

【００３２】輝度画像ｆ（ｘ，ｙ）（ｘ＝０、…、Ｍ−
１；ｙ＝０、…、Ｎ−１）におけるＤＣＴの結果Ｆ
（ｕ，ｖ）（ｕ，ｖはｘ，ｙ軸方向の空間周波数成分）
を第（２）式に示す。The luminance image f (x, y) (x = 0,..., M−
1; y = 0, ..., N-1) DCT result F
(U, v) (u and v are spatial frequency components in x and y axis directions)
Is shown in equation (2).

【００３３】[0033]

【数２】 (Equation 2)

【００３４】ＤＣＴを行なうサイズ（Ｍ，Ｎ）は計算コ
スト削減のため、変換を行なう矩形領域をＭ＝Ｎ＝８ピ
クセルの小領域に分割し、それぞれの小領域に対してＤ
ＣＴ変換を行なう。For the size (M, N) for performing DCT, the rectangular area to be transformed is divided into small areas of M = N = 8 pixels in order to reduce the calculation cost.
Perform CT conversion.

【００３５】図５はＤＣＴによる空間周波数成分のベク
トル化について説明するための図である。この実施形態
では、顔部位の形状変化を捉えるため、縦，横，斜めの
各方向の空間周波数成分を図５に示す範囲について電力
量総和を求める。なお、図５に示すように変換後の空間
周波数領域において、高周波領域についてはノイズなど
の混入によって電力量の分散が大きいため使用しない。
このようにして求められた縦横斜めの各方向の電力量総
和の無表情状態からの差分を表情の特徴量とする。FIG. 5 is a diagram for explaining vectorization of a spatial frequency component by DCT. In this embodiment, in order to capture the change in the shape of the face part, the sum of power amounts is calculated for the spatial frequency components in the vertical, horizontal, and oblique directions in the range shown in FIG. As shown in FIG. 5, in the converted spatial frequency domain, the high frequency domain is not used because the amount of power variance is large due to the inclusion of noise or the like.
The difference from the expressionless state of the sum of the electric energy in each of the vertical, horizontal, and diagonal directions obtained in this manner is defined as the characteristic amount of the facial expression.

【００３６】次に、表情再現部６についてより詳細に説
明する。この実施形態では、予め作成された３次元顔モ
デルを推定された表情データに従って変形することによ
って、表情を再現する。そのために、３次元顔モデルを
変形させる規則が必要となる。この実施形態では、人物
の表情変化に対応した筋肉の動きを考慮にいれて、１０
種類の３次元モデル変形規則（以下、基本表情と称す
る）が定められる。基本的には、人間の筋肉の動きをシ
ミュレートする形で変形規則を定めているが、表情検出
を代表的な特徴領域に対して行なっているため、表情変
化を捉えていない領域も存在する。そこで、若干表情を
強調して再現することによって表情画面の品質の向上を
めざす。そのため、１０種類の基本表情のうちいくつか
は、人間には不可能な動きを表現する変換規則になって
おり、これらの基本表情をうまく混合することによって
人物の表情を再現する。Next, the expression reproducing section 6 will be described in more detail. In this embodiment, a facial expression is reproduced by deforming a three-dimensional face model created in advance in accordance with the estimated facial expression data. Therefore, a rule for deforming the three-dimensional face model is required. In this embodiment, taking into account the movement of the muscle corresponding to the change in the facial expression of the person, 10
Types of three-dimensional model deformation rules (hereinafter, referred to as basic expressions) are determined. Basically, deformation rules are defined in a manner that simulates the movement of human muscles.However, since facial expression detection is performed on typical characteristic regions, some regions do not capture facial expression changes. . Therefore, the quality of the expression screen is improved by slightly emphasizing and reproducing the expression. Therefore, some of the ten basic facial expressions have conversion rules expressing movements impossible for humans, and the facial expressions of a person are reproduced by properly mixing these basic facial expressions.

【００３７】３次元顔モデルに人物の表情を再現するた
めには、指標となるデータが必要となる。この実施形態
では、まずモデルとなる人物にさまざまな表情を表出さ
せ、その顔画像を記録した（たとえば１９０３フレー
ム）。各フレームについて基本表情を調整して表情を再
現し、各基本表情の混合比率を指標データとして記憶す
る。これら混合比率のデータは顔モデルのワイヤフレー
ムの頂点ベクトルをＰ_jとし、混合比率をＣ_jとする
と、ワイヤフレーム各頂点の３次元座標の移動ベクトル
Ｗ_jは第（３）式で表わされる。In order to reproduce the expression of a person on a three-dimensional face model, data serving as an index is required. In this embodiment, first, various facial expressions are expressed by a model person, and their facial images are recorded (for example, 1903 frames). The basic expression is adjusted for each frame to reproduce the expression, and the mixture ratio of each basic expression is stored as index data. These data mixing ratio vertex vector wireframe face model as P _j, when the mixing ratio and C _j, the movement vector W _j of the three-dimensional coordinates of the wireframe each vertex is expressed by the equation (3).

【００３８】[0038]

【数３】 (Equation 3)

【００３９】表情を３次元顔モデルで再現するために
は、無表情時からの空間周波数成分変化を基本表情の混
合比率に変換する必要がある。この関連づけは第（４）
式で示される変換式を用いて行ない、その係数の値の決
定は遺伝的アルゴリズム（ＧＡ：Genetic Algorithm ）
を用いて行なう。ＧＡに学習させる混合比率を求めるた
めの基本式を第（４）式に示す。In order to reproduce a facial expression with a three-dimensional face model, it is necessary to convert a change in spatial frequency component from the time of no expression to a mixture ratio of basic expressions. This association is the fourth (4)
The value of the coefficient is determined by using a conversion formula represented by the following formula, and a genetic algorithm (GA) is determined.
This is performed using Formula (4) shows a basic formula for calculating the mixture ratio to be learned by the GA.

【００４０】[0040]

【数４】 (Equation 4)

【００４１】Ｃ_jは混合比率であり、ＡＢＣＤは求める
係数であり、Ｈｄｉ，Ｖｄｉ，Ｄｄｉはそれぞれ横縦斜
め方向の空間周波数成分の変化量である。実際には、顔
全体を６つの領域（眼，眉，鼻など）に分けて、それぞ
れ独立した混合比率を求めており、しかもいくつかの領
域の空間周波数成分変化を必要とする基本表情の混合比
率の式では、パラメータはその分増加する。この実施形
態で実際に使用するパラメータの数は１１５となってい
る。[0041] C _j is a mixing ratio, ABCD is a coefficient for determining, Hdi, Vdi, Ddi is the change amount of the spatial frequency component of the horizontal vertical diagonal directions. In practice, the entire face is divided into six regions (eye, eyebrows, nose, etc.), and independent mixing ratios are obtained for each region. In addition, the mixing of basic facial expressions that require spatial frequency component changes in some regions In the ratio formula, the parameter increases accordingly. The number of parameters actually used in this embodiment is 115.

【００４２】図６はこの実施形態による表情再現例を示
す図である。図６において、左は被験者の顔画像であ
り、中央は教師情報を入力した場合を示し、右は遺伝的
アルゴリズムによって学習された式によって再現した例
を示す。この図６から明らかなように、この実施形態に
よれば、正確に人物の表情を再現できることがわかる。FIG. 6 is a diagram showing an example of facial expression reproduction according to this embodiment. In FIG. 6, the left is the face image of the subject, the center shows the case where the teacher information is input, and the right shows an example reproduced by an expression learned by the genetic algorithm. As is clear from FIG. 6, according to this embodiment, it is understood that the expression of the person can be accurately reproduced.

【００４３】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

【００４４】[0044]

【発明の効果】以上のように、この発明によれば、被験
者の顔画像を撮像し、その画像から肌色領域を抽出して
その重心を元に撮像手段を制御し、眼と口の位置を推定
して推定された両眼の位置に基づいて顔の傾き角を推定
し、その角度だけ検出領域内の顔画像を回転させ、回転
した検出領域に対して２次元離散コサイン変換を行なっ
て無表情検出時からの空間周波数成分の変化を求め、遺
伝的アルゴリズムによって学習されたパラメータを用い
て空間周波数成分の変化データを３次元顔モデルの変形
に変換して表情を再現することができる。したがって、
顔の表情検出を、頭部とは独立した画像入力から得られ
る画像を基に行なうことで非接触の検出を実現できる。
また、頭部の回転に対応した画像演算を表情特徴量検出
の前段階に対応することで、使用者の動作の自由度を向
上でき、表情検出精度の向上を果たすことができる。As described above, according to the present invention, a face image of a subject is imaged, a flesh-colored area is extracted from the image, the imaging means is controlled based on the center of gravity, and the positions of the eyes and the mouth are determined. The face inclination angle is estimated based on the estimated position of the eyes, the face image in the detection area is rotated by the angle, and the two-dimensional discrete cosine transform is performed on the rotated detection area. The change of the spatial frequency component from the time of detecting the facial expression is obtained, and the change data of the spatial frequency component is converted into the deformation of the three-dimensional face model by using the parameter learned by the genetic algorithm, so that the facial expression can be reproduced. Therefore,
By performing face expression detection based on an image obtained from an image input independent of the head, non-contact detection can be realized.
In addition, since the image calculation corresponding to the rotation of the head corresponds to the stage before the detection of the facial expression feature amount, the degree of freedom of the user's movement can be improved, and the facial expression detection accuracy can be improved.

[Brief description of the drawings]

【図１】この発明の一実施形態を実現するための装置の
ブロック図である。FIG. 1 is a block diagram of an apparatus for realizing an embodiment of the present invention.

【図２】図１に示した画像処理装置における処理過程を
示す図である。FIG. 2 is a diagram showing a processing process in the image processing apparatus shown in FIG.

【図３】図２の特徴点検出部における画像回転処理過程
の処理手順を示すフローチャートである。FIG. 3 is a flowchart illustrating a processing procedure of an image rotation processing process in a feature point detection unit in FIG. 2;

【図４】検出領域内の画像を回転する方法を説明するた
めの図である。FIG. 4 is a diagram for explaining a method of rotating an image in a detection area.

【図５】ＤＣＴによる空間周波数成分のベクトル化を説
明するための図である。FIG. 5 is a diagram for explaining vectorization of a spatial frequency component by DCT.

【図６】この発明の実施形態による表情再現例を示す図
である。FIG. 6 is a diagram showing an example of facial expression reproduction according to the embodiment of the present invention.

[Explanation of symbols]

１ＣＣＤカメラ２画像処理装置３表示部４特徴点検出部５表情推定部６表情再現部 Reference Signs List 1 CCD camera 2 Image processing device 3 Display unit 4 Feature point detection unit 5 Expression estimation unit 6 Expression reproduction unit

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成１１年１２月２７日（１９９９．１２．
２７）[Submission date] December 27, 1999 (1999.12.
27)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Correction target item name] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【特許請求の範囲】[Claims]

【手続補正２】[Procedure amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００６[Correction target item name] 0006

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【０００６】[0006]

【課題を解決するための手段】請求項１に係る発明は、
実時間で被験者の顔の表情を検出する実時間表情検出装
置であって、被験者の顔画像を撮像するための撮像手段
と、撮像手段出力の顔画像信号から肌色領域を抽出して
その重心を求め、入力画像の中心に顔の中心が一致する
ように撮像手段を制御する撮像制御手段と、前処理とし
て撮像手段からの画像信号から肌色とそれ以外の２値画
像を作成し、その画像の重心位置からサーチして画素値
の変化する要素を見出して輪郭線を追跡し、輪郭線の内
側領域の重心から眼と口の位置を求める特徴点検出手段
と、求められた両眼の位置に基づいて顔の傾き角を推定
し、その角度だけ検出領域内の顔画像を回転させる画像
回転手段と、顔画像の回転した検出領域に対して２次元
離散コサイン変換を行ない、無表情検出時からの空間周
波数成分の変化を求める表情推定手段と、予め遺伝的ア
ルゴリズムによって学習されたパラメータを用いて、表
情推定手段によって求められた空間周波数成分の変化デ
ータを３次元顔モデルの変形に変換して表情を再現する
表情再現手段を備えて構成される。The invention according to claim 1 is
A real-time facial expression detection device for detecting a facial expression of a subject in real time, comprising: an imaging unit for capturing a face image of the subject; and a skin color region extracted from a face image signal output from the imaging unit, and a center of gravity thereof is calculated. calculated, an imaging control means for controlling the imaging means in such a manner that the center of the face matches the center of the input image, the pre-treatment
Skin color and other binary images from the image signal from the imaging means
Create an image, search from the center of gravity of the image
Finds the changing element of and traces the contour,
A feature point detection means for determining the position of the eyes and mouth from the center of gravity of the side area, on the basis of the positions of both eyes was determined al estimates the inclination angle of the face, rotating the face image of the angle only detection region image Rotation means , two-dimensional discrete cosine transform for a detected area where the face image is rotated, and a facial expression estimating means for obtaining a change in a spatial frequency component from a time when no expression is detected; and a parameter learned in advance by a genetic algorithm. And a facial expression reproducing means for converting the change data of the spatial frequency component obtained by the facial expression estimating means into a deformation of the three-dimensional face model to reproduce the facial expression.

【手続補正３】[Procedure amendment 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００７[Correction target item name] 0007

【補正方法】削除[Correction method] Deleted

【手続補正４】[Procedure amendment 4]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００４４[Correction target item name] 0044

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００４４】[0044]

【発明の効果】以上のように、この発明によれば、被験
者の顔画像を撮像し、その画像から肌色領域を抽出して
その重心を基に撮像手段を制御し、前処理として撮像手
段からの画像信号から肌色とそれ以外の２値画像を作成
し、その画像の重心位置からサーチして画素値の変化す
る要素を見出して輪郭線を追跡し、輪郭線の内側領域の
重心から眼と口の位置を求め、求めた両眼の位置に基づ
いて顔の傾き角を推定し、その角度だけ検出領域内の顔
画像を回転させ、回転した検出領域に対して２次元離散
コサイン変換を行なって、無表情検出時からの空間周波
数成分の変化を求め、遺伝的アルゴリズムによって学習
されたパラメータを用いて、空間周波数成分の変化デー
タを３次元顔モデルの変形に変換して表情を再現するこ
とができる。したがって、顔の表情検出を、頭部とは独
立した画像入力から得られる画像を基に行なうことで非
接触の検出を実現できる。また、頭部の回転に対応した
画像演算を表情特徴量検出の前段階に対応することで、
使用者の動作の自由度を向上でき、表情検出精度の向上
を果たすことができる。As is evident from the foregoing description, according to the present invention, captures the face image of the subject, to extract the skin color area from the image and controls the imaging unit based on the centroid, the imaging hands as a pretreatment
Create skin color and other binary images from image signal from stage
Search from the position of the center of gravity of the image, and
Traces the contour by finding the element that
The position of the eyes and the mouth is obtained from the center of gravity, the inclination angle of the face is estimated based on the obtained positions of the two eyes, the face image in the detection area is rotated by that angle, and the two-dimensional discrete The cosine transform is performed to find the change in the spatial frequency component from the time when the expressionless expression is detected. Using the parameters learned by the genetic algorithm , the spatial frequency component change data is converted into a three-dimensional face model deformation to express the facial expression. Can be reproduced. Therefore, non-contact detection can be realized by performing face expression detection based on an image obtained from an image input independent of the head. In addition, the image operation corresponding to the rotation of the head corresponds to the previous stage of the facial expression feature amount detection,
The degree of freedom of the user's operation can be improved, and the expression detection accuracy can be improved.

───────────────────────────────────────────────────── フロントページの続き (72)発明者大谷淳京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール知能映像通信研究所内 (72)発明者竹松克浩京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール知能映像通信研究所内Ｆターム(参考） 5B050 AA08 BA09 BA12 CA07 DA04 EA12 EA13 EA28 GA06 5B057 CB13 CD03 CD11 CG05 DA07 DA08 DC06 DC17 DC25 DC33 DC40 ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Atsushi Atsushi Otani 5 Shiraya, Seika-cho, Soraku-cho, Kyoto Pref. No. 5 Sanraya, Inaya, Seika-cho, Gunma F-term (reference) 5A050 AA08 BA09 BA12 CA07 DA04 EA12 EA13 EA28 GA06 5B057 CB13 CD03 CD11 CG05 DA07 DA08 DC06 DC17 DC25 DC33 DC40

Claims

[Claims]

1. A real-time facial expression detection device for detecting a facial expression of a subject in real time, comprising: an imaging unit for imaging a face image of the subject; and a skin color region from a face image signal output from the imaging unit. Imaging control means for extracting and calculating the center of gravity and controlling the imaging means so that the center of the face coincides with the center of the input image; feature inspection for detecting the positions of eyes and mouth from a face image signal output from the imaging means Output means; an image rotation means for estimating a tilt angle of a face based on the positions of both eyes estimated by the feature point detection means, and rotating the face image in the detection area by the angle; and a face image by the image rotation means. Expression estimating means for performing a two-dimensional discrete cosine transform on the rotated detection region to obtain a change in spatial frequency component from the time of expressionless detection, and a parameter learned in advance by a genetic algorithm Using data, with a facial expression reproduction means for reproducing the expression to convert the change data of the spatial frequency components obtained by the facial expression estimator to deformation of the three-dimensional face model, real-time facial expression detection device.

2. The feature point detecting means generates a skin color and other binary images from an image signal from the image pickup means as preprocessing, and searches from the center of gravity of the image to search for an element whose pixel value changes. 2. The real-time facial expression detecting device according to claim 1, wherein the real-time facial expression detecting device is obtained by finding the position of the eye and the mouth by finding the center of gravity and finding the center of gravity of the region inside the outline.