JP3432816B2

JP3432816B2 - Head region extraction device and real-time expression tracking device

Info

Publication number: JP3432816B2
Application number: JP2001304116A
Authority: JP
Inventors: 昭二田中; 聡田中
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2001-09-28
Filing date: 2001-09-28
Publication date: 2003-08-04
Anticipated expiration: 2021-09-28
Also published as: JP2003108980A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、人物を撮像した映
像から人物の頭部領域を抽出する頭部領域抽出装置に関
する。また、本発明は、本人の顔を送信する代わりにＣ
Ｇキャラクタの映像を相手に送信することによって人物
映像を互いに通信するテレビ電話など通信システムに適
用され、特にカメラによって撮像された顔の映像から頭
部の３次元的な姿勢情報と顔の表情を計測し、この計測
結果に基づいてＣＧキャラクタの動きを制御する代理応
答によるリアルタイム表情追跡装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image of a person.
A head region extraction device that extracts the head region of a person from an image
To do. In addition, the present invention uses C instead of transmitting the face of the person.
It is applied to a communication system such as a videophone that communicates person images with each other by transmitting the image of the G character to the other party, and in particular, the three-dimensional posture information of the head and the facial expression are obtained from the face image captured by the camera. The present invention relates to a real-time facial expression tracking device using a proxy response that measures and controls the movement of a CG character based on the measurement result.

【０００２】[0002]

【従来の技術】例えば、図３０は、特開２０００−３３
１１９０号公報に示された従来の仮想変身装置（第１の
従来技術）を示すものであり、この仮想変身装置は、顔
画像を入力するビデオカメラと、ビデオカメラを回転さ
せる電動雲台と、ビデオカメラから入力された顔画像か
ら顔の軸の回転、あるいは顔の軸周りの回転と視線方向
を検出し、両目および口の形状変化を検出する顔画像認
識装置と、この計測結果に基づいてＣＧ（コンピュータ
グラフィックス）で構築された仮想空間のキャラクタを
制御する仮想環境合成装置とを備えている。2. Description of the Related Art For example, FIG.
1 is a view showing a conventional virtual transformation device (first conventional technique) disclosed in Japanese Patent No. 1190, which includes a video camera for inputting a face image, an electric pan head for rotating the video camera, A face image recognition device that detects the rotation of the axis of the face or the rotation around the axis of the face and the direction of the line of sight from the face image input from the video camera, and detects the shape change of both eyes and mouth, and based on this measurement result A virtual environment composition device for controlling a character in a virtual space constructed by CG (computer graphics).

【０００３】この第１の従来技術では、ビデオカメラか
ら入力された顔画像を、予め設定したＲＧＢ空間上に構
築された肌色モデルに従って肌色を１、肌色以外を０と
する２値化処理を行う。次に、２値化した顔領域の重心
を求め、重心が画像の中心になるように電動雲台装置を
制御し、カメラのアングルを修正する。次に、重心位置
に基づき顔領域内に存在する穴を両目および口として検
出する。次に、予め設定したテンプレートを用いたテン
プレートマッチングにより目領域を追跡し、黒目の位置
から視線方向を求める。また、両目を結んだ直線と画像
の水平軸との角度を計測し、さらに、両目間の距離か
ら、顔の軸周りの回転を検出する。そして、両目および
口の周囲画像を離散コサイン変換したときの各周波数帯
域での電力変化を捉えることで、両目および口の形状変
化を計測する。以上の計測結果に基づいてＣＧで構築さ
れた仮想空間のキャラクタの頭部および表情を制御す
る。In the first prior art, a face image input from a video camera is binarized by setting a skin color to 1 and a color other than the skin color to 0 according to a skin color model constructed on a preset RGB space. . Next, the center of gravity of the binarized face area is obtained, the electric pan head device is controlled so that the center of gravity is at the center of the image, and the angle of the camera is corrected. Next, the holes present in the face area are detected as both eyes and mouth based on the position of the center of gravity. Next, the eye region is tracked by template matching using a preset template, and the line-of-sight direction is obtained from the position of the black eye. Also, the angle between the straight line connecting both eyes and the horizontal axis of the image is measured, and the rotation of the face around the axis is detected from the distance between both eyes. Then, the shape change of both eyes and mouth is measured by capturing the power change in each frequency band when the discrete cosine transform is performed on the surrounding images of both eyes and mouth. The head and facial expression of the character in the virtual space constructed by CG are controlled based on the above measurement results.

【０００４】また、特開２０００-２５９８３１号公報
の表情検出装置（第２の従来技術）では、連続する各フ
レームの画像において、選択した複数の特徴点を追跡
し、各フレーム毎に前記複数の特徴点を頂点とするドロ
ネー網を構成し、このドロネー網を用いて表情筋モデル
を特徴点の移動に基づき変位させることにより、表情筋
モデルの変化を求めるようにしている。Further, in the facial expression detection device (second prior art) of Japanese Patent Laid-Open No. 2000-259831, a plurality of selected feature points are tracked in images of consecutive frames, and the plurality of feature points are tracked for each frame. A Delaunay network having feature points as vertices is constructed, and by using this Delaunay network, the facial muscle model is displaced based on the movement of the characteristic point to obtain a change in the facial muscle model.

【０００５】また、特開平１１−３０６３４８号公報
（第３の従来技術）においては、大きさが固定のウィン
ドウマスクを画像全体に走査し、マスク内の輝度分散を
正規化することにより、照明条件が変化しても安定して
対象物の特徴量を抽出可能とした対象物検出装置に関す
る発明が開示されている。Further, in Japanese Unexamined Patent Publication No. 11-306348 (third prior art), a window mask having a fixed size is scanned over the entire image, and the luminance dispersion in the mask is normalized, thereby illuminating conditions. An invention relating to a target object detection apparatus capable of stably extracting a feature amount of a target object is disclosed even when is changed.

【０００６】[0006]

【発明が解決しようとする課題】第１の従来技術では、
カメラで撮影した顔画像を肌色モデルに基づいて２値化
し、顔領域内の穴を見つけ、顔領域の重心位置からそれ
らを目および口に対応させている。しかしながら、本
来、顔の凹凸から生じる影やハイライトの影響があるの
で、第１の従来技術では、照明条件を慎重に設定しなけ
れば目および口のみを穴として検出するのは非常に困難
である。また、この第１の従来技術は、頭部の３軸（Ｘ
軸、Ｙ軸、Ｚ軸）周りの回転を同時に計測することがで
きず、さらに、顔の軸周りの回転を、両目間の距離によ
り求めているため、例えば顔がカメラから遠ざかるある
いは近づくと、必然的に両目間の距離が変化することか
ら、実際には回転させていないのにも関わらず、回転し
ているとみなされるなど問題があった。SUMMARY OF THE INVENTION In the first prior art,
The face image taken by the camera is binarized based on the skin color model, holes in the face area are found, and they are made to correspond to the eyes and mouth from the position of the center of gravity of the face area. However, since the shadows and highlights caused by the unevenness of the face are inherently affected, it is very difficult to detect only eyes and mouths as holes in the first conventional technique unless the lighting conditions are carefully set. is there. In addition, the first conventional technique is based on the three axes (X
Axis, Y-axis, Z-axis) rotation cannot be measured at the same time, and the rotation of the face around the axis is determined by the distance between the eyes. For example, when the face moves away from or approaches the camera, Since the distance between the eyes inevitably changes, there was a problem in that it was considered as rotating even though it was not actually rotating.

【０００７】また、第２の従来技術では、３次元姿勢を
計測するために顔画像中の多数の特徴点を追跡する必要
があるため、計算能力の低いハードウェアではリアルタ
イム処理が困難である問題があった。Further, in the second conventional technique, it is necessary to trace a large number of feature points in the face image in order to measure the three-dimensional posture, so that it is difficult to perform real-time processing with hardware having low calculation ability. was there.

【０００８】また、第３の従来技術では、大きさが固定
されたマスク領域を用いることから、個人差や撮影距離
によって顔領域の大きさが変化することへの対応処理が
困難である。Further, in the third prior art, since the mask area of which size is fixed is used, it is difficult to deal with the change of the size of the face area due to individual difference or photographing distance.

【０００９】この発明は上記に鑑みてなされたもので、
任意の照明条件で撮影された、不特定人物の顔画像か
ら、簡易な演算によって計算能力が低いハードウェアで
も実時間で、頭部領域を正確に抽出し得る頭部領域抽出
装置を得ることを目的としている。また、この発明は、
簡易な演算によって頭部領域を抽出し、頭部の３次元的
な動きを計測し、かつ両目および口の開閉状態を計測
し、その結果を用いてＣＧキャラクタの頭部の動きおよ
び表情を制御するリアルタイム表情追跡装置を得ること
を目的としている。The present invention has been made in view of the above,
Head region extraction that can accurately extract the head region in real time from hardware images with low calculation power by simple calculation from the face image of an unspecified person taken under arbitrary lighting conditions
The purpose is to get the device. Further, the present invention is
The head region is extracted by a simple calculation, the three-dimensional movement of the head is measured, the open / closed states of both eyes and mouth are measured, and the result is used to control the movement and expression of the head of the CG character. The purpose is to obtain a real-time facial expression tracking device.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するため
この発明にかかる頭部領域抽出装置は、人物を撮像した
映像から人物の頭部領域を抽出する頭部領域抽出装置に
おいて、対象人物を撮像した画像の各画素データをＲ，
Ｇ，Ｂ成分毎に下式ｃ１＝arctan（Ｒ／max（Ｇ，Ｂ））ｃ２＝arctan（Ｇ／max（Ｒ，Ｂ））ｃ３＝arctan（Ｂ／max（Ｒ，Ｇ））に従って正規化して正規化データｃ１，ｃ２，ｃ３を取
得する正規化手段と、正規化データｃ１，ｃ２，ｃ３を
含む各画素データを次式Ｃ１＝ｃ２／ｃ１Ｃ２＝ｃ３／ｃ２に従ってＣ１−Ｃ２空間のデータを含む画素データに夫
々変換するデータ変換手段と、変換した画素データのＣ
１データおよびＣ２データが下式ｔｈ１＜Ｃ１＜ｔｈ２ｔｈ１，ｔｈ２；肌色抽出パラメータｔｈ３＜Ｃ２＜ｔｈ４ｔｈ３，ｔｈ４；肌色抽出パラメータを満足すると、この画素データを肌色画素として判断す
ることにより、撮像した画像から頭部領域を抽出する頭
部領域抽出手段とを備えたことを特徴とする。 In order to achieve the above object, the head region extracting apparatus according to the present invention images a person.
A head region extraction device that extracts the head region of a person from video
In addition, each pixel data of the image of the target person is R,
Normalized according to the following equations for each of the G and B components: c1 = arctan (R / max (G, B)) c2 = arctan (G / max (R, B)) c3 = arctan (B / max (R, G)) The normalized data c1, c2, c3
The normalizing means to be obtained and the normalized data c1, c2, c3
Each of the included pixel data is converted into pixel data including the data of the C1-C2 space according to the following formula C1 = c2 / c1 C2 = c3 / c2.
Data conversion means for converting each and C of the converted pixel data
1 data and C2 data satisfy the following formulas th1 <C1 <th2 th1, th2; skin color extraction parameter th3 <C2 <th4 th3, th4; skin color extraction parameter , this pixel data is judged as a skin color pixel.
By extracting the head region from the captured image
And a partial area extracting unit.

【００１１】つぎの発明にかかる頭部領域抽出装置は、
上記の発明において、当該対象人物を撮像するときと同
じ照明環境下で、対象人物の顔の一部の所定の領域の画
像をサンプリングする肌色サンプリング手段と、該肌色
サンプリング手段によってサンプリングした所定の領域
の画像の各画素データを前記正規化手段を用いて正規化
した後、前記データ変換手段を用いてＣ１−Ｃ２空間の
画素データに変換し、該変換した前記所定の領域の複数
の画素データを用いてＣ１データについての最大値およ
び最小値とＣ２データについての最大値および最小値を
求め、これらの最大値および最小値で前記肌色抽出パラ
メータｔｈ１、ｔｈ２、ｔｈ３およびｔｈ４を補正する
肌色抽出パラメータ調整手段とをさらに備えることを特
徴とする。 A head region extraction device according to the next invention is
In the above invention, it is the same as when capturing the target person.
In the same lighting environment, an image of a predetermined area of a part of the target person's face is displayed.
Skin color sampling means for sampling an image, and the skin color
Predetermined area sampled by sampling means
Each pixel data of the image is normalized using the normalizing means.
After that, using the data conversion means,
A plurality of pixels in the predetermined area converted into pixel data
The maximum value for C1 data and
And minimum and maximum and minimum values for C2 data
Then, the skin color extraction parameters are calculated with these maximum and minimum values.
Correct meters th1, th2, th3 and th4
A flesh color extraction parameter adjusting means is further provided.
To collect.

【００１２】つぎの発明にかかる頭部領域抽出は、上記
の発明において、前記頭部領域抽出手段は、肌色領域抽
出結果から最大領域を抽出することにより頭部領域を抽
出することを特徴とする。In the head area extraction according to the next invention, in the above invention, the head area extraction means is a skin color area extraction.
The head area is extracted by extracting the maximum area from the output result.
Characterized by issuing .

【００１３】つぎの発明にかかる頭部領域抽出は、上記
の発明において、前記頭部領域抽出手段は、頭部領域抽
出後の２値画像に膨張収縮処理を加えることを特徴とす
る。 The extraction of the head region according to the next invention is performed by
In the invention, the head area extracting means is
Characterized by adding expansion / contraction processing to the binary image after output
It

【００１４】つぎの発明にかかる頭部領域抽出装置は、
上記の発明において、前記頭部領域抽出手段は、前記膨
張収縮処理後の２値画像と、画素値が全て肌色に対応す
る論理値レベルであるマスク画像との排他的論理和を求
め、この排他的論理和がとられた画像の頭部領域以外を
非肌色に対応する論理値レベルとした画像と前記膨張収
縮処理後の２値画像との論理和を求めることで、頭部領
域全体を抽出することを特徴とする。A head area extraction device according to the next invention is
In the above invention, the head region extraction means obtains an exclusive OR of the binary image after the expansion / contraction process and a mask image whose pixel values are all logical value levels corresponding to skin color, and the exclusive OR is performed. The entire head region is extracted by obtaining the logical sum of the image having the logical value level corresponding to the non-skin color other than the head region of the image logically ORed with the binary image after the expansion / contraction processing. It is characterized by doing.

【００１５】つぎの発明にかかるリアルタイム表情追跡
装置は、順次所定のフレームレートで入力される映像を
キャプチャする映像入力手段と、前記キャプチャした画
像から頭部画像を抽出する頭部領域抽出手段と、前記抽
出した頭部領域から両目および口を含む各部位の候補領
域を抽出する部位領域候補抽出手段と、抽出した候補領
域の中から各部位の位置を検出する部位検出追跡手段
と、前記検出した両目、口の検出位置に基づいて頭部の
３次元姿勢を計測するとともに、両目および口の開閉状
態を計測する頭部３次元姿勢・表情計測手段とを備え、
前記計測した頭部の３次元姿勢および両目および口の開
閉状態に基づいてＣＧキャラクタの動きを制御するリア
ルタイム表情追跡装置であって、前記頭部領域抽出手段
は、対象人物を撮像した画像の各画素データをＲ，Ｇ，
Ｂ成分毎に下式ｃ１＝arctan（Ｒ／max（Ｇ，Ｂ））ｃ２＝arctan（Ｇ／max（Ｒ，Ｂ））ｃ３＝arctan（Ｂ／max（Ｒ，Ｇ））に従って正規化して正規化データｃ１，ｃ２，ｃ３を取
得する正規化手段と、正規化データｃ１，ｃ２，ｃ３を
含む各画素データを次式Ｃ１＝ｃ２／ｃ１Ｃ２＝ｃ３／ｃ２に従ってＣ１−Ｃ２空間のデータを含む画素データに夫
々変換するデータ変換手段と、変換した画素データが下
式ｔｈ１＜Ｃ１＜ｔｈ２ｔｈ１，ｔｈ２；肌色抽出パラメータｔｈ３＜Ｃ２＜ｔｈ４ｔｈ３，ｔｈ４；肌色抽出パラメータを満足すると、この画素データを肌色画素として判断す
ることにより、撮像した画像から肌色領域を抽出する肌
色領域抽出手段とを備えたことを特徴とする。 Real-time facial expression tracking according to the next invention
The device sequentially processes the images input at a predetermined frame rate.
Video input means to capture and the captured image
A head region extracting means for extracting a head image from the image;
Candidate areas for each part including both eyes and mouth from the head area
Region candidate extraction means for extracting regions and the extracted candidate regions
Part detection tracking means for detecting the position of each part from the region
Based on the detected positions of both eyes and mouth,
Measures three-dimensional posture and opens and closes both eyes and mouth
A head three-dimensional posture / facial expression measuring means for measuring a state,
The measured three-dimensional posture of the head and the opening of both eyes and mouth
Rear that controls the movement of the CG character based on the closed state
A real-time facial expression tracking device, wherein the head region extraction means
Represents each pixel data of the image obtained by capturing the target person as R, G,
For each B component, normalize and normalize according to the following formula c1 = arctan (R / max (G, B)) c2 = arctan (G / max (R, B)) c3 = arctan (B / max (R, G)) The converted data c1, c2, c3
The normalizing means to be obtained and the normalized data c1, c2, c3
Each of the included pixel data is converted into pixel data including the data of the C1-C2 space according to the following formula C1 = c2 / c1 C2 = c3 / c2.
The data conversion means for each conversion and the converted pixel data
Expression th1 <C1 <th2 th1, th2; Skin color extraction parameter th3 <C2 <th4 th3, th4; When the skin color extraction parameter is satisfied, this pixel data is determined as a skin color pixel.
By extracting the skin color area from the captured image
And a color area extracting means.

【００１６】つぎの発明にかかるリアルタイム表情追跡
装置は、上記の発明において、当該対象人物を撮像する
ときと同じ照明環境下で、対象人物の顔の一部の所定の
領域の画像をサンプリングする肌色サンプリング手段
と、該肌色サンプリング手段によってサンプリングした
所定の領域の画像の各画素データを前記正規化手段を用
いて正規化した後、前記データ変換手段を用いてＣ１−
Ｃ２空間の画素データに変換し、該変換した前記所定の
領域の複数の画素データを用いてＣ１データについての
最大値および最小値とＣ２データについての最大値およ
び最小値を求め、これらの最大値および最小値で前記肌
色抽出パラメータｔｈ１、ｔｈ２、ｔｈ３およびｔｈ４
を補正する肌色抽出パラメータ調整手段とをさらに備え
ることを特徴とする。 A real-time facial expression tracking device according to the next invention, in the above invention, images the target person.
In the same lighting environment as when, a predetermined part of the target person's face
Skin color sampling means for sampling a region image
And sampled by the skin color sampling means
Each pixel data of an image of a predetermined area is used by the normalizing means.
And then normalize, and then C1-
Converted to pixel data in the C2 space, and the converted predetermined data
For C1 data using multiple pixel data of the area
Maximum and minimum values and maximum and minimum values for C2 data
And minimum value, and the maximum and minimum values of the
Color extraction parameters th1, th2, th3 and th4
And a flesh color extraction parameter adjusting means for correcting
It is characterized by

【００１７】つぎの発明にかかるリアルタイム表情追跡
装置は、上記の発明において、前記頭部領域抽出手段
は、前記肌色領域抽出手段による肌色領域抽出結果から
最大領域を抽出することにより頭部領域を抽出すること
を特徴とする。 Real-time facial expression tracking according to the next invention
In the above invention, the device is the head region extracting means.
From the skin color region extraction result by the skin color region extraction means
Extracting the head region by extracting the maximum region
Is characterized by.

【００１８】つぎの発明にかかるリアルタイム表情追跡
装置は、上記の発明において、前記頭部領域抽出手段
は、前記肌色領域抽出手段による肌色領域抽出後の２値
画像に膨張収縮処理を加えることを特徴とする。 The real-time facial expression tracking device according to the next invention is the head region extracting means in the above invention.
Is a binary value after the skin color area is extracted by the skin color area extracting means.
It is characterized in that the image is subjected to expansion / contraction processing.

【００１９】つぎの発明にかかるリアルタイム表情追跡
装置は、上記の発明において、前記頭部領域抽出手段
は、前記膨張収縮処理後の２値画像と、画素値が全て肌
色に対応する論理値レベルであるマスク画像との排他的
論理和を求め、この排他的論理和がとられた画像の頭部
領域以外を非肌色に対応する論理値レベルとした画像と
前記膨張収縮処理後の２値画像との論理和を求めること
で、頭部領域全体を抽出することを特徴とする。 Real-time facial expression tracking according to the next invention
In the above invention, the device is the head region extracting means.
Is the binary image after the expansion / contraction process and all the pixel values
Exclusive with the mask image, which is the logical level corresponding to the color
The head of the image for which this exclusive OR is calculated
An image with logical values corresponding to non-skin colors other than the area
Obtaining a logical sum with the binary image after the expansion / contraction processing
Then, the entire head region is extracted.

【００２０】[0020]

【００２１】[0021]

【００２２】[0022]

【００２３】[0023]

【００２４】[0024]

【００２５】[0025]

【００２６】[0026]

【００２７】[0027]

【００２８】[0028]

【００２９】[0029]

【００３０】[0030]

【００３１】[0031]

【００３２】[0032]

【発明の実施の形態】以下に添付図面を参照して、この
発明にかかる頭部領域抽出装置およびリアルタイム表情
追跡装置の好適な実施の形態を詳細に説明する。このリ
アルタイム表情追跡装置は、本人の顔を送信する代わり
にＣＧキャラクタの映像を相手に送信することによって
人物映像を互いに通信するテレビ電話など通信システム
に適用される。BEST MODE FOR CARRYING OUT THE INVENTION Preferred embodiments of a head region extracting device and a real-time facial expression tracking device according to the present invention will be described in detail below with reference to the accompanying drawings. This real-time facial expression tracking device is applied to a communication system such as a videophone that communicates person images with each other by transmitting the image of the CG character to the other party instead of transmitting the face of the person.

【００３３】以下、本発明の実施の形態を図１〜図２０
を用いて説明する。図１は、本実施の形態のリアルタイ
ム表情追跡装置の概念的構成を示すものである。The embodiments of the present invention will be described below with reference to FIGS.
Will be explained. FIG. 1 shows a conceptual configuration of the real-time facial expression tracking device according to the present embodiment.

【００３４】この図１に示すリアルタイム表情追跡装置
は、例えばパーソナルコンピュータ、ワークステーショ
ンに実行させるプログラムの機能構成を示すものであ
る。この図１に示すリアルタイム表情追跡装置は、ビデ
オカメラ８０などの映像取込手段から入力された映像を
取り込むための映像入力手段１と、映像入力手段１を介
して入力された人物映像から頭部領域を検出する頭部領
域検出手段２と、頭部領域検出手段２で抽出された頭部
領域から両目および口となる候補領域を抽出する部位領
域候補抽出手段３と、部位領域候補抽出手段３で抽出し
た候補領域から両目、口領域を検出し、毎時変化する位
置を追跡し、さらに各部位の開閉状態を計測する部位検
出追跡手段４と、部位検出追跡手段４で検出した両目お
よび口位置から頭部の３次元姿勢および表情を計測する
頭部３次元姿勢・表情計測手段５とを備えている。The real-time facial expression tracking device shown in FIG. 1 shows the functional configuration of a program executed by a personal computer or workstation, for example. The real-time facial expression tracking device shown in FIG. 1 includes a video input unit 1 for capturing a video input from a video capturing unit such as a video camera 80, and a head from a human video input via the video input unit 1. Head region detecting means 2 for detecting a region, part region candidate extracting means 3 for extracting candidate regions for both eyes and mouth from the head region extracted by the head region detecting means 2, and part region candidate extracting means 3 Both the eyes and the mouth area are detected from the candidate area extracted in 1., the position that changes every hour is tracked, and the open / closed state of each part is measured, and the position of both eyes and the mouth detected by the area detection and tracking means 4 A head 3D posture / facial expression measuring means 5 for measuring the 3D posture and facial expression of the head.

【００３５】さらに、頭部領域検出手段２は、撮影され
る環境下（照明環境下など）で人物の肌色をサンプリン
グする肌色サンプリング手段６と、肌色サンプリング手
段６でサンプリングした肌色情報に基づいて肌色抽出パ
ラメータを調整する肌色抽出パラメータ調整手段７と、
肌色抽出パラメータ調整手段７で調整された肌色抽出パ
ラメータに基づいて入力映像から肌色画素を抽出し、抽
出した画素を塊（領域）ごとに分類する肌色領域抽出手
段８と、抽出した肌色領域の中から頭部領域を選択し頭
部領域中の穴、裂け目などの小領域などを全て埋める
（肌色に置換する）ことにより人物の頭部に関わる全て
の画素を領域として抽出する頭部領域抽出手段９とを備
えている。Further, the head region detecting means 2 is based on the skin color sampling means 6 for sampling the skin color of a person in the environment where the image is taken (such as under the lighting environment) and the skin color information sampled by the skin color sampling means 6. Skin color extraction parameter adjusting means 7 for adjusting extraction parameters,
Among the extracted skin color regions, the skin color region extraction unit 8 extracts the skin color pixels from the input image based on the skin color extraction parameter adjusted by the skin color extraction parameter adjustment unit 7 and classifies the extracted pixels into clusters (regions). Head region extracting means for selecting all the pixels relating to the head of the person as regions by selecting a head region from the region and filling all small regions such as holes and crevices in the head region (replacement with skin color) 9 and 9.

【００３６】部位領域候補抽出手段３は、頭部領域の輝
度値を平均化する頭部領域輝度平均化手段１０と、両目
および口の候補領域を抽出する画素選別手段１１とを備
えている。The part area candidate extracting means 3 comprises a head area brightness averaging means 10 for averaging the brightness values of the head area, and a pixel selecting means 11 for extracting candidate areas for both eyes and mouth.

【００３７】部位検出追跡手段４は、部位領域候補抽出
手段３で抽出された両目および口の候補領域からそれぞ
れに対応する領域を特定する部位検出手段１２と、部位
検出手段１２で検出した両目および口の初期位置を記憶
する初期位置設定手段１３と、前フレームまでに記憶し
た各部位の位置から現フレームにおける位置を検出する
部位追跡手段１４とを備えている。The part detecting / tracking means 4 specifies a region corresponding to each of the eye and mouth candidate regions extracted by the part region candidate extracting means 3, and the part detecting means 12 detects both eyes and It comprises an initial position setting means 13 for storing the initial position of the mouth, and a part tracking means 14 for detecting the position in the current frame from the position of each part stored up to the previous frame.

【００３８】頭部３次元姿勢・表情計測手段５は、初期
位置設定手段１３で設定された各部位の初期位置に基づ
き頭部３次元姿勢を求めるための基準となるアフィン基
底を設定するアフィン基底設定手段１５と、頭部の水平
軸および垂直軸周りの暫定的な回転量を求める頭部回転
量推定手段１６と、部位検出手段１２で検出した各部位
の位置からアフィン基底設定手段１５で設定した仮想３
次元空間上の点に対応する映像中の２次元の点から頭部
の３次元姿勢を計測する姿勢計測手段１７と、各部位
（両目、口）の開閉状態を計測することで表情を追跡す
る開閉状態計測手段１８とを備えている。The head three-dimensional posture / facial expression measuring means 5 sets an affine basis which is a reference for obtaining the three-dimensional head posture based on the initial position of each part set by the initial position setting means 13. The setting means 15, the head rotation amount estimating means 16 for obtaining a provisional rotation amount of the head around the horizontal axis and the vertical axis, and the position of each part detected by the part detecting means 12 are set by the affine base setting means 15. Virtual 3
The facial expression is tracked by measuring a three-dimensional posture of the head from a two-dimensional point in the image corresponding to a point in the three-dimensional space and the open / closed state of each part (both eyes and mouth). The open / close state measuring means 18 is provided.

【００３９】キャラクタ制御装置９０は、頭部３次元姿
勢・表情計測手段５から入力された頭部の３次元姿勢お
よび各部位（両目、口）の開閉状態を用いて三次元のＣ
Ｇキャラクタを制御することで、ビデオカメラ８０で撮
像した利用者の動き、表情に追従させてＣＧキャラクタ
の動き、表情をリアルタイムに変化させる。The character control device 90 uses the three-dimensional posture of the head and the open / closed state of each part (both eyes and mouth) input from the three-dimensional posture / facial expression measuring means 5 to determine the three-dimensional C.
By controlling the G character, the movement and facial expression of the CG character are changed in real time by following the movement and facial expression of the user captured by the video camera 80.

【００４０】図２は、図１のリアルタイム表情追跡装置
のキャリブレーションフェーズの動作の概要を説明する
ためのフローチャートである。図３は、図１のリアルタ
イム表情追跡装置のトラッキングフェーズの動作の概要
を説明するためのフローチャートである。これら図２お
よび図３を用いてリアルタイム表情追跡装置の動作の概
略を説明する。FIG. 2 is a flow chart for explaining the outline of the operation of the calibration phase of the real-time facial expression tracking device of FIG. FIG. 3 is a flowchart for explaining the outline of the operation of the tracking phase of the real-time facial expression tracking device of FIG. The outline of the operation of the real-time facial expression tracking device will be described with reference to FIGS. 2 and 3.

【００４１】リアルタイム表情追跡装置で行われる動作
手順には、頭部の動きを追跡するための情報として両目
および口の位置および無表情時の状態等を取得するキャ
リブレーションフェーズと、実際に頭部の動きおよび両
目および口を追跡し、頭部姿勢と両目および口の開閉状
態つまり表情を計測するトラッキングフェーズがある。The operation procedure performed by the real-time facial expression tracking device includes a calibration phase for acquiring the positions of both eyes and mouth and a state of no expression as information for tracking the movement of the head, and an actual head part. There is a tracking phase that tracks the movements of the eyes and both eyes and mouth, and measures the head posture and the open / closed state of both eyes and mouth, that is, the facial expression.

【００４２】キャリブレーションフェーズでは、まず、
映像入力手段１によってビデオカメラ８０からの映像を
キャプチャする（ステップＳ１００）。なお、人物の映
像をビデオカメラ８０で撮像する際に、ユーザに対して
「カメラに対して正面を向き、両目を開け、口を閉じ
る」ように指示することで、無表情時の人物映像を得
る。つぎに、頭部領域検出手段２において、撮影環境下
におけるユーザの肌色をサンプリングし（ステップＳ１
１０）、このサンプリングデータを用いて予め設定した
肌色抽出パラメータの調整を行う（ステップＳ１２
０）。そして、調整した肌色抽出パラメータを用いて実
際に肌色領域を抽出し（ステップＳ１３０）、抽出した
領域の中から頭部領域を検出する（ステップＳ１４
０）。次に、部位領域候補抽出手段３において、抽出し
た頭部領域から両目、口の候補領域を抽出し（ステップ
Ｓ１５０）、部位検出追跡手段４において両目領域およ
び口領域をそれぞれ検出し（ステップＳ１６０）、検出
した両目および口領域から各部位の位置、大きさ、テン
プレートの初期値を記憶する（ステップＳ１７０）。最
後に、頭部３次元姿勢・表情計測手段５において、求め
た両目および口の位置に基づき、トラッキングフェーズ
において頭部の３次元的姿勢情報を求めるためのアフィ
ン基底（３次元空間上の仮想点）を設定する（ステップ
Ｓ１８０）。In the calibration phase, first,
An image from the video camera 80 is captured by the image input means 1 (step S100). When a video image of a person is taken by the video camera 80, by instructing the user to "face the camera, open both eyes, close the mouth", a person's image without a facial expression can be obtained. obtain. Next, the head area detection unit 2 samples the skin color of the user under the photographing environment (step S1).
10), using this sampling data, the preset skin color extraction parameter is adjusted (step S12).
0). Then, the skin color area is actually extracted using the adjusted skin color extraction parameter (step S130), and the head area is detected from the extracted area (step S14).
0). Next, the part area candidate extraction means 3 extracts both eye and mouth candidate areas from the extracted head area (step S150), and the part detection and tracking means 4 detects both eye areas and mouth area, respectively (step S160). The position, size, and initial value of the template of each part from the detected eyes and mouth area are stored (step S170). Finally, in the head 3D posture / facial expression measuring means 5, based on the obtained positions of both eyes and mouth, an affine basis (a virtual point in a 3D space) for obtaining 3D posture information of the head in the tracking phase. ) Is set (step S180).

【００４３】トラッキングフェーズでは、映像入力手段
１によってビデオカメラ８０からの映像をキャプチャす
る（ステップＳ２００）。頭部領域検出手段２において
は、キャリブレーションフェーズで設定した肌色抽出パ
ラメータを用いてキャプチャした映像中から肌色を抽出
し（ステップＳ２１０）、抽出した領域から頭部領域を
検出する（ステップＳ２２０）。次に、部位領域候補抽
出手段３において、両目および口の候補領域を抽出する
（ステップＳ２３０）。つぎに、部位検出追跡手段４
は、前フレームで検出した両目および口位置に基づき、
部位領域候補抽出手段３で抽出した候補領域の中から現
フレームにおける両目および口領域を検出する（ステッ
プＳ２４０）。次に、頭部３次元姿勢・表情計測手段５
において、部位検出追跡手段４で検出した両目および口
位置（２次元画像点）と予め設定した３次元空間上の仮
想点から頭部の３次元的姿勢情報を計測し（ステップＳ
２５０）、その計測情報に基づいて両目および口の開閉
状態を計測する（ステップＳ２６０）。最後に、計測し
た両目および口の開閉状態情報及び頭部の姿勢情報はキ
ャラクタ制御装置９０に入力され、キャラクタ制御装置
９０によってＣＧキャラクタの頭部の動きおよび表情が
制御される（ステップＳ２７０）。In the tracking phase, the image input means 1 captures an image from the video camera 80 (step S200). The head area detecting means 2 extracts the skin color from the captured image using the skin color extraction parameter set in the calibration phase (step S210), and detects the head area from the extracted area (step S220). Next, the part area candidate extraction means 3 extracts candidate areas for both eyes and mouth (step S230). Next, the part detection tracking means 4
Is based on the position of both eyes and mouth detected in the previous frame,
Both eye and mouth areas in the current frame are detected from the candidate areas extracted by the part area candidate extracting means 3 (step S240). Next, the head 3D posture / facial expression measuring means 5
In step 3, the three-dimensional posture information of the head is measured from the positions of both eyes and mouth (two-dimensional image points) detected by the part detection / tracking means 4 and preset virtual points in the three-dimensional space (step S
250), and measures the open / closed state of both eyes and mouth based on the measurement information (step S260). Finally, the measured open / closed state information of both eyes and mouth and the posture information of the head are input to the character control device 90, and the movement and expression of the head of the CG character are controlled by the character control device 90 (step S270).

【００４４】［キャリブレーションフェーズ］次に、図
１のリアルタイム表情追跡装置のキャリブレーションフ
ェーズにおける動作を図４〜図１７を用いて詳細に説明
する。[Calibration Phase] Next, the operation in the calibration phase of the real-time facial expression tracking device shown in FIG. 1 will be described in detail with reference to FIGS.

【００４５】（ａ）頭部領域検出手段２での処理まず、図４〜図１０を用いて頭部領域検出手段２が行う
図２のステップＳ１１０〜Ｓ１４０の処理の詳細につい
て説明する。(A) Processing by the head area detecting means 2 First, the details of the processing of steps S110 to S140 of FIG. 2 performed by the head area detecting means 2 will be described with reference to FIGS.

【００４６】図４は、頭部領域検出手段２における肌色
サンプリング手段６の動作を説明するための図である。
図５は、肌色サンプリング手段６および肌色抽出パラメ
ータ調整手段７の動作を説明するためのフローチャート
である。FIG. 4 is a diagram for explaining the operation of the skin color sampling means 6 in the head area detecting means 2.
FIG. 5 is a flow chart for explaining the operations of the skin color sampling means 6 and the skin color extraction parameter adjusting means 7.

【００４７】まず、使用する照明環境下におけるユーザ
の肌色をサンプリングするために、図４に示すように、
キャプチャ映像１９に重ねて、サンプリング領域を指定
するためのサンプリングウィンドウ２０を表示する（ス
テップＳ３００）。次に、ユーザは、マウスあるいはそ
の他のポインティングデバイスやキーボード等を用い
て、サンプリングウィンドウ２０を頬あるいは額などの
肌色のみ抽出可能な位置に移動させ、サンプリング可能
であることをシステムに伝える（ステップＳ３１０）。
なお、最初に表示したサンプリングウィンドウ２０の位
置に合わせてユーザ自身が頭を動かして位置を調整して
も良い。First, as shown in FIG. 4, in order to sample the skin color of the user in the lighting environment used, as shown in FIG.
A sampling window 20 for designating a sampling area is displayed so as to overlap the captured image 19 (step S300). Next, the user moves the sampling window 20 to a position where only the skin color such as cheek or forehead can be extracted by using a mouse or other pointing device, a keyboard or the like, and informs the system that sampling is possible (step S310). ).
Note that the user may move his / her head to adjust the position according to the position of the sampling window 20 displayed first.

【００４８】次に、サンプリングウィンドウ２０内の全
ての画素の色を肌色抽出のための色空間（肌色モデル空
間）に写像し（ステップＳ３２０）、写像画素の写像空
間での最大値および最小値を用いて予め設定した肌色抽
出パラメータを調整する（ステップＳ３３０）。Next, the colors of all the pixels in the sampling window 20 are mapped to a color space (skin color model space) for skin color extraction (step S320), and the maximum and minimum values of the mapped pixels in the mapping space are calculated. The preset skin color extraction parameter is adjusted by using (step S330).

【００４９】ここで、肌色抽出空間は、例えば、輝度変
化に比較的ロバストな色空間を新たに構築するとか、画
素の色データ空間（Ｒ、Ｇ、Ｂ空間）上で構築するなど
の方法を用いる。ここでは、下記のような、輝度変化に
比較的ロバストな色空間を用いることにする。Here, as the skin color extraction space, for example, a color space relatively robust against a change in brightness is newly constructed, or a color data space of pixels (R, G, B space) is constructed. To use. Here, the following color space that is relatively robust to luminance changes is used.

【００５０】Ｒ（レッド），Ｇ（グリーン），Ｂ（ブル
ー）を各画素の色の３原色の成分だとすると、まず、次
式により色を正規化する。Assuming that R (red), G (green), and B (blue) are the components of the three primary colors of each pixel, the colors are first normalized by the following equation.

【００５１】ｃ１＝arctan（Ｒ／max（Ｇ，Ｂ））……式（１）ｃ２＝arctan（Ｇ／max（Ｒ，Ｂ））……式（２）ｃ３＝arctan（Ｂ／max（Ｒ，Ｇ））……式（３）[0051] c1 = arctan (R / max (G, B)) ... Equation (1) c2 = arctan (G / max (R, B)) ... Equation (2) c3 = arctan (B / max (R, G)) ... Equation (3)

【００５２】上記式で正規化した色をさらに次式で変換
する。The color normalized by the above equation is further converted by the following equation.

【００５３】Ｃ１＝ｃ２／ｃ１ ……式（４）Ｃ２＝ｃ３／ｃ２ ……式（５）[0053] C1 = c2 / c1 ... Formula (4) C2 = c3 / c2 ... Formula (5)

【００５４】肌色領域抽出手段８では、式（４）および
式（５）でＲＧＢ空間からＣ１−Ｃ２空間に変換した色
が、次式（６），（７）で定義した肌色範囲に入ってい
るか否かを判断することにより入力画像から肌色領域を
抽出する。ｔｈ１＜Ｃ１＜ｔｈ２ ……式（６）ｔｈ３＜Ｃ２＜ｔｈ４ ……式（７）In the skin color area extracting means 8, the color converted from the RGB space to the C1-C2 space by the equations (4) and (5) falls within the skin color range defined by the following equations (6) and (7). A flesh color area is extracted from the input image by determining whether or not there is any. th1 <C1 <th2 ... Equation (6) th3 <C2 <th4 ... Equation (7)

【００５５】肌色抽出パラメータ調整手段７では、この
肌色抽出の際に用いる肌色抽出パラメータ（閾値）ｔｈ
１〜ｔｈ４を、肌色サンプリング手段６のサンプリング
データを用いて異なる照明条件あるいは各人の肌色の違
いに適応して可変するようにしている。すなわち、肌色
抽出パラメータ調整手段７は、肌色サンプリング手段６
でサンプリングした画素のＲＧＢデータをＣ１-Ｃ２空
間に写像し、その時の最大値、最小値をＣ１、Ｃ２につ
いてそれぞれ求め、Ｃ１についての最小値で閾値ｔｈ１
を、Ｃ１についての最大値で閾値ｔｈ２を、Ｃ２につい
ての最小値で閾値ｔｈ３を、Ｃ２についての最大値で閾
値ｔｈ４を夫々変更する。The skin color extraction parameter adjusting means 7 uses the skin color extraction parameter (threshold value) th used in the skin color extraction.
1 to th4 are made variable by using the sampling data of the skin color sampling means 6 in accordance with different illumination conditions or the difference in skin color of each person. That is, the skin color extraction parameter adjusting means 7 is the skin color sampling means 6
The RGB data of the pixels sampled in 1. is mapped to the C1-C2 space, the maximum value and the minimum value at that time are obtained for C1 and C2, and the minimum value for C1 is the threshold value th1.
, The threshold value th2 is changed with the maximum value for C1, the threshold value th3 is changed with the minimum value for C2, and the threshold value th4 is changed with the maximum value for C2.

【００５６】以上のように、使用する照明環境下におい
て利用者の肌色をサンプリングすることにより肌色抽出
性能を向上させることができ、また、照明の輝度変化に
頑強な色空間を用いることにより簡易なパラメータ調整
でも肌色抽出性能をさらに向上させることが可能とな
る。As described above, the flesh color extraction performance can be improved by sampling the flesh color of the user in the lighting environment in which it is used, and a simple color space can be provided by using a robust color space for changes in the brightness of the illumination. It is possible to further improve the skin color extraction performance even by adjusting the parameters.

【００５７】次に図６〜図１０を用いて肌色領域抽出手
段８および頭部領域抽出手段９の動作を説明する。図６
は、肌色領域抽出手段８と頭部領域抽出手段９の動作を
説明するためのフローチャートである。Next, the operations of the skin color area extracting means 8 and the head area extracting means 9 will be described with reference to FIGS. Figure 6
3 is a flow chart for explaining the operations of the skin color area extraction means 8 and the head area extraction means 9.

【００５８】肌色抽出パラメータ調整手段７で調整した
肌色抽出パラメータを用いてもなお照明環境によっては
顔の一部にハイライトが発生したり、皺や影などにより
頭部領域を肌色抽出のみで正確に抽出することは困難で
ある。そのため、肌色領域抽出手段８で抽出された肌色
領域の中で最も大きい領域を頭部領域として判定し、抽
出漏れによる穴や裂け目などの目、鼻、口などの部位以
外の小領域を頭部領域から除去する頭部領域修復処理を
行うことにより頭部全体を適切に抽出可能とする。Even if the skin color extraction parameters adjusted by the skin color extraction parameter adjusting means 7 are used, highlights may still occur on a part of the face depending on the lighting environment, and the head region may be accurately extracted only by skin color extraction due to wrinkles or shadows. It is difficult to extract. Therefore, the largest area of the skin color areas extracted by the skin color area extraction unit 8 is determined as the head area, and small areas other than the areas such as holes and crevices due to omission of extraction, the nose, and the mouth are head areas. By performing the head region repair process for removing from the region, the entire head can be appropriately extracted.

【００５９】肌色領域抽出手段８においては、キャプチ
ャした画像の全ての画素の色データを肌色モデル空間に
写像し（ステップＳ４００）、式（６）および式（７）
で定めた閾値ｔｈ１〜ｔｈ４内にある画素を抽出し（ス
テップＳ４１０）、抽出した画素を４連結あるいは８連
結で統合するラベリング処理（連続した図形をグループ
分けして番号付けする処理）を実行することにより、個
々のブロック領域（塊）に領域分割する（ステップＳ４
２０）。そして、ラベリング処理の結果、得られるブロ
ック領域の中から面積（画素数）が最大の領域を選択
し、それを頭部領域とする（ステップＳ４３０）。The skin color area extracting means 8 maps the color data of all the pixels of the captured image onto the skin color model space (step S400), and formulas (6) and (7) are used.
Pixels within the threshold values th1 to th4 defined in step S410 are extracted (step S410), and a labeling process for integrating the extracted pixels into four connections or eight connections (processing for grouping consecutive figures into numbers) is executed. By doing so, the area is divided into individual block areas (lumps) (step S4).
20). Then, as a result of the labeling process, a region having the largest area (number of pixels) is selected from the obtained block regions, and the selected region is set as the head region (step S430).

【００６０】図７に、このようにして選択された頭部領
域を含む画像を示す。この時点では、ハイライトや影、
両目、口、鼻などの暗い部分が抽出されていないため、
頭部領域には、図７に示すように、穴や裂け目などの小
領域２１が発生している場合が多い。FIG. 7 shows an image including the head region thus selected. At this point, highlights and shadows,
Since dark parts such as eyes, mouth, and nose are not extracted,
As shown in FIG. 7, small areas 21 such as holes and crevices often occur in the head area.

【００６１】そこで、頭部領域抽出手段９は、まず裂け
目部分を修復する。裂け目部分の修復は、肌色領域抽出
後の肌色画素を１、それ以外を０とした２値画像に対し
て、膨張収縮処理を行うことで達成する。膨張収縮処理
は、図８に示すような膨張マスク２２および収縮マスク
２３を設定し、以下の膨張処理と収縮処理を繰り返し行
うことで、前述の裂け目や小さい穴などを埋めるもので
ある。膨張処理は、注目画素の近傍の画素値を膨張マス
ク２２で設定した画素値に置き換えることにより領域を
膨張させるものである。収縮処理は、注目画素の近傍画
素の内、収縮マスク２３で設定した０でない画素の画素
値が収縮マスク２３の画素値と同値である場合に注目画
素を残し、同値で無い場合に注目画素の値を０とするこ
とにより領域を収縮するものである。上記膨張収縮処理
により、図９（ａ）に示すような裂け目２４が修復さ
れ、図９（ｂ）のようになる。また、この処理により微
小の穴も埋めることが可能である。Therefore, the head region extracting means 9 first repairs the crevice. The repair of the torn portion is achieved by performing expansion / contraction processing on the binary image in which the skin color pixels after the skin color region extraction are set to 1 and the rest are set to 0. In the expansion / contraction process, the expansion mask 22 and the contraction mask 23 as shown in FIG. 8 are set, and the expansion process and the contraction process described below are repeatedly performed to fill the above-mentioned cracks and small holes. The expansion processing expands the area by replacing the pixel value in the vicinity of the pixel of interest with the pixel value set by the expansion mask 22. The contraction processing leaves the target pixel when the pixel value of the non-zero pixel set by the contraction mask 23 is the same value as the pixel value of the contraction mask 23 among the neighboring pixels of the target pixel, and when the pixel value is not the same, By setting the value to 0, the area is shrunk. By the expansion and contraction process, the crevice 24 as shown in FIG. 9A is repaired, and it becomes as shown in FIG. 9B. Further, this processing can also fill a minute hole.

【００６２】膨張収縮処理により頭部領域に発生した裂
け目が修復されたことにより、後は頭部領域内の全ての
穴に対応する小領域を埋めることにより頭部全体を一領
域として抽出することが可能となる。この穴埋め処理に
は、図１０に示すような、論理演算処理が用いられる。Since the rift generated in the head region has been repaired by the expansion / contraction process, the entire head is extracted as one region by filling the small regions corresponding to all the holes in the head region. Is possible. A logical operation process as shown in FIG. 10 is used for this filling process.

【００６３】まず、裂け目修復処理により得られた頭部
領域画像２６と、画素値が全て１のマスク２７との排他
的論理和を求める。その結果、背景領域と頭部領域内の
穴が得られる。次に、得られた画像２８から、画像の外
辺に接している領域（背景領域）を除去し、除去した画
像２９と元の頭部領域画像２６との論理和を求めること
により、頭部全体を一領域として抽出することが可能と
なる（３０が論理和がとられた画像、ステップＳ４４
０）。First, the exclusive OR of the head region image 26 obtained by the rip repair processing and the mask 27 having all 1 pixel values is obtained. As a result, holes in the background area and the head area are obtained. Next, the area (background area) in contact with the outer edge of the image is removed from the obtained image 28, and the logical sum of the removed image 29 and the original head area image 26 is obtained to obtain the head. It is possible to extract the whole as one area (30 is an image obtained by logical OR, step S44).
0).

【００６４】このように、簡単な論理演算処理により頭
部領域を抽出できるので、高速処理が可能となる。As described above, since the head region can be extracted by the simple logical operation processing, the high speed processing becomes possible.

【００６５】（ｂ）部位領域候補抽出手段３での処理つぎに、図１１および図１２を用いて部位領域候補抽出
手段３が行う図２のステップＳ１５０の処理の詳細につ
いて説明する。図１１は、部位領域候補抽出手段３の動
作を説明するためのフローチャートである。(B) Processing by the part area candidate extracting means 3 Next, the details of the processing of step S150 of FIG. 2 performed by the part area candidate extracting means 3 will be described with reference to FIGS. 11 and 12. FIG. 11 is a flow chart for explaining the operation of the part region candidate extraction means 3.

【００６６】部位領域候補抽出手段３では、照明条件が
変化することに応じた輝度変化に頑強に対応可能とする
ために、頭部領域検出手段２によって抽出された頭部領
域に対して適応型ヒストグラム平均化法を用いて頭部領
域のコントラストを一定に保つ処理を行う。まず、頭部
領域輝度平均化手段１０は、頭部領域の外接矩形を求
め、その外接矩形領域を例えば８×８の小領域に分割す
る（ステップＳ５００）。つぎに、頭部領域輝度平均化
手段１０は、各小領域毎にヒストグラム平均化処理を行
う（ステップＳ５１０）。The region candidate extraction unit 3 is adaptive to the head region extracted by the head region detection unit 2 in order to robustly cope with a change in brightness according to a change in illumination condition. The histogram averaging method is used to keep the contrast of the head region constant. First, the head area brightness averaging means 10 obtains a circumscribed rectangle of the head area and divides the circumscribed rectangular area into, for example, 8 × 8 small areas (step S500). Next, the head region brightness averaging means 10 performs a histogram averaging process for each small region (step S510).

【００６７】ヒストグラム平均化処理は、次のようにし
て行う。まず、各小領域毎に画素値と頻度の関係を示す
ヒストグラムを求める。次に、累積頻度（頻度の各階級
（画素値）までの累計）を求め、各累積頻度を累積頻度
の最大値で割って、各累積頻度の比率を求める。そし
て、求めた比率に小領域内の画素値の最大値を掛け合わ
せ、四捨五入により小数点以下を丸める。ここで得られ
た値が、平均化後の画素値となる。最後に、平均化後の
画素値の頻度を、平均化前の頻度から求める。The histogram averaging process is performed as follows. First, a histogram showing the relationship between the pixel value and the frequency is obtained for each small area. Next, the cumulative frequency (the cumulative total of each frequency up to each class (pixel value)) is obtained, and each cumulative frequency is divided by the maximum cumulative frequency to obtain the ratio of each cumulative frequency. Then, the obtained ratio is multiplied by the maximum value of the pixel values in the small area and rounded off to the right of the decimal point. The value obtained here becomes the pixel value after averaging. Finally, the frequency of pixel values after averaging is obtained from the frequency before averaging.

【００６８】例えば、図１３に示すように小領域内の画
素値が０から７の範囲内にあり、その頻度が図１３に示
す通りであった場合、平均化後のそれぞれの画素値の頻
度は図１４に示す通りになる。例えば、平均化後の画素
値が４の場合、画素値４に対応する平均前の画素値は２
と３であるため、その頻度は、９＋２＝１１となる。For example, when the pixel values in the small area are in the range of 0 to 7 as shown in FIG. 13 and the frequency is as shown in FIG. 13, the frequency of each pixel value after averaging. Is as shown in FIG. For example, when the pixel value after averaging is 4, the pixel value before averaging corresponding to the pixel value 4 is 2
And 3, the frequency is 9 + 2 = 11.

【００６９】ここで、上記のとおり適応型ヒストグラム
平均化法では、特にコントラストが低い小領域におい
て、領域内の大半の画素値がヒストグラムの極大点に割
り当てられることから、ノイズが多く発生する可能性が
ある。そこで、図１２（ａ）に示すようにある閾値を超
えた頻度をもつ画素値３１が存在する場合には、図１２
（ｂ）に示すように、それらの頻度を他の画素値に分散
させる処理を行うようにしており、これによりノイズの
発生を抑えることが可能である。Here, in the adaptive histogram averaging method as described above, most pixel values in the region are assigned to the maximum points of the histogram, especially in a small region where the contrast is low, so that a lot of noise may occur. There is. Therefore, if there is a pixel value 31 having a frequency exceeding a certain threshold as shown in FIG.
As shown in (b), a process of dispersing those frequencies to other pixel values is performed, which can suppress the generation of noise.

【００７０】以上の処理により、常に一定のコントラス
トを得られることから、画素選別手段１１では、一定の
閾値ｔｈａを用い、頭部領域内の輝度値が閾値ｔｈａ以
下の画素（暗い画素）を論理レベル１とし、それ以外を
論理レベル０とし（ステップＳ５２０）、さらに、画素
値が１の画素を４連結あるいは８連結で結合し領域分割
する（ステップＳ５３０）。最後に、微小領域を除去す
ることにより、各部位（両目と口と鼻）の候補領域を抽
出できる（ステップＳ５４０）。Since a constant contrast can always be obtained by the above processing, the pixel selection means 11 uses a constant threshold value tha and logically determines pixels (dark pixels) whose luminance value in the head region is equal to or lower than the threshold value tha. Level 1 is set, and the others are set to logical level 0 (step S520). Further, pixels having a pixel value of 1 are connected by 4 connection or 8 connection to divide into regions (step S530). Finally, by removing the minute area, the candidate area of each part (both eyes, mouth, and nose) can be extracted (step S540).

【００７１】以上のように、頭部全体を一領域として抽
出し、その頭部領域のコントラストを常に一定にする処
理を施すことにより、両目や口の部位領域の抽出処理を
固定の閾値ｔｈａを用いて実行することができる。した
がって、高速処理が可能となり、かつ輝度変化に頑強な
システムを構築することができる。As described above, the whole head is extracted as one area, and the processing for making the contrast of the head area always constant is performed. Can be performed using. Therefore, it is possible to construct a system that enables high-speed processing and is robust against changes in brightness.

【００７２】（ｃ）部位検出追跡手段４での処理次に、図１５および図１６を用いて部位検出追跡手段４
がキャリブレーションフェーズにおいて行う図２のステ
ップＳ１６０およびＳ１７０の動作を説明する。図１５
は、キャリブレーションフェーズにおける部位検出追跡
手段４の動作を説明するためのフローチャートである。(C) Processing by the part detecting / tracking means 4 Next, the part detecting / tracking means 4 will be described with reference to FIGS. 15 and 16.
The operation of steps S160 and S170 of FIG. 2 performed in the calibration phase will be described. Figure 15
6 is a flowchart for explaining the operation of the part detection / tracking means 4 in the calibration phase.

【００７３】まず、部位検出手段１２は、頭部領域検出
手段２で抽出した頭部領域の重心を求める（ステップＳ
６００）。この重心位置は、周知の距離変換処理などを
用いて求める。First, the part detecting means 12 obtains the center of gravity of the head area extracted by the head area detecting means 2 (step S).
600). The position of the center of gravity is obtained by using a known distance conversion process or the like.

【００７４】距離変換処理とは、画像中のオブジェクト
の各画素値を、各画素位置から背景領域への最短距離に
置き換える変換処理である。距離の概念としては、最も
単純な市街地距離（４連結距離）とチェス盤距離（８連
結距離）がよく使われる。ここでは、市街地距離を用い
たアルゴリズムを説明する。The distance conversion process is a conversion process for replacing each pixel value of the object in the image with the shortest distance from each pixel position to the background area. As the concept of distance, the simplest city distance (4 connection distance) and chessboard distance (8 connection distance) are often used. Here, an algorithm using the city distance will be described.

【００７５】Step1. まず、入力画像を二値化した各画
素データをｆ_i,jとし、Ｄ_i,jを初期化変換された多値デ
ータとした場合、次のように初期化変換する。すなわ
ち、画素値が１の頭部領域内の画素は、多値データ∞
（実際には、１００などの大きな値）に置換し、画素値
が０の背景画素は、０に置換する。Step 1. First, assuming that each pixel data obtained by binarizing the input image is f _{i, j} and D _{i, j} is the initialized multi-valued data, initialization conversion is performed as follows. That is, the pixels in the head region having a pixel value of 1 are multivalued data ∞
(Actually, a large value such as 100) is replaced, and a background pixel having a pixel value of 0 is replaced with 0.

【数１】 [Equation 1]

【００７６】Step2. 初期化した画像を左上から右下に
向かって走査し、次の規則で逐次Ｄ´_i,jを更新する。Ｄ″_i,j＝min（Ｄ′_i,j，Ｄ″_i-1,j＋１，Ｄ″_i,j-1＋１）……式（９）Step 2. The initialized image is scanned from the upper left to the lower right, and D' _{i, j} is sequentially updated according to the following rule. D ″ _{i, j} = min (D′ _{i, j} , D ″ _{i-1, j} + 1, D ″ _{i, j-1} + 1) ... Equation (9)

【００７７】Step3. 先のStep2で得られたＤ″_i,jに対
して、右下から左上に向かって走査し、次の規則で逐次
Ｄ″_i,jを更新する。Ｄ_i,j＝min（Ｄ′_i,j，Ｄ″_i+1,j＋１，Ｄ″_i,j+1＋１）……式（１０）Step 3. With respect to D ″ _{i, j} obtained in the previous Step 2, scanning is performed from the lower right side to the upper left side, and D ″ _{i, j} is sequentially updated according to the following rule. D _{i, j} = min (D ′ _{i, j} , D ″ _{i + 1, j} +1, D ″ _{i, j + 1} +1) ... Equation (10)

【００７８】上式（１０）によって得られたＤ_i,jが距
離画像の各画素データとなる。したがって、これら得ら
れた距離画像から、距離値が最大となる画素を求め、こ
の画素を頭部領域の重心とする。D _{i, j} obtained by the above equation (10) becomes each pixel data of the range image. Therefore, a pixel having the maximum distance value is obtained from these obtained distance images, and this pixel is set as the center of gravity of the head region.

【００７９】距離画像変換の特徴は、領域の形が変化し
ても安定した重心位置を求めることがある。なお、距離
画像変換を用いず、画素の座標値の平均により重心を求
めても良い。A feature of the range image conversion is that a stable center of gravity position can be obtained even if the shape of the area changes. The center of gravity may be obtained by averaging the coordinate values of pixels without using the distance image conversion.

【００８０】部位検出手段１２は、部位領域候補抽出手
段３で抽出された両目、口、鼻についての候補領域の中
から、先のステップＳ６００で求めた頭部領域の重心に
最も近い候補領域を鼻領域とみなす（ステップＳ６１
０）。The part detecting means 12 selects the candidate area closest to the center of gravity of the head area obtained in the previous step S600 from the candidate areas for both eyes, mouth and nose extracted by the part area candidate extracting means 3. Consider as nose area (step S61)
0).

【００８１】つぎに、部位検出手段１２は、図１６に示
すように、上記特定した鼻領域から一定の方向と距離に
頭部領域の大きさに比例した大きさの左目マスク３３、
右目マスク３４、口マスク３５を設定する。Next, as shown in FIG. 16, the part detecting means 12 causes the left-eye mask 33 having a size proportional to the size of the head region in a certain direction and distance from the specified nose region,
The right eye mask 34 and the mouth mask 35 are set.

【００８２】設定したマスク領域の中からそれぞれ重心
位置に最も近い領域をそれぞれ右目、左目、口領域とす
る（ステップＳ６２０）。Among the set mask areas, the areas closest to the barycentric position are set as the right eye, the left eye, and the mouth area, respectively (step S620).

【００８３】次に初期位置設定手段１３において、各部
位領域の中心位置と両目の外側の端点３６ａ、３７ａの
位置を記憶する（ステップＳ６３０）。最後に、右目、
左目および鼻に関する検出領域のうち、右目、左目、口
領域内の画素値を１とし、それ以外を０とした部位領域
マスク画像を各部位について夫々生成し、これらの部位
領域マスク画像を記憶する。（ステップＳ６４０）。こ
の部位領域マスク画像は、トラッキングフェーズでの第
１番目のフレームについての部位追跡処理に用いられ
る。また、部位検出手段１２は、各部位領域（左目、右
目、口）の、中心位置における画像垂直方向（Ｙ方向）
の長さを測定し、これら測定値を初期位置設定手段１３
に記憶する。この記憶された各部位領域（左目、右目、
口）の画像垂直方向（Ｙ方向）の長さは、その後のトラ
ッキングフェーズで、各部位の開閉状態情報を得るため
に利用される。Next, the initial position setting means 13 stores the center position of each region and the positions of the outer end points 36a and 37a of both eyes (step S630). Finally, the right eye,
Of the detection regions related to the left eye and the nose, a region area mask image in which the pixel values in the right eye, left eye, and mouth region are set to 1 and other pixels are set to 0 is generated for each region, and these region region mask images are stored. . (Step S640). This part region mask image is used for part tracking processing for the first frame in the tracking phase. In addition, the part detecting unit 12 uses the image vertical direction (Y direction) at the center position of each part region (left eye, right eye, mouth).
Of the initial position setting means 13
Remember. This memorized part region (left eye, right eye,
The length of the mouth in the image vertical direction (Y direction) is used to obtain the open / closed state information of each part in the subsequent tracking phase.

【００８４】（ｄ）頭部３次元姿勢・表情計測手段５で
の処理次に、図１７〜図１９を用いて頭部３次元姿勢・表情計
測手段５がキャリブレーションフェーズにおいて行う図
２のステップＳ１８０の動作を説明する。図１７は、キ
ャリブレーションフェーズにおける頭部３次元姿勢・表
情計測手段５の動作を説明するためのフローチャートで
ある。(D) Processing by Head 3D Posture / Facial Expression Measuring Means 5 Next, with reference to FIGS. 17 to 19, the steps of FIG. 2 performed by the head 3D posture / facial expression measuring means 5 in the calibration phase. The operation of S180 will be described. FIG. 17 is a flowchart for explaining the operation of the three-dimensional head posture / facial expression measuring means 5 in the calibration phase.

【００８５】アフィン基底設定手段１５は、図１８に示
すように、部位検出追跡手段４で求めた両目の外側の端
点３６ａ，３７ａを結ぶ直線３８を求める（ステップＳ
７００）。次に、左目あるいは右目どちらかの端点を基
準に直線３８が水平になるように画像を回転させる（ス
テップＳ７１０）。そして、口の中心位置を通り、求め
た直線に平行でかつ同じ長さの直線３９を求める（ステ
ップＳ７２０）。この２つの直線３８，３９の両端点、
すなわち４点３６ａ，３７ａ，３６ｂ，３７ｂでできる
矩形の中心座標４０を求める（ステップＳ７３０）。さ
らに、矩形３９の中心４０を基準に、矩形の４頂点の相
対座標を求め、これらを３次元空間上の仮想点として記
憶する（ステップＳ７４０）。As shown in FIG. 18, the affine basis setting means 15 obtains a straight line 38 connecting the outer end points 36a and 37a of both eyes obtained by the part detection / tracking means 4 (step S).
700). Next, the image is rotated so that the straight line 38 becomes horizontal based on the end point of either the left eye or the right eye (step S710). Then, a straight line 39 that passes through the center position of the mouth and is parallel to the obtained straight line and has the same length is obtained (step S720). The end points of these two straight lines 38, 39,
That is, the center coordinates 40 of the rectangle formed by the four points 36a, 37a, 36b, 37b are obtained (step S730). Further, the relative coordinates of the four vertices of the rectangle are obtained with reference to the center 40 of the rectangle 39, and these are stored as virtual points in the three-dimensional space (step S740).

【００８６】この３次元空間上の仮想点は、トラッキン
グフェーズにおける頭部３次元姿勢計測のための基準点
となる。The virtual point on the three-dimensional space serves as a reference point for measuring the three-dimensional head posture in the tracking phase.

【００８７】次に、頭部回転量推定手段１６は、図１９
に示すように、両目の端点３６ａ，３７ａを結ぶ直線を
Ｘ軸、口の中心を通りＸ軸に垂直な直線をＹ軸として座
標系を規定し、頭部領域に外接する外接矩形のＸ軸方向
の長さを１としたときに、左目あるいは右目の内側の端
点と外接矩形の左右の辺との距離Ｌａ，Ｌｂを求める
（ステップＳ７５０）。同様に、外接矩形のＹ軸方向の
長さを１としたときに、口の中心位置から外接矩形の上
下の辺までの距離Ｌｃ、Ｌｄを求める（ステップＳ７６
０）。Next, the head rotation amount estimating means 16 is operated as shown in FIG.
As shown in, the coordinate system is defined with a straight line connecting the end points 36a and 37a of the eyes as the X axis, and a straight line passing through the center of the mouth and perpendicular to the X axis as the Y axis, and the X axis of a circumscribed rectangle circumscribing the head region. When the length in the direction is 1, the distances La and Lb between the inner end point of the left eye or the right eye and the left and right sides of the circumscribed rectangle are obtained (step S750). Similarly, when the length of the circumscribed rectangle in the Y-axis direction is 1, the distances Lc and Ld from the center position of the mouth to the upper and lower sides of the circumscribed rectangle are obtained (step S76).
0).

【００８８】この相対位置がトラッキングフェーズにお
ける頭部の上下左右方向の回転量を予測するための基準
となる。This relative position serves as a reference for predicting the amount of rotation of the head in the vertical and horizontal directions in the tracking phase.

【００８９】以上がキャリブレーションフェーズにおけ
るリアルタイム表情追跡装置の動作である。The above is the operation of the real-time facial expression tracking device in the calibration phase.

【００９０】［トラッキングフェーズ］次に、図１のリ
アルタイム表情追跡装置のトラッキングフェーズにおけ
る動作を図２０〜図２９を用いて詳細に説明する。[Tracking Phase] Next, the operation in the tracking phase of the real-time facial expression tracking device in FIG. 1 will be described in detail with reference to FIGS.

【００９１】（ａ）′頭部領域検出手段２での処理頭部領域検出手段２では、肌色領域抽出手段８と頭部領
域抽出手段９を動作させることで、映像入力手段１を介
して所定のフレームレートで順次入力される現フレーム
の映像に対し、キャリブレーションフェーズ同様の処理
を行い、肌色領域を抽出し、頭部領域を抽出する（図３
ステップＳ２００〜Ｓ２２０）。ただし、このトラッキ
ングフェーズでは、肌色サンプリング手段６による肌色
サンプリングおよび肌色抽出パラメータ調整手段７によ
る肌色パラメータの調整は行わない。(A) 'Processing by the head area detecting means 2 In the head area detecting means 2, the skin color area extracting means 8 and the head area extracting means 9 are operated to make a predetermined operation via the image inputting means 1. The same processing as in the calibration phase is performed on the images of the current frame that are sequentially input at the frame rate of 1 to extract the skin color area and the head area (see FIG. 3).
Steps S200 to S220). However, in this tracking phase, the skin color sampling by the skin color sampling unit 6 and the skin color parameter adjustment by the skin color extraction parameter adjusting unit 7 are not performed.

【００９２】（ｂ）′部位領域候補抽出手段３での処理部位領域候補抽出手段３では、キャリブレーションフェ
ーズと同様の処理を実行することにより、現フレームの
映像から部位（目、口、鼻）領域候補を抽出する（図３
ステップＳ２３０）。すなわち、頭部領域検出手段２に
よって抽出された頭部領域に対して適応型ヒストグラム
平均化法を用いて頭部領域のコントラストを一定に保つ
処理を行い、さらに、一定の閾値ｔｈａを用い、頭部領
域内の輝度値が閾値ｔｈａ以下の画素（暗い画素）を
１、それ以外を０とし、さらに、画素値が１の画素を４
連結あるいは８連結で結合して領域分割し、最後に、微
小領域を除去することにより、各部位（両目と口と鼻）
の候補領域を抽出する。(B) 'Processing by the region candidate extraction unit 3 The region candidate extraction unit 3 executes the same process as in the calibration phase to obtain the region (eye, mouth, nose) from the image of the current frame. Region candidates are extracted (Fig. 3
Step S230). That is, the head region extracted by the head region detecting means 2 is subjected to a process of keeping the contrast of the head region constant by using the adaptive histogram averaging method, and further, a constant threshold value tha is used to Pixels (dark pixels) whose brightness value is less than or equal to the threshold value tha in the partial area are set to 1, other pixels are set to 0, and pixels having a pixel value of 1 are 4
Each area (both eyes, mouth and nose) is divided by connecting or connecting 8 areas to divide the area and finally removing the micro area.
The candidate area of is extracted.

【００９３】（ｃ）′部位検出追跡手段４での処理図２０〜図２３を用いて部位検出追跡手段４のトラッキ
ングフェーズにおける動作を詳細に説明する。図２０お
よび図２１は、トラッキングフェーズにおける部位検出
追跡手段４の動作を説明するためのフローチャートであ
る。(C) 'Processing by the part detecting / tracking means 4 The operation of the part detecting / tracking means 4 in the tracking phase will be described in detail with reference to FIGS. 20 and 21 are flowcharts for explaining the operation of the part detecting / tracking means 4 in the tracking phase.

【００９４】部位追跡手段１４では、記憶されている前
フレームについての部位領域の中心座標を中心に、一定
の大きさの矩形領域を設定する。その矩形領域に存在す
る現フレームの候補領域を求める（ステップＳ８２
０）。つぎに、各候補領域に対して次に示すような判別
式（１１）を用いて評価値Ｅを得る。The region tracking means 14 sets a rectangular region of a certain size centered on the center coordinates of the region region of the stored previous frame. A candidate area of the current frame existing in the rectangular area is obtained (step S82).
0). Next, the evaluation value E is obtained using the following discriminant (11) for each candidate area.

【数２】 [Equation 2]

【００９５】ここで、Ｅは評価値、ＳＰは前フレームに
おける部位領域の画素数、ＳＣは現フレームにおける候
補領域の画素数、ＯＰは現フレームにおける候補領域の
マスク画像（候補領域の画素のみが１で、それ以外は０
の画像）と前フレームにおける部位領域のマスク画像
（部位領域の画素のみが１で、それ以外は０の画像）と
の排他的論理和を求めたときに、画素値が１となる画素
数、Ｄは前フレームにおける部位領域の中心と候補領域
の中心との距離である。Here, E is the evaluation value, SP is the number of pixels in the region of the previous frame, SC is the number of pixels of the candidate region in the current frame, and OP is the mask image of the candidate region in the current frame (only the pixels in the candidate region are 1 and 0 otherwise
Image) and the mask image of the part region in the previous frame (image of which only 1 is the pixel of the part region and is 0 otherwise), the number of pixels having a pixel value of 1, D is the distance between the center of the part region and the center of the candidate region in the previous frame.

【００９６】上記式（１１）で求めた値Ｅが最も小さい
ものを対象領域として選択することにより、前フレーム
の部位領域の位置を基準とした一定範囲内に存在する現
フレームの候補領域の中から対象領域を特定する（ステ
ップＳ８３０）。すなわち、図２２に示すような小さい
ノイズ領域４７が前フレームの部位領域に完全に包含さ
れたとしても、その場合は式（１１）の｜ＳＰ−ＳＣ｜
とＯＰの値が大きくなるため、このようなノイズ領域を
除去できる。By selecting, as the target area, the one having the smallest value E calculated by the above equation (11), among the candidate areas of the current frame existing within a certain range based on the position of the partial area of the previous frame. The target area is specified from (step S830). That is, even if the small noise region 47 as shown in FIG. 22 is completely included in the partial region of the previous frame, in that case, | SP-SC | of the equation (11).
Since the value of and OP becomes large, such a noise area can be removed.

【００９７】このような処理を、左目、右目、口の領域
について夫々実行する（ステップＳ８１０〜Ｓ８４
０）。Such processing is executed for each of the left eye, right eye and mouth areas (steps S810 to S84).
0).

【００９８】以上の処理により全ての部位を検出できた
場合は、部位領域マスク画像を、現在のフレームのもの
で更新し、かつ各部位（左目、右目、口）についての検
出領域の中心位置を求め、これを記憶する（ステップＳ
８５０およびＳ８６０）。When all the parts can be detected by the above processing, the part area mask image is updated with that of the current frame, and the center position of the detection area for each part (left eye, right eye, mouth) is set. Obtain and store this (step S
850 and S860).

【００９９】見つからない部位が存在した場合は（ステ
ップＳ８７０）、現フレームで検出された部位の移動ベ
クトルから、検出できなかった部位の現フレームでの位
置を予測する。例えば、図２３に示すように、現フレー
ムで検出できなかった部位（対象部位）５４が存在した
場合、現フレームで検出された他の部位４８の位置とそ
の部位の前フレームでの位置４９からフレーム間の移動
ベクトル５０を求める。そして、対象部位５４の前フレ
ームにおける位置５１に、他の部位の検出位置から求め
た移動ベクトル５０を加算して、現フレームでの推定位
置を求める（ステップＳ８９０）。そして、求めた位置
を含む所定の矩形領域（例えば１６×１６）５３中の画
素に着目し、この矩形領域中の画素に対し前述したステ
ップＳ８２０およびステップＳ８３０の処理を実行する
ことで、対象部位５４を検出する（ステップＳ９０
０）。If there is a part that cannot be found (step S870), the position in the current frame of the part that cannot be detected is predicted from the movement vector of the part detected in the current frame. For example, as shown in FIG. 23, when there is a site (target site) 54 that could not be detected in the current frame, from the position of another site 48 detected in the current frame and the position 49 of the site in the previous frame. A movement vector 50 between frames is obtained. Then, the movement vector 50 obtained from the detection positions of other regions is added to the position 51 of the target region 54 in the previous frame to obtain the estimated position in the current frame (step S890). Then, paying attention to a pixel in a predetermined rectangular area (for example, 16 × 16) 53 including the obtained position, and executing the processing of step S820 and step S830 described above on the pixel in this rectangular area, 54 is detected (step S90)
0).

【０１００】矩形領域５３内に全く候補領域が存在しな
い場合は、顔の傾きなどによる隠れが生じているものと
し、ステップＳ８９０で推定した位置を現フレームでの
対象部位の位置とし、矩形領域５３自体をその部位領域
として記憶する（ステップＳ９１０，Ｓ９２０）。If there is no candidate area in the rectangular area 53, it is assumed that the face is hidden due to the inclination of the face, and the position estimated in step S890 is set as the position of the target portion in the current frame. It stores itself as its part area (steps S910 and S920).

【０１０１】なお、ステップＳ８７０で、現フレームの
部位領域を全く検出できなかった場合は、部位検出手段
１２によって図１５のステップＳ６００〜Ｓ６４０の処
理を再度行い、部位領域を再検出する（ステップＳ８８
０）。If the part region of the current frame is not detected at all in step S870, the part detecting means 12 performs the processes of steps S600 to S640 of FIG. 15 again to detect the part region again (step S88).
0).

【０１０２】このように、部位を１つ検出できれば、他
の部位を検出漏れしても、検出した部位の移動ベクトル
から検出漏れした部位の現フレームでの位置を予測して
いるので、頑強な部位追跡が行える。さらに、隠れなど
により映像中に対象となる部位が現れない場合でも暫定
的な部位領域を設定することから、隠れた部位が出現し
たときにその部位を即座に追跡可能となり、つまりは、
頭部の各部位の滑らかな動きを再現可能となる。As described above, if one part can be detected, even if the other part is missed, the position of the missed part in the current frame is predicted from the movement vector of the detected part. Part tracking is possible. Furthermore, even if the target part does not appear in the image due to hiding, etc., since the temporary part region is set, it becomes possible to immediately trace the part when the hidden part appears, that is,
The smooth movement of each part of the head can be reproduced.

【０１０３】（ｄ）′頭部３次元姿勢・表情計測手段５
での処理次に、図２４〜図２９を用いて頭部３次元姿勢・表情計
測手段５のトラッキングフェーズにおける動作を詳細に
説明する。図２４および図２７は、トラッキングフェー
ズにおける頭部３次元姿勢・表情計測手段５の動作を説
明するためのフローチャートである。(D) 'Head 3D posture / facial expression measuring means 5
Next, the operation of the head three-dimensional posture / facial expression measuring means 5 in the tracking phase will be described in detail with reference to FIGS. 24 and 27 are flowcharts for explaining the operation of the three-dimensional head posture / facial expression measuring means 5 in the tracking phase.

【０１０４】まず、頭部回転量推定手段１６において
は、図２５に示すように、部位検出追跡手段４で求めら
れた現フレームの両目領域から、両目の外側の端点７
０，７１を求め、これら端点７０，７１を結ぶ直線５５
を求める（ステップＳ１０００）。また、直線５５に直
交し、口の中心位置５９を通る直線５６を求める（ステ
ップＳ１０１０）。求めた直線５５をＸ軸とし、直線５
６をＹ軸とするローカル座標系を設定し、Ｘ軸５５およ
びＹ軸５６のそれぞれに平行な辺を持ち、抽出された頭
部領域に外接する外接矩形５７を求める（ステップＳ１
０２０）。外接矩形５７のＸ軸方向の辺の長さを１と
し、キャリブレーションフェーズで計測した方の目の内
側の端点５８とＹ軸に並行な２辺７２，７３までの相対
距離Ｌａ′，Ｌｂ′を夫々求める（ステップＳ１０３
０）。同様に、外接矩形のＹ軸方向の長さを１とし、口
の中心５９とＸ軸に平行な２辺７４，７５までの相対距
離Ｌｃ′，Ｌｄ′を夫々求める（ステップＳ１０４
０）。First, in the head rotation amount estimating means 16, as shown in FIG. 25, from the both eye regions of the current frame obtained by the part detecting and tracking means 4, the end points 7 on the outer sides of both eyes are detected.
0,71 is calculated, and a straight line 55 connecting these end points 70,71
Is calculated (step S1000). Further, a straight line 56 that is orthogonal to the straight line 55 and passes through the center position 59 of the mouth is obtained (step S1010). The calculated straight line 55 is used as the X-axis, and the straight line 5
A local coordinate system having 6 as the Y axis is set, and a circumscribed rectangle 57 having sides parallel to the X axis 55 and the Y axis 56 and circumscribing the extracted head region is obtained (step S1).
020). The length of the side of the circumscribed rectangle 57 in the X-axis direction is set to 1, and the relative distances La ′ and Lb ′ between the end point 58 inside the eye measured in the calibration phase and the two sides 72 and 73 parallel to the Y-axis. Respectively (step S103
0). Similarly, the length of the circumscribed rectangle in the Y-axis direction is set to 1, and the relative distances Lc ′ and Ld ′ between the center 59 of the mouth and the two sides 74 and 75 parallel to the X-axis are obtained (step S104).
0).

【０１０５】次に、両目の外側の端点７０，７１と、端
点７０，７１を通りＹ軸に平行な直線と口の中心を通り
Ｘ軸に平行な直線との交点（２点）７６，７７とででき
る矩形６０を求める（ステップＳ１０５０）。Next, the intersections (two points) 76 and 77 of the outer end points 70 and 71 of the eyes and a straight line passing through the end points 70 and 71 and parallel to the Y axis and a straight line passing through the center of the mouth and parallel to the X axis. A rectangle 60 formed by and is obtained (step S1050).

【０１０６】ここで、Ｘ軸については右方向を正方向、
Ｙ軸については上方向を正方向としたとき、片目のＸ軸
正方向の相対距離ｄｅｃ（＝Ｌｂ′）と、キャリブレー
ションフェーズで記憶したＸ軸正方向の相対距離ｄｅｉ
（＝Ｌｂ）とから次式（１２）で頭部の左右方向の回転
量を求める。Ｒｆ_E＝ｄｅｃ／ｄｅｉ ……式（１２）Here, with respect to the X axis, the right direction is the positive direction,
Regarding the Y-axis, when the upward direction is the positive direction, the relative distance dec (= Lb ′) in the X-axis positive direction of one eye and the relative distance dei in the X-axis positive direction stored in the calibration phase.
From (= Lb), the rotation amount of the head in the left-right direction is calculated by the following equation (12). Rf _E = dec / dei Equation (12)

【０１０７】ここで、Ｒｆ_Eは左右方向の回転量、ｄｅ
ｃは現フレームでの目のＸ軸正方向の相対距離、ｄｅｉ
はキャリブレーションフェーズで記憶した目のＸ軸正方
向の相対距離である。Here, Rf _E is the amount of rotation in the left-right direction, de
c is the relative distance in the positive direction of the X-axis of the current frame, dei
Is the relative distance in the positive direction of the X axis stored in the calibration phase.

【０１０８】もし、回転量Ｒｆ_Eが１よりも大きい場
合、頭部は左方向に回転していることになる。逆に回転
量Ｒｆ_Eが１よりも小さい場合、頭部は右方向に回転し
ていることになる。If the rotation amount Rf _E is larger than 1, the head is rotating to the left. On the contrary, when the rotation amount Rf _E is smaller than 1, the head is rotating to the right.

【０１０９】同様に、口のＹ軸正方向の相対距離ｄｍｃ
（＝Ｌｄ′）と、キャリブレーションフェーズで記憶し
たＹ軸正方向の相対距離ｄｍｉ（＝Ｌｄ）とから次式
（１３）で頭部の上下方向の回転量を求める。Ｒｆ_m＝ｄｍｃ／ｄｍｉ ……式（１３）Similarly, the relative distance dmc of the mouth in the positive direction of the Y-axis
From (= Ld ') and the relative distance dmi (= Ld) in the Y-axis positive direction stored in the calibration phase, the vertical rotation amount of the head is obtained by the following equation (13). Rf _m = dmc / dmi ...... formula (13)

【０１１０】ここで、Ｒｆ_mは上下方向の回転量、ｄｍ
ｃは現フレームでの口のＹ軸正方向の相対距離、ｄｍｉ
はキャリブレーションフェーズで記憶した口のＹ軸正方
向の相対距離である。Here, Rf _m is the amount of vertical rotation, dm
c is the relative distance of the mouth in the Y-axis positive direction in the current frame, dmi
Is the relative distance in the Y-axis positive direction of the mouth stored in the calibration phase.

【０１１１】もし、回転量Ｒｆ_mが１よりも大きい場
合、頭部は下方向に回転していることになる。逆に１よ
りも小さい場合、頭部は上方向に回転していることにな
る。If the rotation amount Rf _m is greater than 1, the head is rotating downward. On the contrary, when it is smaller than 1, it means that the head is rotating upward.

【０１１２】つぎに、式（１２）および式（１３）で求
めた左右上下の回転量Ｒｆ_E，Ｒｆ_mに基づき次のように
して矩形６０を歪ませる（ステップＳ１０６０）。Next, the rectangle 60 is distorted as follows based on the left, right, up and down rotation amounts Rf _E and Rf _m obtained by the equations (12) and (13) (step S1060).

【０１１３】Ｒｆ_E＞１の場合：矩形の左側の辺（Ｙ軸
に平行な辺でＸ軸の負方向にある辺）の長さを次式（１
４）を用いて短くする。ｌ＝ｗ・Ｒｆ_E・ｏｌ ……式（１４）ｌは計算した長さ、ｏｌは元の長さ、ｗは重み係数であ
る。When Rf _E > 1: The length of the left side of the rectangle (the side parallel to the Y axis and in the negative direction of the X axis) is calculated by the following equation (1)
Use 4) to shorten. l = w · Rf _E · ol (Equation (14)) l is the calculated length, ol is the original length, and w is the weighting factor.

【０１１４】Ｒｆ_E＜１の場合：矩形の右側の辺（Ｙ軸
に平行な辺でＸ軸の正方向にある辺）の長さを式（１
４）を用いて短くする。When Rf _E <1: The length of the right side of the rectangle (the side parallel to the Y axis and in the positive direction of the X axis) is calculated by the formula (1)
Use 4) to shorten.

【０１１５】Ｒｆ_m＞１の場合：矩形の下側の辺（Ｘ軸
に平行な辺でＹ軸の負方向にある辺）の長さを次式（１
５）を用いて短くする。ｌ＝ｗ・Ｒｆ_m・ｏｌ ……式（１５）ｌは計算した長さ、ｏｌは元の長さ、ｗは重み係数であ
る。When Rf _m > 1: The length of the lower side of the rectangle (the side parallel to the X axis and in the negative direction of the Y axis) is calculated by the following equation (1)
Use 5) to shorten. l = w · Rf _m · ol (Equation (15)) l is the calculated length, ol is the original length, and w is the weighting factor.

【０１１６】Ｒｆ_m＜１の場合：矩形の上側の辺（Ｘ軸
に平行な辺でＹ軸の正方向にある辺）の長さを式（１
５）を用いて短くする。When Rf _m <1: The length of the upper side of the rectangle (the side parallel to the X axis and in the positive direction of the Y axis) is calculated by the formula (1
Use 5) to shorten.

【０１１７】例えば、図２６（ａ）に示すように、左方
向に頭部を回転した場合、矩形６０は左側の辺が短くな
り、図２６（ｂ）に示すように、上方向に回転した場
合、矩形６０は上側の辺が短くなる。そして、このよう
にして変形した矩形の各頂点座標を変形前の矩形６０の
中心座標を基準にして求める。For example, when the head is rotated to the left as shown in FIG. 26A, the left side of the rectangle 60 is shortened, and the rectangle 60 is rotated upward as shown in FIG. In this case, the rectangle 60 has a shorter upper side. Then, the vertex coordinates of the thus transformed rectangle are obtained with reference to the center coordinates of the rectangle 60 before the transformation.

【０１１８】つぎに、姿勢計測手段１７は、以上のよう
にして求めた４つの頂点座標（２次元座標）と、それら
に対応するアフィン基底設定手段１５で設定された３次
元空間上の仮想点を基に、頭部の３次元姿勢計測を行
う。ここでは、つぎのような手法を用いて、３次元姿勢
計測を行う。The posture measuring means 17 then calculates the four vertex coordinates (two-dimensional coordinates) thus obtained and the virtual points in the three-dimensional space set by the affine base setting means 15 corresponding to them. Based on, the three-dimensional posture of the head is measured. Here, three-dimensional posture measurement is performed using the following method.

【０１１９】カメラで撮影された画像と３次元空間上の
オブジェクトとの関係は図２８のようになっている。図
２８において６３は、アフィン基底設定手段１５で設定
した３次元空間上の平面、６４はカメラ画像平面、６５
はカメラ座標系である。The relationship between the image taken by the camera and the object in the three-dimensional space is as shown in FIG. In FIG. 28, 63 is a plane in the three-dimensional space set by the affine base setting means 15, 64 is a camera image plane, and 65 is a plane.
Is the camera coordinate system.

【０１２０】３次元空間上の平面６３の座標系における
点（Ｘ_f，Ｙ_f，Ｚ_f）と、それに対応するカメラ座標系
６５における点（Ｘ_c，Ｙ_c，Ｚ_c）は次式（１６）の関
係がある。The point (X _f , Y _f , Z _f ) in the coordinate system of the plane 63 in the three-dimensional space and the corresponding point (X _c , Y _c , Z _c ) in the camera coordinate system 65 are as follows: There is a relationship of 16).

【数３】 [Equation 3]

【０１２１】式（１６）におけるＲが回転成分を、Ｔが
並進成分を表しており、これが、頭部の３次元姿勢情報
に等しい。In the equation (16), R represents a rotational component and T represents a translational component, which is equal to the three-dimensional posture information of the head.

【０１２２】一方、カメラ座標系６５における３次元空
間上の点（Ｘ_c，Ｙ_c，Ｚ_c）とカメラ画像平面６４にお
ける２次元の点（ｄＸ_c，ｄＹ_c）とは、次式（１７）に
示す関係がある。On the other hand, the point (X _c , Y _c , Z _c ) in the three-dimensional space in the camera coordinate system 65 and the two-dimensional point (dX _c , dY _c ) in the camera image plane 64 are expressed by the following equation (17). ) Has the relationship shown in.

【数４】 [Equation 4]

【０１２３】ここで、Ｐを含む行列は使用するビデオカ
メラ８０の透視投影行列であり、周知のカメラキャリブ
レーション技術を用いて予め求めることができる。Here, the matrix including P is a perspective projection matrix of the video camera 80 to be used and can be obtained in advance by using a well-known camera calibration technique.

【０１２４】さて、頭部回転量推定手段１６で得られた
矩形（カメラ画像平面６４）は、３次元空間上では上下
と左右の辺は平行している。この二組の平行した辺から
矩形の３次元空間上の上下方向と左右方向の方向ベクト
ル（Ｘ軸、Ｙ軸）を求めることができる。Now, the rectangle (camera image plane 64) obtained by the head rotation amount estimating means 16 has the upper and lower sides and the left and right sides in parallel in the three-dimensional space. From these two sets of parallel sides, the direction vector (X axis, Y axis) in the vertical direction and the horizontal direction in the rectangular three-dimensional space can be obtained.

【０１２５】平行する辺のカメラ画像平面６４上におけ
る直線の方程式をａ₁ｘ＋ｂ₁ｙ＋ｃ₁＝０ ……式（１８）ａ₂ｘ＋ｂ₂ｙ＋ｃ₂＝０ ……式（１９）とすると、カメラ座標系６５におけるこれらの各直線を
含む３次元の平面の方程式は次式（２０）（２１）であ
らわすことができる。When the equation of a straight line on the camera image plane 64 of parallel sides is a ₁ x + b ₁ y + c ₁ = 0 equation (18) a ₂ x + b ₂ y + c ₂ = 0 equation (19), the camera coordinates The equation of the three-dimensional plane including these straight lines in the system 65 can be expressed by the following equations (20) and (21).

【０１２６】ａ₁Ｐ₁₁Ｘ_c＋（ａ₁Ｐ₁₂＋ｂ₁Ｐ₂₂）Ｙ_c ＋（ａ₁Ｐ₁₃＋ｂ₁Ｐ₂₃＋ｃ₁）Ｚ_c＝０ ……式（２０）ａ₂Ｐ₁₁Ｘ_c＋（ａ₂Ｐ₁₂＋ｂ₁Ｐ₂₂）Ｙ_c ＋（ａ₂Ｐ₁₃＋ｂ₂Ｐ₂₃＋ｃ₂）Ｚ_c＝０ ……式（２１）A ₁ P ₁₁ X _c + (a ₁ P ₁₂ + b ₁ P ₂₂ ) Y _c + (a ₁ P ₁₃ + b ₁ P ₂₃ + c ₁ ) Z _c = 0 Equation (20) a ₂ P ₁₁ X _{_{_{c + (a 2 P 12 +}}} b 1 P 22) Y c + (a 2 P 13 + b 2 P 23 + c 2) Z c = 0 ...... formula (21)

【０１２７】これら２つの平面の法線ベクトル（Ｘ、
Ｙ、Ｚの係数）の外積を求めると上記方向ベクトル（Ｘ
軸、Ｙ軸）を求めることができる。The normal vectors (X,
When the cross product of the Y and Z coefficients is calculated, the direction vector (X
Axis, Y-axis).

【０１２８】以上で、カメラ座標系６５における矩形の
Ｘ軸、Ｙ軸に相当する方向ベクトルを求めることができ
るが、画像から得られる情報の誤差により、得られた方
向ベクトルが図２９に示すように直交していない場合が
ある。そこで、求めた方向ベクトルをＳ１、Ｓ２とした
とき、そのベクトルＳ１、Ｓ２を基に直交ベクトルＶ
１、Ｖ２を求める。Ｚ軸方向のベクトルは、求めたＶ１
とＶ２の外積から求められる。この３つの方向ベクトル
が式（１６）における回転成分Ｒとなる。As described above, the direction vector corresponding to the X axis and the Y axis of the rectangle in the camera coordinate system 65 can be obtained. However, due to the error of the information obtained from the image, the obtained direction vector is as shown in FIG. May not be orthogonal to. Therefore, when the obtained direction vectors are S1 and S2, the orthogonal vector V is obtained based on the vectors S1 and S2.
1, V2 is calculated. The vector in the Z-axis direction is the calculated V1
And the cross product of V2. These three direction vectors become the rotation component R in equation (16).

【０１２９】回転成分Ｒが分かれば、２次元座標と３次
元座標の対応点を式（１６）と式（１７）に代入するこ
とにより並進成分Ｔを求めることができる。If the rotation component R is known, the translational component T can be obtained by substituting the corresponding points of the two-dimensional coordinates and the three-dimensional coordinates into the equations (16) and (17).

【０１３０】姿勢計測手段１７では、まず頭部回転量推
定手段１６で求めた矩形の４頂点の座標から式（１８）
に示す各辺の直線パラメータ（方程式）を求め（ステッ
プＳ１１００）、求めた直線パラメータを用いて式（２
０）および式（２１）に基づき、アフィン基底設定手段
１５で設定した仮想３次元平面のＸ軸、Ｙ軸を求める
（ステップＳ１１１０）。そして、前述したように、求
めた軸が直交するように修正し、更にこの修正したＸ
軸、Ｙ軸からＺ軸を求め、これら３軸（Ｘ軸、Ｙ軸、Ｚ
軸）の方向ベクトルから回転行列（回転成分）Ｒを求め
（ステップＳ１１２０）、さらにこの回転成分Ｒを用い
て得られた２次元座標と３次元座標の対応点を式（１
６）（１７）に代入することで、並進行列（並進成分）
Ｔを求める（ステップＳ１１３０）。In the posture measuring means 17, first, the formula (18) is calculated from the coordinates of the four vertexes of the rectangle obtained by the head rotation amount estimating means 16.
The linear parameter (equation) of each side shown in (1) is obtained (step S1100), and the obtained linear parameter is used to obtain the equation (2).
0) and the equation (21), the X axis and the Y axis of the virtual three-dimensional plane set by the affine basis setting means 15 are obtained (step S1110). Then, as described above, the corrected axes are corrected so that they are orthogonal to each other, and the corrected X
The Z-axis is obtained from the axes, Y-axis, and these three axes (X-axis, Y-axis, Z-axis)
The rotation matrix (rotation component) R is obtained from the direction vector of the axis (step S1120), and the corresponding points of the two-dimensional coordinates and the three-dimensional coordinates obtained by using this rotation component R are given by the formula (1).
6) The translation sequence (translation component) by substituting into (17)
T is calculated (step S1130).

【０１３１】以上のようにして求めた投影行列を用い
て、実際に３次元空間上の仮想点をカメラ画像平面に投
影したときの誤差に応じて投影行列を修正し（ステップ
Ｓ１１４０）、誤差が閾値以下になったときの投影行列
を頭部の３次元姿勢情報とし（ステップＳ１１５０）、
この３次元姿勢情報をキャラクタ制御装置９０に出力す
ることで、ＣＧキャラクタの頭部の３次元姿勢を制御す
る。Using the projection matrix obtained as described above, the projection matrix is corrected according to the error when the virtual points in the three-dimensional space are actually projected on the camera image plane (step S1140). The projection matrix when it becomes equal to or less than the threshold is set as the three-dimensional posture information of the head (step S1150),
By outputting this three-dimensional posture information to the character control device 90, the three-dimensional posture of the head of the CG character is controlled.

【０１３２】このように、顔画像から検出する両目およ
び口の３点から３次元空間上の矩形（仮想平面）を規定
し、追跡時に両目および口の３点から作成した矩形を頭
部の動きに応じて歪ませることにより、３次元平面を２
次元に投影したときの歪みを擬似的に再現し、本来４点
以上の３次元と２次元の対応点がなければ求めることが
できない３次元姿勢情報を画像から得られる両目および
口の３点のみで推定するようにしている。In this way, a rectangle (virtual plane) in the three-dimensional space is defined from the three points of both eyes and mouth detected from the face image, and the rectangle created from the three points of both eyes and mouth at the time of tracking is used to move the head. The three-dimensional plane becomes 2 by distorting according to
3D posture information that cannot be obtained unless there are 4 or more corresponding 3D and 2D points is obtained by pseudo-reproducing the distortion when projected onto 3D. I'm trying to estimate it.

【０１３３】次に開閉状態計測手段１８の動作を説明す
る。開閉状態計測手段１８では、姿勢計測手段１７で求
めた投影行列、すなわち頭部の３次元姿勢情報を用い
て、ユーザが正面を向いたときのカメラ画像における両
目および口領域を再現し、再現した領域の画像垂直方向
（Ｙ方向）の長さと、初期位置設定手段１３に記憶され
ている初期状態における各部位領域の画像垂直方向の長
さとの比率を求める。この比率が、両目および口がどの
程度開閉しているかを示す開閉状態情報となる。Next, the operation of the open / close state measuring means 18 will be described. The open / closed state measuring unit 18 reproduces and reproduces both the eyes and the mouth region in the camera image when the user faces the front by using the projection matrix obtained by the posture measuring unit 17, that is, the three-dimensional posture information of the head. The ratio between the length of the region in the image vertical direction (Y direction) and the length of each region in the initial state stored in the initial position setting means 13 in the image vertical direction is obtained. This ratio serves as open / closed state information indicating how much the eyes and mouth are opened and closed.

【０１３４】このように３次元姿勢情報を用いてユーザ
が正面を向いたときのカメラ画像における両目および口
領域を推定しているので、例えば頭部が横や上を向いて
いる画像においても正面を向いた場合の画像を推定で
き、２次元画像のみから両目および口の開閉状態をより
正確に求めることができる。As described above, since the three-dimensional posture information is used to estimate the eyes and mouth area in the camera image when the user faces the front, for example, even in an image in which the head faces sideways or upwards, It is possible to estimate the image when facing the eye, and more accurately determine the open / closed state of both eyes and mouth from only the two-dimensional image.

【０１３５】このようにして、求められた頭部の３次元
姿勢情報および両目および口の開閉状態情報は、キャラ
クタ制御装置９０に入力される。キャラクタ制御装置９
０は、入力された頭部の３次元姿勢情報および両目およ
び口の開閉状態情報を用いてＣＧキャラクタの頭部の動
きおよび両目および口の開閉状態を可変制御すること
で、ビデオカメラ８０で撮像した利用者の動き、表情に
追従させてＣＧキャラクタの動き、表情をリアルタイム
に変化させる。The three-dimensional posture information of the head and the open / closed state information of both eyes and mouth thus obtained are input to the character control device 90. Character control device 9
0 is variably controlled by the input three-dimensional posture information of the head and the open / closed state of both eyes and mouth to variably control the movement of the head of the CG character and the open / closed state of both eyes and mouth, so that the video camera 80 captures the image. The movement and facial expression of the CG character are changed in real time by following the movement and facial expression of the user.

【０１３６】[0136]

【発明の効果】以上説明したように、この発明によれ
ば、対象人物を撮像した画像の各画素データをＲ，Ｇ，
Ｂ成分毎に正規化した後、Ｃ１−Ｃ２空間のデータを含
む画素データに変換し、これら変換した画素データのＣ
１データおよびＣ２データが所定の肌色抽出パラメータ
の範囲内に入ると、画素データを肌色画素として判断す
ることにより撮像した画像から頭部領域を抽出するよう
にしているので、簡易な演算によって肌色成分を極めて
正確に抽出することができる。したがって、対象人物の
動きに合わせてＣＧキャラクタの頭部の動きおよび表情
を正確に制御することができる。また、使用する照明環
境下で対象人物（利用者）の肌色をサンプリングし、こ
のサンプリングデータを用いて肌色抽出のための肌色抽
出パラメータを調整するようにしているので、任意の照
明環境あるいいは利用者毎の個人差に適応して利用者の
頭部領域を正確に抽出することができる。As described above, according to the present invention, each pixel data of an image obtained by picking up a target person is converted into R, G,
After normalizing for each B component, the data of C1-C2 space is included.
C of the converted pixel data.
1 data and C2 data are predetermined skin color extraction parameters
If it falls within the range of, the pixel data is judged as a skin color pixel.
To extract the head region from the captured image by
Since it is set to
Can be accurately extracted. Therefore,
Head movements and facial expressions of CG characters according to movements
Can be controlled accurately. In addition, the skin color of the target person (user) is sampled under the lighting environment to be used, and the skin color extraction parameter for skin color extraction is adjusted using this sampling data. The head region of the user can be accurately extracted by adapting to the individual difference of each user.

【０１３７】[0137]

【０１３８】[0138]

【０１３９】[0139]

【０１４０】[0140]

【０１４１】[0141]

【０１４２】[0142]

【０１４３】[0143]

【０１４４】[0144]

【０１４５】[0145]

【０１４６】[0146]

[Brief description of drawings]

【図１】この発明にかかるリアルタイム表情追跡装置
の実施の形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a real-time facial expression tracking device according to the present invention.

【図２】図１のリアルタイム表情追跡装置のキャリブ
レーションフェーズの動作の概要を説明するためのフロ
ーチャートである。FIG. 2 is a flowchart for explaining an outline of an operation in a calibration phase of the real-time facial expression tracking device of FIG.

【図３】図１のリアルタイム表情追跡装置のトラッキ
ングフェーズの動作の概要を説明するためのフローチャ
ートである。FIG. 3 is a flowchart for explaining an outline of an operation in a tracking phase of the real-time facial expression tracking device of FIG.

【図４】肌色サンプリングを説明するための図であ
る。FIG. 4 is a diagram for explaining skin color sampling.

【図５】肌色サンプリング手段および肌色抽出パラメ
ータ調整手段の動作を説明するためのフローチャートで
ある。FIG. 5 is a flowchart for explaining the operation of the skin color sampling means and the skin color extraction parameter adjusting means.

【図６】肌色領域抽出手段と頭部領域抽出手段９の動
作を説明するためのフローチャートである。FIG. 6 is a flowchart for explaining the operations of the skin color area extraction means and the head area extraction means 9.

【図７】肌色領域抽出手段で肌色領域を抽出した結果
の一例を示した図である。FIG. 7 is a diagram showing an example of a result of extracting a skin color area by a skin color area extracting unit.

【図８】膨張マスクおよび収縮マスクを例示する図で
ある。FIG. 8 is a diagram illustrating an expansion mask and a contraction mask.

【図９】検出した頭部領域に発生した裂け目を埋める
処理を説明するための図である。FIG. 9 is a diagram for explaining a process of filling a crack generated in the detected head region.

【図１０】頭部領域内の全ての穴を埋める論理演算処
理を説明するための図である。FIG. 10 is a diagram for explaining a logical operation process for filling all holes in the head region.

【図１１】部位領域候補抽出手段の動作を説明するた
めのフローチャートである。FIG. 11 is a flow chart for explaining the operation of the part area candidate extraction means.

【図１２】適応型ヒストグラム平均化法の欠点である
ノイズ発生を抑える処理を説明するための図である。FIG. 12 is a diagram for explaining processing for suppressing noise generation, which is a drawback of the adaptive histogram averaging method.

【図１３】適応型ヒストグラム平均化法を説明するた
めの図である。FIG. 13 is a diagram for explaining an adaptive histogram averaging method.

【図１４】適応型ヒストグラム平均化法を説明するた
めの図である。FIG. 14 is a diagram for explaining an adaptive histogram averaging method.

【図１５】キャリブレーションフェーズにおける部位
検出追跡手段の動作を説明するためのフローチャートで
ある。FIG. 15 is a flowchart for explaining the operation of the part detection / tracking means in the calibration phase.

【図１６】部位検出手段において両目および口領域を
特定する際に用いるマスク領域を示した図である。FIG. 16 is a diagram showing a mask region used when identifying both eyes and a mouth region in the region detecting means.

【図１７】キャリブレーションフェーズにおける頭部
３次元姿勢・表情計測手段５動作を説明するためのフロ
ーチャートである。FIG. 17 is a flowchart for explaining the operation of the three-dimensional head posture / facial expression measuring means 5 in the calibration phase.

【図１８】アフィン基底設定手段で設定する３次元空
間上の仮想点を示した図である。FIG. 18 is a diagram showing virtual points on a three-dimensional space set by affine base setting means.

【図１９】頭部移動量推定手段で求める両目の端点お
よび口の中心点の頭部領域の外接矩形に対する相対位置
を説明するための図である。FIG. 19 is a diagram for explaining the relative positions of the end points of both eyes and the center point of the mouth with respect to the circumscribed rectangle of the head region, which are obtained by the head movement amount estimation means.

【図２０】トラッキングフェーズにおける部位検出追
跡手段の動作を説明するためのフローチャートである
（その１）。FIG. 20 is a flowchart for explaining the operation of the part detecting / tracking means in the tracking phase (No. 1).

【図２１】トラッキングフェーズにおける部位検出追
跡手段の動作を説明するためのフローチャートである
（その２）。FIG. 21 is a flowchart for explaining the operation of the part detecting / tracking means in the tracking phase (No. 2).

【図２２】部位追跡手段での現フレームにおける部位
領域の追跡方法を説明するための図である。FIG. 22 is a diagram for explaining a method of tracking a part region in the current frame by part tracking means.

【図２３】検出できなかった部位領域を検出できた部
位領域の位置から予測する処理を説明するための図であ
る。FIG. 23 is a diagram for explaining a process of predicting a part region that could not be detected from the position of the part region that could be detected.

【図２４】トラッキングフェーズにおける頭部３次元
姿勢・表情計測手段の動作を説明するためのフローチャ
ートである。FIG. 24 is a flowchart for explaining the operation of the head three-dimensional posture / facial expression measuring means in the tracking phase.

【図２５】頭部回転量推定手段での左右上下方向の頭
部回転量を推定する処理を説明するための図である。FIG. 25 is a diagram for explaining a process of estimating the head rotation amount in the left-right and up-down directions by the head rotation amount estimation means.

【図２６】頭部回転量推定手段において３次元空間上
の仮想点（アフィン基底）に対応する対応点を求める処
理を説明するための図である。FIG. 26 is a diagram for explaining a process of obtaining corresponding points corresponding to virtual points (affine bases) in a three-dimensional space in the head rotation amount estimation means.

【図２７】トラッキングフェーズにおける頭部３次元
姿勢・表情計測手段の動作を説明するためのフローチャ
ートである。FIG. 27 is a flowchart for explaining the operation of the three-dimensional head posture / facial expression measuring means in the tracking phase.

【図２８】姿勢計測手段での３次元と２次元の対応点
から頭部の３次元姿勢情報を求める処理を説明するため
の図である。[Fig. 28] Fig. 28 is a diagram for explaining a process of obtaining three-dimensional posture information of the head from three-dimensional and two-dimensional corresponding points in the posture measuring means.

【図２９】姿勢情報を求める際の誤差を補正する処理
を説明するための図である。FIG. 29 is a diagram for explaining a process of correcting an error when obtaining orientation information.

【図３０】従来技術を示す図である。FIG. 30 is a diagram showing a conventional technique.

[Explanation of symbols]

１映像入力手段、２頭部領域検出手段、３部位領
域候補抽出手段、４部位検出追跡手段、５３次元姿勢
・表情計測手段、６肌色サンプリング手段、７肌色
抽出パラメータ調整手段、８肌色領域抽出手段、９
頭部領域抽出手段、１０頭部領域輝度平均化手段、１
１画素選別手段、１２部位検出手段、１３初期位
置設定手段、１４部位追跡手段、１５アフィン基底
設定手段、１６頭部回転量推定手段、１７姿勢計測
手段、１８開閉状態計測手段、２０サンプリングウ
ィンドウ、２２膨張マスク、２３収縮マスク、３３
左目マスク、３４右目マスク、３５口マスク、５０
移動ベクトル、５３矩形領域、５７外接矩形、６４
カメラ画像平面、８０ビデオカメラ、９０キャラク
タ制御装置。1 image input means, 2 head area detecting means, 3 part area candidate extracting means, 4 part detecting and tracking means, 5 3D posture / facial expression measuring means, 6 skin color sampling means, 7 skin color extraction parameter adjusting means, 8 skin color area extraction Means, 9
Head region extraction means, 10 Head region brightness averaging means, 1
1 pixel selection means, 12 part detection means, 13 initial position setting means, 14 part tracking means, 15 affine base setting means, 16 head rotation amount estimation means, 17 posture measuring means, 18 open / closed state measuring means, 20 sampling window, 22 expansion mask, 23 contraction mask, 33
Left eye mask, 34 Right eye mask, 35 mouth mask, 50
Movement vector, 53 rectangular area, 57 circumscribed rectangle, 64
Camera image plane, 80 video camera, 90 character controller.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開2000−331190（ＪＰ，Ａ) 特開平８−272948（ＪＰ，Ａ) 特開平８−272973（ＪＰ，Ａ) 特開平11−85988（ＪＰ，Ａ) 特開平11−15947（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 1/00 340 G06T 7/00 100 G06T 7/20 300 H04N 7/14 ＪＳＴＰｌｕｓファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP 2000-331190 (JP, A) JP 8-272948 (JP, A) JP 8-272973 (JP, A) JP 11-85988 (JP, A) JP 11-15947 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G06T 1/00 340 G06T 7/00 100 G06T 7/20 300 H04N 7 / 14 JSTPlus file (JOIS)

Claims

(57) [Claims]

1. A head region of a person from a video image of the person.
In the head region extraction device for extracting the R, G, and B pixel data of the image of the target person.
The formula c1 = arctan per minute (R / max (G, B )) c2 = arctan (G / max (R, B)) c3 = arctan normalized by normalizing according to (B / max (R, G )) Collect data c1, c2, c3
The normalizing means to be obtained and each pixel data including the normalized data c1, c2, c3 are
According to the formula C1 = c2 / c1 C2 = c3 / c2, the pixel data including the data in the C1-C2 space is added.
The data conversion means for converting each and the C1 data and C2 data of the converted pixel data are
Expression th1 <C1 <th2 th1, th2; Skin color extraction parameter th3 <C2 <th4 th3, th4; When the skin color extraction parameter is satisfied, this pixel data is determined as a skin color pixel.
By extracting the head region from the captured image
Head region extraction apparatus characterized by comprising: a part region extraction means.

2. The same illumination as when capturing the target person.
Under the environment, an image of a certain area of the target person's face is supported.
Skin color sampling means for sampling and a predetermined color sampled by the skin color sampling means
Each pixel data of the image of the area of
After normalization, C1-C2 is obtained using the data conversion means.
Converted to space pixel data, and the converted predetermined area
Maximum for C1 data using multiple pixel data of
Value and minimum value and maximum and maximum value for C2 data
Obtain a small value and use the maximum and minimum values to extract the skin color.
Output parameters th1, th2, th3 and th4 are supplemented
The head according to claim 1, further comprising a correct skin color extraction parameter adjusting unit.
Area extraction device.

3. The head area extracting means extracts skin color areas.
Extract the head area by extracting the maximum area from the result
The head region according to claim 1 or 2, characterized in that
Area extraction device.

4. The head area extraction means extracts the head area.
Characterized by adding expansion / contraction processing to the subsequent binary image
The head region extraction device according to claim 1.
Place

5. The head region extracting means obtains an exclusive logical sum of the binary image after the expansion / contraction process and a mask image in which all pixel values are logical value levels corresponding to skin color, and the exclusive OR is performed. The entire head region is extracted by obtaining the logical sum of the image having the logical value level corresponding to the non-skin color other than the head region of the image logically ORed with the binary image after the expansion / contraction process. The head area extraction device according to claim 4 , wherein

6. Sequentially input at a predetermined frame rate
Video input means for capturing video and head region for extracting head image from the captured image
Region extraction means, and from the extracted head region of each part including both eyes and mouth
Region candidate extracting means for extracting a candidate region and a region for detecting the position of each region from the extracted candidate regions
The detection / tracking means and the tertiary of the head based on the detected positions of the detected eyes and mouth.
Measure the original posture and check the open / closed state of both eyes and mouth.
A head 3D posture / facial expression measuring means for measuring;
Measured 3D posture of head and open / closed state of both eyes and mouth
Real-time control that controls the movement of CG characters based on movement
In the facial expression tracking device, the head region extracting means may generate R, G, and B pixel data of each pixel of an image of a target person.
The formula c1 = arctan per minute (R / max (G, B )) c2 = arctan (G / max (R, B)) c3 = arctan normalized by normalizing according to (B / max (R, G )) Collect data c1, c2, c3
The normalizing means to be obtained and each pixel data including the normalized data c1, c2, c3 are
According to the formula C1 = c2 / c1 C2 = c3 / c2, the pixel data including the data in the C1-C2 space is added.
When the data conversion means for performing the conversion and the converted pixel data satisfy the following equations th1 <C1 <th2 th1, th2; skin color extraction parameter th3 <C2 <th4 th3, th4; Judge
By extracting the skin color area from the captured image
A real-time facial expression tracking device comprising: a color area extracting unit .

7. The same illumination as when capturing the target person
Under the environment, an image of a certain area of the target person's face is supported.
Skin color sampling means for sampling and a predetermined color sampled by the skin color sampling means
Each pixel data of the image of the area of
After normalization, C1-C2 is obtained using the data conversion means.
Converted to space pixel data, and the converted predetermined area
Maximum for C1 data using multiple pixel data of
Value and minimum value and maximum and maximum value for C2 data
Obtain a small value and use the maximum and minimum values to extract the skin color.
Output parameters th1, th2, th3 and th4 are supplemented
7. The rear according to claim 6, further comprising : a correct skin color extraction parameter adjusting unit.
Letime facial expression tracking device.

8. The head region extraction means is the skin color region.
The maximum area is extracted from the skin color area extraction result by the extraction means.
Claim to extract the head region by
Item 7. The real-time facial expression tracking device according to item 6 or 7.

9. The head region extraction means is configured to perform the skin color region.
Expansion / contraction processing is performed on the binary image after the skin color area is extracted by the extraction means.
9. The method according to claim 6, further comprising:
Real-time facial expression tracking device.

10. The head region extraction means obtains an exclusive OR of the binary image after the expansion / contraction processing and a mask image in which all pixel values are logical value levels corresponding to skin color,
The entire head region is obtained by obtaining the logical sum of the image having the logical value level corresponding to the non-skin color other than the head region of the image obtained by the exclusive OR and the binary image after the expansion / contraction processing. The real-time facial expression tracking device according to claim 9 , wherein