JPH08249447A

JPH08249447A - Expression detector

Info

Publication number: JPH08249447A
Application number: JP5273795A
Authority: JP
Inventors: Tatsumi Sakaguchi; 竜己坂口; Atsushi Otani; 淳大谷
Original assignee: ATR TSUSHIN SYST KENKYUSHO KK
Current assignee: ATR TSUSHIN SYST KENKYUSHO KK
Priority date: 1995-03-13
Filing date: 1995-03-13
Publication date: 1996-09-27
Anticipated expiration: 2013-12-24
Also published as: JP2840816B2

Abstract

PURPOSE: To exactly detect facial expression with high robustness in real time. CONSTITUTION: This detector is provided with an extraction part 16 extracting only the part distinct in facial expressions from a facial video signal DVL, a wavelet filter 18 generating a spatial frequency F by performing wavelet conversion for the extracted video signal DVLe, an average power calculation part 20 calculating the average power Pc of the spatial frequency F for every prescribed band and a difference calculation part 24 obtaining a feature vector FV by calculating the difference of the average calculated power Pc and the average power Pn of expressionless which is preliminarily stored in a stationary state storage memory 22.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は表情検出装置に関し、
さらに詳しくは、離れた場所を結ぶコンピュータグラフ
ィックスを利用したテレビ会議の参加者の表情検出や、
実時間でコンピュータグラフィックスにより作成した顔
の表情検出など、人間の顔の表情を実時間で検出する表
情検出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a facial expression detecting device,
More specifically, facial expression detection of participants in video conferences using computer graphics connecting remote places,
The present invention relates to a facial expression detection device that detects a human facial expression in real time, such as a facial expression detection created by computer graphics in real time.

【０００２】[0002]

【従来の技術および発明が解決しようとする課題】人物
の顔の表情認識は、高度な符号化通信やコンピュータの
能動的なインタフェースの研究において重要な基礎技術
である。たとえば「大谷，北村，竹村，岸野：“臨場感
通信会議における３次元顔画像の実時間表示”，信学技
報，HC-92-61,pp.23-28(1993.1.)」に開示されているよ
うなコンピュータ制御による通信会議システムにおいて
は、表情認識が実現されれば、伝送すべき情報の高圧縮
率が達成できるとともに、より高度な強調作業などへの
応用が期待できる。2. Description of the Related Art Human facial expression recognition is an important basic technology in research on advanced coded communication and active computer interfaces. For example, it is disclosed in "Otani, Kitamura, Takemura, Kishino:" Real-time display of 3D face image in realistic communication conference ", IEICE Technical Report, HC-92-61, pp.23-28 (1993.1.)". In such a computer-controlled communication conference system, if facial expression recognition is realized, it is possible to achieve a high compression ratio of information to be transmitted, and it can be expected to be applied to more advanced emphasizing work.

【０００３】上述した臨場感通信会議における従来の表
情検出方法では、顔の数カ所にマーカーを貼り付け、そ
のマーカーの動きを追跡することによって顔の表情を検
出していた。しかし、このような方法ではマーカーなど
の検出補助用具を必要とするため、表情検出を実行する
ための前準備に時間がかかるという問題があった。ま
た、ロバスト（堅牢）性が低いため、ヘルメットのずれ
や脈拍などの外乱要因によって誤動作を生じやすいとい
う問題もあった。In the conventional facial expression detecting method in the above-mentioned realistic communication conference, the facial expression is detected by attaching markers to several places on the face and tracking the movements of the markers. However, since such a method requires a detection assisting tool such as a marker, there is a problem that it takes time to prepare for performing facial expression detection. In addition, since the robustness is low, there is also a problem that malfunction is likely to occur due to disturbance factors such as displacement of the helmet and pulse.

【０００４】また、コンピュータビジョンの分野では、
顔の特徴部位（目、鼻、口など）の位置や形状を画像か
ら抽出し、その測定量に基づいて顔の表情認識を試みて
いる例が多い。しかし、画像の特徴量として各部位の形
状や位置を写真などの静止画像から正確に抽出すること
は、撮影条件や被験者の面相に強く依存するため、非常
に困難であると同時に、複雑な処理アルゴリズムが必要
となり、実時間処理を実現するための障害となってい
た。In the field of computer vision,
In many cases, the positions and shapes of the characteristic parts of the face (eyes, nose, mouth, etc.) are extracted from the image and facial expression recognition is attempted based on the measured amount. However, it is very difficult to accurately extract the shape and position of each part from a still image such as a photograph as a feature amount of an image because it strongly depends on the imaging conditions and the aspect of the subject, and at the same time, complicated processing is required. An algorithm was required, which was an obstacle to real-time processing.

【０００５】ところで、時間変化に着目した顔の表情認
識についてはいくつかの報告がなされている。たとえば
「M.Rosenblum,Y.Yacoob and L.Davis: “Human Emotio
n Recognition form Motion Using a Radial Basis Fun
ction Network Architecture”,Proceedings of the IE
EE Workshop on Motion of Non-rigit and Articulated
Objects, pp.43-49(1994)」では、動画像中の表情によ
る顔表面の変化に着目し、オプティカルフローを用いた
認識を試みているが、非常に多次元の特徴となるため、
隠れマルコフモデル（ＨＭＭ：Hidden Markov Model ）
による認識には向かず、処理が複雑で計算量が多いとい
う問題があった。By the way, some reports have been made on the recognition of facial expressions by paying attention to the change with time. For example, “M. Rosenblum, Y. Yacoob and L. Davis:“ Human Emotio
n Recognition form Motion Using a Radial Basis Fun
ction Network Architecture ”, Proceedings of the IE
EE Workshop on Motion of Non-rigit and Articulated
"Objects, pp.43-49 (1994)" focuses on changes in the face surface due to facial expressions in moving images, and attempts recognition using optical flow, but since it is a very multidimensional feature,
Hidden Markov Model (HMM)
However, there is a problem that the processing is complicated and the calculation amount is large.

【０００６】ＨＭＭを利用した動画像認識の例として、
たとえば「大和，大谷，石井：“隠れマルコフモデルを
用いた動画像からの人物の行動認識”，信学会論文誌D-
II,Vol.J76-D-II,No.12,pp2556-2563(1993.12.) 」で
は、テニスプレーヤの動き認識を試みているが、画像の
特徴化に用いている輝度値のメッシュ特徴が人物表情の
ように類似度が高い場合は必ずしも有効でなく、類似し
たカテゴリへの分類には弱いなどの課題が残っていた。As an example of moving image recognition using HMM,
For example, “Yamato, Otani, Ishii:“ Recognizing human behavior from moving images using hidden Markov models ””, IEICE Transactions D-
II, Vol.J76-D-II, No.12, pp2556-2563 (1993.12.) ”, Attempts to recognize the movement of a tennis player, but the mesh features of the brightness values used to characterize the image It is not always effective when the degree of similarity is high like facial expressions, and there are problems such as weakness in classification into similar categories.

【０００７】この発明は上記のような問題点を解決する
ためになされたもので、ロバスト性の高い表情検出装置
を提供することを目的とする。The present invention has been made to solve the above problems, and an object of the present invention is to provide a facial expression detection device having high robustness.

【０００８】この発明の他の目的は、顔にマーカーを貼
り付けるなどの前準備を行なうことなく、実時間で顔の
表情を検出できる表情検出装置を提供することである。Another object of the present invention is to provide a facial expression detecting device capable of detecting the facial expression in real time without preparation such as attaching a marker to the face.

【０００９】この発明のさらに他の目的は、処理が単純
で計算量が少ない表情検出装置を提供することである。Still another object of the present invention is to provide a facial expression detection device which is simple in processing and has a small amount of calculation.

【００１０】この発明のさらに他の目的は、ＨＭＭを適
用することにより高い表情認識率を得ることが可能な表
情検出装置を提供することである。Still another object of the present invention is to provide a facial expression detecting device capable of obtaining a high facial expression recognition rate by applying an HMM.

【００１１】[0011]

【課題を解決するための手段】請求項１に係る表情検出
装置は、撮影手段、ウェーブレット変換手段、平均電力
算出手段、および差分算出手段を備える。撮影手段は、
人物の顔を撮影して映像信号を生成する。ウェーブレッ
ト変換手段は、映像信号をウェーブレット変換すること
により所定の各帯域ごとに空間周波数領域の周波数信号
を生成する。平均電力算出手段は、各帯域ごとに周波数
信号の平均電力を算出する。差分算出手段は、平均電力
算出手段から順次与えられる平均電力と、人物の顔が無
表情のときにその顔から得られる対応の平均電力との差
分を算出する。A facial expression detecting apparatus according to a first aspect of the present invention comprises a photographing means, a wavelet transforming means, an average power calculating means, and a difference calculating means. The shooting method is
An image of a person's face is captured and a video signal is generated. The wavelet transform means wavelet transforms the video signal to generate a frequency signal in the spatial frequency domain for each predetermined band. The average power calculation means calculates the average power of the frequency signal for each band. The difference calculation means calculates a difference between the average power sequentially given by the average power calculation means and the corresponding average power obtained from the face of the person when the face is expressionless.

【００１２】請求項２に係る表情検出装置は、上記請求
項１の構成に加えて、抽出手段をさらに備える。抽出手
段は、映像手段から与えられる映像信号の中から人物の
顔の表情が出やすい所定領域に対応する部分を抽出して
ウェーブレット変換手段に与える。A facial expression detecting apparatus according to a second aspect further comprises an extracting means in addition to the configuration of the first aspect. The extraction means extracts a portion corresponding to a predetermined area where the facial expression of the person is likely to appear from the video signal given from the video means and gives it to the wavelet transformation means.

【００１３】[0013]

【作用】請求項１に係る表情検出装置においては、人物
の顔が撮影されて映像信号が生成される。その生成され
た映像信号はウェーブレット変換されることにより所定
の各帯域ごとに空間周波数領域の周波数信号が生成さ
れ、さらに各帯域ごとにその生成された周波数信号の平
均電力が算出される。そして、その算出された平均電力
と、無表情の顔から得られる対応の平均電力との差分が
算出される。このように映像信号を一旦ウェーブレット
変換しているため、ロバスト性の高い表情検出が可能と
なる。In the facial expression detecting apparatus according to the first aspect, the face of a person is photographed and a video signal is generated. The generated video signal is wavelet transformed to generate a frequency signal in the spatial frequency domain for each predetermined band, and the average power of the generated frequency signal is calculated for each band. Then, the difference between the calculated average power and the corresponding average power obtained from the expressionless face is calculated. Since the video signal is once wavelet transformed in this way, facial expression detection with high robustness is possible.

【００１４】請求項２に係る表情検出装置においては、
上記請求項１の作用に加えて、表情が出やすい部分だけ
を映像信号の中から抽出してウェーブレット変換してい
るため、より正確な表情検出が可能となる。In the facial expression detecting device according to claim 2,
In addition to the effect of the first aspect, more accurate facial expression detection is possible because only the portion where the facial expression is likely to appear is extracted from the video signal and wavelet transformed.

【００１５】[0015]

【実施例】以下、この発明に係る表情検出装置の一実施
例を図面を参照して詳しく説明する。なお、図中同一符
号は同一または相当部分を示す。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a facial expression detecting device according to the present invention will be described below in detail with reference to the drawings. In the drawings, the same reference numerals indicate the same or corresponding parts.

【００１６】図１は、この発明の一実施例による表情検
出装置の全体構成を示すブロック図である。図１を参照
して、この表情検出装置は、人間の顔を撮影するための
ＣＣＤ（Charge Coupled Device ）カメラ１０と、ＣＣ
Ｄカメラ１０から与えられるアナログ映像信号ＡＶの中
から高域成分を除去するローパスフィルタ（ＬＰＦ）１
２と、ローパスフィルタ１２から与えられるアナログ低
域映像信号ＡＶＬをデジタル低域映像信号ＤＶＬに変換
するＡ／Ｄ変換器１４と、Ａ／Ｄ変換器１４から与えら
れるデジタル低域映像信号ＤＶＬの中から後述する予め
定められた部分だけを抽出する抽出部１６と、抽出部１
６によって抽出されたデジタル低域映像信号ＤＶＬｅを
ウェーブレット（Wavelet ）変換することにより所定の
各帯域ごとに空間周波数領域の周波数信号Ｆを生成する
ウェーブレットフィルタ１８と、各帯域ごとにその生成
された周波数信号Ｆの平均電力Ｐｃを算出する平均電力
算出部２０と、この平均電力Ｐｃに対応して無表情のと
きの平均電力Ｐｎが予め格納されている定常状態記憶メ
モリ２２と、それら平均電力ＰｃおよびＰｎの差分を算
出することにより特徴ベクトルＦＶを生成する差分算出
部２４とを備える。FIG. 1 is a block diagram showing the overall structure of a facial expression detection device according to an embodiment of the present invention. Referring to FIG. 1, this facial expression detection device includes a CCD (Charge Coupled Device) camera 10 for photographing a human face, and a CC.
Low-pass filter (LPF) 1 for removing high frequency components from the analog video signal AV given from the D camera 10
2, an A / D converter 14 for converting the analog low-pass video signal AVL given from the low-pass filter 12 into a digital low-pass video signal DVL, and a digital low-pass video signal DVL given from the A / D converter 14. An extracting unit 16 for extracting only a predetermined portion, which will be described later, and an extracting unit 1.
A wavelet filter 18 that generates a frequency signal F in the spatial frequency domain for each predetermined band by performing a wavelet transform on the digital low-frequency video signal DVLe extracted by 6, and the generated frequency for each band. An average power calculation unit 20 that calculates the average power Pc of the signal F, a steady-state storage memory 22 that stores in advance the average power Pn when there is no expression corresponding to this average power Pc, and the average power Pc and The difference calculation unit 24 that generates the feature vector FV by calculating the difference of Pn.

【００１７】次に、この表情検出装置の動作について説
明する。まず被験者の顔の正面にＣＣＤカメラ１０を取
付ける。このＣＣＤカメラ１０は、常に定位置から被験
者の顔を撮影するように被験者に装着されたヘルメット
などに固定される。図２は、ＣＣＤカメラ１０によって
撮影した被験者の顔の画像を示す。Next, the operation of this facial expression detecting device will be described. First, the CCD camera 10 is attached to the front of the subject's face. The CCD camera 10 is fixed to a helmet or the like attached to the subject so that the subject's face is always photographed from a fixed position. FIG. 2 shows an image of the subject's face taken by the CCD camera 10.

【００１８】表情を検出するのに先立って、予め無表情
のときの平均電力Ｐｎを定常状態記憶メモリ２２に格納
しておく。この平均電力Ｐｎの詳細は後述する。Prior to detecting a facial expression, the average power Pn when there is no facial expression is stored in the steady state storage memory 22 in advance. Details of the average power Pn will be described later.

【００１９】表情を検出するに当たっては、まずＣＣＤ
カメラ１０によって被験者の顔が撮影され、それにより
生成されたアナログ映像信号ＡＶがローパスフィルタ１
２に与えられる。ローパスフィルタ１２はその与えられ
たアナログ映像信号ＡＶの中から高域成分を除去し、低
域アナログ映像信号ＡＶＬをＡ／Ｄ変換器１４に与え
る。一般に人間の顔には、毛穴、凹凸、産毛などがある
ために、このアナログ映像信号ＡＶにはそれらに起因す
る高域成分が大量に含まれているが、このような高域成
分は顔の表情を検出するためには全く不要なものであ
る。また、ＳＮ比の低いアナログ映像信号ＡＶには大量
のノイズ成分が含まれているが、このようなノイズ成分
のほとんどは高い周波数を有している。したがって、ロ
ーパスフィルタ１２は、表情の検出を行なうために必要
な低域成分のみを通過させることによってこの装置の誤
動作を防止するためのものである。To detect a facial expression, first, a CCD
The subject's face is photographed by the camera 10, and the analog video signal AV generated by the photograph is taken by the low-pass filter 1.
Given to 2. The low-pass filter 12 removes high frequency components from the supplied analog video signal AV and supplies the low frequency analog video signal AVL to the A / D converter 14. Generally, since a human face has pores, irregularities, and hair loss, the analog video signal AV contains a large amount of high-frequency components caused by these signals. It is completely unnecessary for detecting facial expressions. Further, although the analog video signal AV having a low SN ratio contains a large amount of noise components, most of such noise components have a high frequency. Therefore, the low-pass filter 12 is for preventing malfunction of this device by passing only the low-frequency component necessary for detecting the facial expression.

【００２０】Ａ／Ｄ変換器１４は、その与えられる低域
アナログ映像信号ＡＶＬを低域デジタル映像信号ＤＶＬ
に変換する。この低域デジタル映像信号ＤＶＬは抽出部
１６に与えられ、この低域デジタル映像信号ＤＶＬの一
部ＤＶＬｅが抽出される。この実施例では、図２に示さ
れるように目および眉を含む検出領域１６１と口を含む
検出領域１６２とが切り出される。つまり、映像信号Ｄ
ＶＬの中から検出領域１６１および１６２に対応する部
分ＤＶＬｅのみが抽出される。このように目、眉および
口を検出領域と定めたのは、これらが表情による変化が
顕著に現れる部位だからである。したがって検出領域は
上述した領域に限定されることなく、たとえば皺の生じ
やすい額を検出領域に設定してもよい。The A / D converter 14 converts the supplied low frequency analog video signal AVL into the low frequency digital video signal DVL.
Convert to. This low-frequency digital video signal DVL is given to the extraction unit 16, and a part DVLe of this low-frequency digital video signal DVL is extracted. In this embodiment, a detection area 161 including eyes and eyebrows and a detection area 162 including a mouth are cut out as shown in FIG. That is, the video signal D
Only the part DVLe corresponding to the detection areas 161 and 162 is extracted from the VL. The eyes, the eyebrows, and the mouth are defined as the detection areas in this way because these are the portions in which the change due to the facial expression is remarkable. Therefore, the detection area is not limited to the above-described area, and for example, a wrinkle-prone frame may be set as the detection area.

【００２１】このようにして抽出された映像信号ＤＶＬ
ｅはウェーブレットフィルタ１８に与えられ、ここでウ
ェーブレット変換が行なわれる。ウェーブレット変換と
は、画像の周波数領域への変換手法の１種であり、高周
波ほど変換基底幅が狭くなるオーバラップ変換と考えら
れる。画像を周波数領域に変換するウェーブレットフィ
ルタ１８は、帯域分割フィルタと考えることができ、図
３に示すようなフィルタバンクを多段階に組合せること
によって実現することができる。図３に示されるよう
に、１つのフィルタバンクは、水平方向分割用のローパ
スフィルタ２６およびハイパスフィルタ２８と、垂直方
向分割用のローパスフィルタ３４，４２およびハイパス
フィルタ３６，４４と、ダウンサンプラ３０，３２，３
８，４０，４６，４８とを備える。The video signal DVL thus extracted
The e is given to the wavelet filter 18, where the wavelet transform is performed. The wavelet transform is one kind of transforming method to the frequency domain of an image, and is considered to be an overlap transform in which the transform base width becomes narrower at higher frequencies. The wavelet filter 18 for converting an image into the frequency domain can be considered as a band division filter, and can be realized by combining filter banks as shown in FIG. 3 in multiple stages. As shown in FIG. 3, one filter bank includes a low-pass filter 26 and a high-pass filter 28 for horizontal division, low-pass filters 34 and 42 and high-pass filters 36 and 44 for vertical division, and a down sampler 30, 32,3
8, 40, 46, 48.

【００２２】抽出部１６からの映像信号ＤＶＬｅは、原
イメージとしてローパスフィルタ２６およびハイパスフ
ィルタ２８に与えられる。ローパスフィルタ２６を通過
した信号はダウンサンプラ３０によって圧縮され、さら
にローパスフィルタ３４およびハイパスフィルタ３６に
与えられる。ローパスフィルタ３４を通過した信号はダ
ウンサンプラ３８によって圧縮され、さらに次段のフィ
ルタバンクに与えられる。ハイパスフィルタ３６を通過
した信号はダウンサンプラ４０によって圧縮され、図４
に示された周波数帯域１８３に現れる。The video signal DVLe from the extraction unit 16 is given to the low-pass filter 26 and the high-pass filter 28 as an original image. The signal that has passed through the low-pass filter 26 is compressed by the down sampler 30, and is further supplied to the low-pass filter 34 and the high-pass filter 36. The signal that has passed through the low-pass filter 34 is compressed by the down sampler 38, and is further given to the filter bank of the next stage. The signal passed through the high pass filter 36 is compressed by the down sampler 40,
Appears in the frequency band 183 shown in FIG.

【００２３】他方、ハイパスフィルタ２８を通過した信
号はダウンサンプラ３２によって圧縮され、さらにロー
パスフィルタ４２およびハイパスフィルタ４４に与えら
れる。ローパスフィルタ４２を通過した信号はダウンサ
ンプラ４６によって圧縮され、図４に示された周波数帯
域１８２に現れる。ハイパスフィルタ４４を通過した信
号はダウンサンプラ４８によって圧縮され、図４に示さ
れた周波数帯域１８１に現れる。On the other hand, the signal that has passed through the high pass filter 28 is compressed by the down sampler 32, and is further given to the low pass filter 42 and the high pass filter 44. The signal passed through the low pass filter 42 is compressed by the down sampler 46 and appears in the frequency band 182 shown in FIG. The signal passed through the high pass filter 44 is compressed by the down sampler 48 and appears in the frequency band 181 shown in FIG.

【００２４】次段のフィルタバンクに与えられた映像信
号は同様に、図４に示された３つの周波数帯域１８４〜
１８６に分割される。さらにその次の段のフィルタバン
クに与えられた映像信号も同様に、３つの周波数帯域１
８７〜１８９に分割される。この終段のフィルタバンク
におけるローパスフィルタ３４を通過した信号はダウン
サンプラ３８によって圧縮され、図４に示された周波数
帯域１９０に現れる。したがって、周波数帯域１８１に
は最高の空間周波数が含まれ、周波数帯域１９０には最
低の空間周波数が含まれる。Similarly, the video signal supplied to the filter bank of the next stage has three frequency bands 184 to 184 shown in FIG.
It is divided into 186. Similarly, the video signal given to the filter bank at the next stage is also divided into three frequency bands 1.
It is divided into 87 to 189. The signal passed through the low-pass filter 34 in this final stage filter bank is compressed by the down sampler 38 and appears in the frequency band 190 shown in FIG. Therefore, frequency band 181 contains the highest spatial frequencies and frequency band 190 contains the lowest spatial frequencies.

【００２５】図５は、眉および目を含む検出領域１６１
の映像信号が上記のような３段のフィルタバンクを通過
した場合の画像を示す。この画像は可視化するために各
帯域ごとにバイアスをかけている。FIG. 5 shows a detection area 161 including eyebrows and eyes.
2 shows an image when the video signal of 3 has passed through the above three-stage filter bank. This image is biased for each band for visualization.

【００２６】ウェーブレットフィルタ１８中のローパス
フィルタ２６，３４，４２およびハイパスフィルタ２
８，３６，４４には、完全可逆性、線形位相に近い位相
特性、急峻な遮断特性、周波数応答の直交性などの条件
が必要とされるが、これらの条件にはトレードオフがあ
る。この実施例では、遮断特性は比較的低いが直線位相
特性を持った３２次の直交ミラーフィルタが用いられ
る。The low-pass filters 26, 34, 42 and the high-pass filter 2 in the wavelet filter 18
8, 36, and 44 require conditions such as perfect reversibility, phase characteristics close to linear phase, steep cutoff characteristics, and orthogonality of frequency response, but there is a trade-off in these conditions. In this embodiment, a 32nd-order quadrature mirror filter having a linear cutoff characteristic but a relatively low cutoff characteristic is used.

【００２７】図５に示されたようにウェーブレットフィ
ルタ１８を通過した画像では、その周波数特性とともに
画像上での位置情報が保存されている。この情報量は原
画像と同一であるので、これから特徴を得ることは困難
である。また、このまま特徴として用いたとすると、Ｃ
ＣＤカメラ１０の位置の変動や個人差の影響を強く受け
てしまう。そこで、この発明では各帯域ごとの平均電力
の増減のみを特徴とする手法が採用されている。したが
って、ウェーブレットフィルタ１８によって各帯域ごと
に生成された空間周波数Ｆは平均電力算出部２０に与え
られ、ここで空間周波数Ｆの平均電力Ｐｃが各帯域ごと
に算出される。In the image that has passed through the wavelet filter 18 as shown in FIG. 5, the frequency characteristic and the position information on the image are stored. Since this amount of information is the same as the original image, it is difficult to obtain features from this. Moreover, if it is used as a feature as it is, C
The position of the CD camera 10 is strongly affected by variations and individual differences. Therefore, in the present invention, a method characterized by only increasing or decreasing the average power for each band is adopted. Therefore, the spatial frequency F generated by the wavelet filter 18 for each band is given to the average power calculation unit 20, where the average power Pc of the spatial frequency F is calculated for each band.

【００２８】表情が表出される際には顔の構成要素の形
状や大きさ、傾き等が変化する。この変化が各々の周波
数帯域内に与える影響を特徴とする。たとえば被験者が
目を閉じた場合は、それまで電力の存在していた水平周
波数成分が減少し、逆に電力の少なかった垂直高周波成
分が増加する。When the facial expression is expressed, the shape, size, inclination, etc. of the facial constituent elements change. It is characterized by the effect of this change within each frequency band. For example, when the subject closes his / her eyes, the horizontal frequency component in which electric power has been present decreases, and conversely, the vertical high frequency component in which electric power is small increases.

【００２９】表情が変化した場合、高周波数帯域１８１
〜１８３では顕著な電力変化は認められないが、低周波
数帯域１８４〜１８９では顕著な電力変化が認められ
る。これは、ウェーブレットフィルタ１８を通過した後
の高域成分には主に画像のエッジ情報が含まれ、低域成
分には顔の構成要素の形状情報が含まれているからであ
る。このことは、表情変化に伴なうエッジ方向や強度の
変化に比べ、形状そのものの変化のほうが特徴として有
効であることを示す。したがって、この実施例では周波
数帯域１８４〜１８９内の平均電力Ｐｃが好ましく用い
られる。When the facial expression changes, the high frequency band 181
No significant power change is observed in ~ 183, but a significant power change is observed in the low frequency bands 184-189. This is because the high frequency components after passing through the wavelet filter 18 mainly include the edge information of the image, and the low frequency components include the shape information of the face constituent elements. This shows that the change of the shape itself is more effective as a feature than the change of the edge direction and the intensity accompanying the change of the facial expression. Therefore, in this embodiment, the average power Pc in the frequency bands 184-189 is preferably used.

【００３０】図６は、被験者が驚いた場合に口を含む検
出領域１６２から得られた周波数帯域１８５の平均電力
Ｐｃの変化を示すグラフである。この表の横軸はフレー
ム数を示し、縦軸は平均電力を示す。驚きの表情が現れ
た場合、口が縦に開かれるため、帯域１８５に現れる水
平の低域成分の電力が顕著に増加する。FIG. 6 is a graph showing changes in the average power Pc of the frequency band 185 obtained from the detection region 162 including the mouth when the subject is surprised. The horizontal axis of this table shows the number of frames, and the vertical axis shows the average power. When a surprising expression appears, the mouth is opened vertically, so that the power of the horizontal low-frequency component appearing in the band 185 is significantly increased.

【００３１】図７は、被験者が瞬きをした場合に眉およ
び目を含む領域１６１から得られる周波数帯域１８５お
よび１８６の平均電力の変化を示すグラフである。図７
に示されるように、瞬きが行なわれると、目が細くなる
ので帯域１８５に現れる水平の低域成分の平均電力が瞬
間的に減少し、帯域１８６に現れる垂直の低域成分の平
均電力が瞬間的に増加する。FIG. 7 is a graph showing changes in average power of the frequency bands 185 and 186 obtained from the region 161 including the eyebrows and eyes when the subject blinks. Figure 7
As shown in (1), when blinking occurs, the eyes become narrower, so the average power of the horizontal low-frequency components appearing in the band 185 instantaneously decreases, and the average power of the vertical low-frequency components appearing in the band 186 instantaneously decreases. Increase.

【００３２】このように、ウェーブレット変換により得
られた空間周波数の低域成分における平均電力は、顔の
表情に応じて変化する。平均電力算出部２０において算
出されたこのような平均電力Ｐｃは各フレームごとに差
分算出部２４に与えられる。差分算出部２４では、この
算出された平均電力Ｐｃと定常状態記憶メモリ２２から
与えられる無表情の平均電力Ｐｎとの差分が算出され
る。ここで、定常状態記憶メモリ２２には、算出された
平均電力Ｐｃに対応するような平均電力Ｐｎが格納され
ていなければならない。したがって、表情を検出するの
に先立って、まず無表情の顔を撮影し、上記と同様にウ
ェーブレット変換を行ない、さらに帯域１８４〜１８９
の平均電力を算出して定常状態記憶メモリ２２に予め格
納しておく必要がある。したがって、算出された平均電
力Ｐｃと無表情の平均電力Ｐｎとの差分が差分算出部２
４から特徴ベクトルＦＶとして出力される。特徴ベクト
ルＦＶは、無表情を基準とした顔の変位を示す。As described above, the average power in the low frequency component of the spatial frequency obtained by the wavelet transform changes according to the facial expression. Such average power Pc calculated by the average power calculation unit 20 is given to the difference calculation unit 24 for each frame. The difference calculator 24 calculates the difference between the calculated average power Pc and the expressionless average power Pn given from the steady-state storage memory 22. Here, the steady-state storage memory 22 must store an average power Pn corresponding to the calculated average power Pc. Therefore, prior to detecting the facial expression, an expressionless face is first photographed, the wavelet transform is performed in the same manner as described above, and then the bands 184 to 189 are added.
It is necessary to calculate the average power of the above and store it in the steady state storage memory 22 in advance. Therefore, the difference between the calculated average power Pc and the expressionless average power Pn is the difference calculation unit 2
4 is output as a feature vector FV. The feature vector FV indicates the displacement of the face based on the expressionlessness.

【００３３】以上のようにこの実施例によれば、顔の映
像信号を一旦ウェーブレット変換しているため、ロバス
ト性が高く、画面内で顔が多少移動しても、あるいは多
少照明条件が異なっていても、表情を正確に検出するこ
とができる。また、単純な構成のフィルタによってウェ
ーブレット変換が行なわれるため、処理が簡潔で計算量
が少ない。そのため、実時間で表情の検出を行なうこと
ができる。As described above, according to this embodiment, since the face image signal is wavelet-transformed once, the robustness is high, and even if the face moves a little on the screen, or the illumination conditions are slightly different. However, the facial expression can be accurately detected. Moreover, since the wavelet transform is performed by the filter having a simple structure, the processing is simple and the calculation amount is small. Therefore, the facial expression can be detected in real time.

【００３４】また、表情の出やすい眉、目および口を含
む検出領域１６１，１６２のみを抽出しているため、表
情変化と無関係な顔の動きによって誤動作を引起こすこ
とはなく、より正確な表情検出が可能となる。Further, since only the detection areas 161, 162 including the eyebrows, eyes, and mouths, which are likely to express facial expressions, are extracted, a malfunction does not occur due to a facial movement unrelated to facial expression changes, and a more accurate facial expression is obtained. It becomes possible to detect.

【００３５】また、この実施例によって得られた特徴ベ
クトルＦＶをベクトル量子化し、ＨＭＭを用いて表情の
認識を行なうことも可能である。このようにＨＭＭによ
る時間変化と組合せれば、より高い認識率を得ることが
可能となる。It is also possible to perform vector quantization on the feature vector FV obtained in this embodiment and recognize the facial expression using the HMM. In this way, a higher recognition rate can be obtained by combining with the time change by the HMM.

【００３６】[0036]

【発明の効果】請求項１に係る表情検出装置によれば、
顔の映像信号をウェーブレット変換し、それにより得ら
れた空間周波数の平均電力を無表情のときの平均電力と
比較するようにしているため、ロバスト性の高い表情検
出が可能である。According to the facial expression detecting device of the first aspect,
The face image signal is wavelet-transformed, and the average power of the spatial frequency obtained thereby is compared with the average power when there is no expression. Therefore, facial expression detection with high robustness is possible.

【００３７】請求項２に係る表情検出装置によれば、表
情の出やすい映像信号のみを抽出しているため、より正
確な表情検出が可能である。According to the facial expression detecting apparatus of the second aspect, since only the video signal in which the facial expression is likely to appear is extracted, the facial expression can be detected more accurately.

[Brief description of drawings]

【図１】この発明の実施例による表情検出装置の全体構
成を示すブロック図である。FIG. 1 is a block diagram showing an overall configuration of a facial expression detection device according to an embodiment of the present invention.

【図２】図１中のＣＣＤカメラによって撮影された画像
であり、その中には図１中の抽出部によって抽出される
検出領域が示される。2 is an image taken by the CCD camera in FIG. 1, in which a detection region extracted by the extraction unit in FIG. 1 is shown.

【図３】図１中のウェーブレットフィルタを構成する１
つのフィルタバンクの構成を示すブロック図である。FIG. 3 is a block diagram of the wavelet filter in FIG.
It is a block diagram which shows the structure of one filter bank.

【図４】図１中のウェーブレットフィルタによって得ら
れた空間周波数の画像の構成図である。FIG. 4 is a structural diagram of an image of spatial frequency obtained by the wavelet filter in FIG.

【図５】図２に示された眉および目を含む検出領域の映
像信号が実際に図１中のウェーブレットフィルタを通過
した場合における空間周波数の画像である。5 is an image of a spatial frequency when the video signal in the detection region including the eyebrows and eyes shown in FIG. 2 actually passes through the wavelet filter in FIG.

【図６】被験者が怒った場合に図１中の平均電力算出部
から得られる平均電力の変化を示すグラフである。FIG. 6 is a graph showing changes in average power obtained from the average power calculation unit in FIG. 1 when the subject is angry.

【図７】被験者が瞬きをした場合に図１中の平均電力算
出部から得られる平均電力の変化を示すグラフである。FIG. 7 is a graph showing changes in average power obtained from the average power calculation unit in FIG. 1 when the subject blinks.

[Explanation of symbols]

１０ＣＣＤカメラ１２ローパスフィルタ１４Ａ／Ｄ変換器１６抽出部１８ウェーブレットフィルタ２０平均電力算出部２２定常状態記憶メモリ２４差分算出部１８１〜１９０周波数帯域ＡＶアナログ映像信号ＡＶＬ低域アナログ映像信号ＤＶＬ，ＤＶＬｅ低域デジタル映像信号Ｆ空間周波数Ｐｃ，Ｐｎ平均電力 10 CCD camera 12 Low-pass filter 14 A / D converter 16 Extraction unit 18 Wavelet filter 20 Average power calculation unit 22 Steady-state storage memory 24 Difference calculation unit 181 to 190 Frequency band AV analog video signal AVL Low frequency analog video signal DVL, DVLe Low range digital video signal F Spatial frequency Pc, Pn Average power

Claims

[Claims]

1. A photographing means for photographing a face of a person to generate a video signal, and a wavelet transforming means for wavelet transforming the video signal to generate a frequency signal in a spatial frequency domain for each predetermined band. An average power calculation means for calculating the average power of the frequency signal for each of the bands, the average power sequentially given from the average power calculation means,
A facial expression detection device comprising: a difference calculation unit that calculates a difference from a corresponding average power obtained from the face when the person's face has no expression.

2. An extraction means for extracting a portion corresponding to a predetermined region where the facial expression of the person is likely to appear from the video signal given from the photographing means and giving it to the wavelet transformation means. The facial expression detection device according to claim 1.