JP4942197B2

JP4942197B2 - Template creation apparatus, facial expression recognition apparatus and method, program, and recording medium

Info

Publication number: JP4942197B2
Application number: JP2007284029A
Authority: JP
Inventors: 和弘大塚; 淳司大和; 美奈子澤木; 英作前田; 洋一佐藤; 史朗熊野
Original assignee: Nippon Telegraph and Telephone Corp; University of Tokyo NUC
Current assignee: Nippon Telegraph and Telephone Corp; University of Tokyo NUC
Priority date: 2007-10-31
Filing date: 2007-10-31
Publication date: 2012-05-30
Anticipated expiration: 2027-10-31
Also published as: JP2009110426A

Description

本発明は、画像処理に基づく人物表情認識技術に係るテンプレート作成装置及び表情認識装置並びにその方法、プログラム及び記録媒体に関する。 The present invention relates to a template creation device, a facial expression recognition device, a method, a program, and a recording medium according to a human facial expression recognition technique based on image processing.

人物の表情認識技術はヒューマンマシンインタラクションの重要な要素技術の一つであり、顔の向きを限定することなく表情を認識可能な手法の実現が期待されている。これに対し、これまでに開発された人物表情認識手法のほとんどが正面顔に限定されたものであり、顔の向きの変化に対応できていない。また、一部の手法では顔向きの変化に対応できるものの、対象人物毎に詳細な３次元顔形状モデルが必要であり、事前の登録手続きに時間がかかるという課題が存在した。 Human facial expression recognition technology is one of the important elemental technologies of human-machine interaction, and it is expected to realize a method that can recognize facial expressions without limiting the face orientation. On the other hand, most of the human facial expression recognition methods that have been developed so far are limited to the front face and cannot cope with changes in the orientation of the face. In addition, although some methods can cope with changes in face orientation, a detailed three-dimensional face shape model is required for each target person, and there is a problem that it takes time for a prior registration procedure.

例えば、非特許文献１では、実時間で頭部姿勢及び表情を推定する手法が提案されている。しかし、追跡する眼や口の端点といった特徴点の３次元座標をステレオ視によって算出するため、単眼で撮影された既存のビデオ映像等に適用できないという課題がある。 For example, Non-Patent Document 1 proposes a method for estimating the head posture and facial expression in real time. However, since the three-dimensional coordinates of the feature points such as the eye to be tracked and the end points of the mouth are calculated by stereo vision, there is a problem that it cannot be applied to an existing video image taken with a single eye.

非特許文献２では、実時間で頭部姿勢及び表情を推定する他の手法が提案されているが、追跡する眼や口の端点といった特徴点の３次元座標の複数人についての平均値を、事前にステレオ視やレンジスキャナ等を用いて計測しておく必要がある。さらに、変動の大きな個人間の顔形状や表情変化を精度よく近似できないという課題がある。
S. B. Gokturk, C. Tomasi, B. Girod and J. -Y. Bouguet: "Model-based face tracking for view-independent facial expression recognition", Proc. of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 287-293 （2002）. F. Dornaika and F. Davoine: "Simultaneous facial action tracking and expression recognition using a particle filter", Proc. of the Tenth IEEE International Conference on Computer Vision, 2, pp. 1733-1738 （2005）. Non-Patent Document 2 proposes another method for estimating the head posture and facial expression in real time, but the average value for a plurality of three-dimensional coordinates of feature points such as the eye and mouth end points to be tracked, It is necessary to measure in advance using stereo vision or a range scanner. Furthermore, there is a problem that face shapes and facial expression changes between individuals with large fluctuations cannot be accurately approximated.
SB Gokturk, C. Tomasi, B. Girod and J.-Y.Bouguet: "Model-based face tracking for view-independent facial expression recognition", Proc. Of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 287-293 (2002). F. Dornaika and F. Davoine: "Simultaneous facial action tracking and expression recognition using a particle filter", Proc. Of the Tenth IEEE International Conference on Computer Vision, 2, pp. 1733-1738 (2005).

顔向きを限定せずに表情を認識しようとする場合、顔向き変化に影響を受けない形で画像特徴量を定義することが必要となる。顔の３次元形状があらかじめ与えられている場合、顔形状を用いた画像の正規化によって顔向きの影響を取り除くことができる。しかし、人物毎に正確な顔形状モデルを準備するのはコストが大きく実用上問題となる。そのため、大まかな幾何形状モデルのみを用いた表情認識手法の開発が望まれる。 When an expression is to be recognized without limiting the face direction, it is necessary to define the image feature amount in a form that is not affected by changes in the face direction. When the three-dimensional shape of the face is given in advance, the influence of the face direction can be removed by normalizing the image using the face shape. However, preparing an accurate face shape model for each person is expensive and problematic in practice. Therefore, it is desired to develop a facial expression recognition method using only a rough geometric model.

上記課題を解決するため、請求項１記載の発明は、入力された顔画像に基づいて顔の形状を近似する形状モデルを作成する形状モデル作成部と、形状モデル作成部が作成した形状モデル上で、各顔部品のエッジに基づいて注目点を抽出することにより、複数の注目点からなる注目点群を抽出する注目点抽出部と、複数の表情に対して、注目点抽出部で抽出された注目点群の輝度変化のモデルを表すテンプレートを作成する輝度分布モデル作成部とを備えることを特徴とするテンプレート作成装置である。 In order to solve the above-mentioned problem, the invention described in claim 1 includes a shape model creation unit that creates a shape model that approximates the shape of a face based on an input face image, and a shape model created by the shape model creation unit. Then, by extracting the attention point based on the edge of each face part, the attention point extraction unit for extracting the attention point group composed of a plurality of attention points and the attention point extraction unit for the plurality of facial expressions are extracted. And a luminance distribution model creating unit that creates a template representing a luminance change model of the target point group.

上記課題を解決するため、請求項１記載の発明は、入力された顔画像に基づいて顔の形状を近似する形状モデルを作成する形状モデル作成部と、形状モデル作成部が作成した形状モデル上で所定の条件を満たす複数の注目点からなる注目点群を抽出する注目点抽出部と、複数の表情に対して、注目点抽出部で抽出された注目点群の輝度変化のモデルを表すテンプレートを作成する輝度分布モデル作成部とを備えることを特徴とするテンプレート作成装置である。 In order to solve the above-mentioned problem, the invention described in claim 1 includes a shape model creation unit that creates a shape model that approximates the shape of a face based on an input face image, and a shape model created by the shape model creation unit. A point-of-interest extraction unit that extracts a point-of-interest group consisting of a plurality of points of interest that satisfy a predetermined condition in FIG. 5 and a template that represents a brightness change model of the point-of-interest group extracted by the point of interest extraction unit for a plurality of facial expressions And a luminance distribution model creation unit for creating a template.

請求項２記載の発明は、前記注目点抽出部が、形状モデル作成部で入力された顔の正面画像に基づいて作成された形状モデル上で注目点を抽出するものであることを特徴とする。 The invention according to claim 2 is characterized in that the attention point extraction unit extracts the attention point on the shape model created based on the front image of the face input by the shape model creation unit. .

請求項３記載の発明は、前記輝度分布モデル作成部が、前記顔の正面画像に基づいて作成された形状モデル上で抽出された注目点群の輝度変化のモデルを表すテンプレートを作成するものであることを特徴とする。 According to a third aspect of the present invention, the brightness distribution model creation unit creates a template representing a brightness change model of a point of interest group extracted on a shape model created based on the front image of the face. It is characterized by being.

請求項４記載の発明は、入力された顔画像に基づいて顔の形状を近似する形状モデルを作成する形状モデル作成部と、形状モデル作成部が作成した形状モデル上で所定の条件を満たす複数の注目点からなる注目点群を抽出する注目点抽出部と、複数の表情に対して、注目点抽出部で抽出された注目点群の輝度変化のモデルを表すテンプレートを作成する輝度分布モデル作成部とを有するテンプレート作成部と、入力された顔画像に基づいて頭部姿勢及び表情の状態分布を予測する予測部と、予測した状態分布をテンプレート作成部が作成したテンプレートを用いて評価する更新部と、評価した状態分布から頭部姿勢及び表情の推定値を算出する推定値算出部とを有する表情及び頭部姿勢推定部とを備えることを特徴とする表情認識装置である。 According to a fourth aspect of the present invention, a shape model creation unit that creates a shape model that approximates the shape of a face based on an input face image, and a plurality of conditions that satisfy a predetermined condition on the shape model created by the shape model creation unit A point-of-interest extraction unit that extracts a point-of-interest group consisting of a plurality of points of interest, and a luminance distribution model creation that creates a template that represents a model of the luminance change of the point-of-interest group extracted by the point-of-interest extraction unit A template creation unit having a part, a prediction part for predicting a state distribution of a head posture and a facial expression based on an input face image, and an update for evaluating the predicted state distribution using a template created by the template creation part A facial expression and head posture estimation unit, and an estimated value calculation unit that calculates an estimated value of the head posture and facial expression from the evaluated state distribution

請求項５記載の発明は、入力された顔画像に基づいて顔の形状を近似する形状モデルを作成する形状モデル作成過程と、形状モデル作成過程で作成した形状モデル上で、各顔部品のエッジに基づいて注目点を抽出することにより、複数の注目点からなる注目点群を抽出する注目点抽出過程と、複数の表情に対して、注目点抽出過程で抽出された注目点群の輝度変化のモデルを表すテンプレートを作成する輝度分布モデル作成過程とを有することを特徴とするテンプレート作成方法である。
The invention according to claim 5 is a shape model creation process for creating a shape model that approximates the shape of a face based on an input face image, and an edge of each face part on the shape model created in the shape model creation process. The attention point extraction process that extracts the attention point group consisting of a plurality of attention points by extracting the attention points based on the target, and the luminance change of the attention point group extracted in the attention point extraction process for a plurality of facial expressions And a luminance distribution model creation process for creating a template representing the model of the template.

請求項６記載の発明は、入力された顔画像に基づいて顔の形状を近似する形状モデルを作成する形状モデル作成過程と、形状モデル作成過程で作成した形状モデル上で所定の条件を満たす複数の注目点からなる注目点群を抽出する注目点抽出過程と、複数の表情に対して、注目点抽出過程で抽出された注目点群の輝度変化のモデルを表すテンプレートを作成する輝度分布モデル作成過程と、入力された顔画像に基づいて頭部姿勢及び表情の状態分布を予測する予測過程と、予測した状態分布をテンプレート作成過程で作成したテンプレートを用いて評価する更新過程と、評価した状態分布から頭部姿勢及び表情の推定値を算出する推定値算出過程とを有することを特徴とする表情認識方法である。 According to a sixth aspect of the present invention, a shape model creation process for creating a shape model that approximates the shape of a face based on an input face image, and a plurality of conditions that satisfy a predetermined condition on the shape model created in the shape model creation process A point of interest extraction process that extracts a group of points of interest consisting of a plurality of points of interest, and a luminance distribution model that creates a template that represents the brightness change model of the points of interest extracted in the point of interest extraction process for multiple facial expressions Process, prediction process for predicting the state distribution of head posture and facial expression based on the input face image, update process for evaluating the predicted state distribution using the template created in the template creation process, and the evaluated state It is an expression recognition method characterized by having an estimated value calculation process of calculating an estimated value of a head posture and an expression from a distribution.

請求項７記載の発明は、請求項１〜４のいずれか１項に記載の装置がコンピュータを用いて構成されるものであって、そのコンピュータにおいて実行されるプログラムである。 The invention described in claim 7 is a program executed by the apparatus according to any one of claims 1 to 4 using a computer.

請求項８記載の発明は、請求項１〜４のいずれか１項に記載の装置がコンピュータを用いて構成されるものであって、そのコンピュータにおいて実行されるプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The invention according to claim 8 is a computer-readable recording in which the apparatus according to any one of claims 1 to 4 is configured using a computer, and records a program executed in the computer. It is a medium.

本発明によれば、入力された顔画像に基づいて顔の形状を近似する形状モデルを作成し、さらに作成した形状モデル上で所定の条件を満たす複数の注目点からなる注目点群を抽出して、複数の表情に対して抽出された注目点群の輝度変化のモデルを表すテンプレートを作成するようにしたので、そのテンプレートを用いることで、表情の変動に対して注目点における明るさがどのように変化するかということに基づき、大きな姿勢変動を伴う場合においても表情を正確に認識可能である。特に、他の手法のように精密な顔形状を必要とせず、各人物ごとのモデルの登録も容易であるというメリットがある。 According to the present invention, a shape model that approximates the shape of a face is created based on an input face image, and a point of interest group that includes a plurality of points of interest that satisfy a predetermined condition on the created shape model is extracted. In addition, a template that represents the brightness change model of the point of interest group extracted for multiple facial expressions was created. The facial expression can be accurately recognized even when a large posture change is involved. In particular, there is a merit that a precise face shape is not required unlike other methods, and a model for each person can be easily registered.

以下、本発明の実施の形態について図面を参照して説明する。図１は、本実施の形態における表情認識装置1の全体構成を示すブロック図である。図１において、入力部2において、撮影手段としてのCCD（電荷結合素子）カメラなどを用いて撮影された被験者の顔の動画像が表情認識装置1に与えられる。すなわち、入力部2は、撮影された動画像を所定のサンプリング周期でサンプリングしてデータ化し、各時刻のデータからなるデータ列を表情認識装置1に供給する。そして、表情認識装置1によって実時間で推定された頭部姿勢及び表情の推定値3が出力部4から出力される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an overall configuration of a facial expression recognition device 1 according to the present embodiment. In FIG. 1, in the input unit 2, a moving image of the face of a subject photographed using a CCD (charge coupled device) camera or the like as photographing means is given to the facial expression recognition device 1. That is, the input unit 2 samples a captured moving image at a predetermined sampling period to convert it into data, and supplies a data string including data at each time to the facial expression recognition device 1. Then, an estimated value 3 of the head posture and facial expression estimated in real time by the facial expression recognition device 1 is output from the output unit 4.

表情認識装置１は、図１に示すとおり、事前にテンプレート11を作成するテンプレート作成部12、及び、動画像中の人物の頭部姿勢及び表情を同時に推定する頭部姿勢及び表情推定部13からなる。これらの機能はソフトウェア処理によって実現される。 As shown in FIG. 1, the facial expression recognition device 1 includes a template creation unit 12 that creates a template 11 in advance, and a head posture and facial expression estimation unit 13 that simultaneously estimates the head posture and facial expression of a person in a moving image. Become. These functions are realized by software processing.

テンプレート作成部12は、顔形状を近似する形状モデルを作成する形状モデル作成部121、形状モデル作成部121が作成した形状モデル上で所定の条件を満たす複数の注目点からなる注目点群を抽出する注目点抽出部122、及び、ユーザの各表情における各注目点の輝度変化の分布のモデルを表すテンプレートを準備する輝度分布モデル作成部123を備える。この注目点抽出部122が抽出した注目点は、頭部姿勢及び表情の推定に使用される。 The template creation unit 12 extracts a point of interest group composed of a plurality of points of interest that satisfy a predetermined condition on the shape model created by the shape model creation unit 121 and the shape model created by the shape model creation unit 121. And a luminance distribution model creation unit 123 that prepares a template representing a distribution model of the luminance change of each attention point in each facial expression of the user. The attention point extracted by the attention point extraction unit 122 is used for estimation of the head posture and facial expression.

頭部姿勢及び表情推定部13は、パーティクルフィルタの枠組みに基づき、入力された顔画像に基づいて頭部姿勢及び表情の状態分布を予測することで現時刻での頭部姿勢及び表情の状態を予測する予測部131、予測した状態分布をテンプレート作成部12が作成したテンプレート11を用いて評価する（頭部姿勢及び表情の状態とテンプレートの各状態との尤度を評価する）ことで、その予測状態に対する評価を行う更新部132、及び、予測状態及びその評価をもとに頭部姿勢及び表情の推定値を算出する推定値算出部133を備える。 The head posture and facial expression estimation unit 13 predicts the head posture and facial expression state distribution based on the input face image based on the particle filter framework to determine the head posture and facial expression state at the current time. By predicting the prediction unit 131 and evaluating the predicted state distribution using the template 11 created by the template creation unit 12 (evaluating the likelihood of the head posture and facial expression state and each state of the template) An update unit 132 that performs evaluation on the predicted state, and an estimated value calculation unit 133 that calculates estimated values of the head posture and facial expression based on the predicted state and its evaluation.

図２は図１に示した表情認識装置1のテンプレート作成部12のより具体的な処理手順を示すフローチャートである。図２において、入力データ124及び127は図１の入力部2から入力される画像データであり、出力データ125、126及び128は図１のテンプレート11を構成するデータである。以下、図２に沿ってテンプレート作成部12の説明を行う。 FIG. 2 is a flowchart showing a more specific processing procedure of the template creation unit 12 of the facial expression recognition apparatus 1 shown in FIG. In FIG. 2, input data 124 and 127 are image data input from the input unit 2 in FIG. 1, and output data 125, 126, and 128 are data constituting the template 11 in FIG. Hereinafter, the template creation unit 12 will be described with reference to FIG.

形状モデル作成部121では、正面・無表情の顔画像124に基づいて顔形状を近似する媒介変数表示可能な形状モデル125を作成する。本実施の形態では、パラメータが１つのみで作成が容易であり、平面よりも実際の顔形状に近い円柱を形状モデルとして用いる。円柱の半径は、ユーザが無表情かつ顔をカメラに正対させた状態で撮像された１枚の顔画像において、たとえば文献（P. A. Viola and M. J. Jones: "Rapid object detection using a boosted cascade of simple features," Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 511-518（2001））に記載されているような既存の手法を用いて自動検出された顔領域の幅に既定の定数を乗じた値とする。 The shape model creation unit 121 creates a shape model 125 that can display parametric variables that approximate the face shape based on the face image 124 of the front and no expression. In the present embodiment, the creation is easy with only one parameter, and a cylinder closer to the actual face shape than the plane is used as the shape model. The radius of the cylinder is calculated based on a single facial image captured with the user facing no expression and the face facing the camera. For example, PA Viola and MJ Jones: “Rapid object detection using a boosted cascade of simple features , "Proc. Of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 511-518 (2001)). The value multiplied by a constant.

なお、本実施の形態では形状モデルとして円柱を用いているが、本発明の枠組みでは、平面や楕円体などの任意の媒介変数表示可能な形状モデルを用いることが可能である。また、形状モデル作成部121にてその場で形状モデルを作成するかわりに、事前に用意されている実際のユーザの顔の３次元データや複数人の平均顔形状など任意の形状を形状モデルとして用いることも可能である。 In this embodiment, a cylinder is used as the shape model. However, in the framework of the present invention, a shape model capable of displaying any parameter such as a plane or an ellipsoid can be used. In addition, instead of creating a shape model on the spot by the shape model creation unit 121, any shape such as 3D data of an actual user's face prepared in advance or an average face shape of a plurality of people is used as a shape model. It is also possible to use it.

注目点抽出部122では、形状モデル125で用いた正面かつ無表情の顔画像124において、頭部姿勢及び表情の推定に使用する所定の条件を満たす注目点を複数抽出し、それらの画像座標を記憶する。 The attention point extraction unit 122 extracts a plurality of attention points that satisfy predetermined conditions used for estimating the head posture and expression in the frontal and expressionless face image 124 used in the shape model 125, and calculates the image coordinates thereof. Remember.

輝度分布モデル作成部123では、正面・各表情の複数の顔画像127に対して輝度分布モデル128が作成される。表情輝度分布モデル128とは、図３に示すように、各注目点（61a、62a、61b、62b等）の輝度が各表情によってどのように変化するかをモデル化したものである。本実施の形態では、この表情輝度分布モデル128を用意することで、輝度分布が各表情によって異なることを利用して、入力画像における注目点の輝度のパターンからそのときの表情の推定を行う。 In the luminance distribution model creation unit 123, a luminance distribution model 128 is created for a plurality of face images 127 of the front and each expression. The facial expression luminance distribution model 128 is a model of how the luminance of each point of interest (61a, 62a, 61b, 62b, etc.) changes with each facial expression as shown in FIG. In the present embodiment, by preparing this facial expression luminance distribution model 128, the facial expression at that time is estimated from the luminance pattern of the point of interest in the input image by utilizing the fact that the luminance distribution varies depending on each facial expression.

ここでは、各注目点の輝度の独立性を仮定して、それぞれの輝度を正規分布にて表現し、さらに、それらの分布がそれぞれ表情によって変化するものとする。輝度を正規分布にて表現するのは、輝度が形状モデル誤差や撮像系ノイズなどによる複合的な要因による広がりを持つという考えによる。ただし、輝度の表現は、正規分布に限らず、単峰性をもつ他の確率分布（例えばコーシー分布など）を用いて行うことも可能である。 Here, assuming the independence of the luminance of each point of interest, each luminance is expressed by a normal distribution, and further, the distribution changes according to the expression. The reason why the luminance is expressed by a normal distribution is based on the idea that the luminance has a spread due to complex factors such as a shape model error and imaging system noise. However, the expression of luminance is not limited to the normal distribution, but can be performed using another probability distribution having a single peak (for example, a Cauchy distribution).

図３は、正面かつ無表情の顔画像124に基づいて作成された形状モデル125上の注目点61a及び62aの輝度が、形状モデル125と同一形式のモデルであって正面かつ怒りの表情を表す形状モデル7上の同一座標の注目点61b及び62bにおいてどのように変化するのかを説明するための概念図である。この場合、形状モデル125には、右眉に対応する顔部品51a、左眉に対応する顔部品52a、右目に対応する顔部品53a、左目に対応する顔部品54a及び口に対応する顔部品55aの各領域が図に示すような位置に対応づけられているものとする。また、注目点61a及び62aは、左眉の顔部品52aの外側（注目点61a）と内側（注目点62a）に設定されている。この場合、図３の下部に示すように、注目点の輝度とその確率は、左眉の内側に位置する注目点62aの輝度が、左眉の外側に位置する注目点61aの輝度よりも低い値となるような分布となっている。 FIG. 3 is a model of the same type as that of the shape model 125 in which the brightness of the attention points 61a and 62a on the shape model 125 created based on the front and expressionless facial image 124, and represents a front and angry expression. 7 is a conceptual diagram for explaining how the attention points 61b and 62b having the same coordinates on the shape model 7 change. FIG. In this case, the shape model 125 includes a face part 51a corresponding to the right eyebrow, a face part 52a corresponding to the left eyebrow, a face part 53a corresponding to the right eye, a face part 54a corresponding to the left eye, and a face part 55a corresponding to the mouth. It is assumed that each area is associated with a position as shown in the figure. The attention points 61a and 62a are set outside (attention point 61a) and inside (attention point 62a) of the face part 52a of the left eyebrow. In this case, as shown in the lower part of FIG. 3, the luminance of the attention point and the probability thereof are such that the luminance of the attention point 62a located inside the left eyebrow is lower than the luminance of the attention point 61a located outside the left eyebrow. The distribution is a value.

一方、無表情から怒りに顔表情が変化した場合、正面かつ怒りの表情の形状モデル7上の注目点61aと同一座標の注目点61bと、注目点62aと同一座標の注目点62bは、ともに左眉に対応する顔部品52bの上側に位置するため、注目点61bと注目点62bにおける輝度は互いに重なる部分を持つ近い分布を有している。 On the other hand, when the facial expression changes from expressionless to angry, the attention point 61b of the same coordinate as the attention point 61a on the front and angry facial expression shape model 7 and the attention point 62b of the same coordinate as the attention point 62a are both Since it is located above the face part 52b corresponding to the left eyebrow, the luminance at the attention point 61b and the attention point 62b has a close distribution with portions overlapping each other.

なお、正面かつ怒りの表情の形状モデル7には、左眉に対応する顔部品52bのほかに、右眉に対応する顔部品51b、右目に対応する顔部品53b、左目に対応する顔部品54b及び口に対応する顔部品55bが図に示すような位置に対応づけられているものとする。 In addition to the face part 52b corresponding to the left eyebrow, the face model 51b corresponding to the right eyebrow, the face part 53b corresponding to the right eye, and the face part 54b corresponding to the left eye are included in the shape model 7 of the front and angry expression. It is also assumed that the face part 55b corresponding to the mouth is associated with a position as shown in the figure.

図２の輝度分布モデル作成部123で作成された輝度分布モデル128は、ユーザが前述した形状モデル作成部121及び注目点抽出部122で撮影された状態（顔をカメラに正対させた状態）のまま頭部を静止させた状態で、各表情を順に表出した複数の顔画像127から獲得される。このような方法とするのは、頭部を静止させた状態では各注目点の位置が固定されるため、それらの画素の輝度から各表情における各注目点の輝度を容易に獲得することができることによる。ここでは、たとえば上述した方法によって得られる各表情における各注目点の輝度を、それぞれの正規分布の平均として使用し、標準偏差を平均の1/6の値とするような校正処理を行う。 The luminance distribution model 128 created by the luminance distribution model creation unit 123 in FIG. 2 is a state in which the user has been photographed by the shape model creation unit 121 and the attention point extraction unit 122 described above (a state in which the face faces the camera). The facial images are acquired from a plurality of facial images 127 in which each facial expression is displayed in order with the head still. This is because the position of each point of interest is fixed when the head is stationary, so that the luminance of each point of interest in each facial expression can be easily obtained from the luminance of those pixels. by. Here, for example, the luminance of each attention point in each facial expression obtained by the above-described method is used as the average of the respective normal distributions, and the calibration process is performed so that the standard deviation is 1/6 of the average.

図４は、図２に示した注目点抽出部122のより具体的な処理手順を示すフローチャートである。以下、図４に沿って注目点抽出部122の説明を行う。まず、顔画像にラプラシアン−ガウシアンフィルタを施し（ステップS11）、その画像におけるゼロ交差する画素をエッジとして検出する（ステップS12）。次いで、そのエッジ画像の顔領域内において、図５のような点対の集合を注目点集合として抽出する（ステップS13〜S15）。すなわち、
まず、各顔部品の領域のエッジを跨ぎ、かつ、中心がエッジ上に存在する点対の全てを注目点対候補とする（ステップS13）。次に、点対候補を、点対内の点の輝度差の大きい順に並び変える（ステップS14）。そして、輝度差が大きい点対候補から順に、既に選択された点対の中心から規定距離離れていれば選択する（ステップS15）。このようにして、図５のような点対（10-1等）の集合が注目点集合（注目点群）として抽出される。 FIG. 4 is a flowchart showing a more specific processing procedure of the attention point extraction unit 122 shown in FIG. Hereinafter, the attention point extraction unit 122 will be described with reference to FIG. First, a Laplacian-Gaussian filter is applied to the face image (step S11), and a zero-crossing pixel in the image is detected as an edge (step S12). Next, in the face area of the edge image, a set of point pairs as shown in FIG. 5 is extracted as a set of attention points (steps S13 to S15). That is,
First, all point pairs that straddle the edge of each facial part region and that have the center on the edge are set as target point pair candidates (step S13). Next, the point pair candidates are rearranged in descending order of the luminance difference between the points in the point pair (step S14). Then, in order from the point pair candidate having the largest luminance difference, selection is made if it is a prescribed distance away from the center of the already selected point pair (step S15). In this way, a set of point pairs (such as 10-1) as shown in FIG. 5 is extracted as a point of interest set (point of interest group).

図５は、ステップS12でエッジとして検出された正面・無表情の入力画像124における顔領域8を示す図である。図５の顔領域8には、右眉に対応する顔部品の領域81、左眉に対応する顔部品の領域82、右目に対応する顔部品の領域83、左目に対応する顔部品の領域84、鼻に対応する顔部品の領域85及び口に対応する顔部品の領域86がそれぞれ存在している。また、顔領域8上には抽出された複数の注目点9、9、…が示されている。これらの注目点9、9、…は、各顔部品の領域81〜85の端部（エッジ）を跨ぐ形で対応づけられた１対の注目点9、9からなる点対（例えば注目点9-1と注目点9-2からなる点対10-1））をそれぞれ形成している。 FIG. 5 is a diagram showing the face region 8 in the front / expressionless input image 124 detected as an edge in step S12. The face area 8 in FIG. 5 includes a face part area 81 corresponding to the right eyebrow, a face part area 82 corresponding to the left eyebrow, a face part area 83 corresponding to the right eye, and a face part area 84 corresponding to the left eye. There are a face part region 85 corresponding to the nose and a face part region 86 corresponding to the mouth. Further, a plurality of extracted attention points 9, 9,... Are shown on the face area 8. These attention points 9, 9,... Are point pairs (for example, attention point 9) consisting of a pair of attention points 9, 9 associated with each other so as to straddle the ends (edges) of the regions 81 to 85 of each face part. -1 and a point of interest 9-2) 10-1)).

本実施の形態では、このような注目点9、9、…の集合を、以下の条件を満たす点対の候補のうち、点対内の２点の輝度差が大きいものから順に、一定数の点対を選択するという方法で自動的に抽出する。その条件とは以下の３つである。 In the present embodiment, such a set of attention points 9, 9,... Is a certain number of points in order from the point pair candidates satisfying the following conditions in descending order of the luminance difference between the two points. It is automatically extracted by selecting a pair. The conditions are the following three.

（C1）エッジを跨いでいる。（C2）点対を構成する各注目点間の中心（点対の中心）が顔部品のエッジ上にある。（C3）点対の中心が既に選択された全ての点対の中心から規定の距離以上離れている。 (C1) Crosses the edge. (C2) The center between the points of interest that make up the point pair (the center of the point pair) is on the edge of the face part. (C3) The center of the point pair is more than the specified distance from the centers of all the already selected point pairs.

図５は、上記方法によって抽出される注目点9、9、…の配置の例を示したものである。 FIG. 5 shows an example of the arrangement of attention points 9, 9,... Extracted by the above method.

なお、本実施の形態では、エッジ検出方法をラプラシアン−ガウシアンフィルタ画像のゼロ交差する画素としたが、ソーベルフィルタの使用など他のエッジ検出方法でも構わない。また、必ずしも注目点集合をエッジを中心とした点対の集合として構成する必要はなく、図６（Ｂ）のように図５の左眉に対応する顔部品の領域85のエッジによって分割される両領域（内側の領域と外側の領域）にそれぞれ注目点9、9、…がエッジから適切な距離離れた位置に配置されるよう定義すればよい。 In the present embodiment, the edge detection method is a pixel that crosses zero in the Laplacian-Gaussian filter image, but other edge detection methods such as the use of a Sobel filter may be used. Further, it is not always necessary to configure the attention point set as a set of point pairs centered on the edge, and it is divided by the edge of the face part region 85 corresponding to the left eyebrow in FIG. 5 as shown in FIG. 6B. What is necessary is just to define so that the attention points 9, 9,.

なお、図６は、複数の注目点を選択して作成した注目点集合を、上記の条件を満たす点対10、10、…から構成する例（図６（Ａ））と、顔部品のエッジから適切な距離離れた位置にある複数の注目点9、9、…から構成する例（図６（Ｂ））とを示している。 6 shows an example (FIG. 6A) in which an attention point set created by selecting a plurality of attention points is composed of point pairs 10, 10,... Satisfying the above conditions, and an edge of a face part. FIG. 6B shows an example (FIG. 6B) configured from a plurality of attention points 9, 9,...

次いで、図１の頭部姿勢及び表情推定部13の具体的な処理を説明する。本実施の形態では、各時刻における頭部姿勢及び表情は、その時刻での顔画像が与えられた時に条件付き独立であり、ともに１次のマルコフ過程に従う確率変数であるものとする。このときの頭部姿勢h、表情e及び顔画像zの関係を動的ベイズネットワークにて表現すると図７のようになる。 Next, specific processing of the head posture and facial expression estimation unit 13 in FIG. 1 will be described. In the present embodiment, it is assumed that the head posture and facial expression at each time are conditionally independent when a face image at that time is given, and both are random variables that follow a first-order Markov process. If the relationship between the head posture h, the facial expression e, and the face image z at this time is expressed by a dynamic Bayesian network, it is as shown in FIG.

本実施の形態では、時刻tでの頭部姿勢及び表情の同時事後確率分布を時刻tまでに得られた全ての顔画像を用いて推定する。これは、次式のように書ける。 In the present embodiment, the simultaneous posterior probability distribution of the head posture and facial expression at time t is estimated using all the face images obtained up to time t. This can be written as:

ここで、z_t、h_t及びe_tはそれぞれ時刻tでの顔画像、頭部姿勢及び表情、P（h_t, e_t｜z_1:t）は時刻1〜tまでの全ての顔画像z_1:tが与えられたもとでの時刻tでの頭部姿勢及び表情の同時事後確率分布、α=1／P（z_t）は確率の正規化定数であり、P（h₀）及びP（e₀）はそれぞれ頭部姿勢及び表情の事前分布である。なお、（1）式の右辺は、左辺に対してベイズ則及び条件付き独立性を用いるとともに周辺化を行うことで得られる。 Here, z _t, face image in h _t and e _t each time t, head posture and facial _{_{expression, P (h t, e t}} | z 1: t) all face image up to the time 1~t Simultaneous posterior probability distribution of head posture and facial expression at time t with z _{1: t} given, α = 1 / P (z _t ) is a probability normalization constant, P (h ₀ ) and P (E ₀ ) is a prior distribution of head posture and facial expression, respectively. Note that the right side of equation (1) is obtained by using a Bayes rule and conditional independence on the left side and performing marginalization.

頭部姿勢は、顔形状モデル中心の入力画像上での位置、カメラに正対する顔向きを基準とした頷き（ピッチ）、首振り（ヨー）、傾げ（ロール）に対応する頭部姿勢角、及び、スケールの計６つの連続変数にて、表情は各表情に振られた番号にて表現される。 The head posture is the head posture angle corresponding to the position on the input image of the center of the face shape model, the stroke (pitch), the head swing (yaw), the tilt (roll) based on the face direction facing the camera, In addition, the expression is expressed by a number assigned to each expression with six continuous variables of scale.

本実施の形態において、頭部姿勢の状態遷移モデルP（h_t｜h_t1）は、各変数がそれぞれ独立なランダムウォークモデルとする。表情の状態遷移モデルP（e_t｜e_t1）については、全ての表情間の遷移が等しい確率とする。なお、これらの状態遷移モデルは本表情認識装置１を適用する場面に応じて適切なモデルを用いて構わない。 In this embodiment, the head posture state transition model P (h _t | h _t1 ) is a random walk model in which each variable is independent. For the facial expression state transition model P (e _t | e _t1 ), the transition between all facial expressions is assumed to have the same probability. Note that these state transition models may be appropriate models depending on the scene where the facial expression recognition device 1 is applied.

頭部姿勢に関する変数は連続変数であり、かつ、遮蔽などによりその事後分布が複雑となるため、（1）式は厳密に解くことはできない。そこで、本実施の形態では、このような確率密度分布を推定する手法の一つであるパーティクルフィルタを用い、頭部姿勢及び表情を同時に推定する。以下では、パーティクルフィルタの枠組みにおいて頭部姿勢及び表情を同時に推定する方法について述べる。 Since the variables related to head posture are continuous variables, and the posterior distribution becomes complicated due to shielding, etc., equation (1) cannot be solved exactly. Therefore, in the present embodiment, a head filter and a facial expression are simultaneously estimated using a particle filter which is one of methods for estimating such a probability density distribution. In the following, a method for simultaneously estimating the head posture and facial expression in the particle filter framework will be described.

パーティクルフィルタの枠組みでは、頭部姿勢及び表情の状態の確率分布は、パーティクル（粒子）と呼ばれるサンプル集合によって近似的に表現される。各粒子は、推定対象の１状態及びその重みを保持する。本実施の形態では、各粒子の保持する状態は、頭部姿勢（連続変数）及び表情（離散変数）の状態に相当する。また、これらの状態は、テンプレート11の状態に対応している。 In the particle filter framework, the probability distribution of the head posture and the expression state is approximately expressed by a sample set called particles. Each particle holds one state to be estimated and its weight. In the present embodiment, the state held by each particle corresponds to the state of head posture (continuous variable) and facial expression (discrete variable). These states correspond to the state of the template 11.

パーティクルフィルタを用いたテンプレート11の状態推定は、図１の表情及び頭部姿勢及び表情推定部13によって図８に示すようにして行う。すなわち、頭部姿勢及び表情推定部13では、予測部131によってその時刻でのテンプレート11の状態の事後確率分布を予測し、更新部132によって観測情報に基づいてその状態の事後確率分布を更新する。そして、推定値算出部133によって得られた事後確率分布の推定値を算出する。これらの処理の説明を図８を参照して以下に述べる。なお、ここでいう観測情報とは、例えば、予測部131によって予測された、その時刻でのテンプレート11の状態の事後確率分布のことである。 The state estimation of the template 11 using the particle filter is performed as shown in FIG. 8 by the expression / head posture / expression estimation unit 13 of FIG. That is, in the head posture and facial expression estimation unit 13, the prediction unit 131 predicts the posterior probability distribution of the state of the template 11 at that time, and the update unit 132 updates the posterior probability distribution of the state based on the observation information. . Then, the estimated value of the posterior probability distribution obtained by the estimated value calculation unit 133 is calculated. These processes will be described below with reference to FIG. Note that the observation information here is, for example, a posterior probability distribution of the state of the template 11 at that time predicted by the prediction unit 131.

予測部131では、現在の時刻における頭部姿勢及び表情の状態を近似する粒子群が予測される（ステップS21）。具体的には、まず、直前の時刻t1の状態を近似している粒子群から各粒子の持つ重みに応じてサンプリングが行われる。次いで、そのサンプリングされたそれぞれの粒子について、保持している頭部姿勢及び表情の状態を状態遷移モデルに従って遷移されることで、現時刻tにおける粒子の状態を予測する。 The prediction unit 131 predicts a particle group that approximates the head posture and facial expression at the current time (step S21). Specifically, sampling is first performed according to the weight of each particle from a particle group that approximates the state at the previous time t1. Next, for each sampled particle, the state of the held head posture and expression is changed according to the state transition model, so that the state of the particle at the current time t is predicted.

続いて更新部132では、予測部131で状態が予測された各粒子について、その顔状態での画像の尤度を計算し、その尤度に比例した各粒子の重み付けが行われる（ステップS22）。 Subsequently, the updating unit 132 calculates the likelihood of the image in the face state for each particle whose state is predicted by the prediction unit 131, and weights each particle in proportion to the likelihood (step S22). .

推定値算出部133では、各時刻の頭部姿勢の推定値及び推定表情が、それぞれ、期待値及び事後確率が最大となる表情として計算される（ステップS23）。これは、次式にて算出される。 In the estimated value calculation unit 133, the estimated value and estimated facial expression of the head posture at each time are calculated as facial expressions that maximize the expected value and the posterior probability, respectively (step S23). This is calculated by the following equation.

ここで、h_t^及びh_t ^(l)は頭部姿勢の推定値及びl（英字のエル）番目の粒子が保持する頭部姿勢、e_t^は推定表情、ω_t ^(l)はl（英字のエル）番目の粒子が保持する重みをそれぞれ表す。 Here, h _t ^ and h _t ^(l) are the estimated values of the head pose, the head pose held by the l-th particle, e _t ^ is the estimated facial expression, and ω _t ^(l) is l (English letter L) represents the weight held by the particle.

以下では、更新部132での具体的な処理について、図９に従い説明する。まず、各粒子が持つ頭部姿勢状態において、各注目点が自己遮蔽されていないかどうかの判定を行う（ステップS31）。本実施の形態のように、形状モデルとして円柱のような凸体を用いる場合では、この判定は、法線ベクトルがカメラ方向を向いている点、すなわち、形状モデルの各注目点が射影される位置での法線ベクトルのカメラ方向成分が正である点を、遮蔽されていない点とみなすことができる。以下の更新部内の処理については、自己遮蔽されていない注目点についてのみを対象とする。 Hereinafter, specific processing in the updating unit 132 will be described with reference to FIG. First, it is determined whether or not each point of interest is self-shielded in the head posture state of each particle (step S31). When a convex body such as a cylinder is used as the shape model as in the present embodiment, this determination is performed by projecting each point of interest of the shape model, that is, a point where the normal vector faces the camera direction. A point where the camera direction component of the normal vector at the position is positive can be regarded as an unshielded point. The following processing in the update unit is only targeted for attention points that are not self-shielded.

続いて、各粒子が持つ頭部姿勢状態での、入力画像での位置を算出する（ステップS32）。 Subsequently, the position in the input image in the head posture state of each particle is calculated (step S32).

続いて、得られた入力画像での注目点の位置における輝度を注目点の輝度とする（ステップS33）。 Subsequently, the luminance at the position of the target point in the obtained input image is set as the luminance of the target point (step S33).

続いて、上記の方法により得られた各注目点の輝度を用い、その画像のモデルに対する尤度を算出する（ステップS34）。本実施の形態では、各注目点の輝度が各々独立であることを仮定し、各顔画像の尤度を、各注目点での輝度の輝度分布モデルに対する尤度の積として算出する。ここで、各注目点の輝度は、各々独立な正規分布に従うことを仮定する。なお、これらの正規分布は、テンプレート作成部12についての説明で述べたとおり、表情により変化する。その際、各表情における正規分布の平均及び標準偏差は、テンプレート作成部12による上述した校正処理で得た値とする。 Subsequently, the likelihood of the model of the image is calculated using the luminance of each target point obtained by the above method (step S34). In the present embodiment, it is assumed that the luminance of each attention point is independent, and the likelihood of each face image is calculated as the product of the likelihood of the luminance distribution model at each attention point with respect to the luminance distribution model. Here, it is assumed that the luminance of each attention point follows an independent normal distribution. Note that these normal distributions change according to facial expressions as described in the description of the template creation unit 12. At that time, the average and standard deviation of the normal distribution in each facial expression are values obtained by the above-described calibration processing by the template creation unit 12.

そして、頭部姿勢及び表情における顔画像の尤度を、各点での輝度を輝度分布モデルに対する尤度の積として算出する。さらに、各注目点の輝度は、輝度によらず一定の確率で外れ値であると仮定し、その外れ値である一様確率と正規分布の尤度の大きな値をとるものをその注目点の尤度とする。 Then, the likelihood of the face image in the head posture and facial expression is calculated as the product of the luminance at each point with respect to the luminance distribution model. Furthermore, the brightness of each attention point is assumed to be an outlier with a constant probability regardless of the brightness, and the one that has a uniform probability that is an outlier and a value with a large likelihood of the normal distribution is the value of that attention point. Likelihood.

以上の頭部姿勢及び表情における顔画像の尤度の計算は、次式にて算出される。 The calculation of the likelihood of the face image in the above head posture and facial expression is calculated by the following equation.

ここで、P（z_t｜h_t, e_t）は、時刻tでの頭部姿勢h_tかつ表情e_tにおける画像z_tの尤度、P（z_{i, t}｜h_t, e_t）は、時刻tでのその頭部姿勢及び表情における注目点iの輝度z_{i, t}の尤度、Pは全注目点の集合のうち自己遮蔽されていない注目点の番号の集合、μ_i（e_t）及びσ_i（e_t）は注目点iの表情e_tにおける輝度分布モデルの平均及び標準偏差である。また、P_outlierは、その注目点での輝度が外れ値である確率であり、輝度値によらず一定の値であるとする。具体的には、（6）式においてz_{i, t}=μ_i（e_t）±3σ_i（e_t）であるときのP_Niの値をP_outlierとする。 Here, P (z _t | h _t , e _t ) is the likelihood of the image z _t in the head posture h _t and the expression e _t at time t, and P (z _{i, t} | h _t , e _t ) Is the likelihood of the luminance z _{i, t} of the point of interest i in the head posture and expression at time t, P is the set of points of interest that are not self-occluded among the set of all points of interest, μ _i ( e _t ) and σ _i (e _t ) are the mean and standard deviation of the luminance distribution model in the facial expression e _t of the point of interest i. Further, P _outlier is a probability that the luminance at the attention point is an _outlier , and is assumed to be a constant value regardless of the luminance value. Specifically, the value of P _Ni when z _{i, t} = μ _i (e _t ) ± 3σ _i (e _t ) in equation (6) is P _outlier .

画像の尤度の定義としては、その他にも、各注目点について、計算された位置における輝度のモデルに対するマハラノビス距離を引数とするロバスト関数値の、全注目点についての和の平均の逆数などが考えられる。 In addition to the definition of the likelihood of the image, for each attention point, there is the inverse of the average of the sum of all the attention points of the robust function value with the Mahalanobis distance as an argument for the model of luminance at the calculated position. Conceivable.

この場合の尤度の算出式は、次式にて算出される。 The likelihood calculation formula in this case is calculated by the following formula.

ここで、N_Pは自己遮蔽されていない注目点の数、ρ（・）はロバスト関数、cは定数（例えばc=1）を表す。 Here, N _P is the number of target points that are not self-shielding, [rho (·) is robust function, c is representative of the constants (e.g. c = 1).

最後に、各粒子の重みは、前述の方法により算出した各粒子の持つ頭部姿勢及び表情に対する顔画像の尤度に比例し、かつ、それらの重みの総和が1となるものとして算出される（ステップS35）。 Finally, the weight of each particle is calculated as being proportional to the likelihood of the face image with respect to the head posture and facial expression of each particle calculated by the method described above, and the sum of these weights is 1. (Step S35).

以下では、図９における各注目点の画像座標の算出処理（ステップS32）を、図１０及び図１１に従い説明する。まず、各注目点の画像座標を以下の方法により算出する。形状モデル125（図２参照）の３次元座標である形状モデル座標系201を、その中心及び鉛直軸が注目点抽出に用いた顔画像124の画像平面上の顔中心及び画像の鉛直軸に一致するよう定義する。そして、顔画像124上の注目点9a等の各注目点を、形状モデル座標系201へ投影することで、注目点9aに対応する注目点9b等の各注目点の形状モデル座標を得る（ステップS41）。 In the following, the image coordinate calculation process (step S32) of each target point in FIG. 9 will be described with reference to FIGS. First, the image coordinates of each attention point are calculated by the following method. The shape model coordinate system 201, which is the three-dimensional coordinate of the shape model 125 (see FIG. 2), has a center and a vertical axis that coincide with the face center on the image plane of the face image 124 used for point of interest extraction and the vertical axis of the image. To be defined. Then, by projecting each attention point such as the attention point 9a on the face image 124 to the shape model coordinate system 201, the shape model coordinates of each attention point such as the attention point 9b corresponding to the attention point 9a are obtained (step S41).

次いで、形状モデル125を各粒子の持つ頭部姿勢状態に従って２次元並進、３次元回転及びスケーリングを施すことで変形した形状モデル125a上で、注目点9bに対応する注目点9c等を含む各注目点の画像座標を得る（ステップS42）。 Next, each attention including the attention point 9c corresponding to the attention point 9b on the shape model 125a deformed by performing two-dimensional translation, three-dimensional rotation and scaling according to the head posture state of each particle of the shape model 125 The image coordinates of the point are obtained (step S42).

そして、上記処理後の形状モデル125a上の注目点9c等の各注目点を入力顔画像124と同平面上の入力画像124aへ投影する（ステップS43）。ここで、注目点9c（すなわち注目点9b及び9aに対応する注目点）に対応する注目点9d等の各注目点の画像を算出する。以上の処理によって図９のステップS32における各注目点の画像座標の算出が行われる。 Then, each attention point such as the attention point 9c on the processed shape model 125a is projected onto the input image 124a on the same plane as the input face image 124 (step S43). Here, an image of each attention point such as the attention point 9d corresponding to the attention point 9c (that is, the attention point corresponding to the attention points 9b and 9a) is calculated. With the above processing, the image coordinates of each target point in step S32 in FIG. 9 are calculated.

なお、本実施の形態では、これらの投影において弱中心投影を仮定するが、任意の投影が利用可能である。 In this embodiment, a weak center projection is assumed in these projections, but any projection can be used.

本実施の形態によれば、以下の従来にない顕著な効果が期待できる。 According to the present embodiment, the following remarkable effects can be expected.

（１）個人間及び照明環境による変動が大きい、顔形状のモデル、及び、表情に対する顔の輝度変化のモデルを、その場での簡単な作業により作成することができる。 (1) It is possible to create a face shape model and a face brightness change model with respect to facial expressions that vary greatly between individuals and depending on the lighting environment by simple operations on the spot.

（２）広範囲にわたり頭部姿勢を変化させるユーザの表情を、実時間かつ高精度に認識することができる。 (2) The user's facial expression that changes the head posture over a wide range can be recognized in real time and with high accuracy.

なお、本発明の実施の形態は、上記のものに限定されず、たとえば図１に示す各ブロックを、統合したり、分割したり、あるいはさらに通信回線などを介して分散して配置することなどが可能である。また、本発明の実施の形態は、コンピュータとコンピュータによって実行されるプログラムとによって構成することができる。さらに、そのプログラムの全部あるいは、たとえば図１のテンプレート作成部12に係る部分を構成するなどの一部を構成するプログラムは、通信回線あるいはコンピュータ読み取り可能な記録媒体を介して頒布することが可能である。 The embodiment of the present invention is not limited to the above. For example, the blocks shown in FIG. 1 are integrated, divided, or further distributed and arranged via a communication line or the like. Is possible. The embodiment of the present invention can be configured by a computer and a program executed by the computer. Further, the entire program or a program constituting a part of the program, for example, a part related to the template creation unit 12 in FIG. 1, can be distributed through a communication line or a computer-readable recording medium. is there.

また、本発明にる表情認識装置は、上記の実施の形態によって、あるいはその一部を変形することによって、次のような特徴を有するものとしてとらえること、あるいは変形することが可能である。 Moreover, the facial expression recognition apparatus according to the present invention can be considered as having the following characteristics or can be modified by the above embodiment or by modifying a part thereof.

（１）推定を開始したフレームから現在の時刻までに得られた全ての顔画像を用いて、現在の時刻における頭部姿勢状態及び表情状態の同時事後確率分布の推定を行う表情認識装置。（２）表情変化に伴う顔画像中の注目点の輝度変化のテンプレートを事前に用意しておき、それを動画像中の各時刻のユーザの表情に応じて変化させることで、各時刻での注目点の輝度からその顔画像中の人物の表情及び頭部姿勢を同時に推定する、請求項1に記載の表情認識装置。（３）表情変化に加え、一般的な照明環境下で生じる頭部姿勢の変化による輝度分布の変化のモデルを各表情毎に用意しておき、それをユーザの頭部姿勢及び表情に応じて変化させることで、各時刻での注目点の輝度からその顔画像中の人物の表情及び頭部姿勢を同時に推定する表情認識装置。 (1) A facial expression recognition apparatus that estimates the simultaneous posterior probability distribution of the head posture state and the facial expression state at the current time using all face images obtained from the frame where the estimation is started to the current time. (2) A template for changing the luminance of the point of interest in the facial image accompanying the facial expression change is prepared in advance, and the template is changed according to the facial expression of the user at each time in the moving image. 2. The facial expression recognition device according to claim 1, wherein the facial expression and head posture of the person in the face image are estimated simultaneously from the luminance of the attention point. (3) In addition to changes in facial expression, a model of luminance distribution change due to a change in head posture that occurs in a general lighting environment is prepared for each facial expression, and this is changed according to the user's head posture and facial expression. A facial expression recognition device that simultaneously estimates the facial expression and head posture of a person in the face image from the luminance of the point of interest at each time by changing.

（４）変動量が大きく、かつ、大きな照明変動を引き起こす、首振り運動についてのみ、輝度分布のモデルを準備する表情認識装置。（５）テンプレートを輝度の値ではなく、分布として撮像系ノイズやテンプレート形状モデルの実際の顔形状に対する誤差など複合的な要因により生じる輝度の分布として用意する表情認識装置。（６）テンプレート形状モデルとして、平面モデル、円柱モデルや楕円体モデルなどの、作成が容易な媒介変数表示可能な形状を用いた表情認識装置。（７）テンプレート形状モデルとして、レーザスキャナやステレオ視などにより計測した実際の３次元形状を用いた表情認識装置。 (4) A facial expression recognition apparatus that prepares a luminance distribution model only for a swing motion that causes a large variation in illumination and a large amount of illumination. (5) A facial expression recognition device that prepares a template as a distribution of luminance caused by complex factors such as imaging system noise and an error with respect to an actual face shape of a template shape model as a distribution instead of a luminance value. (6) A facial expression recognition device using a shape that can be easily created as a template shape model, such as a planar model, a cylindrical model, or an ellipsoid model, that can be easily created. (7) A facial expression recognition apparatus using an actual three-dimensional shape measured by a laser scanner or stereo vision as a template shape model.

（８）顔領域中において離散的に定義した点のみをテンプレートとして用いる表情認識装置。（９）顔領域内のエッジ近傍に配置した離散的な注目点の集合を使用する表情認識装置。（１０）エッジを跨ぐ点対の集合を離散的な点集合として定義する表情認識装置。 (8) A facial expression recognition apparatus that uses only points defined discretely in a face area as a template. (9) A facial expression recognition apparatus that uses a set of discrete attention points arranged in the vicinity of an edge in a face region. (10) A facial expression recognition device that defines a set of point pairs straddling edges as a discrete point set.

（１１）各時刻におけるある頭部姿勢及び表情におけるその時刻の画像の尤度を、計算された各注目点の位置における輝度の正規分布にて表現されたモデル（輝度分布モデル）に対する尤度の積として定義する表情認識装置。（１２）頭部姿勢及び表情における顔画像の尤度を、計算された各注目点の位置における輝度のモデルに対するマハラノビス距離を引数とするロバスト関数値の全注目点の和の逆数と定義する表情認識装置。 (11) The likelihood of the image at that time in a certain head posture and facial expression at each time is expressed as a likelihood of the model (luminance distribution model) expressed by the normal distribution of the luminance at the position of each calculated point of interest. A facial expression recognition device defined as a product. (12) A facial expression that defines the likelihood of a face image in the head posture and facial expression as the reciprocal of the sum of all the points of interest of the robust function value using the Mahalanobis distance as an argument for the model of luminance at the position of each point of interest. Recognition device.

（１３）各時刻における頭部姿勢の推定値として、事後確率密度分布の期待値、最頻値(モード)、あるいは、中央値(メディアン)などを用いる表情認識装置。（１４）算出された事後確率が最大となる表情を推定表情とする表情認識装置。（１５）各時刻における頭部姿勢状態及び表情状態の同時事後確率分布の推定を、パーティクルフィルタの枠組みにて行う表情認識装置。 (13) A facial expression recognition apparatus that uses an expected value, a mode value (mode), or a median value (median) of the posterior probability density distribution as an estimated value of the head posture at each time. (14) A facial expression recognition apparatus that uses the facial expression having the maximum calculated posterior probability as an estimated facial expression. (15) A facial expression recognition device that estimates a simultaneous posterior probability distribution of a head posture state and a facial expression state at each time using a particle filter framework.

本発明の表情認識装置の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of embodiment of the facial expression recognition apparatus of this invention. 図１のテンプレート作成部12による処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process by the template preparation part 12 of FIG. 図２の輝度分布モデル128を説明するための概念図である。It is a conceptual diagram for demonstrating the luminance distribution model 128 of FIG. 図１の注目点抽出部122による処理を説明するためのフローチャートである。3 is a flowchart for explaining processing by a point of interest extraction unit 122 in FIG. 1. 図１の注目点抽出部122による処理を説明するための注目点の配置例を示す図である。FIG. 7 is a diagram illustrating an example of attention point arrangement for explaining processing by the attention point extraction unit 122 of FIG. 1. 図１の注目点抽出部122による処理の変形例を説明するための説明図である。FIG. 10 is an explanatory diagram for explaining a modification of processing by the attention point extraction unit 122 of FIG. 1. 図１の頭部姿勢及び表情推定部13による処理を説明するための図であって、頭部姿勢、表情及び顔画像からなる動的ベイジアンネットワークを示す図である。It is a figure for demonstrating the process by the head posture and facial expression estimation part 13 of FIG. 1, Comprising: It is a figure which shows the dynamic Bayesian network which consists of a head posture, a facial expression, and a face image. 図１の頭部姿勢及び表情推定部13による処理を説明するためフローチャートである。3 is a flowchart for explaining processing by a head posture and expression estimation unit 13 in FIG. 1. 図１の更新部132による処理を説明するためフローチャートである。3 is a flowchart for explaining processing by an updating unit 132 in FIG. 1. 図９のステップS32の処理を説明するためフローチャートである。FIG. 10 is a flowchart for explaining a process of step S32 of FIG. 9. FIG. 図１０の処理を説明するため概念図である。It is a conceptual diagram for demonstrating the process of FIG.

Explanation of symbols

1 表情認識装置
11 テンプレート
12 テンプレート作成部
13 頭部姿勢及び表情推定部
9、9-1、9-2、9a〜9d、61a、62a、61b、62b 注目点
10、10-1 点対（注目点対）
125 形状モデル
128 輝度分布モデル 1 Facial expression recognition device
11 templates
12 Template creation part
13 Head posture and facial expression estimation unit
9, 9-1, 9-2, 9a-9d, 61a, 62a, 61b, 62b
10, 10-1 point pair (point of interest pair)
125 shape model
128 luminance distribution model

Claims

A shape model creation unit for creating a shape model that approximates the shape of the face based on the input face image;
On the shape model created by the shape model creation unit, by extracting a point of interest based on the edge of each facial part, a point of interest extraction unit that extracts a point of interest consisting of a plurality of points of interest;
A template creation apparatus comprising: a brightness distribution model creation unit that creates a template representing a brightness change model of a group of attention points extracted by a point of interest extraction unit for a plurality of facial expressions.

2. The template creation according to claim 1, wherein the attention point extraction unit extracts a attention point on a shape model created based on a front image of a face input by the shape model creation unit. apparatus.

The brightness distribution model creation unit creates a template representing a brightness change model of a point of interest group extracted on a shape model created based on a front image of the face. 2. The template creation device according to 2.

A shape model creation unit for creating a shape model that approximates the shape of the face based on the input face image;
A point-of-interest extraction unit that extracts a point-of-interest group consisting of a plurality of points of interest that satisfy a predetermined condition on the shape model created by the shape model creation unit;
A template creating unit having a brightness distribution model creating unit that creates a template representing a brightness change model of a group of points of interest extracted by the point of interest extraction unit for a plurality of facial expressions;
A prediction unit that predicts a head posture and a state distribution of facial expressions based on the input face image;
An update unit that evaluates the predicted state distribution using a template created by the template creation unit;
A facial expression recognition apparatus comprising: an facial expression and head posture estimation unit having an estimated value calculation unit that calculates an estimated value of a head posture and facial expression from the evaluated state distribution.

A shape model creation process for creating a shape model that approximates the shape of the face based on the input face image;
On the shape model created in the shape model creation process, by extracting the point of interest based on the edge of each face part, the point of interest extraction process of extracting a point of interest consisting of a plurality of points of interest;
A template creation method, comprising: a brightness distribution model creating process for creating a template representing a brightness change model of a group of points of interest extracted in the point of interest extraction process for a plurality of facial expressions.

A shape model creation process for creating a shape model that approximates the shape of the face based on the input face image;
A point of interest extraction process that extracts a point of interest consisting of a plurality of points of interest that satisfy a predetermined condition on the shape model created in the shape model creation process;
A luminance distribution model creation process for creating a template representing a luminance change model of the target point group extracted in the target point extraction process for a plurality of facial expressions;
A prediction process for predicting the head posture and facial expression state distribution based on the input face image;
An update process for evaluating the predicted state distribution using the template created in the template creation process,
An estimated value calculating process for calculating an estimated value of the head posture and expression from the evaluated state distribution.

The apparatus of any one of Claims 1-4 is comprised using a computer, Comprising: The program run in the computer.

The apparatus according to any one of claims 1 to 4, wherein the apparatus is configured using a computer, and is a computer-readable recording medium on which a program executed on the computer is recorded.