JP2014119879A

JP2014119879A - Face expression evaluation result smoothing device and face expression evaluation result smoothing program

Info

Publication number: JP2014119879A
Application number: JP2012273587A
Authority: JP
Inventors: Makoto Okuda; 誠奥田
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2012-12-14
Filing date: 2012-12-14
Publication date: 2014-06-30

Abstract

PROBLEM TO BE SOLVED: To stably obtain a face expression evaluation result from video data including a series of face images.SOLUTION: A smoothing processing part 80 as a face expression evaluation result smoothing device includes: a face expression intensity value acquisition part 81 for fetching a plurality of face expression intensity values obtained for each face expression type on the basis of a face image for each frame; and a face expression intensity value smoothing processing part 82 for, by referring to the face expression intensity values for a plurality of frames fetched by the face expression intensity value acquisition part 81, calculating a representative face expression intensity value corresponding to the plurality of frames on the basis of a total value based on the face expression intensity values of the plurality of frames of each face expression type.

Description

本発明は、顔表情評価結果平滑化装置および顔表情評価結果平滑化プログラムに関する。 The present invention relates to a facial expression evaluation result smoothing apparatus and a facial expression evaluation result smoothing program.

人物の顔画像が含まれる画像データを解析して、顔画像における顔表情を６種類（Ａｎｇｅｒ；怒り、Ｄｉｓｇｕｓｔ；嫌悪、Ｆｅａｒ；恐れ、Ｈａｐｐｉｎｅｓｓ；幸せ、Ｓａｄｎｅｓｓ；悲しみ、Ｓｕｒｐｒｉｓｅ；驚き）に分類し、その顔表情の強度を計算する技術が知られている（例えば、非特許文献１参照）。非特許文献１に記載された技術を適用した情報処理装置は、顔表情が異なる複数の顔画像について、顔表情の強度の順序関係が整合するよう強度を求めるとともに、各顔画像における顔表情を上記６種類の顔表情に分類ことができる。 Analyzing image data including human face images, classify facial expressions in face images into 6 types (Anger; anger, Disgust; aversion, Fear; fear, Happiness; happiness, Sadness; sadness, Surprise). A technique for calculating the intensity of the facial expression is known (for example, see Non-Patent Document 1). The information processing apparatus to which the technology described in Non-Patent Document 1 is applied obtains the strength so that the order relation of the facial expression intensities matches for a plurality of facial images with different facial expressions, and the facial expression in each facial image is obtained. The six types of facial expressions can be classified.

Peng Yang, Qingshan Liu, Dimitris N. Metaxas, "RankBoost with l1 regularization for Facial Expression Recognition and Intensity Estimation", IEEE International Conference on Computer Vision (ICCV), pp. 1018-1025, 2009Peng Yang, Qingshan Liu, Dimitris N. Metaxas, "RankBoost with l1 regularization for Facial Expression Recognition and Intensity Estimation", IEEE International Conference on Computer Vision (ICCV), pp. 1018-1025, 2009

しかしながら、上記の情報処理装置に映像データを供給し、この映像データにおける各顔画像の評価処理を実行させた場合に、一連の顔画像それぞれにおける顔表情の中に、周囲と異なる種類の顔表情が突発的に現出することがある。フレーム単位で顔表情が変化することは通常起こり難い現象であり、この突発的に現出した顔表情はエラーである可能性が高い。 However, when video data is supplied to the above information processing apparatus and evaluation processing of each facial image in the video data is executed, the facial expressions in the series of facial images are different from the surrounding facial expressions. May appear suddenly. Changing facial expressions on a frame-by-frame basis is a phenomenon that is unlikely to occur normally, and this suddenly appearing facial expression is likely to be an error.

本発明は、上記の問題を解決するためになされたものであり、一連の顔画像を含む映像データから顔表情評価結果を安定して得る、顔表情評価結果平滑化装置および顔表情評価結果平滑化プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problem. A facial expression evaluation result smoothing apparatus and a facial expression evaluation result smoothing apparatus that stably obtain facial expression evaluation results from video data including a series of facial images. The purpose is to provide a computerized program.

［１］上記の課題を解決するため、本発明の一態様である顔表情評価結果平滑化装置は、顔画像に基づき顔表情種別ごとに得られた複数の顔表情強度値を、フレームごとに取り込む顔表情強度値取得部と、前記顔表情強度値取得部が取り込んだ複数フレーム分の顔表情強度値を参照し、前記顔表情種別ごとの前記複数フレームの顔表情強度値に基づく合計値に基づいて、前記複数フレームに対応する代表顔表情強度値を計算する顔表情強度値平滑化処理部と、を備えることを特徴とする。 [1] In order to solve the above-described problem, a facial expression evaluation result smoothing apparatus according to an aspect of the present invention provides, for each frame, a plurality of facial expression intensity values obtained for each facial expression type based on a facial image. The facial expression intensity value acquisition unit to be captured and the facial expression intensity values for a plurality of frames captured by the facial expression intensity value acquisition unit are referred to, and a total value based on the facial expression intensity values of the plurality of frames for each facial expression type is obtained. And a facial expression intensity value smoothing processing unit that calculates representative facial expression intensity values corresponding to the plurality of frames.

［２］上記［１］記載の顔表情評価結果平滑化装置において、前記合計値は、前記顔表情種別ごとの、前記複数フレーム分の顔表情強度値の合計値である、ことを特徴とする。
［３］上記［１］記載の顔表情評価結果平滑化装置において、前記合計値は、前記複数フレームのそれぞれにおける最大の顔表情強度値の個数を顔表情種別ごとに計数して得た合計値である、ことを特徴とする。
［４］上記［２］または［３］いずれか記載の顔表情評価結果平滑化装置において、前記顔表情強度値平滑化処理部は、前記複数フレーム内における各フレームの位置に応じた重み付けをして前記合計値を計算する、ことを特徴とする。
［５］上記［１］〜［４］いずれか一項記載の顔表情評価結果平滑化装置において、前記顔表情強度値平滑化処理部が求めた、前記顔表情種別ごとの合計値のうち最大の合計値に対応する顔表情種別を、前記複数フレームに対応する顔表情の分類結果として選出する顔表情種別平滑化処理部、をさらに備えることを特徴とする。
［６］上記［１］〜［５］いずれか一項記載の顔表情評価結果平滑化装置において、前記顔表情強度値平滑化処理部は、前記複数フレームよりも少ないフレーム数をシフト量とし、前記複数フレームを前記シフト量だけ時間方向にずらす、ことを特徴とする。 [2] The facial expression evaluation result smoothing apparatus according to [1], wherein the total value is a total value of facial expression intensity values for the plurality of frames for each facial expression type. .
[3] In the facial expression evaluation result smoothing apparatus according to [1], the total value is a total value obtained by counting the number of maximum facial expression intensity values in each of the plurality of frames for each facial expression type. It is characterized by being.
[4] In the facial expression evaluation result smoothing device according to any one of [2] or [3], the facial expression intensity value smoothing processing unit performs weighting according to the position of each frame in the plurality of frames. And calculating the total value.
[5] The facial expression evaluation result smoothing device according to any one of [1] to [4], wherein the facial expression intensity value smoothing processing unit obtains the maximum value among the total values for each facial expression type. A facial expression type smoothing processing unit that selects a facial expression type corresponding to the total value as a classification result of facial expressions corresponding to the plurality of frames.
[6] In the facial expression evaluation result smoothing device according to any one of [1] to [5], the facial expression intensity value smoothing processing unit uses a number of frames smaller than the plurality of frames as a shift amount, The plurality of frames are shifted in the time direction by the shift amount.

［７］上記の課題を解決するため、本発明の一態様である顔表情評価結果平滑化プログラムは、コンピュータを、顔画像に基づき顔表情種別ごとに得られた複数の顔表情強度値を、フレームごとに取り込む顔表情強度値取得部と、前記顔表情強度値取得部が取り込んだ複数フレーム分の顔表情強度値を参照し、前記顔表情種別ごとの前記複数フレームの顔表情強度値に基づく合計値に基づいて、前記複数フレームに対応する代表顔表情強度値を計算する顔表情強度値平滑化処理部と、として機能させる。 [7] In order to solve the above problems, a facial expression evaluation result smoothing program according to an aspect of the present invention uses a computer to calculate a plurality of facial expression intensity values obtained for each facial expression type based on a facial image, Based on the facial expression intensity value of the plurality of frames for each facial expression type, referring to the facial expression intensity value acquisition unit captured for each frame and the facial expression intensity values for a plurality of frames captured by the facial expression intensity value acquisition unit The facial expression strength value smoothing processing unit calculates a representative facial expression strength value corresponding to the plurality of frames based on the total value.

本発明によれば、一連の顔画像を含む映像データから顔表情評価結果を安定して得ることができる。 According to the present invention, a facial expression evaluation result can be stably obtained from video data including a series of facial images.

本発明の第１実施形態である顔表情評価結果平滑化装置を適用した顔表情解析装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the facial expression analysis apparatus to which the facial expression evaluation result smoothing apparatus which is 1st Embodiment of this invention is applied. 同実施形態において、平滑化処理部の機能構成を示すブロック図である。FIG. 3 is a block diagram showing a functional configuration of a smoothing processing unit in the same embodiment. 同実施形態において、顔表情解析装置が学習モードに設定されて学習処理を実行する際に用いる、顔画像データベースのデータ構造の一部分を概念的に示す図である。In the embodiment, it is a figure which shows notionally a part of data structure of the face image database used when a facial expression analyzer is set to learning mode and performs a learning process. 同実施形態において、顔表情解析装置が学習モードに設定されて学習処理を実行する際に用いる顔表情強度教師値を、顔画像データに対応付けて示す図である。In the same embodiment, it is a figure which shows the facial expression intensity | strength teacher value used when a facial expression analysis apparatus is set to learning mode and performs a learning process by matching with face image data. 同実施形態において、画像データと、この画像データから抽出された顔領域データと、この顔領域データを正規化して得られた正規化顔領域データとを模式的に示す図である。FIG. 4 is a diagram schematically showing image data, face area data extracted from the image data, and normalized face area data obtained by normalizing the face area data in the embodiment. 同実施形態において、解析領域決定部が正規化顔領域データから決定した解析領域を、視覚的に分かり易く線描画した図である。In the embodiment, the analysis area determined by the analysis area determination unit from the normalized face area data is a diagram in which lines are drawn in an easy-to-understand manner. 同実施形態において、画像特徴量計算部によって生成された、上部解析領域における特徴量のヒストグラムと、下部解析領域における特徴量のヒストグラムと、これら二つのヒストグラムが連結された、解析領域全体における特徴量のヒストグラムとを模式的に示した図である。In the embodiment, the feature amount histogram generated by the image feature amount calculation unit in the upper analysis region, the feature amount histogram in the lower analysis region, and the feature amount in the entire analysis region obtained by connecting these two histograms. It is the figure which showed typically the histogram. 同実施形態において、回帰分析部が実行する回帰分析処理における一つの回帰モデルを模式的に示した図である。In the embodiment, it is the figure which showed typically one regression model in the regression analysis process which a regression analysis part performs. 同実施形態において、顔表情解析装置が実行する学習処理の手順を示すフローチャートである。4 is a flowchart illustrating a procedure of learning processing executed by the facial expression analysis apparatus in the embodiment. 同実施形態において、顔表情解析装置が実行する１フレーム分の顔表情評価処理の手順を示すフローチャートである。5 is a flowchart illustrating a facial expression evaluation process for one frame executed by the facial expression analysis apparatus in the embodiment. 同実施形態において、顔表情解析モードに設定されている顔表情解析装置が評価映像データを取り込んで顔表情評価処理を繰り返し実行することによって顔表情評価部で得られる、一連の顔表情強度値セットの一例を示す図である。In the embodiment, a series of facial expression intensity value sets obtained by the facial expression evaluation unit by the facial expression analysis apparatus set in the facial expression analysis mode fetching the evaluation video data and repeatedly executing the facial expression evaluation process It is a figure which shows an example. 同実施形態において、平滑化処理部が実行する一区間分の顔表情評価結果平滑化処理の手順を示すフローチャートである。In the same embodiment, it is a flowchart which shows the procedure of the facial expression evaluation result smoothing process for one area which a smoothing process part performs. 同実施形態において、平滑化処理部が顔表情評価結果平滑化処理を行う前後それぞれの顔表情評価結果を模式的に示す図である。In the same embodiment, it is a figure which shows typically each facial expression evaluation result before and after a smoothing process part performs a facial expression evaluation result smoothing process. 本発明の第４実施形態において、平滑化処理部における区間の移動を説明するための図である。In 4th Embodiment of this invention, it is a figure for demonstrating the movement of the area in a smoothing process part.

以下、本発明を実施するための形態について、図面を参照して詳細に説明する。
［第１の実施の形態］
本発明の第１実施形態である顔表情評価結果平滑化装置を適用した顔表情解析装置は、学習処理および顔表情解析処理を、切替制御により切り替えて実行する。学習処理には、事前処理が含まれる。また、顔表情解析処理は、顔表情評価処理および顔表情評価結果平滑化処理を含む。学習処理では、顔表情解析装置は、顔表情種別ごとに、顔表情の度合（強度）がそれぞれ異なる顔画像データ列における各顔画像データの顔領域の画像特徴量と、評価者の主観評価による顔表情の度合を示す顔表情強度教師値との対応関係を回帰分析することにより、顔表情種別ごとの回帰モデルにおけるパラメータ値を求める。また、１フレームあたりの顔表情評価処理では、顔表情解析装置は、顔表情種別ごとに学習した回帰モデルに、評価用の顔画像データ（評価顔画像データ）における顔領域の画像特徴量を適用することによって、顔表情種別ごとに顔表情強度値を計算して顔表情強度値セットを生成する。ここで、顔表情強度値セットとは、１フレーム分の、複数の顔表情種別それぞれに対応する顔表情強度値のまとまりである。また、複数フレーム分の顔表情強度値セットを含む一区間あたりの顔表情評価結果平滑化処理では、顔表情解析装置は、それら複数フレーム分の顔表情強度値セットに基づいて、顔表情強度値および顔表情種別を平滑化する。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
[First Embodiment]
The facial expression analysis apparatus to which the facial expression evaluation result smoothing apparatus according to the first embodiment of the present invention is applied switches between learning processing and facial expression analysis processing by switching control. The learning process includes a pre-process. The facial expression analysis process includes a facial expression evaluation process and a facial expression evaluation result smoothing process. In the learning process, the facial expression analysis device uses the image feature amount of the facial region of each facial image data in the facial image data sequence having different degrees (intensities) of facial expressions for each facial expression type, and subjective evaluation of the evaluator. By performing a regression analysis of the correspondence with the facial expression intensity teacher value indicating the degree of facial expression, the parameter value in the regression model for each facial expression type is obtained. In the facial expression evaluation process per frame, the facial expression analysis apparatus applies the image feature amount of the facial region in the evaluation facial image data (evaluation facial image data) to the regression model learned for each facial expression type. By doing this, the facial expression strength value is calculated for each facial expression type to generate a facial expression strength value set. Here, the facial expression strength value set is a set of facial expression strength values corresponding to each of a plurality of facial expression types for one frame. In addition, in the facial expression evaluation result smoothing process for each section including the facial expression intensity value sets for a plurality of frames, the facial expression analysis apparatus performs facial expression intensity values based on the facial expression intensity value sets for the plurality of frames. And smoothing facial expression types.

図１は、本発明の第１実施形態である顔表情評価結果平滑化装置を適用した顔表情解析装置の機能構成を示すブロック図である。同図に示すように、顔表情解析装置１は、画像データ取得部１０と、顔表情強度教師値取得部２０と、顔領域抽出部３０と、画像特徴量計算部４０と、回帰分析部５０と、回帰モデル記憶部６０と、顔表情評価部７０と、平滑化処理部（顔表情評価結果平滑化装置）８０と、モード切替部９０とを備える。 FIG. 1 is a block diagram showing a functional configuration of a facial expression analysis apparatus to which the facial expression evaluation result smoothing apparatus according to the first embodiment of the present invention is applied. As shown in the figure, the facial expression analysis apparatus 1 includes an image data acquisition unit 10, a facial expression intensity teacher value acquisition unit 20, a face region extraction unit 30, an image feature quantity calculation unit 40, and a regression analysis unit 50. A regression model storage unit 60, a facial expression evaluation unit 70, a smoothing processing unit (facial expression evaluation result smoothing device) 80, and a mode switching unit 90.

モード切替部９０は、例えば、顔表情解析装置１が制御プログラムを実行することにより実現される切替制御によって、顔表情解析装置１を学習モードと顔表情解析モードとの間で切り替える。学習モードは、顔表情解析装置１が事前処理および学習処理を実行する動作モードである。また、顔表情解析モードは、顔表情解析装置１が顔表情解析処理を実行する動作モードである。なお、モード切替部９０は、例えば、操作者による顔表情解析装置１の切替操作にしたがって、学習モードと顔表情解析モードとを切替えてもよい。 The mode switching unit 90 switches the facial expression analysis apparatus 1 between the learning mode and the facial expression analysis mode by switching control realized by the facial expression analysis apparatus 1 executing a control program, for example. The learning mode is an operation mode in which the facial expression analysis apparatus 1 executes pre-processing and learning processing. The facial expression analysis mode is an operation mode in which the facial expression analysis apparatus 1 executes facial expression analysis processing. Note that the mode switching unit 90 may switch between the learning mode and the facial expression analysis mode, for example, according to the switching operation of the facial expression analysis apparatus 1 by the operator.

モード切替部９０によって顔表情解析装置１を学習モードに設定している場合、顔表情解析装置１は、画像データ取得部１０と、顔表情強度教師値取得部２０と、顔領域抽出部３０と、画像特徴量計算部４０と、回帰分析部５０と、回帰モデル記憶部６０とを機能させる。また、モード切替部９０によって顔表情解析装置１を顔表情解析モードに設定している場合、顔表情解析装置１は、画像データ取得部１０と、顔領域抽出部３０と、画像特徴量計算部４０と、回帰モデル記憶部６０と、顔表情評価部７０と、平滑化処理部８０とを機能させる。 When the facial expression analysis device 1 is set to the learning mode by the mode switching unit 90, the facial expression analysis device 1 includes the image data acquisition unit 10, the facial expression strength teacher value acquisition unit 20, and the facial region extraction unit 30. The image feature amount calculation unit 40, the regression analysis unit 50, and the regression model storage unit 60 are caused to function. Further, when the facial expression analysis device 1 is set to the facial expression analysis mode by the mode switching unit 90, the facial expression analysis device 1 includes the image data acquisition unit 10, the face region extraction unit 30, and the image feature amount calculation unit. 40, the regression model storage unit 60, the facial expression evaluation unit 70, and the smoothing processing unit 80 are caused to function.

画像データ取得部１０は、図示しない外部装置が供給する画像データを取り込む。具体的に、顔表情解析装置１が学習モードに設定されている場合、画像データ取得部１０は、例えば、顔画像データベースから複数の顔画像データを取り込む。顔画像データベースは、例えば、顔表情の種類別に、複数人の顔表情の度合がそれぞれ異なる顔画像データ列の集合を格納したデータベースである。また、顔表情解析装置１が顔表情解析モードに設定されている場合、画像データ取得部１０は、例えば、映像撮影装置または映像記録装置が供給する、顔表情解析のための評価映像データを取り込む。この評価映像データは、時系列のフレームに対応する複数の評価顔画像データを含んでいる。 The image data acquisition unit 10 takes in image data supplied from an external device (not shown). Specifically, when the facial expression analysis apparatus 1 is set to the learning mode, the image data acquisition unit 10 captures a plurality of face image data from a face image database, for example. The face image database is, for example, a database that stores a set of face image data sequences with different degrees of facial expressions of a plurality of people for each type of facial expression. When the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the image data acquisition unit 10 captures evaluation video data for facial expression analysis supplied by, for example, a video photographing apparatus or a video recording apparatus. . The evaluation video data includes a plurality of evaluation face image data corresponding to time-series frames.

顔表情解析装置１が学習モードに設定されている場合、画像データ取得部１０は、取り込んだ顔画像データを顔領域抽出部３０に供給する。また、顔表情解析装置１が顔表情解析モードに設定されている場合、画像データ取得部１０は、取り込んだ評価映像データから、順次またはあらかじめ決定された所定フレーム数おきに、顔画像データを顔領域抽出部３０に供給する。 When the facial expression analysis apparatus 1 is set to the learning mode, the image data acquisition unit 10 supplies the captured face image data to the face area extraction unit 30. Further, when the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the image data acquisition unit 10 converts the facial image data from the captured evaluation video data sequentially or every predetermined number of frames. This is supplied to the region extraction unit 30.

顔表情解析装置１が学習モードに設定されている場合、顔表情強度教師値取得部２０は、図示しない外部装置が供給する顔表情強度教師値を取り込み、この顔表情強度教師値を回帰分析部５０に供給する。外部装置は、例えば、前記の顔画像データベースまたはコンピュータ装置等の情報処理装置である。顔表情強度教師値は、顔画像データベースに格納された、顔表情種別ごとの顔画像データ列について、各顔画像データにおける顔表情の度合を、評価者の主観評価にしたがって表した値である。一例として、顔表情強度教師値を、下限値（例えば“０（ゼロ）”）から上限値（例えば“１００”）までの整数で表す。これにおいて、顔表情強度教師値が小さいほど顔表情の度合が小さく、顔表情強度教師値が大きいほど顔表情の度合が大きい。なお、顔表情の度合を評価する評価者は、一人でもよいし、複数でもよい。評価者が複数である場合、各評価者によって付された値の平均値を顔表情強度教師値としてもよい。 When the facial expression analysis device 1 is set to the learning mode, the facial expression strength teacher value acquisition unit 20 takes in a facial expression strength teacher value supplied from an external device (not shown), and uses the facial expression strength teacher value as a regression analysis unit. 50. The external device is, for example, an information processing device such as the face image database or computer device. The facial expression strength teacher value is a value representing the degree of facial expression in each facial image data according to the subjective evaluation of the evaluator for the facial image data sequence for each facial expression type stored in the facial image database. As an example, the facial expression intensity teacher value is represented by an integer from a lower limit value (for example, “0 (zero)”) to an upper limit value (for example, “100”). In this case, the smaller the facial expression intensity teacher value, the smaller the facial expression degree, and the larger the facial expression intensity teacher value, the larger the facial expression degree. Note that the number of evaluators who evaluate the degree of facial expression may be one or more. When there are a plurality of evaluators, the average value assigned by each evaluator may be used as the facial expression intensity teacher value.

顔表情解析装置１が学習モードに設定されている場合、画像データ取得部１０が取り込む顔画像データと、この顔画像データに対応して顔表情強度教師値取得部２０が取り込む顔表情強度教師値との対データは、顔表情解析装置１における教師データである。 When the facial expression analysis apparatus 1 is set to the learning mode, the facial image data captured by the image data acquisition unit 10 and the facial expression strength teacher value captured by the facial expression strength teacher value acquisition unit 20 corresponding to the facial image data. Is the teacher data in the facial expression analysis apparatus 1.

顔領域抽出部３０は、画像データ取得部１０が供給する画像データ（顔画像データまたは評価顔画像データ）を取り込み、この画像データから顔の解析領域を抽出する。具体的に、顔領域抽出部３０は、顔領域検出部３１と、解析領域決定部３２とを備える。 The face area extraction unit 30 takes in the image data (face image data or evaluation face image data) supplied by the image data acquisition unit 10 and extracts a face analysis area from the image data. Specifically, the face area extraction unit 30 includes a face area detection unit 31 and an analysis area determination unit 32.

顔領域検出部３１は、画像データ取得部１０が供給する画像データを取り込み、この画像データについて顔検出処理を実行することによってその画像データから顔領域を検出する。この顔領域のデータ（顔領域データ）は、例えば矩形の画像データである。 The face area detection unit 31 captures image data supplied from the image data acquisition unit 10 and detects a face area from the image data by executing face detection processing on the image data. The face area data (face area data) is, for example, rectangular image data.

顔領域検出部３１が実行する顔検出処理のアルゴリズムとして、公知の顔検出アルゴリズム（例えば、ＡｄａＢｏｏｓｔ）を適用できる。なお、公知の顔検出アルゴリズムについては、例えば、PAUL VIOLA, MICHAEL J. JONES, "Robust Real-Time Face Detection", International Journal of Computer Vision, 2004, Vol. 57, No. 2, pp. 137-154に、詳細が開示されている。 A known face detection algorithm (for example, AdaBoost) can be applied as the algorithm of the face detection process executed by the face area detection unit 31. Known face detection algorithms include, for example, PAUL VIOLA, MICHAEL J. JONES, “Robust Real-Time Face Detection”, International Journal of Computer Vision, 2004, Vol. 57, No. 2, pp. 137-154. Details are disclosed.

解析領域決定部３２は、顔領域検出部３１が検出した顔領域データを所定サイズに正規化する。そして、解析領域決定部３２は、正規化した顔領域データ（正規化顔領域データ）から解析領域を抽出する。具体的に、解析領域決定部３２は、顔領域データを、例えば水平方向１２８画素×垂直方向１２８画素の正規化顔領域データに正規化する。すなわち、解析領域決定部３２は、顔領域データを所定サイズの矩形画像に拡大または縮小する画像処理を実行することによって正規化顔領域データを生成する。つまり、画像データに含まれる顔の大きさは画像データによって様々であるため、解析領域決定部３２は、顔領域を拡大または縮小させて、全ての画像データにおける顔領域の解像度を同程度にする。これにより、解像度が異なる顔領域データにおける情報量を、略均等にすることができる。 The analysis area determination unit 32 normalizes the face area data detected by the face area detection unit 31 to a predetermined size. Then, the analysis area determination unit 32 extracts an analysis area from the normalized face area data (normalized face area data). Specifically, the analysis area determination unit 32 normalizes the face area data into normalized face area data of 128 pixels in the horizontal direction × 128 pixels in the vertical direction, for example. That is, the analysis area determination unit 32 generates normalized face area data by executing image processing for enlarging or reducing the face area data into a rectangular image of a predetermined size. That is, since the size of the face included in the image data varies depending on the image data, the analysis area determination unit 32 enlarges or reduces the face area so that the resolution of the face area in all the image data is approximately the same. . Thereby, the amount of information in the face area data with different resolutions can be made substantially equal.

解析領域決定部３２は、正規化顔領域データから、画像特徴量を計算する領域である解析領域を決定し、この解析領域のデータ（解析領域データ）を抽出する。解析領域は、例えば、正規化顔領域内の中心部の円（楕円または真円）領域である。そして、解析領域決定部３２は、例えば、解析領域の中心を通る水平方向の直線で当該解析領域を二分し、その上部の領域を上部解析領域（第１の解析部分領域）、下部の領域を下部解析領域（第２の解析部分領域）として決定する。言い換えると、解析領域決定部３２は、正規化顔領域に内接する円形または楕円形よりも小さな円形または楕円形の解析領域を上下（縦）方向に二分して上部解析領域および下部解析領域を決定する。 The analysis region determination unit 32 determines an analysis region that is a region for calculating the image feature amount from the normalized face region data, and extracts data of the analysis region (analysis region data). The analysis region is, for example, a central circle (ellipse or perfect circle) region in the normalized face region. Then, for example, the analysis region determination unit 32 bisects the analysis region by a horizontal straight line passing through the center of the analysis region, the upper region is an upper analysis region (first analysis partial region), and the lower region is The lower analysis area (second analysis partial area) is determined. In other words, the analysis area determination unit 32 divides a circular or elliptical analysis area smaller than a circle or an ellipse inscribed in the normalized face area in the vertical (vertical) direction to determine the upper analysis area and the lower analysis area. To do.

画像特徴量計算部４０は、顔領域抽出部３０が抽出した解析領域データの画像特徴量を計算する。 The image feature amount calculation unit 40 calculates the image feature amount of the analysis region data extracted by the face region extraction unit 30.

具体的に、顔表情解析装置１が学習モードに設定されて実行する事前処理において、画像特徴量計算部４０は、解析領域決定部３２が決定した解析領域における上部解析領域について、例えば、ＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍａｔｉｏｎ（ＳＩＦＴ）特徴量またはＳｐｅｅｄｅｄＵｐＲｏｂｕｓｔＦｅａｔｕｒｅｓ（ＳＵＲＦ）特徴量等の局所特徴量を計算する。画像特徴量計算部４０は、全ての顔画像データについての局所特徴量についてクラスタリング処理を実行することによってクラスタを生成し、このクラスタを内蔵する記憶部に記憶させる。クラスタリング処理として、例えば、Ｋ平均法を適用する。画像特徴量計算部４０は、下部解析領域についても上部解析領域と同様にクラスタを生成し、このクラスタを上記の記憶部に記憶させる。なお、画像特徴量計算部４０は、学習処理において用いる全ての顔画像データについての上部解析領域および下部解析領域それぞれに対するクラスタを、外部装置から取り込んで上記の記憶部に記憶させてもよい。 Specifically, in the pre-processing executed by the facial expression analysis apparatus 1 set to the learning mode, the image feature amount calculation unit 40 performs, for example, Scale Invariant on the upper analysis region in the analysis region determined by the analysis region determination unit 32. A local feature quantity such as a Feature Transformation (SIFT) feature quantity or a Speeded Up Robust Feature (SURF) feature quantity is calculated. The image feature quantity calculation unit 40 generates a cluster by performing clustering processing on local feature quantities for all face image data, and stores the cluster in a storage unit that incorporates the cluster. As the clustering process, for example, a K-average method is applied. The image feature amount calculation unit 40 also generates a cluster for the lower analysis region in the same manner as the upper analysis region, and stores the cluster in the storage unit. Note that the image feature amount calculation unit 40 may acquire clusters for each of the upper analysis region and the lower analysis region for all face image data used in the learning process from an external device and store them in the storage unit.

顔表情解析装置１が学習モードに設定されて実行する学習処理、または顔表情解析モードに設定されて実行する顔表情評価処理において、画像特徴量計算部４０は、解析領域決定部３２が決定した解析領域における上部解析領域からＳＩＦＴ特徴量、またはＳＵＲＦ特徴量等の局所特徴量を計算する。そして、画像特徴量計算部４０は、これら局所特徴量を、事前処理において記憶した上部解析領域に対するクラスタに分類し、各クラスタをビン、各クラスタの要素数を頻度とするヒストグラムであるＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを生成する。画像特徴量計算部４０は、下部解析領域についても上部解析領域と同様に、Ｂａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを生成する。 In the learning process executed when the facial expression analysis apparatus 1 is set to the learning mode, or the facial expression evaluation process executed when the facial expression analysis mode is set to the facial expression analysis mode, the analysis region determination unit 32 determines the image feature amount calculation unit 40. A local feature amount such as a SIFT feature amount or a SURF feature amount is calculated from the upper analysis region in the analysis region. Then, the image feature quantity calculation unit 40 classifies these local feature quantities into clusters for the upper analysis region stored in the preprocessing, and Bag-of which is a histogram in which each cluster is a bin and the number of elements of each cluster is a frequency. -Generate Keypoints. The image feature quantity calculation unit 40 generates Bag-of-Keypoints for the lower analysis area as well as the upper analysis area.

画像特徴量計算部４０は、上部解析領域および下部解析領域それぞれについてのＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを連結して解析領域全体のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを生成する。具体的に、画像特徴量計算部４０は、例えば、上部解析領域に対する１７５次元のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓに、下部解析領域に対する１２５次元のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを連結し、解析領域全体として３００次元のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを生成する。 The image feature amount calculation unit 40 generates Bag-of-Keypoints for the entire analysis region by connecting Bag-of-Keypoints for each of the upper analysis region and the lower analysis region. Specifically, the image feature amount calculation unit 40, for example, connects the 175-dimensional Bag-of-Keypoints for the upper analysis area to the 125-dimensional Bag-of-Keypoints for the lower analysis area, and has 300 dimensions as the entire analysis area. Generate Bag-of-Keypoints.

なお、Ｂａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓについては、例えば、Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Gedric Bray, "Visual Categorization with Bag of Keypoints", Proc. of ECCV Workshop on Statistical Learning in Computer Vision, pp. 59-74, 2004に、詳細が開示されている。 Regarding Bag-of-Keypoints, for example, Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Gedric Bray, "Visual Categorization with Bag of Keypoints", Proc. Of ECCV Workshop on Statistical Learning in Computer Vision, Details are disclosed in pp. 59-74, 2004.

顔表情解析装置１が学習モードに設定されている場合、画像特徴量計算部４０は、解析領域全体のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを、画像特徴量として回帰分析部５０に供給する。また、顔表情解析装置１が顔表情解析モードに設定されている場合、画像特徴量計算部４０は、解析領域全体のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを、画像特徴量として顔表情評価部７０に供給する。 When the facial expression analysis apparatus 1 is set to the learning mode, the image feature amount calculation unit 40 supplies Bag-of-Keypoints of the entire analysis region to the regression analysis unit 50 as an image feature amount. When the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the image feature amount calculation unit 40 supplies Bag-of-Keypoints of the entire analysis region to the facial expression evaluation unit 70 as an image feature amount. .

顔表情解析装置１が学習モードに設定されている場合、回帰分析部５０は、画像特徴量計算部４０が供給する、顔画像データに対する画像特徴量を取り込み、また、顔表情強度教師値取得部２０が供給する、当該顔画像データに対する顔表情強度教師値を取り込む。 When the facial expression analysis device 1 is set to the learning mode, the regression analysis unit 50 takes in the image feature amount for the facial image data supplied from the image feature amount calculation unit 40, and also acquires the facial expression intensity teacher value acquisition unit. The facial expression intensity teacher value corresponding to the face image data supplied by 20 is fetched.

回帰分析部５０は、顔画像データに対する画像特徴量とその顔画像データに対応付けられた顔表情種別ごとの顔表情強度教師値とを用いて回帰分析処理を実行することにより、回帰モデルが有するパラメータ値を顔表情種別ごとに更新する。回帰モデルは、顔領域の画像特徴量から顔表情の度合を示す顔表情強度値を計算するための計算手段である。この回帰モデルは、可変のパラメータを有し、パラメータ値を更新可能とする数式モデルである。回帰分析部５０は、例えば、顔表情種別が“怒り”である場合の回帰分析において、顔表情種別が“怒り”である顔画像データについては顔表情強度教師値そのものを用いる一方、顔表情種別が“怒り”以外である顔画像データについては顔表情強度教師値を“０（ゼロ）”として用いて、回帰処理を実行する。そして、回帰分析部５０は、回帰処理によって得られるパラメータ値を、回帰モデル記憶部６０に記憶させる。なお、回帰分析部５０は、顔表情種別ごとに回帰分析処理を実行するのではなく、顔画像データに対する画像特徴量とその顔画像データに対応付けられた全顔表情における顔表情強度教師値とを用いて回帰分析処理を実行してもよい。 The regression analysis unit 50 has a regression model by executing a regression analysis process using an image feature amount for facial image data and a facial expression intensity teacher value for each facial expression type associated with the facial image data. The parameter value is updated for each facial expression type. The regression model is a calculation means for calculating a facial expression intensity value indicating the degree of facial expression from the image feature quantity of the facial region. This regression model is a mathematical model that has variable parameters and allows parameter values to be updated. For example, in the regression analysis when the facial expression type is “anger”, the regression analysis unit 50 uses the facial expression strength teacher value itself for the facial image data with the facial expression type “angry”, while the facial expression type For face image data other than “anger”, the facial expression strength teacher value is set to “0 (zero)” and the regression process is executed. Then, the regression analysis unit 50 stores the parameter value obtained by the regression process in the regression model storage unit 60. The regression analysis unit 50 does not execute the regression analysis process for each facial expression type, but the image feature amount for the facial image data and the facial expression intensity teacher value for all facial expressions associated with the facial image data. The regression analysis process may be executed using.

回帰モデル記憶部６０は、回帰分析部５０が供給するパラメータ値を、顔表情種別ごとに記憶する。回帰モデル記憶部６０は、例えば、磁気ハードディスク装置または半導体記憶装置により実現される。 The regression model storage unit 60 stores the parameter values supplied by the regression analysis unit 50 for each facial expression type. The regression model storage unit 60 is realized by, for example, a magnetic hard disk device or a semiconductor storage device.

顔表情解析装置１が顔表情解析モードに設定されている場合、顔表情評価部７０は、画像特徴量計算部４０が供給する、評価顔画像データに対する画像特徴量を取り込む。また、顔表情評価部７０は、回帰モデル記憶部６０から、顔表情種別ごとに回帰モデルのパラメータ値を読み込む。そして、顔表情評価部７０は、各回帰モデルに画像特徴量を適用して顔表情種別ごとに顔表情強度値を計算することによって顔表情強度値セットを生成し、この顔表情強度値セットを平滑化処理部８０に供給する。この顔表情強度値セットは、１フレーム分の評価顔画像データに対するデータセットである。具体的に、顔表情強度値セットは、各顔表情種別と顔表情強度値とを対応付けたものである。 When the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the facial expression evaluation unit 70 captures the image feature amount for the evaluation face image data supplied from the image feature amount calculation unit 40. Further, the facial expression evaluation unit 70 reads the parameter values of the regression model for each facial expression type from the regression model storage unit 60. Then, the facial expression evaluation unit 70 generates a facial expression strength value set by calculating the facial expression strength value for each facial expression type by applying the image feature amount to each regression model, and this facial expression strength value set is generated. This is supplied to the smoothing processing unit 80. This facial expression intensity value set is a data set for evaluation face image data for one frame. Specifically, the facial expression intensity value set associates each facial expression type with a facial expression intensity value.

顔表情解析装置１が顔表情解析モードに設定されている場合、平滑化処理部８０は、顔表情評価部７０が供給する顔表情強度値セットを時系列に取り込んで記憶する。そして、平滑化処理部８０は、複数フレーム分の顔表情強度値セットを含む区間ごとに、顔表情強度値を平滑化することによって、当該区間内における各フレームに対する平滑化後の顔表情強度値と顔表情の分類結果である顔表情種別情報を得る。 When the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the smoothing processing unit 80 captures and stores the facial expression intensity value set supplied by the facial expression evaluation unit 70 in time series. Then, the smoothing processing unit 80 smoothes the facial expression intensity value for each section including the facial expression intensity value set for a plurality of frames, thereby smoothing the facial expression intensity value after smoothing for each frame in the section. And facial expression type information which is the classification result of facial expressions.

次に、平滑化処理部８０の詳細を説明する。
図２は、平滑化処理部８０の機能構成を示すブロック図である。同図に示すように、平滑化処理部８０は、顔表情強度値取得部８１と、顔表情強度値平滑化処理部８２と、顔表情種別平滑化処理部８３とを備える。 Next, details of the smoothing processing unit 80 will be described.
FIG. 2 is a block diagram illustrating a functional configuration of the smoothing processing unit 80. As shown in the figure, the smoothing processing unit 80 includes a facial expression strength value acquisition unit 81, a facial expression strength value smoothing processing unit 82, and a facial expression type smoothing processing unit 83.

顔表情強度値取得部８１は、顔表情評価部７０が供給する顔表情強度値セットを時系列に取り込んで内蔵するバッファに記憶させる。このバッファは、複数フレームを含む区間における複数組の顔表情強度値セットを記憶可能な容量を有する、ＦＩＦＯ（ＦｉｒｓｔＩｎ／ＦｉｒｓｔＯｕｔ）形式の記憶部である。 The facial expression intensity value acquisition unit 81 takes a facial expression intensity value set supplied by the facial expression evaluation unit 70 in time series and stores it in a built-in buffer. This buffer is a FIFO (First In / First Out) type storage unit having a capacity capable of storing a plurality of sets of facial expression intensity value sets in a section including a plurality of frames.

顔表情強度値平滑化処理部８２は、顔表情強度値取得部８１がバッファに記憶させた複数フレーム分の顔表情強度値セットを含む区間ごとに、前記顔表情強度値取得部が取り込んだ複数フレーム分の顔表情強度値セットを参照し、顔表情種別ごとの複数フレームの顔表情強度値に基づく合計値に基づいて、複数フレームに対応する代表顔表情強度値を計算する。具体的に、顔表情強度値平滑化処理部８２は、顔表情強度値取得部８１がバッファに記憶させた複数フレーム分の顔表情強度値セットを含む区間ごとに、当該区間内の複数フレーム分の顔表情強度値セットについて、顔表情種別ごとに顔表情強度値の合計値を計算する。そして、顔表情強度値平滑化処理部８２は、顔表情種別ごとの顔表情強度値の合計値に基づいて、顔表情種別ごとの平均値を計算する。ここでの平均値は、単純平均値である。そして、顔表情強度値平滑化処理部８２は、顔表情種別ごとの顔表情強度値の平均値のうち最大の平均値、言い換えると、顔表情種別ごとの顔表情強度値の合計値のうち最大の合計値から求まる平均値を、当該区間に対応する平滑化後の代表顔表情強度値として出力する。 The facial expression intensity value smoothing processing unit 82 includes a plurality of facial expression intensity value acquisition units that each facial expression intensity value acquisition unit 81 captures for each section including a plurality of frames of facial expression intensity value sets stored in the buffer. With reference to the facial expression intensity value set for each frame, the representative facial expression intensity value corresponding to the plurality of frames is calculated based on the total value based on the facial expression intensity values of the plurality of frames for each facial expression type. Specifically, the facial expression intensity value smoothing processing unit 82 performs processing for a plurality of frames in the section for each section including the facial expression intensity value set for a plurality of frames stored in the buffer by the facial expression intensity value acquisition unit 81. For the facial expression strength value set, the total facial expression strength value is calculated for each facial expression type. Then, the facial expression intensity value smoothing processing unit 82 calculates an average value for each facial expression type based on the total value of the facial expression intensity values for each facial expression type. The average value here is a simple average value. The facial expression strength value smoothing processing unit 82 then calculates the maximum average value of the facial expression intensity values for each facial expression type, in other words, the maximum of the total facial expression intensity values for each facial expression type. An average value obtained from the total value is output as a smoothed representative facial expression intensity value corresponding to the section.

顔表情種別平滑化処理部８３は、顔表情強度値取得部８１がバッファに記憶させた複数フレーム分の顔表情強度値セットを含む区間ごとに、顔表情強度値平滑化処理部８２が求めた最大の平均値に対応する顔表情種別を、当該区間に対応する平滑化後の顔表情の分類結果として選出する。そして、顔表情種別平滑化処理部８３は、その顔表情の分類結果を示す顔表情種別情報を生成し、この顔表情種別情報を出力する。 The facial expression type smoothing processing unit 83 obtains the facial expression intensity value smoothing processing unit 82 for each section including the facial expression intensity value sets for a plurality of frames stored in the buffer by the facial expression intensity value acquisition unit 81. The facial expression type corresponding to the maximum average value is selected as the classification result of the smoothed facial expression corresponding to the section. The facial expression type smoothing processing unit 83 generates facial expression type information indicating the classification result of the facial expression, and outputs the facial expression type information.

図３は、顔表情解析装置１が学習モードに設定されて学習処理を実行する際に用いる、顔画像データベースのデータ構造の一部分を概念的に示す図である。同図に示すように、顔画像データベースは、顔表情種別ごとに、各人物（被写体）のニュートラル顔表情からピーク顔表情まで顔表情の度合がそれぞれ異なる顔画像データ列の集合に、当該顔表情種別を示すラベルを対応付けて構成した顔画像データ群を格納している。顔表情種別は、例えば、「怒り」、「嫌悪」、「恐れ」、「幸せ」、「悲しみ」、および「驚き」の６種類である。ニュートラル顔表情は、人物の中立的な顔表情であり、例えば、人物の無表情な顔つきから表情の種類を判別困難な程度の顔つきまでを示す表情である。つまり、ニュートラル顔表情には、顔表情の幅がある。ピーク顔表情は、人物の感情を豊かに表現した顔表情であり、例えば、怒り、嫌悪、恐れ、幸せ、悲しみ、驚き等の感情が強く表現された顔つきを示す。 FIG. 3 is a diagram conceptually showing a part of the data structure of the face image database used when the facial expression analysis apparatus 1 is set to the learning mode and executes the learning process. As shown in the figure, the facial image database stores the facial expression in a set of facial image data sequences having different degrees of facial expression from the neutral facial expression to the peak facial expression of each person (subject) for each facial expression type. A face image data group configured by associating labels indicating types is stored. There are six types of facial expression types, for example, “anger”, “disgust”, “fear”, “happiness”, “sadness”, and “surprise”. The neutral facial expression is a neutral facial expression of a person, for example, a facial expression ranging from a person's expressionless face to a face whose degree of expression is difficult to distinguish. In other words, the neutral facial expression has a range of facial expressions. The peak facial expression is a facial expression that expresses a person's emotions abundantly, and indicates a facial expression in which emotions such as anger, disgust, fear, happiness, sadness, and surprise are strongly expressed.

顔画像データベースとして、例えば、Patrick Lucey, Jeffrey F. Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, "The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression", the Third IEEE Workshop on CVPR for Human Communicative Behavior Analysis, pp. 94-101, 2010に記載された、Cohn-Kanade Facial Expression Databaseを適用できる。 For example, Patrick Lucey, Jeffrey F. Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, "The Extended Cohn-Kanade Dataset (CK +): A complete dataset for action unit and emotion-specified expression", the Third The Cohn-Kanade Facial Expression Database described in IEEE Workshop on CVPR for Human Communicative Behavior Analysis, pp. 94-101, 2010 can be applied.

図４は、顔表情解析装置１が学習モードに設定されて学習処理を実行する際に用いる顔表情強度教師値を、顔画像データに対応付けて示す図である。同図に示すように、顔表情強度教師値は、顔画像データ群における顔表情種別ごとの各被写体の顔画像データ列それぞれについて、各顔画像データの顔表情の度合を、評価者による主観評価にしたがって下限値“０（ゼロ）”から上限値“１００”までの整数で表される。 FIG. 4 is a diagram showing the facial expression intensity teacher value used when the facial expression analysis apparatus 1 is set to the learning mode and executes the learning process, in association with the facial image data. As shown in the figure, the facial expression intensity teacher value is a subjective evaluation by the evaluator for the degree of facial expression of each facial image data for each facial image data string of each subject for each facial expression type in the facial image data group. Is represented by an integer from the lower limit “0 (zero)” to the upper limit “100”.

図４では、顔表情種別が“幸せ”である第１の被写体の顔画像データ列について、ニュートラル顔表情に対応する顔表情強度教師値が“０（ゼロ）”、顔表情の度合が大きくなるにしたがって、顔表情強度教師値が例えば“８”、“４６”、“８３”等と大きくなり、ピーク顔表情に対応する顔表情強度教師値が“１００”となっている。また、顔表情種別が“幸せ”である第２の被写体の顔画像データ列について、ニュートラル顔表情に対応する顔表情強度教師値が“０（ゼロ）”、顔表情の度合が大きくなるにしたがって、顔表情強度教師値が例えば“６”、“５２”、“７９”等と大きくなり、ピーク顔表情に対応する顔表情強度教師値が“１００”となっている。また、顔表情種別が“驚き”である顔画像データ列について、ニュートラル顔表情に対応する顔表情強度教師値が“０（ゼロ）”、顔表情の度合が大きくなるにしたがって、顔表情強度教師値が例えば“７”、“４３”、“８８”等と大きくなり、ピーク顔表情に対応する顔表情強度教師値が“１００”となっている。なお、この例のように、ニュートラル顔表情からピーク顔表情に顔表情が変化する顔画像列に対し、顔表情強度教師値の下限値および上限値を設けることを必須の条件としてもよいし、必須の条件としなくてもよい。 In FIG. 4, for the face image data string of the first subject whose facial expression type is “happy”, the facial expression intensity teacher value corresponding to the neutral facial expression is “0 (zero)”, and the degree of facial expression increases. Accordingly, the facial expression intensity teacher value increases to, for example, “8”, “46”, “83”, and the facial expression intensity teacher value corresponding to the peak facial expression is “100”. For the face image data string of the second subject whose facial expression type is “happy”, the facial expression intensity teacher value corresponding to the neutral facial expression is “0 (zero)”, and the degree of facial expression increases. For example, the facial expression intensity teacher value increases to, for example, “6”, “52”, “79”, and the facial expression intensity teacher value corresponding to the peak facial expression is “100”. In addition, for a facial image data string whose facial expression type is “surprise”, the facial expression intensity teacher value corresponding to the neutral facial expression is “0 (zero)”, and the facial expression intensity teacher increases as the degree of facial expression increases. The value is increased to, for example, “7”, “43”, “88”, and the facial expression intensity teacher value corresponding to the peak facial expression is “100”. In addition, as in this example, it may be an indispensable condition to provide a lower limit value and an upper limit value of the facial expression intensity teacher value for the facial image sequence in which the facial expression changes from the neutral facial expression to the peak facial expression, It does not have to be an indispensable condition.

図５は、画像データと、この画像データから抽出された顔領域データと、この顔領域データを正規化して得られた正規化顔領域データとを模式的に示す図である。つまり、同図は、画像データ取得部１０が取得する画像データ２と、顔領域検出部３１が検出する顔領域データ２ａと、解析領域決定部３２が正規化（ここでは、縮小）する正規化顔領域データ２ｂとを時系列に示している。同図に示すように、画像データ２は、人物の首より上側を含む画像である。顔領域データ２ａは、画像データ２から抽出された顔を含む画像である。顔を含む画像とは、例えば、人物の顔表情を決定付ける顔の主要なパーツ（両眉毛、両目、鼻、口）を含む画像である。正規化顔領域データ２ｂは、顔領域データ２ａを水平画素数Ｌ_Ｘ×垂直画素数Ｌ_Ｙサイズに正規化した画像である。水平画素数Ｌ_Ｘと垂直画素数Ｌ_Ｙとの関係は、例えば、正規化顔領域が正方形となる関係である。 FIG. 5 is a diagram schematically showing image data, face area data extracted from the image data, and normalized face area data obtained by normalizing the face area data. That is, the figure shows the image data 2 acquired by the image data acquisition unit 10, the face region data 2a detected by the face region detection unit 31, and the normalization that the analysis region determination unit 32 normalizes (here, reduces). The face area data 2b is shown in time series. As shown in the figure, the image data 2 is an image including the upper side of the person's neck. The face area data 2a is an image including a face extracted from the image data 2. An image including a face is, for example, an image including main parts (both eyebrows, both eyes, nose, mouth) of a face that determine the facial expression of a person. The normalized face area data 2b is an image obtained by normalizing the face area data 2a to a horizontal pixel number L _X × vertical pixel number _LY size. The relationship between the horizontal pixel number L _X and the vertical pixel number L _Y is, for example, a relationship in which the normalized face area is a square.

図６は、解析領域決定部３２が正規化顔領域データ２ｂから決定した解析領域を、視覚的に分かり易く線描画した図である。同図に示すように、解析領域決定部３２は、水平画素数Ｌ_Ｘ×垂直画素数Ｌ_Ｙの正規化顔領域データ２ｂの中心位置を中心として、正規化顔領域データ２ｂに含まれる円形の解析領域３を決定する。解析領域３の水平方向の径は、例えば水平画素数Ｌ_Ｘの０．８倍の大きさを有し、垂直方向の径は、例えば垂直画素数Ｌ_Ｙの０．８倍の大きさを有する。このように、解析領域３の径を正規化顔領域データ２ｂの内接円の径よりも小さくすることにより、顔の認識や顔表情認識にとって重要度が低い髪の毛、耳、イヤリング等の情報を除外することができる。解析領域決定部３２は、解析領域３の中心を通る水平線で、解析領域３を上部解析領域３Ｕと下部解析領域３Ｄとに区分する。このように区分することにより、上部解析領域３Ｕは両眉毛および両目を含み、下部解析領域３Ｄは鼻頭および口を含むこととなる。 FIG. 6 is a diagram in which the analysis region determined by the analysis region determination unit 32 from the normalized face region data 2b is drawn in a line that is visually easy to understand. As shown in the figure, the analysis region determination unit 32, about the central position of the horizontal pixel number L _X × vertical pixel number L _Y of the normalized face region data 2b, circular included in the normalized face region data 2b The analysis area 3 is determined. Horizontal diameter of the analysis region 3, for example, a 0.8 times larger than the horizontal pixel number L _X, the diameter of the vertical direction, for example, has a 0.8 times the size of the vertical pixel number L _Y . In this way, by making the diameter of the analysis region 3 smaller than the diameter of the inscribed circle of the normalized face region data 2b, information on hair, ears, earrings, etc. that are less important for face recognition and facial expression recognition are obtained. Can be excluded. The analysis region determination unit 32 divides the analysis region 3 into an upper analysis region 3U and a lower analysis region 3D by a horizontal line passing through the center of the analysis region 3. By dividing in this way, the upper analysis region 3U includes both eyebrows and both eyes, and the lower analysis region 3D includes the nasal head and mouth.

図７は、画像特徴量計算部４０によって生成された、上部解析領域における特徴量のヒストグラムと、下部解析領域における特徴量のヒストグラムと、これら二つのヒストグラムが連結された、解析領域全体における特徴量のヒストグラムとを模式的に示した図である。同図は、上部解析領域における特徴量のヒストグラムの後に、下部解析領域における特徴量のヒストグラムを連結した例である。このように、画像特徴量計算部４０が、分割された各領域でヒストグラムを生成して連結することにより、Ｂａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓに位置情報を加えることができる。なお、画像特徴量計算部４０は、下部解析領域における特徴量のヒストグラムの後に、上部解析領域における特徴量のヒストグラムを連結することによって、解析領域全体における特徴量のヒストグラムを生成してもよい。 FIG. 7 shows a feature amount histogram generated by the image feature amount calculation unit 40 in the upper analysis region, a feature amount histogram in the lower analysis region, and a feature amount in the entire analysis region in which these two histograms are connected. It is the figure which showed typically the histogram. This figure is an example in which a histogram of feature amounts in the lower analysis region is connected to a histogram of feature amounts in the upper analysis region. As described above, the image feature quantity calculation unit 40 can add position information to Bag-of-Keypoints by generating and connecting a histogram in each divided area. Note that the image feature amount calculation unit 40 may generate a feature amount histogram in the entire analysis region by concatenating the feature amount histogram in the upper analysis region after the feature amount histogram in the lower analysis region.

図８は、回帰分析部５０が実行する回帰分析処理における一つの回帰モデルを模式的に示した図である。同図において、横軸は回帰式における独立変数を表し、本実施形態では、顔画像データの顔領域の画像特徴量を表す。縦軸は回帰式における従属変数を表し、本実施形態では、顔表情強度教師値を表わす。同図における複数の四角形印の分布は、画像特徴量とこの画像特徴量に対する顔表情強度教師値との対応関係を示している。また、同図において曲線で表されている実線は、回帰分析部５０が実行する回帰分析処理によって得られる回帰式を示すグラフである。 FIG. 8 is a diagram schematically showing one regression model in the regression analysis process executed by the regression analysis unit 50. In the figure, the horizontal axis represents an independent variable in the regression equation, and in this embodiment, represents the image feature amount of the face area of the face image data. The vertical axis represents the dependent variable in the regression equation, and in this embodiment represents the facial expression intensity teacher value. The distribution of a plurality of square marks in the figure shows the correspondence between the image feature quantity and the facial expression intensity teacher value for this image feature quantity. Moreover, the solid line represented by the curve in the same figure is a graph which shows the regression type obtained by the regression analysis process which the regression analysis part 50 performs.

回帰分析部５０は、回帰モデルとして、例えば、線形回帰モデル、ロジスティック回帰モデル、またはサポートベクトル回帰モデルを適用して回帰分析処理を実行する。次に、各回帰モデルを適用した回帰分析処理について説明する。 The regression analysis unit 50 executes a regression analysis process by applying, for example, a linear regression model, a logistic regression model, or a support vector regression model as a regression model. Next, a regression analysis process using each regression model will be described.

［１］線形回帰モデル
回帰モデルとして線形回帰モデルを適用した場合、回帰分析部５０は、線形回帰分析処理として、画像特徴量および顔表情強度教師値の関係を、下記の式（１）に示す積和関数にモデル化する。ただし、Ｙは顔表情強度教師値、Ｘ_ｉは画像特徴量（ｉ＝１，・・・，Ｉ）である。また、α、β_ｉはパラメータである。 [1] Linear Regression Model When a linear regression model is applied as the regression model, the regression analysis unit 50 represents the relationship between the image feature amount and the facial expression intensity teacher value as the linear regression analysis process as shown in the following equation (1). Model as a product-sum function. Here, Y is a facial expression intensity teacher value, and X _i is an image feature amount (i = 1,..., I). Α and β _i are parameters.

回帰分析部５０は、画像特徴量とこの画像特徴量に対する顔表情強度教師値との対応関係を例えば最小二乗法によって回帰させることにより、式（１）に示す積和関数を推計する。具体的に、式（１）が画像特徴量とこの画像特徴量に対する顔表情強度教師値との対データに対して最適な近似式となるように、回帰分析部５０は、近似誤差の二乗和が最小となるパラメータα、β_ｉを、例えば最急降下法によって求める。回帰分析部５０は、回帰分析処理において、相関が強い（例えば、相関係数が“０．５”以上である）独立変数の一方を削除することによって多重共線を排除または抑制してもよい。また、全ての顔表情種別に共通して“０（ゼロ）”である独立変数（画像特徴量）について、回帰分析部５０は、その独立変数を削除する処理を行ってもよい。 The regression analysis unit 50 estimates the sum-of-products function shown in Expression (1) by regressing the correspondence between the image feature quantity and the facial expression intensity teacher value for the image feature quantity by, for example, the least square method. Specifically, the regression analysis unit 50 calculates the sum of squares of the approximation errors so that the formula (1) is an optimal approximation formula for the paired data of the image feature quantity and the facial expression intensity teacher value for the image feature quantity. The parameters α and β _i that minimize the value are obtained by the steepest descent method, for example. The regression analysis unit 50 may eliminate or suppress multiple collinearity by deleting one of the independent variables having strong correlation (for example, the correlation coefficient is “0.5” or more) in the regression analysis processing. . For the independent variable (image feature amount) that is “0 (zero)” in common for all facial expression types, the regression analysis unit 50 may perform processing for deleting the independent variable.

［２］ロジスティック回帰モデル
回帰モデルとしてロジスティック回帰モデルを適用した場合、回帰分析部５０は、ロジスティック回帰分析処理として、画像特徴量および顔表情強度教師値の関係を、下記の式（２）に示す関数にモデル化する。ただし、Ｙは顔表情強度教師値、Ｘ_ｉは画像特徴量（ｉ＝１，・・・，Ｉ）である。また、α、β_ｉはパラメータである。 [2] Logistic Regression Model When a logistic regression model is applied as the regression model, the regression analysis unit 50 represents the relationship between the image feature quantity and the facial expression intensity teacher value as the logistic regression analysis process as shown in the following equation (2). Model into a function. Here, Y is a facial expression intensity teacher value, and X _i is an image feature amount (i = 1,..., I). Α and β _i are parameters.

回帰分析部５０は、画像特徴量とこの画像特徴量に対する顔表情強度教師値との対応関係を回帰させることによってパラメータα、β_ｉを求める。このロジスティック回帰モデルを適用することにより、回帰分析部５０は、画像特徴量Ｘ_ｉに対する顔表情強度教師値Ｙが０から１００までの間（０≦Ｙ≦１００）に収まる回帰式を得ることができる。 The regression analysis unit 50 obtains the parameters α and β _i by regressing the correspondence between the image feature quantity and the facial expression intensity teacher value for the image feature quantity. By applying this logistic regression model, the regression analysis unit 50 can obtain a regression equation in which the facial expression intensity teacher value Y with respect to the image feature amount X _i falls within a range from 0 to 100 (0 ≦ Y ≦ 100). it can.

［３］サポートベクトル回帰モデル
回帰モデルとしてサポートベクトル回帰モデルを適用した場合、回帰分析部５０は、サポートベクトル回帰分析処理として、下記の式（３）の形で、画像特徴量Ｘ_ｉ（ｉ＝１，・・・，Ｉ）と顔表情強度教師値Ｙとを関係付ける。 [3] Support Vector Regression Model When the support vector regression model is applied as the regression model, the regression analysis unit 50 performs the image feature quantity X _i (i = _i ) in the form of the following formula (3) as the support vector regression analysis process. 1,..., I) and the facial expression intensity teacher value Y are related.

式（３）において、関数φは、Ｉ次元の特徴量ベクトルをＪ次元のベクトル（行ベクトル）に写像する写像関数である。このサポートベクトル回帰モデルは、関数φによるカーネルトリックを用いる。β_ｊ（ｊ＝１，・・・，Ｊ）は、関数φによる写像後のベクトルの要素それぞれに対応する重み係数である。また、αはバイアス項である。回帰分析部５０は、入力される多数の顔表情強度教師値を用いて式（３）の形の回帰を行い、パラメータα，β_１，・・・，β_Ｊを求める。なお、パラメータの計算自体は、例えば、ニュートン法に基づいて既存のサポートベクトル回帰の学習法を適用することができる。 In equation (3), the function φ is a mapping function that maps an I-dimensional feature vector into a J-dimensional vector (row vector). This support vector regression model uses a kernel trick with a function φ. β _j (j = 1,..., J) is a weighting coefficient corresponding to each element of the vector after mapping by the function φ. Α is a bias term. Regression analysis unit 50 performs a regression in the form of equation (3) using a number of facial expression intensity teaching value inputted, the parameter alpha, beta _1, · · ·, seek beta _J. For parameter calculation itself, for example, an existing support vector regression learning method can be applied based on the Newton method.

次に、顔表情解析装置１の動作について説明する。
まず、学習モードに設定された顔表情解析装置１は、学習処理において用いる全ての顔画像データを顔画像データベースから取り込んで、以下に示す事前処理を実行する。すなわち、画像データ取得部１０が、顔画像データベースから顔画像データを取り込む。次に、顔領域抽出部３０がその取り込んだ顔画像データのサイズを正規化して解析領域（上部解析領域および下部解析領域）を抽出する。次に、画像特徴量計算部４０が、上部解析領域について、ＳＩＦＴ特徴量またはＳＵＲＦ特徴量等の局所特徴量を計算する。次に、画像特徴量計算部４０が、全ての顔画像データの上部解析領域に対する局所特徴量についてクラスタリング処理を実行することによってクラスタを生成し、このクラスタを記憶部に記憶させる。また、画像特徴量計算部４０は、下部解析領域についても上部解析領域と同様にクラスタを生成し、このクラスタを記憶部に記憶させる。 Next, the operation of the facial expression analysis apparatus 1 will be described.
First, the facial expression analysis apparatus 1 set to the learning mode fetches all face image data used in the learning process from the face image database, and executes the following pre-processing. That is, the image data acquisition unit 10 captures face image data from the face image database. Next, the face area extraction unit 30 normalizes the size of the captured face image data and extracts analysis areas (upper analysis area and lower analysis area). Next, the image feature amount calculation unit 40 calculates a local feature amount such as a SIFT feature amount or a SURF feature amount for the upper analysis region. Next, the image feature amount calculation unit 40 generates a cluster by executing clustering processing on the local feature amount for the upper analysis region of all the face image data, and stores the cluster in the storage unit. Further, the image feature quantity calculation unit 40 also generates a cluster for the lower analysis region as in the upper analysis region, and stores the cluster in the storage unit.

次に、顔表情解析装置１の学習処理について説明する。
図９は、顔表情解析装置１が実行する学習処理の手順を示すフローチャートである。
ステップＳ１において、画像データ取得部１０は、例えば、顔画像データベースに格納された複数の顔画像データから一つの顔画像データを取り込み、この顔画像データを顔領域抽出部３０に供給する。
次に、ステップＳ２において、顔表情強度教師値取得部２０は、ステップＳ１の処理において画像データ取得部１０に取り込まれた顔画像データに対応する顔表情強度教師値を、外部装置（例えば、顔画像データベース）から取り込み、この顔表情強度教師値を回帰分析部５０に供給する。 Next, the learning process of the facial expression analysis apparatus 1 will be described.
FIG. 9 is a flowchart showing the procedure of the learning process executed by the facial expression analysis apparatus 1.
In step S 1, for example, the image data acquisition unit 10 takes in one face image data from a plurality of face image data stored in the face image database, and supplies the face image data to the face region extraction unit 30.
Next, in step S2, the facial expression intensity teacher value acquisition unit 20 sets the facial expression intensity teacher value corresponding to the facial image data captured by the image data acquisition unit 10 in the process of step S1 to an external device (for example, face The facial expression intensity teacher value is fetched from the image database) and supplied to the regression analysis unit 50.

次に、ステップＳ３において、顔領域抽出部３０は、画像データ取得部１０が供給する顔画像データを取り込み、この顔画像データに対して顔検出処理を実行することによってその顔画像データから人物の顔領域を検出する。次に、解析領域決定部３２は、顔領域検出部３１が検出した顔領域データを所定サイズ（例えば、水平方向１２８画素×垂直方向１２８画素）に正規化する。次に、解析領域決定部３２は、正規化顔領域データから解析領域を抽出し、この解析領域から二つの解析部分領域（上部解析領域および下部解析領域）を決定する。 Next, in step S 3, the face area extraction unit 30 takes in the face image data supplied from the image data acquisition unit 10 and executes face detection processing on the face image data to thereby determine the human face from the face image data. Detect face area. Next, the analysis area determination unit 32 normalizes the face area data detected by the face area detection unit 31 to a predetermined size (for example, 128 pixels in the horizontal direction × 128 pixels in the vertical direction). Next, the analysis region determination unit 32 extracts an analysis region from the normalized face region data, and determines two analysis partial regions (an upper analysis region and a lower analysis region) from this analysis region.

次に、ステップＳ４において、画像特徴量計算部４０は、顔領域抽出部３０が抽出した解析領域データの画像特徴量を計算する。具体的に、画像特徴量計算部４０は、上部解析領域からＳＩＦＴ特徴量またはＳＵＲＦ特徴量等の局所特徴量を計算する。次に、画像特徴量計算部４０は、これら局所特徴量を、事前処理において記憶した上部解析領域に対するクラスタに分類し、各クラスタをビン、各クラスタの要素数を頻度とするヒストグラムを生成する。また、画像特徴量計算部４０は、下部解析領域からＳＩＦＴ特徴量またはＳＵＲＦ等の局所特徴量を計算する。次に、画像特徴量計算部４０は、これら局所特徴量を、事前処理において記憶した下部解析領域に対するクラスタに分類し、各クラスタをビン、各クラスタの要素数を頻度とするヒストグラムを生成する。次に、画像特徴量計算部４０は、上部解析領域および下部解析領域それぞれについてのヒストグラムを連結して解析領域全体のヒストグラム、言い換えると、解析領域全体のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを生成する。次に、画像特徴量計算部４０は、解析領域全体のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを、画像特徴量として回帰分析部５０に供給する。 Next, in step S 4, the image feature amount calculation unit 40 calculates the image feature amount of the analysis region data extracted by the face region extraction unit 30. Specifically, the image feature amount calculation unit 40 calculates a local feature amount such as a SIFT feature amount or a SURF feature amount from the upper analysis region. Next, the image feature quantity calculation unit 40 classifies these local feature quantities into clusters for the upper analysis region stored in the preprocessing, and generates a histogram with each cluster as a bin and the number of elements in each cluster as a frequency. Further, the image feature quantity calculation unit 40 calculates a local feature quantity such as SIFT feature quantity or SURF from the lower analysis region. Next, the image feature quantity calculation unit 40 classifies these local feature quantities into clusters for the lower analysis region stored in the preprocessing, and generates a histogram with each cluster as a bin and the number of elements of each cluster as a frequency. Next, the image feature amount calculation unit 40 generates a histogram of the entire analysis region, in other words, Bag-of-Keypoints of the entire analysis region by connecting the histograms for the upper analysis region and the lower analysis region. Next, the image feature amount calculation unit 40 supplies Bag-of-Keypoints of the entire analysis region to the regression analysis unit 50 as an image feature amount.

次に、ステップＳ５において、顔画像データベースから取り込むべき全ての顔画像データの取り込みが完了した場合（ステップＳ５：ＹＥＳ）、顔表情解析装置１はステップＳ６の処理に移す。一方、顔画像データベースから取り込むべき全ての顔画像データの取り込みが完了していない場合（ステップＳ５：ＮＯ）は、顔表情解析装置１はステップＳ１の処理に戻す。 Next, in step S5, when the acquisition of all the face image data to be acquired from the face image database is completed (step S5: YES), the facial expression analysis apparatus 1 proceeds to the process of step S6. On the other hand, when all the facial image data to be captured from the facial image database has not been captured (step S5: NO), the facial expression analysis apparatus 1 returns to the process of step S1.

ステップＳ６において、回帰分析部５０は、顔画像データに対する画像特徴量とその顔画像データに対応付けられた顔表情種別ごとの顔表情強度教師値とを用いて回帰分析処理を実行することにより、回帰モデルが有するパラメータ値を顔表情種別ごとに更新する。次に、回帰分析部５０は、回帰処理を行って得られるパラメータ値を、回帰モデル記憶部６０に供給する。
次に、ステップＳ７において、回帰モデル記憶部６０は、回帰分析部５０が供給するパラメータ値を、顔表情種別ごとに記憶する。 In step S6, the regression analysis unit 50 executes the regression analysis process by using the image feature amount for the facial image data and the facial expression intensity teacher value for each facial expression type associated with the facial image data. The parameter value of the regression model is updated for each facial expression type. Next, the regression analysis unit 50 supplies parameter values obtained by performing the regression process to the regression model storage unit 60.
Next, in step S7, the regression model storage unit 60 stores the parameter values supplied by the regression analysis unit 50 for each facial expression type.

図１０は、顔表情解析装置１が実行する１フレーム分の顔表情評価処理の手順を示すフローチャートである。
ステップＳ２１において、画像データ取得部１０は、例えば、映像撮影装置または映像記録装置が供給する、顔表情解析のための評価顔画像データを取り込み、この評価顔画像データを顔領域抽出部３０に供給する。 FIG. 10 is a flowchart illustrating a procedure of facial expression evaluation processing for one frame executed by the facial expression analysis apparatus 1.
In step S 21, the image data acquisition unit 10 takes in evaluation face image data for facial expression analysis supplied by, for example, a video shooting device or video recording device, and supplies this evaluation face image data to the face region extraction unit 30. To do.

次に、ステップＳ２２において、顔領域抽出部３０は、画像データ取得部１０が供給する評価顔画像データを取り込み、この評価顔画像データに対して顔検出処理を実行することによってその評価顔画像データから人物の顔領域を検出する。次に、解析領域決定部３２は、顔領域検出部３１が検出した評価顔領域データを所定サイズ（例えば、水平方向１２８画素×垂直方向１２８画素）に正規化する。次に、解析領域決定部３２は、正規化顔領域データから解析領域を抽出し、この解析領域から二つの解析部分領域（上部解析領域および下部解析領域）を決定する。 Next, in step S22, the face area extraction unit 30 takes in the evaluation face image data supplied by the image data acquisition unit 10 and executes face detection processing on the evaluation face image data to thereby evaluate the face image data. The face area of a person is detected from the above. Next, the analysis area determination unit 32 normalizes the evaluation face area data detected by the face area detection unit 31 to a predetermined size (for example, 128 pixels in the horizontal direction × 128 pixels in the vertical direction). Next, the analysis region determination unit 32 extracts an analysis region from the normalized face region data, and determines two analysis partial regions (an upper analysis region and a lower analysis region) from this analysis region.

次に、ステップＳ２３において、画像特徴量計算部４０は、顔領域抽出部３０が抽出した解析領域データの画像特徴量を計算する。例えば、画像特徴量計算部４０は、解析領域決定部３２が決定した解析領域における上部解析領域および下部解析領域それぞれのデータについて、学習処理におけるステップＳ４（図９参照）の処理と同様に、ヒストグラムを計算する。次に、画像特徴量計算部４０は、上部解析領域および下部解析領域それぞれについてのヒストグラムを連結して解析領域全体のヒストグラム、つまり、解析領域全体のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを生成する。次に、画像特徴量計算部４０は、解析領域全体のＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを、画像特徴量として顔表情評価部７０に供給する。 Next, in step S23, the image feature amount calculation unit 40 calculates the image feature amount of the analysis region data extracted by the face region extraction unit 30. For example, the image feature amount calculation unit 40 uses the histogram for the data of the upper analysis region and the lower analysis region in the analysis region determined by the analysis region determination unit 32 as in the process of step S4 (see FIG. 9) in the learning process. Calculate Next, the image feature quantity calculation unit 40 generates a histogram of the entire analysis region, that is, Bag-of-Keypoints of the entire analysis region by connecting the histograms for the upper analysis region and the lower analysis region. Next, the image feature amount calculation unit 40 supplies Bag-of-Keypoints of the entire analysis region to the facial expression evaluation unit 70 as an image feature amount.

次に、ステップＳ２４において、顔表情評価部７０は、画像特徴量計算部４０が供給する、評価顔画像データに対する画像特徴量を取り込む。次に、顔表情評価部７０は、回帰モデル記憶部６０から、顔表情種別ごとに回帰モデルのパラメータ値を読み込む。次に、顔表情評価部７０は、各回帰モデルに画像特徴量を適用して顔表情種別ごとに顔表情強度値を計算することによって顔表情強度値セットを生成する。 Next, in step S 24, the facial expression evaluation unit 70 takes in the image feature amount for the evaluation face image data supplied by the image feature amount calculation unit 40. Next, the facial expression evaluation unit 70 reads the regression model parameter values from the regression model storage unit 60 for each facial expression type. Next, the facial expression evaluation unit 70 generates a facial expression strength value set by calculating the facial expression strength value for each facial expression type by applying the image feature amount to each regression model.

図１１は、顔表情解析モードに設定されている顔表情解析装置１が評価映像データを取り込んで顔表情評価処理を繰り返し実行することによって顔表情評価部７０で得られる、一連の顔表情強度値セットの一例を示す図である。同図において、網掛けされた顔表情強度値は、各顔表情強度値セット、つまり各フレームにおける顔表情強度値の最大値（顔表情強度最大値）である。同図によれば、時刻（ｔ−３）から時刻（ｔ＋４）までの８フレームにおいて、時刻（ｔ−３），（ｔ−２），（ｔ−１），（ｔ＋１），（ｔ＋２），（ｔ＋３），（ｔ＋４）それぞれの顔表情強度値セットにおける顔表情強度最大値に対応する顔表情種別（代表顔表情種別）は、「悲しみ」である。これに対し、時刻ｔの顔表情強度値セットにおける顔表情強度最大値に対応する代表顔表情種別は、「幸せ」である。同図を参照して、時刻（ｔ−２）から時刻（ｔ＋３）までの６フレームを一区間とした場合の、平滑化処理部８０が実行する顔表情評価結果平滑化処理について説明する。 FIG. 11 shows a series of facial expression intensity values obtained by the facial expression evaluation unit 70 when the facial expression analysis apparatus 1 set in the facial expression analysis mode takes in the evaluation video data and repeatedly executes the facial expression evaluation process. It is a figure which shows an example of a set. In the figure, the shaded facial expression intensity value is a facial expression intensity value set, that is, the maximum facial expression intensity value (maximum facial expression intensity value) in each frame. According to the figure, in 8 frames from time (t-3) to time (t + 4), time (t-3), (t-2), (t-1), (t + 1), (t + 2), The facial expression type (representative facial expression type) corresponding to the maximum facial expression intensity value in each of the (t + 3) and (t + 4) facial expression intensity value sets is “sadness”. On the other hand, the representative facial expression type corresponding to the maximum facial expression intensity value in the facial expression intensity value set at time t is “happy”. The facial expression evaluation result smoothing process executed by the smoothing processing unit 80 when six frames from time (t−2) to time (t + 3) are defined as one section will be described with reference to FIG.

平滑化処理部８０において、顔表情強度値取得部８１は、顔表情評価部７０が供給する顔表情強度値セットを時系列に取り込んで内蔵するバッファに記憶させる。つまり、顔表情強度値取得部８１は、時刻（ｔ−２）から時刻（ｔ＋３）までの６フレーム分の顔表情強度値セットをバッファに記憶させる。 In the smoothing processing unit 80, the facial expression intensity value acquisition unit 81 takes the facial expression intensity value set supplied by the facial expression evaluation unit 70 in time series and stores it in a built-in buffer. That is, the facial expression intensity value acquisition unit 81 stores the facial expression intensity value set for six frames from time (t−2) to time (t + 3) in the buffer.

顔表情強度値取得部８１のバッファに６フレーム分の顔表情強度値セットが記憶されると、顔表情強度値平滑化処理部８２は、そのバッファに記憶された６フレーム分の顔表情強度値セットについて、顔表情種別ごとに当該区間内における顔表情強度値の平均値を計算する。 When the facial expression intensity value set for six frames is stored in the buffer of the facial expression intensity value acquisition unit 81, the facial expression intensity value smoothing processing unit 82 stores the facial expression intensity value for six frames stored in the buffer. For the set, the average value of the facial expression intensity values in the section is calculated for each facial expression type.

具体的に、顔表情強度値平滑化処理部８２は、当該区間内において、顔表情種別が「怒り」である６個の顔表情強度値の平均値｛（６．１＋２．４＋３．２＋３．５＋４．１＋５．２）／６｝を計算し、顔表情種別「怒り」に対する顔表情強度の平均値“４．１”を得る。また、顔表情強度値平滑化処理部８２は、当該区間内において、顔表情種別が「嫌悪」である６個の顔表情強度値の平均値｛（２．７＋３．３＋２．４＋２．１＋０．８＋０．１）／６｝を計算し、顔表情種別「嫌悪」に対する顔表情強度の平均値“１．９”を得る。また、顔表情強度値平滑化処理部８２は、当該区間内において、顔表情種別が「恐れ」である６個の顔表情強度値の平均値｛（８．９＋１１．１＋５．２＋７．８＋２．３＋１．７）／６｝を計算し、顔表情種別「恐れ」に対する顔表情強度の平均値“６．２”を得る。また、顔表情強度値平滑化処理部８２は、当該区間内において、顔表情種別が「幸せ」である６個の顔表情強度値の平均値｛（１８．８＋４８．３＋７８．２＋２５．５＋６０．２＋４０．１）／６｝を計算し、顔表情種別「幸せ」に対する顔表情強度の平均値“４５．２”を得る。また、顔表情強度値平滑化処理部８２は、当該区間内において、顔表情種別が「悲しみ」である６個の顔表情強度値の平均値｛（６８．３＋７０．１＋７２．３＋７４．５＋７２．２＋７４．５）／６｝を計算し、顔表情種別「悲しみ」に対する顔表情強度の平均値“７２．０”を得る。また、顔表情強度値平滑化処理部８２は、当該区間内において、顔表情種別が「驚き」である６個の顔表情強度値の平均値｛（１．８＋９．２＋１２．８＋６．５＋２．１＋４．４）／６｝を計算し、顔表情種別「驚き」に対する顔表情強度の平均値“６．１”を得る。 Specifically, the facial expression intensity value smoothing processing unit 82 calculates the average value of six facial expression intensity values whose facial expression type is “anger” {(6.1 + 2.4 + 3.2 + 3.5 + 4) in the section. .1 + 5.2) / 6} to obtain the average value “4.1” of the facial expression intensity for the facial expression type “anger”. In addition, the facial expression intensity value smoothing processing unit 82 averages {(2.7 + 3.3 + 2.4 + 2.1 + 0.8 + 0) of six facial expression intensity values whose facial expression type is “disgust” in the section. .1) / 6} to obtain an average value “1.9” of the facial expression intensity for the facial expression type “disgust”. In addition, the facial expression intensity value smoothing processing unit 82 calculates an average value of six facial expression intensity values whose facial expression type is “fear” {(8.9 + 11.1 + 5.2 + 7.8 + 2.3 + 1) in the section. .7) / 6} to obtain the average value “6.2” of the facial expression intensity for the facial expression type “fear”. Also, the facial expression intensity value smoothing processing unit 82 averages {(18.8 + 48.3 + 78.2 + 25.5 + 60.2 + 40) of the six facial expression intensity values whose facial expression type is “happy” in the section. .1) / 6} to obtain an average value “45.2” of the facial expression intensity for the facial expression type “happy”. In addition, the facial expression intensity value smoothing processing unit 82 averages six facial expression intensity values {(68.3 + 70.1 + 72.3 + 74.5 + 72.2 + 74) whose facial expression type is “sadness” in the section. .5) / 6} to obtain an average value “72.0” of the facial expression intensity for the facial expression type “sadness”. Further, the facial expression intensity value smoothing processing unit 82 averages {(1.8 + 9.2 + 12.8 + 6.5 + 2.1 + 4) of the six facial expression intensity values whose facial expression type is “surprise” in the section. .4) / 6} to obtain an average value “6.1” of the facial expression intensity for the facial expression type “surprise”.

そして、顔表情強度値平滑化処理部８２は、６種類の顔表情種別それぞれに対する顔表情強度値の平均値のうち最大の平均値である“７２．０”を、当該区間内の時刻ｔに対応する平滑化後の代表顔表情強度値として出力する。 Then, the facial expression intensity value smoothing processing unit 82 sets “72.0”, which is the maximum average value among the average values of the facial expression intensity values for each of the six types of facial expression types, at time t in the section. The corresponding representative facial expression intensity value after smoothing is output.

つまり、平滑化処理部８０は、下記の式（４）により、当該区間内の時刻ｔにおける代表顔表情強度値Ｉ_ｔを計算する。ただし、ｅは、顔表情種別（例えば、「怒り（Ａｎｇｅｒ）」、「嫌悪（Ｄｉｓｇｕｓｔ）」、「恐れ（Ｆｅａｒ）」、「幸せ（Ｈａｐｐｉｎｅｓｓ）」、「悲しみ（Ｓａｄｎｅｓｓ）」、および「驚き（Ｓｕｒｐｒｉｓｅ）」の６種類）を示す。Ｉ_ｅＴは、時刻Ｔ（（ｔ−ｍ）≦Ｔ≦（ｔ＋ｎ））における顔表情種別ｅに対応する顔表情強度値である。よって、時刻（ｔ−ｍ）から時刻（ｔ＋ｎ）まで顔表情強度値Ｉ_ｅＴをたし合わせたものが合計値である。また、図１１を適用した場合、ｍは“２”、ｎは“３”である。 In other words, the smoothing processing unit 80, by the following equation (4), to calculate a representative facial expression intensity values I _t at time t in the section. Where e is the facial expression type (eg, “Anger”, “Disgust”, “Fear”, “Happiness”, “Sadness”, and “Surprise ( "Surprise)"). I _eT is a facial expression intensity value corresponding to the facial expression type e at time T ((t−m) ≦ T ≦ (t + n)). Therefore, the sum of the facial expression intensity values _IeT from time (t−m) to time (t + n) is the total value. When FIG. 11 is applied, m is “2” and n is “3”.

また、顔表情種別平滑化処理部８３は、顔表情強度値平滑化処理部８２が求めた、顔表情種別ごとの顔表情強度値の平均値のうち最大の平均値“７２．０”に対応する顔表情種別「悲しみ」を、当該区間内の時刻ｔに対応する平滑化後の顔表情の分類結果として選出する。そして、顔表情種別平滑化処理部８３は、その顔表情の分類結果である「悲しみ」を示す顔表情種別情報を生成し、この顔表情種別情報を出力する。 The facial expression type smoothing processing unit 83 corresponds to the maximum average value “72.0” of the average facial expression intensity values for each facial expression type obtained by the facial expression intensity value smoothing processing unit 82. The facial expression type “sadness” to be selected is selected as the classification result of the smoothed facial expression corresponding to time t in the section. Then, the facial expression type smoothing processing unit 83 generates facial expression type information indicating “sadness”, which is the classification result of the facial expression, and outputs the facial expression type information.

つまり、平滑化処理部８０は、下記の式（５）により、当該区間内の時刻ｔにおける顔表情種別情報Ｅ_ｔを計算する。 That is, the smoothing processing unit 80 calculates the facial expression type information E _t at the time t in the section by the following equation (5).

このように、平滑化処理部８０は、一区間における複数の顔表情強度値セット、つまり、この区間における顔表情種別ごとの各顔表情強度値を信頼度として用いることによって、その区間における顔表情評価結果を平滑化する。よって、平滑化処理部８０は、当該区間内において、信頼性が高い顔表情強度値および顔表情種別情報を得ることができる。 In this way, the smoothing processing unit 80 uses the facial expression intensity value set for each facial expression type in this section as a reliability by using a plurality of facial expression intensity value sets in one section, that is, the facial expression in that section. Smooth the evaluation results. Therefore, the smoothing processing unit 80 can obtain highly reliable facial expression intensity values and facial expression type information within the section.

図１２は、平滑化処理部８０が実行する一区間分の顔表情評価結果平滑化処理の手順を示すフローチャートである。
ステップＳ４１において、顔表情強度値取得部８１は、顔表情評価部７０が供給する１フレーム分の顔表情強度値セットを取り込み、この顔表情強度値セットを内蔵するバッファに記憶させる。 FIG. 12 is a flowchart showing the procedure of the facial expression evaluation result smoothing process for one section executed by the smoothing processing unit 80.
In step S41, the facial expression intensity value acquisition unit 81 takes in the facial expression intensity value set for one frame supplied by the facial expression evaluation unit 70, and stores the facial expression intensity value set in a buffer that incorporates the facial expression intensity value set.

次に、ステップＳ４２において、顔表情強度値取得部８１は、バッファに一区間分の顔表情強度値セットを記憶した場合に（ステップＳ４２：ＹＥＳ）、ステップＳ４３の処理に移し、バッファに一区間分の顔表情強度値セットを記憶していない場合に（ステップＳ４２：ＮＯ）、ステップＳ４１の処理に戻す。 Next, in step S42, when the facial expression intensity value acquisition unit 81 stores the facial expression intensity value set for one section in the buffer (step S42: YES), the process moves to the process of step S43, and one section is stored in the buffer. If the facial expression intensity value set for minutes is not stored (step S42: NO), the process returns to step S41.

ステップＳ４３において、顔表情強度値平滑化処理部８２は、顔表情強度値取得部８１のバッファに記憶された６フレーム分の顔表情強度値セットについて、顔表情種別ごとに当該区間内における顔表情強度値の平均値を計算する。次に、顔表情強度値平滑化処理部８２は、顔表情種別ごとの顔表情強度値の平均値のうち最大の平均値を、当該区間に対応する平滑化後の代表顔表情強度値として出力する。 In step S43, the facial expression intensity value smoothing processing unit 82 sets the facial expression intensity value set for six frames stored in the buffer of the facial expression intensity value acquisition unit 81 for each facial expression type for each facial expression type. Calculate the average value of intensity values. Next, the facial expression intensity value smoothing processing unit 82 outputs the maximum average value of the facial expression intensity values for each facial expression type as the representative facial expression intensity value after smoothing corresponding to the section. To do.

次に、ステップＳ４４において、顔表情種別平滑化処理部８３は、顔表情強度値平滑化処理部８２が求めた、顔表情種別ごとの顔表情強度値の平均値における最大値に対応する顔表情種別を、当該区間内における各フレームに対応する平滑化後の顔表情の分類結果として選出する。次に、顔表情種別平滑化処理部８３は、その顔表情の分類結果を示す顔表情種別情報を生成し、この顔表情種別情報を出力する。 Next, in step S44, the facial expression type smoothing processing unit 83 determines the facial expression corresponding to the maximum value of the average facial expression intensity value for each facial expression type obtained by the facial expression intensity value smoothing processing unit 82. The type is selected as the classification result of the smoothed facial expression corresponding to each frame in the section. Next, the facial expression type smoothing processing unit 83 generates facial expression type information indicating the classification result of the facial expression, and outputs the facial expression type information.

図１３は、平滑化処理部８０が顔表情評価結果平滑化処理を行う前後それぞれの顔表情評価結果を模式的に示す図である。同図における上段のグラフは、平滑化処理部８０が顔表情評価結果平滑化処理を実行する前の顔表情評価結果を時系列に示したグラフである。この上段のグラフは、横軸を時間軸とし、縦軸を顔表情強度値セットにおける顔表情強度最大値としている。上段のグラフが示すように、平滑化処理部８０が顔表情評価結果平滑化処理を実行する前の顔表情強度最大値は、時間経過に対してばらつきを有している。 FIG. 13 is a diagram schematically illustrating the facial expression evaluation results before and after the smoothing processing unit 80 performs the facial expression evaluation result smoothing process. The upper graph in the figure is a graph showing the facial expression evaluation results in time series before the smoothing processing unit 80 executes the facial expression evaluation result smoothing process. In the upper graph, the horizontal axis is the time axis, and the vertical axis is the facial expression intensity maximum value in the facial expression intensity value set. As shown in the upper graph, the facial expression intensity maximum value before the smoothing processing unit 80 executes the facial expression evaluation result smoothing process varies with time.

また、上段のグラフの直下に示す△、○、および□記号（顔表情記号と呼ぶ）は、顔表情強度最大値に対応する代表顔表情種別が示す表情を表す記号であり、グラフの時間軸に対応付けて図示されている。ここでは、△は「幸せ」、○は「驚き」、□は「怒り」を示す記号である。上段のグラフ直下の一連の顔表情記号によれば、「幸せ」を示す顔表情の中に、突発的に「驚き」および「怒り」の顔表情が現出している。 In addition, Δ, ○, and □ symbols (referred to as facial expression symbols) shown immediately below the upper graph are symbols representing facial expressions represented by the representative facial expression types corresponding to the maximum facial expression intensity values, and the time axis of the graph Are shown in association with each other. Here, Δ is a symbol indicating “happy”, ○ is “surprise”, and □ is “angry”. According to a series of facial expression symbols immediately below the upper graph, suddenly “surprise” and “angry” facial expressions appear in the facial expression indicating “happiness”.

これに対して、図１３における下段のグラフは、平滑化処理部８０が顔表情評価結果平滑化処理を実行した後の顔表情評価結果を時系列に示したグラフである。この下段のグラフは、横軸を時間軸とし、縦軸を平滑化後の顔表情強度値としている。下段のグラフが示すように、平滑化処理部８０が顔表情評価結果平滑化処理を実行した後の顔表情強度値は、複数フレーム（同図では１０フレーム）ごと、つまり、時間Ｔ_１，Ｔ_２，Ｔ_３，Ｔ_４，・・・において、ばらつきがない顔表情強度値となっている。また、下段のグラフ直下の一連の顔表情記号によれば、突発的な顔表情が現出することなく、安定した顔表情分類の結果が示されている。 On the other hand, the lower graph in FIG. 13 is a graph showing the facial expression evaluation results in time series after the smoothing processing unit 80 executes the facial expression evaluation result smoothing process. In the lower graph, the horizontal axis is the time axis, and the vertical axis is the smoothed facial expression intensity value. As shown in the lower graph, the facial expression strength value after the smoothing processing unit 80 executes the facial expression evaluation result smoothing process is every plural frames (10 frames in the figure), that is, time T ₁ , T ₂ , T ₃ , T ₄ ,..., The facial expression intensity values have no variation. In addition, according to the series of facial expression symbols immediately below the lower graph, the results of stable facial expression classification are shown without sudden facial expressions appearing.

以上説明したとおり、平滑化処理部８０は、複数フレームを含む区間ごとに、これら複数フレーム分の顔表情強度値セットを用いて顔表情評価結果を平滑化することにより、顔表情評価結果におけるエラーを除去し、安定した顔表情強度値と顔表情の分類結果とを得ることができる。 As described above, the smoothing processing unit 80 smoothes the facial expression evaluation result for each section including a plurality of frames by using the facial expression intensity value set for the plurality of frames, thereby generating an error in the facial expression evaluation result. Can be obtained, and a stable facial expression strength value and a facial expression classification result can be obtained.

［第２の実施の形態］
第１実施形態では、平滑化処理部８０における顔表情強度値平滑化処理部８２が、一区間内の複数フレーム分の顔表情強度値セットについて、顔表情種別ごとに顔表情強度値の平均値（単純平均値）を計算した。これに対し、本発明の第２実施形態では、顔表情強度値平滑化処理部が、一区間内の複数フレーム分の顔表情強度値セットについて、顔表情種別ごとに顔表情強度値の加重平均値を計算する。第２実施形態における顔表情解析装置の構成は、第１実施形態における顔表情解析装置１の構成と同じであるため、図１および図２を参照することとし、第１実施形態と異なる機能についてのみ説明する。 [Second Embodiment]
In the first embodiment, the facial expression strength value smoothing processing unit 82 in the smoothing processing unit 80 sets the average value of facial expression strength values for each facial expression type for a set of facial expression strength values for a plurality of frames in one section. (Simple average value) was calculated. On the other hand, in the second embodiment of the present invention, the facial expression intensity value smoothing processing unit performs a weighted average of facial expression intensity values for each facial expression type for a set of facial expression intensity values for a plurality of frames in one section. Calculate the value. Since the configuration of the facial expression analysis apparatus in the second embodiment is the same as the configuration of the facial expression analysis apparatus 1 in the first embodiment, the functions different from those in the first embodiment will be described with reference to FIGS. Only explained.

第２実施形態において、顔表情強度値平滑化処理部８２は、顔表情強度値取得部８１がバッファに記憶させた複数フレーム分の顔表情強度値セットを含む区間ごとに、当該区間内の複数フレーム分の顔表情強度値セットについて、顔表情種別ごとにフレームの位置に応じて重み付けした顔表情強度値の合計値を計算する。そして、顔表情強度値平滑化処理部８２は、顔表情種別ごとの重み付けされた顔表情強度値の合計値に基づいて、顔表情種別ごとの平均値を計算する。つまり、この平均値は加重平均値である。各フレームに対する重み付けの係数（重み係数）は、一区間に含まれる一連のフレームにおいて単調に増加もしくは単調に減少、または単調に増加した後に単調に減少する値とする。例えば、図１１に示した一区間を例にすると、時刻（ｔ−２）から時刻ｔに近づくにしたがって大きくなる重み係数、また、時刻ｔから時刻（ｔ＋３）に近づくにしたがって小さくなる重み係数を、６フレームそれぞれに対する重み係数とする。具体的に、例えば、時刻（ｔ−２）に対して“０．０５”、時刻（ｔ−１）に対して“０．２”、時刻ｔに対して“０．４”、時刻（ｔ＋１）に対して“０．２”、時刻（ｔ＋２）に対して“０．１”、および時刻（ｔ＋３）に対して“０．０５”の重み係数とする。 In the second embodiment, the facial expression intensity value smoothing processing unit 82 performs a plurality of sections in the section including the facial expression intensity value sets for a plurality of frames stored in the buffer by the facial expression intensity value acquisition unit 81. For the facial expression intensity value set for the frame, the total value of the facial expression intensity values weighted according to the frame position is calculated for each facial expression type. Then, the facial expression intensity value smoothing processing unit 82 calculates an average value for each facial expression type based on the total weighted facial expression intensity value for each facial expression type. That is, this average value is a weighted average value. The weighting coefficient (weighting coefficient) for each frame is a value that monotonously increases or decreases monotonically in a series of frames included in one section, or monotonously decreases after increasing monotonously. For example, taking one section shown in FIG. 11 as an example, a weighting factor that increases as time t (from time -2) approaches time t, and a weighting factor that decreases as time t (from time t + 3) approaches time t. , The weighting coefficient for each of the six frames. Specifically, for example, “0.05” for time (t−2), “0.2” for time (t−1), “0.4” for time t, time (t + 1) ) For the time (t + 2), and “0.05” for the time (t + 3).

そして、顔表情強度値平滑化処理部８２は、顔表情種別ごとの顔表情強度値の加重平均値のうち最大の加重平均値、言い換えると、顔表情種別ごとの顔表情強度値の重み付けされた合計値のうち最大の合計値から求まる平均値を、当該区間内の時刻ｔに対応する平滑化後の代表顔表情強度値として出力する。 Then, the facial expression intensity value smoothing processing unit 82 weights the maximum weighted average value among the weighted average values of the facial expression intensity values for each facial expression type, in other words, the facial expression intensity values for each facial expression type. An average value obtained from the maximum total value among the total values is output as a smoothed representative facial expression intensity value corresponding to time t in the section.

このように構成することにより、顔表情強度値平滑化処理部８２は、一区間における所定のフレームおよびこのフレームに近いフレームに大きな信頼度をもたせて顔表情強度値を平滑化することができる。 By configuring in this way, the facial expression intensity value smoothing processing unit 82 can smooth the facial expression intensity value by giving a high reliability to a predetermined frame in one section and a frame close to this frame.

また、第２実施形態において、顔表情種別平滑化処理部８３は、顔表情強度値取得部８１がバッファに記憶させた複数フレーム分の顔表情強度値セットを含む区間ごとに、顔表情強度値平滑化処理部８２が求めた加重平均値における最大値に対応する顔表情種別を、当該区間内における各フレームに対応する平滑化後の顔表情の分類結果として選出する。そして、顔表情種別平滑化処理部８３は、その顔表情の分類結果を示す顔表情種別情報を生成し、この顔表情種別情報を出力する。 Further, in the second embodiment, the facial expression type smoothing processing unit 83 performs the facial expression intensity value for each section including the facial expression intensity value set for a plurality of frames stored in the buffer by the facial expression intensity value acquisition unit 81. The facial expression type corresponding to the maximum value in the weighted average value obtained by the smoothing processing unit 82 is selected as a classification result of the smoothed facial expression corresponding to each frame in the section. The facial expression type smoothing processing unit 83 generates facial expression type information indicating the classification result of the facial expression, and outputs the facial expression type information.

［第３の実施の形態］
第１実施形態および第２実施形態では、平滑化処理部８０における顔表情強度値平滑化処理部８２が、一区間内の複数フレーム分の顔表情強度値セットについて、顔表情種別ごとに顔表情強度値の平均値（単純平均値、加重平均値）を計算し、最大の平均値を代表顔表情強度値とした。これに対し、本発明の第３実施形態では、顔表情強度値平滑化処理部が、区間ごとに、当該区間内の複数フレーム分の顔表情強度値セットにおける顔表情強度最大値の個数を代表顔表情種別ごとに計数することに基づいて、代表顔表情強度値を得る。第３実施形態における顔表情解析装置の構成は、第１実施形態における顔表情解析装置１の構成と同じであるため、図１および図２を参照することとし、第１実施形態と異なる機能についてのみ説明する。 [Third Embodiment]
In the first embodiment and the second embodiment, the facial expression strength value smoothing processing unit 82 in the smoothing processing unit 80 sets the facial expression for each facial expression type for the facial expression strength value set for a plurality of frames in one section. The average value of intensity values (simple average value, weighted average value) was calculated, and the maximum average value was used as the representative facial expression intensity value. On the other hand, in the third embodiment of the present invention, the facial expression intensity value smoothing processing unit represents the number of facial expression intensity maximum values in the facial expression intensity value set for a plurality of frames in the section for each section. Based on counting for each facial expression type, a representative facial expression intensity value is obtained. Since the configuration of the facial expression analysis apparatus in the third embodiment is the same as the configuration of the facial expression analysis apparatus 1 in the first embodiment, the functions different from those in the first embodiment will be described with reference to FIGS. Only explained.

第３実施形態において、顔表情強度値平滑化処理部８２は、バッファに記憶された一区間分の顔表情強度値を参照し、代表顔表情種別ごとに、当該代表顔表情種別の顔表情強度値がフレーム内において最大値（顔表情強度最大値）となる場合のフレームの個数を、当該区間にわたって計数する。そして、顔表情強度値平滑化処理部８２は、その計数結果をその代表顔表情種別における合計値とする。そして、顔表情強度値平滑化処理部８２は、代表顔表情種別ごとの合計値のうち、最大の合計値に対応する代表顔表情種別についての区間内の顔表情強度最大値の単純平均値を求め、この単純平均値を当該区間における代表顔表情強度値とする。 In the third embodiment, the facial expression intensity value smoothing processing unit 82 refers to the facial expression intensity value for one section stored in the buffer, and for each representative facial expression type, the facial expression intensity of the representative facial expression type. The number of frames when the value becomes the maximum value (maximum facial expression intensity value) in the frame is counted over the relevant section. Then, the facial expression intensity value smoothing processing unit 82 sets the count result as the total value for the representative facial expression type. Then, the facial expression intensity value smoothing processing unit 82 calculates a simple average value of the facial expression intensity maximum values in the section for the representative facial expression type corresponding to the maximum total value among the total values for each representative facial expression type. This simple average value is used as the representative facial expression intensity value in the section.

また、顔表情種別平滑化処理部８３は、顔表情強度値平滑化処理部８２が求めた最大の合計値に対応する顔表情種別を、当該区間に対応する平滑化後の顔表情の分類結果として選出する。 In addition, the facial expression type smoothing processing unit 83 selects the facial expression type corresponding to the maximum total value obtained by the facial expression intensity value smoothing processing unit 82 as the classification result of the smoothed facial expression corresponding to the section. Elected as.

具体的に、図１１に示した一区間分の顔表情強度値を例とすると、顔表情強度値平滑化処理部８２は、当該区間内の６フレーム分の顔表情強度値における顔表情強度最大値（網掛けされた数値）に対応する顔表情種別（代表顔表情種別）として、「幸せ」および「悲しみ」を抽出する。そして、顔表情強度値平滑化処理部８２は、抽出した代表顔表情種別それぞれについて顔表情強度最大値の個数を計数し、代表顔表情種別「幸せ」に対して合計値“１”、代表顔表情種別「悲しみ」に対して合計値“５”を得る。そして、顔表情強度値平滑化処理部８２は、合計値のうち最大の合計値“５”に対応する代表顔表情種別「悲しみ」についての当該区間内の顔表情強度最大値の平均値“（６８．３＋７０．１＋７２．３＋７４．５＋７２．２＋７４．５）／６＝７２．０”を求め、この平均値“７２．０”を当該区間における代表顔表情強度値とする。また、顔表情種別平滑化処理部８３は、顔表情強度値平滑化処理部８２が求めた最大の合計値“５”に対応する顔表情種別「悲しみ」を、当該区間に対応する平滑化後の顔表情の分類結果として選出する。 Specifically, taking the facial expression intensity value for one section shown in FIG. 11 as an example, the facial expression intensity value smoothing processing unit 82 has the maximum facial expression intensity in the facial expression intensity values for six frames in the section. “Happy” and “sadness” are extracted as the facial expression types (representative facial expression types) corresponding to the values (shaded numerical values). Then, the facial expression intensity value smoothing processing unit 82 counts the number of maximum facial expression intensity values for each of the extracted representative facial expression types, and adds the total value “1” to the representative facial expression type “happy”. A total value “5” is obtained for the expression type “sadness”. Then, the facial expression intensity value smoothing processing unit 82 calculates the average value of facial expression intensity maximum values in the section for the representative facial expression type “sadness” corresponding to the maximum total value “5” of the total values “( 68.3 + 70.1 + 72.3 + 74.5 + 72.2 + 74.5) /6=72.0 ”, and this average value“ 72.0 ”is set as the representative facial expression intensity value in the section. Further, the facial expression type smoothing processing unit 83 sets the facial expression type “sadness” corresponding to the maximum total value “5” obtained by the facial expression intensity value smoothing processing unit 82 to the post-smoothing corresponding to the section. Selected as a result of classification of facial expressions.

［第４の実施の形態］
第３実施形態では、平滑化処理部８０における顔表情強度値平滑化処理部８２が、一区間内の顔表情強度値セットにおける顔表情強度最大値の個数を、代表顔表情種別ごとに計数することによって代表顔表情強度値を得た。これに対し、本発明の第４実施形態では、顔表情強度値平滑化処理部が、一区間内の顔表情強度値セットにおける顔表情強度最大値の個数を、フレームの位置に応じた重み付けをして代表顔表情種別ごとに計数することによって代表顔表情強度値を得る。第４実施形態における顔表情解析装置の構成は、第１実施形態における顔表情解析装置１の構成と同じであるため、図１および図２を参照することとし、第１実施形態と異なる機能についてのみ説明する。 [Fourth Embodiment]
In the third embodiment, the facial expression intensity value smoothing processing unit 82 in the smoothing processing unit 80 counts the number of facial expression intensity maximum values in the facial expression intensity value set in one section for each representative facial expression type. The representative facial expression intensity value was obtained. In contrast, in the fourth embodiment of the present invention, the facial expression intensity value smoothing processing unit weights the number of facial expression intensity maximum values in the facial expression intensity value set in one section according to the position of the frame. The representative facial expression intensity value is obtained by counting for each representative facial expression type. Since the configuration of the facial expression analysis apparatus in the fourth embodiment is the same as the configuration of the facial expression analysis apparatus 1 in the first embodiment, the functions different from those in the first embodiment will be described with reference to FIGS. Only explained.

第４実施形態において、顔表情強度値平滑化処理部８２は、顔表情強度値取得部８１がバッファに記憶させた複数フレーム分の顔表情強度値セットを含む区間ごとに、当該区間内の複数フレーム分の顔表情強度値セットにおける顔表情強度最大値の個数にフレームの位置に応じた重み付けをし、重み付けされた個数を顔表情種別ごとに計数する。各フレームの位置に応じた重み付けの値（重み）は、一区間に含まれる一連のフレームにおいて単調に増加もしくは単調に減少、または単調に増加した後に単調に減少する値とする。例えば、図１１に示した一区間を例にすると、時刻（ｔ−２）から時刻ｔに近づくにしたがって大きくなる重み、また、時刻ｔから時刻（ｔ＋３）に近づくにしたがって小さくなる重みを、６フレームそれぞれに対する重みとする。具体的に、例えば、時刻（ｔ−２）に対して“１”、時刻（ｔ−１）に対して“２”、時刻ｔに対して“４”、時刻（ｔ＋１）に対して“３”、時刻（ｔ＋２）に対して“２”、および時刻（ｔ＋３）に対して“１”の重みとする。そして、顔表情強度値平滑化処理部８２は、個数の合計値のうち最大の合計値を代表顔表情強度値とする。 In the fourth embodiment, the facial expression intensity value smoothing processing unit 82 performs a plurality of sections in the section including the facial expression intensity value sets for a plurality of frames stored in the buffer by the facial expression intensity value acquisition unit 81. The number of facial expression intensity maximum values in the frame facial expression intensity value set is weighted according to the position of the frame, and the weighted number is counted for each facial expression type. The weighting value (weight) according to the position of each frame is a value that monotonously increases or monotonically decreases in a series of frames included in one section, or monotonously decreases after increasing monotonously. For example, taking one section shown in FIG. 11 as an example, a weight that increases from time (t−2) to time t and a weight that decreases from time t to time (t + 3) The weight for each frame. Specifically, for example, “1” for time (t−2), “2” for time (t−1), “4” for time t, and “3” for time (t + 1). ”,“ 2 ”for time (t + 2), and“ 1 ”for time (t + 3). Then, the facial expression intensity value smoothing processing unit 82 sets the maximum total value among the total number of the numbers as the representative facial expression intensity value.

具体的に、上記の重みを例とし、また、図１１に示した一区間分の顔表情強度値を例として説明する。顔表情強度値平滑化処理部８２は、当該区間内の６フレーム分の顔表情強度値における顔表情強度最大値に対応する代表顔表情種別として、「幸せ」および「悲しみ」を抽出する。そして、顔表情強度値平滑化処理部８２は、抽出した代表顔表情種別それぞれについて、フレームの位置に応じた重み付けをして、顔表情強度最大値の個数を計数する。つまり、顔表情強度値平滑化処理部８２は、代表顔表情種別「幸せ」に対して合計値“１×４＝４”を得る。また、顔表情強度値平滑化処理部８２は、代表顔表情種別「悲しみ」に対して合計値“１×１＋１×２＋１×３＋１×２＋１×１＝９”を得る。そして、顔表情強度値平滑化処理部８２は、合計値のうち最大の合計値“９”に対応する代表顔表情種別「悲しみ」についての当該区間内の顔表情強度値の加重平均値“（１×６８．３＋２×７０．１＋４×７２．３＋３×７４．５＋２×７２．２＋１×７４．５）／１３＝７２．３”を求め、この加重平均値“７２．３”を当該区間における代表顔表情強度値とする。また、顔表情種別平滑化処理部８３は、顔表情強度値平滑化処理部８２が求めた最大の合計値“９”に対応する顔表情種別「悲しみ」を、当該区間に対応する平滑化後の顔表情の分類結果として選出する。 Specifically, the above weights are taken as an example, and the facial expression intensity values for one section shown in FIG. 11 are taken as an example. The facial expression intensity value smoothing processing unit 82 extracts “happy” and “sadness” as the representative facial expression types corresponding to the maximum facial expression intensity values in the facial expression intensity values for six frames in the section. Then, the facial expression intensity value smoothing processing unit 82 weights each extracted representative facial expression type according to the position of the frame, and counts the number of maximum facial expression intensity values. That is, the facial expression intensity value smoothing processing unit 82 obtains a total value “1 × 4 = 4” for the representative facial expression type “happy”. Further, the facial expression intensity value smoothing processing unit 82 obtains a total value “1 × 1 + 1 × 2 + 1 × 3 + 1 × 2 + 1 × 1 = 9” for the representative facial expression type “sadness”. The facial expression intensity value smoothing processing unit 82 then calculates a weighted average value of facial expression intensity values in the section for the representative facial expression type “sadness” corresponding to the maximum total value “9” among the total values “( 1 × 68.3 + 2 × 70.1 + 4 × 72.3 + 3 × 74.5 + 2 × 72.2 + 1 × 74.5) /13=72.3 ”, and this weighted average value“ 72.3 ”is represented in the section. The facial expression intensity value is used. Further, the facial expression type smoothing processing unit 83 converts the facial expression type “sadness” corresponding to the maximum total value “9” obtained by the facial expression intensity value smoothing processing unit 82 into the post-smoothing corresponding to the section. Selected as a result of classification of facial expressions.

［第５の実施の形態］
第１実施形態では、平滑化処理部８０が、区間ごとに平滑化した顔表情強度値および顔表情種別情報を取得した。これに対し、本発明の第５実施形態では、上記の区間を時間方向にずらしながら顔表情強度値および顔表情種別情報を得る。第５実施形態における顔表情解析装置の構成は、第１実施形態における顔表情解析装置１の構成と同じであるため、図１および図２を参照することとし、第１実施形態と異なる機能についてのみ説明する。 [Fifth Embodiment]
In the first embodiment, the smoothing processing unit 80 acquires the facial expression intensity value and facial expression type information smoothed for each section. On the other hand, in the fifth embodiment of the present invention, the facial expression intensity value and the facial expression type information are obtained while shifting the above-described section in the time direction. Since the configuration of the facial expression analysis apparatus in the fifth embodiment is the same as the configuration of the facial expression analysis apparatus 1 in the first embodiment, refer to FIG. 1 and FIG. Only explained.

第５実施形態において、顔表情強度値平滑化処理部８２は、一区間に含まれる複数フレームよりも少ないフレーム数をシフト量（ずらし量）とし、当該区間をそのシフト量ずつ時間方向にずらす。例えば、顔表情強度値平滑化処理部８２は、シフト量を１フレームとし、区間を１フレームずつ時間方向にずらす。顔表情強度値平滑化処理部８２は、ずらした区間ごとに、当該区間内の顔表情強度値セットについて、顔表情種別ごとに顔表情強度値の平均値を計算する。そして、顔表情強度値平滑化処理部８２は、顔表情種別ごとの顔表情強度値の平均値のうち最大の平均値を、当該区間に対応する平滑化後の代表顔表情強度値として出力する。 In the fifth embodiment, the facial expression intensity value smoothing processing unit 82 sets the number of frames smaller than a plurality of frames included in one section as a shift amount (shift amount), and shifts the section in the time direction by the shift amount. For example, the facial expression intensity value smoothing processing unit 82 sets the shift amount to one frame, and shifts the section in the time direction by one frame. The facial expression intensity value smoothing processing unit 82 calculates an average value of facial expression intensity values for each facial expression type with respect to the facial expression intensity value set in the section for each shifted section. Then, the facial expression intensity value smoothing processing unit 82 outputs the maximum average value of the facial expression intensity values for each facial expression type as the smoothed representative facial expression intensity value corresponding to the section. .

また、第５実施形態において、顔表情種別平滑化処理部８３は、シフト量だけずらされた区間において、顔表情強度値平滑化処理部８２が求めた平滑化後の顔表情強度値に対応する顔表情種別を、当該区間内における各フレームに対応する平滑化後の顔表情の分類結果として選出する。そして、顔表情種別平滑化処理部８３は、その顔表情の分類結果を示す顔表情種別情報を生成し、この顔表情種別情報を出力する。 In the fifth embodiment, the facial expression type smoothing processing unit 83 corresponds to the smoothed facial expression intensity value obtained by the facial expression intensity value smoothing processing unit 82 in the section shifted by the shift amount. The facial expression type is selected as the classification result of the smoothed facial expression corresponding to each frame in the section. The facial expression type smoothing processing unit 83 generates facial expression type information indicating the classification result of the facial expression, and outputs the facial expression type information.

なお、顔表情強度値平滑化処理部８２が区間をシフト量だけ時間方向にずらす処理は、第２実施形態から第４実施形態いずれにおいても適用できる。 Note that the processing of shifting the section in the time direction by the shift amount by the facial expression intensity value smoothing processing unit 82 can be applied to any of the second to fourth embodiments.

図１４は、平滑化処理部８０における区間の移動を説明するための図である。同図における各グラフは、横軸を時間軸とし、縦軸を顔表情強度値としている。時刻ｔ_１、時刻ｔ_２、および時刻ｔ_３は、連続するフレームに対する時刻である。つまり、時刻ｔ_２は、時刻ｔ_１におけるフレームの次フレームに対応する時刻、時刻ｔ_３は、時刻ｔ_２におけるフレームの次フレームに対応する時刻である。また、時間（ｔ_ｐ＋ｔ_ｆ）は、一区間である。よって、同図における上段、中段、および下段のグラフは、１フレームをシフト量とし、時刻（ｔ_１−ｔ_ｐ）から時刻（ｔ_１＋ｔ_ｆ）までの区間を順次時間方向にずらした様子を示している。 FIG. 14 is a diagram for explaining the movement of the section in the smoothing processing unit 80. In each graph in the figure, the horizontal axis is the time axis, and the vertical axis is the facial expression intensity value. Time t ₁ , time t ₂ , and time t ₃ are times for successive frames. That is, time t ₂ is a time corresponding to the next frame of the frame at time t ₁ , and time t ₃ is a time corresponding to the next frame of the frame at time t ₂ . The time (t _p + t _f ) is one section. Therefore, the upper in the figure, middle, and lower stage of the graph, one frame as a shift amount, time, (t 1 _-t _p) from the time _(t 1 ₊ t _f) how shifted sequentially time direction section up Show.

第５実施形態によれば、平滑化処理部８０は、一区間においてばらつきを抑えて信頼度を向上させた顔表情強度値を、時間方向のシフト量ごとに出力することができる。 According to the fifth embodiment, the smoothing processing unit 80 can output a facial expression intensity value with improved reliability by suppressing variation in one section for each shift amount in the time direction.

［その他の実施の形態］
上述した本発明の第１実施形態から第４実施形態における画像特徴量計算部４０は、画像特徴量としてＢａｇ−ｏｆ−Ｋｅｙｐｏｉｎｔｓを求める他に、例えば、ローカルバイナリパターン（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎｓ；ＬＢＰ）、または拡張ローカルバイナリパターン（拡張ＬＢＰ）を求めてもよい。 [Other embodiments]
In addition to obtaining Bag-of-Keypoints as the image feature amount, the image feature amount calculation unit 40 in the first to fourth embodiments of the present invention described above includes, for example, a local binary pattern (Local Binary Patterns; LBP), Alternatively, an extended local binary pattern (extended LBP) may be obtained.

ローカルバイナリパターンは、画像特徴量計算部４０が、解析領域において走査し選択する注目画素と、この注目画素の周辺画素（例えば、８個の隣接画素）とをそれぞれ比較し大小関係を二値化することによって得られるバイナリパターンを画像特徴量とするものである。画像特徴量計算部４０は、解析領域に含まれる各画素を注目画素として順次走査してもよいし、所定数の画素間隔で離散的に走査してもよい。 The local binary pattern is binarized by comparing the target pixel scanned and selected in the analysis region by the image feature amount calculation unit 40 and peripheral pixels (for example, eight adjacent pixels) of the target pixel. The binary pattern obtained by doing this is used as the image feature amount. The image feature amount calculation unit 40 may sequentially scan each pixel included in the analysis region as a target pixel, or may scan discretely at a predetermined number of pixel intervals.

ローカルバイナリパターンについては、例えば、Timo Ojala, Matti Pietikainen, Senior Member, IEEE and Topi Maenpaa, "Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, July 2002に、詳細が開示されている。 For local binary patterns, see, for example, Timo Ojala, Matti Pietikainen, Senior Member, IEEE and Topi Maenpaa, "Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24 , no. 7, July 2002, details are disclosed.

具体的に、顔領域抽出部３０の解析領域決定部３２は、正規化顔領域データを格子状に分割（例えば、水平方向および垂直方向それぞれに８分割）する。つまり、解析領域決定部３２は、正規化顔領域データの各分割ブロックデータを解析領域データとする。画像特徴量計算部４０は、各分割ブロックデータについて、例えば画素ごとにＬＢＰを計算する。そして、画像特徴量計算部４０は、全てのＬＢＰのパターンをビン、各パターンの出力回数を頻度とするヒストグラムを生成する。そして、画像特徴量計算部４０は、各分割ブロックのヒストグラムを連結した連結ヒストグラムを顔画像特徴量とする。 Specifically, the analysis region determination unit 32 of the face region extraction unit 30 divides the normalized face region data into a grid (for example, eight divisions in the horizontal direction and the vertical direction). That is, the analysis area determination unit 32 sets each divided block data of the normalized face area data as analysis area data. The image feature quantity calculation unit 40 calculates LBP for each divided block data, for example, for each pixel. Then, the image feature amount calculation unit 40 generates a histogram in which all LBP patterns are bins and the frequency of output of each pattern is a frequency. Then, the image feature amount calculation unit 40 uses a connected histogram obtained by connecting the histograms of the respective divided blocks as the face image feature amount.

また、拡張ローカルバイナリパターンは、上記のローカルバイナリパターンを時系列方向に拡張して得られるバイナリパターンを特徴量とするものである。つまり、拡張ローカルバイナリパターンは、顔表情解析装置１が評価映像データを取り込む場合に有用な特徴量である。画像特徴量計算部４０は、評価映像データに含まれる現在評価顔画像データの特徴量をローカルバイナリパターンとして求める際に、現在評価顔画像データとこの現在評価顔画像データよりも過去および未来の評価顔画像データとの画素の比較結果もバイナリパターンに含める。 Further, the extended local binary pattern is a binary pattern obtained by extending the local binary pattern in the time series direction as a feature amount. That is, the extended local binary pattern is a feature quantity useful when the facial expression analysis apparatus 1 captures evaluation video data. The image feature amount calculation unit 40, when obtaining the feature amount of the current evaluation face image data included in the evaluation video data as a local binary pattern, evaluates past and future evaluations from the current evaluation face image data and the current evaluation face image data. The result of pixel comparison with face image data is also included in the binary pattern.

拡張ＬＢＰについては、例えば、Guoying Zhao, Matti Pietikainen, "Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions", IEEE Transactions on Patterns Analysis and Machine Intelligence, vol. 29, no. 6, June 2007に、詳細が開示されている。 Regarding extended LBP, for example, Guoying Zhao, Matti Pietikainen, "Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions", IEEE Transactions on Patterns Analysis and Machine Intelligence, vol. 29, no. 6, June 2007, Details are disclosed.

また、回帰分析部５０に線形回帰モデルまたはサポートベクトル回帰モデルのいずれかを適用してパラメータ値を求めた場合、顔表情評価部７０から出力される顔表情強度値が、下限値（例えば“０（ゼロ）”）から上限値（例えば“１００”）までの範囲内に収まらない場合がある。そこで、回帰分析部５０に線形回帰モデルまたはサポートベクトル回帰モデルのいずれかを適用する場合、顔表情評価部７０は、求めた顔表情強度値が“０（ゼロ）”未満であるときは“０（ゼロ）”、“１００”を超えるときは“１００”として、顔表情強度値を出力してもよい。 When the parameter value is obtained by applying either the linear regression model or the support vector regression model to the regression analysis unit 50, the facial expression intensity value output from the facial expression evaluation unit 70 is a lower limit value (for example, “0”). (Zero) ") to the upper limit (for example," 100 ") may not fit. Therefore, when applying either the linear regression model or the support vector regression model to the regression analysis unit 50, the facial expression evaluation unit 70 determines that the calculated facial expression intensity value is less than “0 (zero)”. When the value exceeds (zero) ”or“ 100 ”, the facial expression strength value may be output as“ 100 ”.

または、回帰分析部５０に線形回帰モデルまたはサポートベクトル回帰モデルのいずれかを適用した顔表情解析装置１に、各評価顔画像データについて顔認識処理を実行して人物を識別する顔認識処理部をさらに備えてもよい。この場合、顔表示解析装置１が顔表情解析モードに設定された場合、所定期間において顔認識処理部が認識した人物ごとに、顔表情評価部７０に、顔表情強度値の最大値ｉｎｔ_ｍａｘと最小値ｉｎｔ_ｍｉｎとを用いて、下記の式（６）によって顔表情強度値ｉｎｔを０から１００までの範囲内の値ｉｎｔ’に正規化してもよい。 Alternatively, the facial expression analysis apparatus 1 in which either the linear regression model or the support vector regression model is applied to the regression analysis unit 50 includes a face recognition processing unit that performs face recognition processing on each evaluation face image data and identifies a person. Further, it may be provided. In this case, when the face display analysis device 1 is set to the facial expression analysis mode, the facial expression evaluation unit 70 determines the maximum facial expression intensity value int _max for each person recognized by the facial recognition processing unit in a predetermined period. Using the minimum value int _min , the facial expression intensity value int may be normalized to a value int ′ within a range from 0 to 100 by the following equation (6).

また、第１実施形態から第４実施形態では、回帰分析部５０が実行する回帰分析処理として、線形回帰分析処理、ロジスティック回帰分析処理、およびサポートベクトル回帰分析処理を示した。回帰分析部５０が実行する回帰分析処理は、これらの例に限定されることなく、他の回帰分析処理も適用できる。例えば、回帰分析部５０は、ニューラルネットワークによる学習処理を回帰分析処理に適用してもよい。 In the first to fourth embodiments, linear regression analysis processing, logistic regression analysis processing, and support vector regression analysis processing are shown as the regression analysis processing executed by the regression analysis unit 50. The regression analysis process executed by the regression analysis unit 50 is not limited to these examples, and other regression analysis processes can also be applied. For example, the regression analysis unit 50 may apply a learning process using a neural network to the regression analysis process.

また、第１実施形態から第５実施形態において、顔表情解析装置１を、顔表情解析モードのみで動作する装置としてもよい。具体的に、顔表情解析装置１から、顔表情強度教師値取得部２０と、回帰分析部５０と、モード切替部９０とを削除し、顔表情解析処処理のみを実行する装置としてもよい。この場合、回帰モデル記憶部６０には、顔表情種別ごとに最適化されたパラメータ値があらかじめ記憶される。 Further, in the first to fifth embodiments, the facial expression analysis apparatus 1 may be an apparatus that operates only in the facial expression analysis mode. Specifically, the facial expression intensity teacher value acquisition unit 20, the regression analysis unit 50, and the mode switching unit 90 may be deleted from the facial expression analysis apparatus 1, and only the facial expression analysis processing may be performed. In this case, the regression model storage unit 60 stores parameter values optimized for each facial expression type in advance.

また、上述した各実施形態における顔表情解析装置１の一部の機能、例えば、平滑化処理部８０の機能をコンピュータで実現するようにしてもよい。この場合、その機能を実現するための顔表情評価結果平滑化プログラムをコンピュータ読み取り可能な記録媒体に記録し、この記録媒体に記録された顔表情評価結果平滑化プログラムをコンピュータシステムに読み込ませて、このコンピュータシステムが実行することによって実現してもよい。なお、このコンピュータシステムとは、オペレーティング・システム（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ；ＯＳ）や周辺装置のハードウェアを含むものである。また、コンピュータ読み取り可能な記録媒体とは、フレキシブルディスク、光磁気ディスク、光ディスク、メモリカード等の可搬型記録媒体、コンピュータシステムに備えられる磁気ハードディスクやソリッドステートドライブ等の記憶装置のことをいう。さらに、コンピュータ読み取り可能な記録媒体とは、インターネット等のコンピュータネットワーク、および電話回線や携帯電話網を介してプログラムを送信する場合の通信回線のように、短時間の間、動的にプログラムを保持するもの、さらには、その場合のサーバ装置やクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持するものを含んでもよい。また上記の顔表情評価結果平滑化プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせにより実現するものであってもよい。 Moreover, you may make it implement | achieve the one part function of the facial expression analysis apparatus 1 in each embodiment mentioned above, for example, the function of the smoothing process part 80, with a computer. In this case, a facial expression evaluation result smoothing program for realizing the function is recorded on a computer-readable recording medium, and the facial expression evaluation result smoothing program recorded on the recording medium is read into a computer system, You may implement | achieve by performing this computer system. This computer system includes an operating system (OS) and hardware of peripheral devices. The computer-readable recording medium is a portable recording medium such as a flexible disk, a magneto-optical disk, an optical disk, or a memory card, and a storage device such as a magnetic hard disk or a solid state drive provided in the computer system. Furthermore, a computer-readable recording medium dynamically holds a program for a short time, such as a computer network such as the Internet, and a communication line when transmitting a program via a telephone line or a cellular phone network. In addition, a server that holds a program for a certain period of time, such as a volatile memory inside a computer system serving as a server device or a client in that case, may be included. Further, the facial expression evaluation result smoothing program may be for realizing a part of the above-described functions, and further, the above-described functions are realized by a combination with a program already recorded in a computer system. You may do.

以上、本発明の実施の形態について図面を参照して詳述したが、具体的な構成はその実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計等も含まれる。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the specific structure is not restricted to that embodiment, The design of the range which does not deviate from the summary of this invention, etc. are included.

１顔表情解析装置
１０画像データ取得部
２０顔表情強度教師値取得部
３０顔領域抽出部
３１顔領域検出部
３２解析領域決定部
４０画像特徴量計算部
５０回帰分析部
６０回帰モデル記憶部
７０顔表情評価部
８０平滑化処理部（顔表情評価結果平滑化装置）
８１顔表情強度値取得部
８２顔表情強度値平滑化処理部
８３顔表情種別平滑化処理部
９０モード切替部 DESCRIPTION OF SYMBOLS 1 Face expression analyzer 10 Image data acquisition part 20 Face expression intensity teacher value acquisition part 30 Face area extraction part 31 Face area detection part 32 Analysis area determination part 40 Image feature-value calculation part 50 Regression analysis part 60 Regression model memory | storage part 70 Face Facial expression evaluation unit 80 Smoothing processing unit (Facial expression evaluation result smoothing device)
81 Facial expression intensity value acquisition unit 82 Facial expression intensity value smoothing processing unit 83 Facial expression type smoothing processing unit 90 Mode switching unit

Claims

A facial expression intensity value acquisition unit that captures a plurality of facial expression intensity values obtained for each facial expression type based on a facial image for each frame;
Refer to the facial expression intensity values for a plurality of frames captured by the facial expression intensity value acquisition unit, and correspond to the plurality of frames based on a total value based on the facial expression intensity values of the plurality of frames for each facial expression type. A facial expression intensity value smoothing processing unit for calculating a representative facial expression intensity value;
A facial expression evaluation result smoothing apparatus comprising:

The total value is the total value of the facial expression intensity values for the plurality of frames for each facial expression type.
The facial expression evaluation result smoothing apparatus according to claim 1.

The total value is a total value obtained by counting the number of maximum facial expression intensity values in each of the plurality of frames for each facial expression type,
The facial expression evaluation result smoothing apparatus according to claim 1.

The facial expression intensity value smoothing processing unit calculates the total value by weighting according to the position of each frame in the plurality of frames.
4. The facial expression evaluation result smoothing apparatus according to claim 2, wherein the facial expression evaluation result is smooth.

A face selected from the facial expression types corresponding to the maximum total value among the total values for each facial expression type obtained by the facial expression intensity value smoothing processing unit, as a classification result of facial expressions corresponding to the plurality of frames. Facial expression type smoothing processing unit,
The facial expression evaluation result smoothing device according to claim 1, further comprising:

The facial expression intensity value smoothing processing unit sets the number of frames smaller than the plurality of frames as a shift amount, and shifts the plurality of frames in the time direction by the shift amount.
The facial expression evaluation result smoothing apparatus according to claim 1, wherein the facial expression evaluation result is smooth.

Computer
A facial expression intensity value acquisition unit that captures a plurality of facial expression intensity values obtained for each facial expression type based on a facial image for each frame;
Refer to the facial expression intensity values for a plurality of frames captured by the facial expression intensity value acquisition unit, and correspond to the plurality of frames based on a total value based on the facial expression intensity values of the plurality of frames for each facial expression type. A facial expression intensity value smoothing processing unit for calculating a representative facial expression intensity value;
Program for smoothing facial expression evaluation results.