JP2013228847A

JP2013228847A - Facial expression analyzing device and facial expression analyzing program

Info

Publication number: JP2013228847A
Application number: JP2012099904A
Authority: JP
Inventors: Makoto Okuda; 誠奥田
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2012-04-25
Filing date: 2012-04-25
Publication date: 2013-11-07
Anticipated expiration: 2032-04-25
Also published as: JP5879188B2

Abstract

PROBLEM TO BE SOLVED: To make classification of neutral facial expressions easy, and to improve accuracy of facial expression classification.SOLUTION: A facial expression analyzing device includes: a facial region extraction unit 20 which extracts a facial analysis region from image data captured by an image data acquisition unit 10; an image feature quantity analysis unit 30 which calculates the image feature quantity in the analysis region; a facial expression intensity evaluation unit 40 which generates a first facial image feature vector by executing first cluster classification processing with respect to the image feature quantity, and calculates a facial expression intensity value that is a distance from a predetermined first boundary surface to the first facial image feature vector; and a facial expression evaluation unit 50 which generates a second facial image feature vector by executing a second cluster classification processing with respect to the image feature quantity, and generates a facial expression type information which shows the facial expression type corresponding to the analysis region, on the basis of a positional relationship of the second facial image feature vector with respect to a predetermined second boundary surface and the facial expression intensity value.

Description

本発明は、顔表情解析装置および顔表情解析プログラムに関する。 The present invention relates to a facial expression analysis apparatus and a facial expression analysis program.

人物の顔画像が含まれる画像データを解析し、顔表情を６種類（Ａｎｇｅｒ；怒り、Ｄｉｓｇｕｓｔ；嫌悪、Ｆｅａｒ；恐れ、Ｈａｐｐｉｎｅｓｓ；喜び、Ｓａｄｎｅｓｓ；悲しみ、Ｓｕｒｐｒｉｓｅ；驚き）に分類する技術が知られている（例えば、非特許文献１参照）。 A technique for analyzing image data including human face images and classifying facial expressions into six types (Anger; anger, Disgust; aversion, Fear; fear, Happiness; joy, Sadness; sadness, Surprise) is known. (For example, refer nonpatent literature 1).

Zisheng Li, Jun-ichi Imai, Masahide Kaneko, "Facial Expression Recognition Using Facial-component-based Bag of Words and PHOG Descriptors", 映像情報メディア学会誌, Vol.64, No.2, pp. 230-236, 2010Zisheng Li, Jun-ichi Imai, Masahide Kaneko, "Facial Expression Recognition Using Facial-component-based Bag of Words and PHOG Descriptors", Journal of the Institute of Image Information and Television Engineers, Vol.64, No.2, pp. 230-236, 2010

しかしながら、従来技術では、無表情な顔つきから表情の種類を判別困難な程度の顔つきまでを示す中立的な顔表情（ニュートラル顔表情）を分類することが困難であった。
そこで、本発明は、上記の問題を解決するためになされたものであり、中立的な顔表情の分類を容易にするとともに、顔表情分類の精度を高めることができる、顔表情解析装置および顔表情解析プログラムを提供することを目的とする。 However, with the conventional technology, it has been difficult to classify neutral facial expressions (neutral facial expressions) that show a range from an expressionless face to a face whose degree of expression is difficult to distinguish.
Therefore, the present invention has been made to solve the above-described problems, and facilitates the classification of neutral facial expressions, and can improve the accuracy of facial expression classification and a facial expression analysis apparatus and face The purpose is to provide a facial expression analysis program.

［１］上記の課題を解決するため、本発明の一態様である顔表情解析装置は、画像データを取り込む画像データ取得部と、前記画像データ取得部が取り込んだ前記画像データから顔の解析領域を抽出する顔領域抽出部と、前記顔領域抽出部が抽出した前記解析領域の画像特徴量を計算する画像特徴量計算部と、前記画像特徴量計算部が計算した前記画像特徴量に対し第１のクラスタ分類処理を実行して第１の顔画像特徴ベクトルを生成し、顔画像特徴ベクトル空間においてあらかじめ決定された第１の境界面から前記第１の顔画像特徴ベクトルまでの距離である顔表情強度値を計算する顔表情強度評価部と、前記画像特徴量に対し第２のクラスタ分類処理を実行して第２の顔画像特徴ベクトルを生成し、顔画像特徴ベクトル空間においてあらかじめ決定された第２の境界面に対する前記第２の顔画像特徴ベクトルの位置関係と前記顔表情強度評価部が計算した前記顔表情強度値とに基づき、前記解析領域に対応する顔表情種別を示す顔表情種別情報を生成する顔表情評価部と、を備えることを特徴とする。 [1] In order to solve the above-described problem, a facial expression analysis apparatus according to an aspect of the present invention includes an image data acquisition unit that captures image data, and a face analysis region from the image data captured by the image data acquisition unit. A face area extracting unit for extracting the image area, an image feature amount calculating unit for calculating an image feature amount of the analysis area extracted by the face area extracting unit, and an image feature amount calculated by the image feature amount calculating unit. The first face image feature vector is generated by performing one cluster classification process, and the face is a distance from the first boundary surface determined in advance in the face image feature vector space to the first face image feature vector A facial expression strength evaluation unit that calculates a facial expression strength value, and a second cluster classification process is performed on the image feature amount to generate a second facial image feature vector. The facial expression type corresponding to the analysis region based on the positional relationship of the second facial image feature vector with respect to the second boundary surface determined in advance and the facial expression strength value calculated by the facial expression strength evaluation unit And a facial expression evaluation unit for generating facial expression type information indicating.

［２］上記［１］記載の顔表情解析装置において、前記顔表情評価部は、前記顔表情強度値に基づいて、前記解析領域に対応する顔表情種別がニュートラル顔表情であるか否かを判定し、前記顔表情種別が前記ニュートラル顔表情でないと判定した場合、前記第２の境界面に対する前記第２の顔画像特徴ベクトルの位置関係に基づいて前記顔表情種別情報を生成することを特徴とする。
［３］上記［１］または［２］記載の顔表情解析装置において、前記顔表情強度評価部は、前記顔表情評価部が生成した前記顔表情種別情報に対応する境界面から前記第１の顔画像特徴ベクトルまでの距離である前記顔表情強度値を計算することを特徴とする。
［４］上記［１］から［３］いずれか一項記載の顔表情解析装置において、前記第１の境界面は、顔表情の種類別に、顔表情の度合がそれぞれ異なる顔表情教師データの集合に前記顔表情の種類を示すラベルを対応付けて構成した顔表情教師データ群から取得した複数の顔表情教師データそれぞれの解析領域について画像特徴量を計算し、前記複数の顔表情教師データ分の画像特徴量をクラスタ分析し、前記顔表情の種類ごとの集合における前記顔表情の度合が最小および最大である顔表情教師データそれぞれに対応する画像特徴量を、前記クラスタ分析の結果であるクラスタに分類することによって得られる顔画像特徴ベクトルを適用したサポートベクトルマシンにより計算されることを特徴とする。
［５］上記［４］記載の顔表情解析装置において、前記第２の境界面は、前記複数の顔表情教師データの全てまたは一部の顔表情教師データそれぞれに対応する画像特徴量を、前記クラスタに分類することによって得られる顔画像特徴ベクトルを適用したサポートベクトルマシンにより計算されることを特徴とする。
［６］上記［１］から［５］いずれか一項記載の顔表情解析装置において、前記顔領域抽出部は、前記解析領域を複数の解析部分領域に分割し、前記画像特徴量計算部は、前記複数の解析部分領域それぞれの画像特徴量を計算し、前記顔表情強度評価部は、前記複数の解析部分領域それぞれの画像特徴量に対して前記第１のクラスタ分類処理を実行し、各分類結果を連結することによって前記第１の顔画像特徴ベクトルを生成し、前記顔表情評価部は、前記複数の解析部分領域それぞれの画像特徴量に対して前記第２のクラスタ分類処理を実行し、各分類結果を連結することによって前記第２の顔画像特徴ベクトルを生成することを特徴とする。
［７］上記［１］から［６］いずれか一項記載の顔表情解析装置において、前記顔表情評価部は、複数フレーム分の画像データを含む所定区間ごとに、顔表情種別ごとの顔表情強度値の総和を計算し、総和値が最大となる顔表情種別を示す顔表情種別情報を生成することを特徴とする。
［８］上記［７］記載の顔表情解析装置において、前記顔表情評価部は、前記複数フレームよりも少ないフレーム数おきに、前記所定区間を前記フレーム数分ずらすことを特徴とする。 [2] In the facial expression analysis apparatus according to [1], the facial expression evaluation unit determines whether the facial expression type corresponding to the analysis region is a neutral facial expression based on the facial expression strength value. And determining that the facial expression type information is generated based on a positional relationship of the second facial image feature vector with respect to the second boundary surface when it is determined that the facial expression type is not the neutral facial expression. And
[3] In the facial expression analysis apparatus according to [1] or [2], the facial expression strength evaluation unit includes the first expression from a boundary surface corresponding to the facial expression type information generated by the facial expression evaluation unit. The facial expression intensity value, which is a distance to a face image feature vector, is calculated.
[4] In the facial expression analysis apparatus according to any one of [1] to [3], the first boundary surface includes a set of facial expression teacher data having different degrees of facial expression for each type of facial expression. Image feature amount is calculated for each analysis region of a plurality of facial expression teacher data acquired from a facial expression teacher data group configured by associating a label indicating the type of facial expression with Cluster analysis is performed on the image feature amount, and the image feature amount corresponding to each facial expression teacher data in which the degree of the facial expression in the set for each type of facial expression is minimum and maximum is obtained as a cluster as a result of the cluster analysis. It is calculated by a support vector machine to which a face image feature vector obtained by classification is applied.
[5] In the facial expression analysis apparatus according to [4], the second boundary surface includes image feature amounts corresponding to all or a part of the facial expression teacher data of the plurality of facial expression teacher data. It is calculated by a support vector machine to which face image feature vectors obtained by classifying into clusters are applied.
[6] In the facial expression analysis apparatus according to any one of [1] to [5], the face region extraction unit divides the analysis region into a plurality of analysis partial regions, and the image feature amount calculation unit includes: Calculating the image feature amount of each of the plurality of analysis partial regions, and the facial expression strength evaluation unit executes the first cluster classification processing on the image feature amount of each of the plurality of analysis partial regions, The first facial image feature vector is generated by concatenating the classification results, and the facial expression evaluation unit executes the second cluster classification process on the image feature amount of each of the plurality of analysis partial regions. The second face image feature vector is generated by concatenating each classification result.
[7] The facial expression analysis apparatus according to any one of [1] to [6], wherein the facial expression evaluation unit performs facial expression for each facial expression type for each predetermined section including image data for a plurality of frames. The sum of the intensity values is calculated, and facial expression type information indicating the facial expression type with the maximum total value is generated.
[8] The facial expression analysis apparatus according to [7], wherein the facial expression evaluation unit shifts the predetermined interval by the number of frames every frame number smaller than the plurality of frames.

［９］上記の課題を解決するため、本発明の一態様である顔表情解析プログラムは、コンピュータを、画像データを取り込む画像データ取得部と、前記画像データ取得部が取り込んだ前記画像データから解析領域を抽出する顔領域抽出部と、前記顔領域抽出部が抽出した前記解析領域の画像特徴量を計算する画像特徴量計算部と、前記画像特徴量計算部が計算した前記画像特徴量に対し第１のクラスタ分類処理を実行して第１の顔画像特徴ベクトルを生成し、顔画像特徴ベクトル空間においてあらかじめ決定された第１の境界面から前記第１の顔画像特徴ベクトルまでの距離である顔表情強度値を計算する顔表情強度評価部と、前記画像特徴量に対し第２のクラスタ分類処理を実行して第２の顔画像特徴ベクトルを生成し、顔画像特徴ベクトル空間においてあらかじめ決定された第２の境界面に対する前記第２の顔画像特徴ベクトルの位置関係と前記顔表情強度評価部が計算した前記顔表情強度値とに基づき、前記解析領域に対応する顔表情種別を示す顔表情種別情報を生成する顔表情評価部と、として機能させる。 [9] In order to solve the above-described problem, a facial expression analysis program according to an aspect of the present invention analyzes a computer from an image data acquisition unit that acquires image data and the image data acquired by the image data acquisition unit. A face area extracting unit for extracting an area, an image feature amount calculating unit for calculating an image feature amount of the analysis area extracted by the face area extracting unit, and the image feature amount calculated by the image feature amount calculating unit. The first face image feature vector is generated by executing the first cluster classification process, and is the distance from the first boundary surface determined in advance in the face image feature vector space to the first face image feature vector. A facial expression strength evaluation unit for calculating a facial expression strength value; and a second cluster classification process is performed on the image feature amount to generate a second facial image feature vector, and a facial image feature vector A facial expression corresponding to the analysis region based on a positional relationship of the second facial image feature vector with respect to a second boundary surface determined in advance in space and the facial expression strength value calculated by the facial expression strength evaluation unit It functions as a facial expression evaluation unit that generates facial expression type information indicating the type.

本発明によれば、中立的な顔表情の分類を容易にするとともに、顔表情分類の精度を高めることができる。 According to the present invention, neutral facial expression classification can be facilitated and the accuracy of facial expression classification can be increased.

本発明の第１実施形態である顔表情解析装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the facial expression analyzer which is 1st Embodiment of this invention. 顔表情解析装置が機械学習モードに設定されて機械学習処理を実行する際に用いる、顔表情教師データベースのデータ構造の一部分を概念的に示す図である。It is a figure which shows notionally a part of data structure of the facial expression teacher database used when a facial expression analyzer is set to machine learning mode and performs a machine learning process. 画像データと、この画像データから抽出された顔領域データと、この顔領域データを正規化して得られた正規化顔領域データとを模式的に示す図である。FIG. 4 is a diagram schematically showing image data, face area data extracted from the image data, and normalized face area data obtained by normalizing the face area data. 解析領域決定部が正規化顔領域データから決定した解析領域を、視覚的に分かり易く線描画した図である。FIG. 5 is a diagram in which an analysis region determined from normalized face region data by an analysis region determination unit is drawn in a line that is visually easy to understand. 、機械学習部が画像特徴量をクラスタに分類して生成するヒストグラムを模式的に示した図である。FIG. 6 is a diagram schematically showing a histogram generated by the machine learning unit classifying image feature amounts into clusters. 顔表情教師データの顔画像特徴ベクトルが２クラスに分類された様子を示すサポートベクトルマシンの概念図である。It is a conceptual diagram of a support vector machine showing a state in which face image feature vectors of facial expression teacher data are classified into two classes. 同実施形態である顔表情解析装置が実行する機械学習処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the machine learning process which the facial expression analysis apparatus which is the embodiment performs. 同実施形態である顔表情解析装置が実行する顔表情解析処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the facial expression analysis process which the facial expression analysis apparatus which is the embodiment performs. 本発明の第２実施形態である顔表情解析装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the facial expression analyzer which is 2nd Embodiment of this invention. 本発明の第３実施形態である顔表情解析装置の出力結果を模式的に示した図である。It is the figure which showed typically the output result of the facial expression analyzer which is 3rd Embodiment of this invention. 同実施形態の変形例である顔表情解析装置の出力結果を模式的に示した図である。It is the figure which showed typically the output result of the facial expression analyzer which is a modification of the embodiment.

以下、本発明を実施するための形態について、図面を参照して詳細に説明する。
［第１の実施の形態］
図１は、本発明の第１実施形態である顔表情解析装置の機能構成を示すブロック図である。同図に示すように、顔表情解析装置１は、画像データ取得部１０と、顔領域抽出部２０と、画像特徴量分析部（画像特徴量計算部）３０と、顔表情強度評価部４０と、顔表情評価部５０と、モード切替部６０とを備える。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram showing a functional configuration of a facial expression analysis apparatus according to the first embodiment of the present invention. As shown in the figure, the facial expression analysis apparatus 1 includes an image data acquisition unit 10, a face region extraction unit 20, an image feature amount analysis unit (image feature amount calculation unit) 30, and a facial expression strength evaluation unit 40. The facial expression evaluation unit 50 and the mode switching unit 60 are provided.

顔表情解析装置１は、顔表情解析処理を実行することにより、取り込んだ画像データに含まれる人物顔の顔表情強度値を計算してこの顔表情強度値を出力するとともに、その人物顔の顔表情を分類して顔表情種別情報を生成し、この顔表情種別情報を出力する。顔表情強度値は、ニュートラル顔表情からピーク顔表情までの顔表情の度合を強度として示す数値である。ニュートラル顔表情は、人物の中立的な顔表情であり、例えば、人物の無表情な顔つきから表情の種類を判別困難な程度の顔つきまでを示す表情である。つまり、ニュートラル顔表情には、顔表情の幅がある。ピーク顔表情は、人物の感情を豊かに表現した顔表情であり、例えば、怒り、嫌悪、恐れ、喜び、悲しみ、驚き等の感情を強く表現した顔つきを示す。 The facial expression analysis apparatus 1 executes a facial expression analysis process to calculate a facial expression intensity value of a human face included in the captured image data and output the facial expression intensity value. The facial expression classification information is generated by classifying the facial expressions, and the facial expression classification information is output. The facial expression intensity value is a numerical value indicating the degree of facial expression from a neutral facial expression to a peak facial expression as an intensity. The neutral facial expression is a neutral facial expression of a person, for example, a facial expression ranging from a person's expressionless face to a face whose degree of expression is difficult to distinguish. In other words, the neutral facial expression has a range of facial expressions. The peak facial expression is a facial expression that expresses a person's emotions abundantly, and indicates a facial expression that strongly expresses emotions such as anger, disgust, fear, joy, sadness, and surprise.

顔表情解析装置１は、顔表情解析処理の前処理として機械学習処理を実行する。顔表情解析装置１は、機械学習処理を実行することにより、外部の顔表情教師データベースから複数の顔表情教師データを取り込み、これら複数の顔表情教師データを用いて、顔表情強度値を計算するための分類器、および顔表情を分類するための分類器それぞれの機械学習を行う。顔表情教師データベースは、顔表情の種類別に、顔表情の度合がそれぞれ異なる顔表情教師データの集合に、当該顔表情の種類を示すラベルを対応付けて構成した顔表情教師データ群を格納したデータベースである。分類器は、例えば、サポートベクトルマシン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ；ＳＶＭ）により実現される。このサポートベクトルマシンについては、例えば、C. Cortes, and V. Vapnik: "Support-Vector Networks", Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995に開示されている。 The facial expression analysis apparatus 1 executes machine learning processing as preprocessing of facial expression analysis processing. The facial expression analysis apparatus 1 executes a machine learning process to capture a plurality of facial expression teacher data from an external facial expression teacher database and calculate a facial expression intensity value using the plurality of facial expression teacher data. Machine learning for a classifier for classifying and a classifier for classifying facial expressions. The facial expression teacher database stores a facial expression teacher data group configured by associating a set of facial expression teacher data with different degrees of facial expression for each type of facial expression with a label indicating the type of facial expression. It is. The classifier is realized by, for example, a support vector machine (SVM). This support vector machine is disclosed in, for example, C. Cortes, and V. Vapnik: “Support-Vector Networks”, Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995.

図１において、モード切替部６０は、例えば、顔表情解析装置１がプログラムを実行することにより実現される切替制御により、顔表情解析装置１を機械学習モードから顔表情解析モード、または顔表情解析モードから機械学習モードに切り替える。または、モード切替部６０は、いずれのモードにも設定されていない状態（初期状態）から、機械学習モードまたは顔表情解析モードに設定する。機械学習モードは、顔表情解析装置１が機械学習処理を実行する動作モードである。また、顔表情解析モードは、顔表情解析装置１が顔解析処理を実行する動作モードである。
なお、モード切替部６０は、例えば、操作者による顔表情解析装置１の切替操作にしたがって、機械学習モードと顔表情解析モードとを切り替えてもよい。 In FIG. 1, the mode switching unit 60 switches the facial expression analysis apparatus 1 from the machine learning mode to the facial expression analysis mode or the facial expression analysis, for example, by switching control realized by the facial expression analysis apparatus 1 executing a program. Switch from mode to machine learning mode. Alternatively, the mode switching unit 60 sets the machine learning mode or the facial expression analysis mode from a state (initial state) that is not set to any mode. The machine learning mode is an operation mode in which the facial expression analysis apparatus 1 executes machine learning processing. The facial expression analysis mode is an operation mode in which the facial expression analysis apparatus 1 executes face analysis processing.
Note that the mode switching unit 60 may switch between the machine learning mode and the facial expression analysis mode in accordance with, for example, a switching operation of the facial expression analysis apparatus 1 by the operator.

画像データ取得部１０は、図示しない外部装置が供給する画像データを取り込む。具体的に、顔表情解析装置１が機械学習モードに設定されているとき、画像データ取得部１０は、顔表情教師データベースから複数の顔表情教師データを取り込む。また、顔表情解析装置１が顔表情解析モードに設定されているとき、画像データ取得部１０は、例えば、撮影装置または記録装置が供給する評価画像データを取り込む。 The image data acquisition unit 10 takes in image data supplied from an external device (not shown). Specifically, when the facial expression analysis apparatus 1 is set to the machine learning mode, the image data acquisition unit 10 captures a plurality of facial expression teacher data from the facial expression teacher database. In addition, when the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the image data acquisition unit 10 captures, for example, evaluation image data supplied from a photographing apparatus or a recording apparatus.

画像データ（顔表情教師データ、評価画像データ）は、静止画像データまたは動画像データである。画像データが静止画像データである場合、画像データ取得部１０は、取り込んだ静止画像データを顔領域抽出部２０に供給する。また、画像データが動画像データである場合、画像データ取得部１０は、取り込んだ動画像データからキーフレームを検出し、このキーフレームを画像データとして、順次またはあらかじめ決定された所定フレーム数おきに顔領域抽出部２０に供給する。 Image data (facial expression teacher data, evaluation image data) is still image data or moving image data. When the image data is still image data, the image data acquisition unit 10 supplies the captured still image data to the face area extraction unit 20. When the image data is moving image data, the image data acquisition unit 10 detects a key frame from the captured moving image data, and uses the key frame as image data sequentially or every predetermined number of frames. This is supplied to the face area extraction unit 20.

顔領域抽出部２０は、画像データ取得部１０が供給する画像データを取り込み、この画像データから人物顔の解析領域を抽出する。
顔領域抽出部２０は、その機能構成として、顔領域検出部２１と、解析領域決定部２２とを備える。 The face area extraction unit 20 takes in the image data supplied by the image data acquisition unit 10 and extracts a human face analysis area from the image data.
The face area extracting unit 20 includes a face area detecting unit 21 and an analysis area determining unit 22 as functional configurations.

顔領域検出部２１は、取り込んだ画像データに対して顔検出処理を実行し、その画像データから人物の顔領域を検出する。この顔領域のデータ（顔領域データ）は、人物顔を含む、例えば矩形の画像データである。顔領域検出部２１が実行する顔検出処理のアルゴリズムとして、公知の顔検出アルゴリズム、例えばＡｄａＢｏｏｓｔが適用できる。
なお、公知の顔検出アルゴリズムについては、例えば、PAUL VIOLA, MICHAEL J. JONES, "Robust Real-Time Face Detection", International Journal of Computer Vision, 2004, Vol. 57, No. 2, pp. 137-154に、詳細が開示されている。 The face area detection unit 21 performs face detection processing on the captured image data, and detects a human face area from the image data. The face area data (face area data) is, for example, rectangular image data including a human face. As a face detection processing algorithm executed by the face area detection unit 21, a known face detection algorithm such as AdaBoost can be applied.
Known face detection algorithms include, for example, PAUL VIOLA, MICHAEL J. JONES, “Robust Real-Time Face Detection”, International Journal of Computer Vision, 2004, Vol. 57, No. 2, pp. 137-154. Details are disclosed.

解析領域決定部２２は、顔領域検出部２１が検出した顔領域データを所定画素サイズに正規化する。そして、解析領域決定部２２は、正規化した顔領域データ（正規化顔領域データ）から解析領域を抽出する。具体的に、解析領域決定部２２は、顔領域データを所定画素サイズ（例えば、水平方向１２８画素×垂直方向１２８画素）の正規化顔領域データに正規化する。すなわち、解析領域決定部２２は、顔領域データを上記所定画素サイズの矩形画像に拡大させたり、縮小させたりする画像処理を実行して正規化顔領域データを生成する。つまり、画像データに含まれる人物顔の大きさは画像データによって様々であるため、解析領域決定部２２は、顔領域を拡大または縮小させて、全ての画像データにおける顔領域の解像度を同程度にする。これにより、解像度が異なる顔領域データの情報量を略均等（均等を含む）にすることができる。 The analysis area determination unit 22 normalizes the face area data detected by the face area detection unit 21 to a predetermined pixel size. Then, the analysis area determination unit 22 extracts an analysis area from the normalized face area data (normalized face area data). Specifically, the analysis area determination unit 22 normalizes the face area data into normalized face area data having a predetermined pixel size (for example, 128 pixels in the horizontal direction × 128 pixels in the vertical direction). In other words, the analysis area determination unit 22 performs image processing for enlarging or reducing the face area data to the rectangular image having the predetermined pixel size to generate normalized face area data. That is, since the size of the human face included in the image data varies depending on the image data, the analysis area determination unit 22 enlarges or reduces the face area so that the resolution of the face area in all the image data is the same. To do. Thereby, the amount of information of face area data with different resolutions can be made substantially equal (including equal).

解析領域決定部２２は、正規化顔領域データから、画像特徴量を計算するための解析領域を決定し、この解析領域のデータ（解析領域データ）を抽出する。解析領域は、例えば、正規化顔領域の中心位置を中心に設けられる、この正規化顔領域に含まれる円（楕円または真円）領域である。解析領域決定部２２は、例えば、正規化顔領域の水平方向であって且つその中心を通る直線で解析領域を二分し、その上部の領域を上部解析領域（第１の解析部分領域）、下部の領域を下部解析領域（第２の解析部分領域）として決定する。言い換えると、解析領域決定部２２は、正規化顔領域に内接する円形または楕円形よりも小さな円形または楕円形の解析領域を上下（縦）方向に二分して上部解析領域および下部解析領域を決定する。つまり、解析領域決定部２２は、解析領域を二つの解析部分領域に分割する。 The analysis area determination unit 22 determines an analysis area for calculating the image feature amount from the normalized face area data, and extracts data of the analysis area (analysis area data). The analysis region is, for example, a circle (ellipse or perfect circle) region included in the normalized face region that is provided around the center position of the normalized face region. For example, the analysis region determination unit 22 bisects the analysis region by a straight line that is in the horizontal direction of the normalized face region and passes through the center thereof, and the upper region is an upper analysis region (first analysis partial region), Is determined as a lower analysis area (second analysis partial area). In other words, the analysis region determination unit 22 divides a circular or elliptical analysis region smaller than a circle or ellipse inscribed in the normalized face region in the vertical (vertical) direction to determine the upper analysis region and the lower analysis region. To do. That is, the analysis region determination unit 22 divides the analysis region into two analysis partial regions.

画像特徴量分析部３０は、顔領域抽出部２０が抽出した解析領域データの局所特徴量である画像特徴量を計算する。例えば、画像特徴量分析部３０は、解析領域決定部２２が決定した解析領域における上部解析領域および下部解析領域それぞれのデータについて、ＳＵＲＦ（ＳｐｅｅｄｅｄＵｐＲｏｂｕｓｔＦｅａｔｕｒｅｓ）特徴量を計算する。または、例えば、画像特徴量分析部３０は、上部解析領域および下部解析領域それぞれのデータについて、ＳＩＦＴ（ＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍａｔｉｏｎ）特徴量を計算する。そして、画像特徴量分析部３０は、計算した二つの解析部分領域それぞれの画像特徴量を、顔表情強度評価部４０および顔表情評価部５０に供給する。 The image feature amount analysis unit 30 calculates an image feature amount that is a local feature amount of the analysis region data extracted by the face region extraction unit 20. For example, the image feature amount analysis unit 30 calculates a SURF (Speeded Up Robust Features) feature amount for each data of the upper analysis region and the lower analysis region in the analysis region determined by the analysis region determination unit 22. Alternatively, for example, the image feature quantity analysis unit 30 calculates a SIFT (Scale Invariant Feature Transformation) feature quantity for each data in the upper analysis area and the lower analysis area. Then, the image feature amount analysis unit 30 supplies the calculated image feature amounts of the two analysis partial regions to the facial expression strength evaluation unit 40 and the facial expression evaluation unit 50.

顔表情解析装置１が機械学習モードに設定されているとき、顔表情強度評価部４０は、複数の顔表情教師データから得られた各解析領域の画像特徴量を用いて、顔表情強度値を計算するための分類器の機械学習を行う。また、顔表情解析装置１が顔表情解析モードに設定されているとき、顔表情強度評価部４０は、評価画像データから得られた解析領域の画像特徴量を用いて、機械学習された分類器により顔表情強度値を計算する。
顔表情強度評価部４０は、その機能構成として、機械学習部４１と、顔表情強度値計算部４２とを備える。 When the facial expression analysis apparatus 1 is set to the machine learning mode, the facial expression strength evaluation unit 40 uses the image feature amount of each analysis region obtained from a plurality of facial expression teacher data to calculate the facial expression strength value. Perform machine learning of classifiers to calculate. Further, when the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the facial expression strength evaluation unit 40 uses the image feature amount of the analysis region obtained from the evaluation image data to perform a machine-learned classifier. To calculate the facial expression intensity value.
The facial expression strength evaluation unit 40 includes a machine learning unit 41 and a facial expression strength value calculation unit 42 as functional configurations.

顔表情解析装置１が機械学習モードに設定されているとき、機械学習部４１は、画像特徴量分析部３０が供給する、複数の顔表情教師データから得られた各解析領域の画像特徴量を取り込む。そして、機械学習部４１は、複数の顔表情教師データ分の画像特徴量についてクラスタ分析（クラスタリング）を実行する。クラスタ分析として、例えば、Ｋ平均法が適用できる。具体的に、機械学習部４１は、上部解析領域について画像特徴量のクラスタ分析を実行し、例えば３５０個のクラスタを生成する。また、機械学習部４１は、下部解析領域について画像特徴量のクラスタ分析を実行し、例えば２５０個のクラスタを生成する。 When the facial expression analysis apparatus 1 is set to the machine learning mode, the machine learning unit 41 calculates the image feature amount of each analysis region obtained from the plurality of facial expression teacher data supplied by the image feature amount analysis unit 30. take in. Then, the machine learning unit 41 performs cluster analysis (clustering) on the image feature amounts for a plurality of facial expression teacher data. As the cluster analysis, for example, a K-average method can be applied. Specifically, the machine learning unit 41 performs cluster analysis of image feature amounts for the upper analysis region, and generates, for example, 350 clusters. In addition, the machine learning unit 41 performs cluster analysis of image feature amounts for the lower analysis region, and generates, for example, 250 clusters.

そして、機械学習部４１は、顔表情の種類ごとの顔表情教師データの集合における顔表情の度合が最小および最大である顔表情教師データそれぞれに対応する画像特徴量を、クラスタ分析の結果であるクラスタに分類してヒストグラムを生成（クラスタ分類）することにより、顔画像特徴ベクトルを生成する。度合が最小である顔表情はニュートラル顔表情であり、度合が最大である顔表情はピーク顔表情である。 Then, the machine learning unit 41 is a result of cluster analysis on the image feature amounts corresponding to the facial expression teacher data having the minimum and maximum degrees of facial expression in the set of facial expression teacher data for each type of facial expression. A face image feature vector is generated by classifying into clusters and generating a histogram (cluster classification). The facial expression with the minimum degree is a neutral facial expression, and the facial expression with the maximum degree is a peak facial expression.

具体的に、機械学習部４１は、顔表情の種類ごとの顔表情教師データの集合における顔表情の度合が最小および最大である顔表情教師データの上部解析領域に対応する画像特徴量をクラスタに分類する。そして、機械学習部４１は、クラスタを階級とし、各クラスタの要素数を頻度とするヒストグラム（第１のヒストグラム）を生成する。また、機械学習部４１は、顔表情の種類ごとの顔表情教師データの集合における顔表情の度合が最小および最大である顔表情教師データの下部解析領域に対応する画像特徴量をクラスタに分類する。そして、機械学習部４１は、クラスタを階級とし、各クラスタの要素数を頻度とするヒストグラム（第２のヒストグラム）を生成する。そして、機械学習部４１は、分類結果である第１のヒストグラムと第２のヒストグラムとを連結して解析領域全体に対するヒストグラム（全体ヒストグラム）を生成する。例えば、機械学習部４１は、第１のヒストグラムに第２のヒストグラムを連結して全体ヒストグラムを生成する。または、機械学習部４１は、第２のヒストグラムに第１のヒストグラムを連結して全体ヒストグラムを生成する。そして、機械学習部４１は、全体ヒストグラムを正規化することにより顔画像特徴ベクトルを生成する。例えば、機械学習部４１は、全体ヒストグラムにおける各階級の頻度を、全階級の頻度の合計値で除算して顔画像特徴ベクトルを生成する。 Specifically, the machine learning unit 41 clusters the image feature amounts corresponding to the upper analysis area of the facial expression teacher data in which the degree of facial expression in the set of facial expression teacher data for each type of facial expression is minimum and maximum. Classify. And the machine learning part 41 produces | generates the histogram (1st histogram) which makes a cluster a class and makes the number of elements of each cluster into frequency. In addition, the machine learning unit 41 classifies image feature amounts corresponding to the lower analysis region of the facial expression teacher data having the minimum and maximum degrees of facial expression in the set of facial expression teacher data for each type of facial expression into clusters. . And the machine learning part 41 produces | generates the histogram (2nd histogram) which makes a cluster a class and makes the number of elements of each cluster into frequency. Then, the machine learning unit 41 generates a histogram (overall histogram) for the entire analysis region by connecting the first histogram and the second histogram that are the classification results. For example, the machine learning unit 41 concatenates the second histogram to the first histogram to generate an overall histogram. Alternatively, the machine learning unit 41 generates a whole histogram by connecting the first histogram to the second histogram. Then, the machine learning unit 41 generates a face image feature vector by normalizing the entire histogram. For example, the machine learning unit 41 generates a face image feature vector by dividing the frequency of each class in the entire histogram by the total value of the frequencies of all classes.

機械学習部４１は、例えば、サポートベクトルマシンによる機械学習を実行し、顔表情の度合が最小である顔画像と、顔表情の度合が最大である顔画像とを分類する境界面（第１の境界面）を計算し、この境界面のデータを顔表情強度値計算部４２に供給する。境界面は、超平面、分離超平面、分離平面等ともいう。顔表情強度値計算部４２は、機械学習部４１が供給する境界面のデータを取り込み、この境界面のデータを記憶する。 For example, the machine learning unit 41 performs machine learning using a support vector machine, and classifies a face image with the smallest facial expression degree and a face image with the largest facial expression degree (first surface). (Boundary surface) is calculated, and the data of the boundary surface is supplied to the facial expression intensity value calculation unit 42. The boundary surface is also called a hyperplane, a separation hyperplane, a separation plane, or the like. The facial expression intensity value calculation unit 42 takes in the boundary surface data supplied by the machine learning unit 41 and stores the boundary surface data.

機械学習部４１が様々な顔表情の顔表情教師データから得られた画像特徴量を用いてクラスタ分析を実行することにより、顔表情の強度の変化に応じた顔画像特徴ベクトルを得ることができ、顔表情強度値の精度を高めることができる。 The machine learning unit 41 performs cluster analysis using image feature amounts obtained from facial expression teacher data of various facial expressions, thereby obtaining facial image feature vectors corresponding to changes in facial expression intensity. The accuracy of the facial expression intensity value can be increased.

顔表情解析装置１が顔表情解析モードに設定されているとき、顔表情強度値計算部４２は、画像特徴量分析部３０が供給する、評価画像データから得られた解析領域の画像特徴量を取り込む。そして、顔表情強度値計算部４２は、取り込んだ画像特徴量を機械学習部４１が実行したクラスタ分析の結果であるクラスタに分類（第１のクラスタ分類処理）して、顔画像特徴ベクトル（第１の顔画像特徴ベクトル）を生成する。そして、顔表情強度値計算部４２は、記憶した境界面から顔画像特徴ベクトルまでの距離を計算し、この距離の値を顔表情強度値として出力する。この距離とは、特徴ベクトル空間における、顔画像特徴ベクトルから境界面までのユークリッド距離である。顔表情強度値は、例えば、０（ゼロ）を中心として、負方向に大きくなるほどニュートラル顔表情に近づく一方、正方向に大きくなるほどピーク顔表情に近づく数値である。また、顔表情強度値計算部４２は、顔表情強度値を顔表情評価部５０に供給する。 When the facial expression analysis device 1 is set to the facial expression analysis mode, the facial expression intensity value calculation unit 42 calculates the image feature amount of the analysis region obtained from the evaluation image data supplied from the image feature amount analysis unit 30. take in. Then, the facial expression intensity value calculation unit 42 classifies the captured image feature quantities into clusters (first cluster classification process) that are the results of the cluster analysis performed by the machine learning unit 41, and performs facial image feature vectors (first number). 1 face image feature vector). Then, the facial expression intensity value calculation unit 42 calculates the distance from the stored boundary surface to the facial image feature vector, and outputs the value of this distance as the facial expression intensity value. This distance is the Euclidean distance from the face image feature vector to the boundary surface in the feature vector space. For example, the facial expression intensity value is a numerical value that approaches a neutral facial expression as the value increases in the negative direction around 0 (zero), and approaches a peak facial expression as the value increases in the positive direction. In addition, the facial expression intensity value calculation unit 42 supplies the facial expression intensity value to the facial expression evaluation unit 50.

顔表情解析装置１が機械学習モードに設定されているとき、顔表情評価部５０は、複数の顔表情教師データから得られた各解析領域の画像特徴量を用いて、顔表情を分類するための分類器の機械学習を行う。また、顔表情解析装置１が顔表情解析モードに設定されているとき、顔表情評価部５０は、評価画像データから得られた解析領域の画像特徴量と顔表情強度評価部４０が供給した顔表情強度値とに基づいて、機械学習された分類器により顔表情を分類して顔表情種別情報を生成する。
顔表情評価部５０は、その機能構成として、機械学習部５１と、顔表情分類部５２とを備える。 When the facial expression analysis apparatus 1 is set to the machine learning mode, the facial expression evaluation unit 50 classifies the facial expressions using image feature amounts of each analysis region obtained from a plurality of facial expression teacher data. Perform machine learning of classifiers. When the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the facial expression evaluation unit 50 includes the image feature amount of the analysis region obtained from the evaluation image data and the face supplied by the facial expression strength evaluation unit 40. Based on the facial expression intensity value, the facial expression classification information is generated by classifying the facial expression with a machine-learned classifier.
The facial expression evaluation unit 50 includes a machine learning unit 51 and a facial expression classification unit 52 as functional configurations.

顔表情解析装置１が機械学習モードに設定されているとき、機械学習部５１は、画像特徴量分析部３０が供給する、複数の顔表情教師データの全てまたは一部の顔表情教師データから得られた各解析領域の画像特徴量を取り込む。一部の顔表情教師データは、例えば、顔表情の種類ごとの顔表情教師データの集合のうち、顔表情の度合が大きい方の所定割合分の顔表情教師データである。そして、機械学習部５１は、これらの画像特徴量を、機械学習部４１が実行したクラスタ分析の結果であるクラスタに分類してヒストグラムを生成することにより、顔画像特徴ベクトルを生成する。 When the facial expression analysis apparatus 1 is set to the machine learning mode, the machine learning unit 51 obtains from all or some of the facial expression teacher data supplied from the image feature amount analysis unit 30. The image feature amount of each analysis area is fetched. The partial facial expression teacher data is, for example, the facial expression teacher data for a predetermined proportion of the facial expression teacher data set for each type of facial expression with a higher degree of facial expression. Then, the machine learning unit 51 generates a face image feature vector by classifying these image feature amounts into clusters that are the result of the cluster analysis performed by the machine learning unit 41 and generating a histogram.

具体的に、機械学習部５１は、複数の顔表情教師データの全てまたは一部の顔表情教師データの上部解析領域に対応する画像特徴量をクラスタに分類する。そして、機械学習部５１は、クラスタを階級とし、各クラスタの要素数を頻度とするヒストグラム（第３のヒストグラム）を生成する。また、機械学習部５１は、複数の顔表情教師データの全てまたは一部の顔表情教師データの下部解析領域に対応する画像特徴量をクラスタに分類する。そして、機械学習部５１は、クラスタを階級とし、各クラスタの要素数を頻度とするヒストグラム（第４のヒストグラム）を生成する。そして、機械学習部５１は、分類結果である第３のヒストグラムと第４のヒストグラムとを連結して解析領域全体に対するヒストグラム（全体ヒストグラム）を生成する。例えば、機械学習部５１は、第３のヒストグラムに第４のヒストグラムを連結して全体ヒストグラムを生成する。または、機械学習部５１は、第４のヒストグラムに第３のヒストグラムを連結して全体ヒストグラムを生成する。そして、機械学習部５１は、全体ヒストグラムを正規化することにより顔画像特徴ベクトルを生成する。例えば、機械学習部５１は、全体ヒストグラムにおける各階級の頻度を、全階級の頻度の合計値で除算して顔画像特徴ベクトルを生成する。 Specifically, the machine learning unit 51 classifies image feature amounts corresponding to the upper analysis region of all or part of the plurality of facial expression teacher data into clusters. Then, the machine learning unit 51 generates a histogram (third histogram) having clusters as classes and the number of elements of each cluster as a frequency. In addition, the machine learning unit 51 classifies image feature amounts corresponding to lower analysis regions of all or part of the facial expression teacher data of the plurality of facial expression teacher data into clusters. Then, the machine learning unit 51 generates a histogram (fourth histogram) having clusters as classes and the frequency of the number of elements of each cluster. Then, the machine learning unit 51 generates a histogram (overall histogram) for the entire analysis region by connecting the third histogram and the fourth histogram that are the classification results. For example, the machine learning unit 51 generates a whole histogram by connecting the fourth histogram to the third histogram. Alternatively, the machine learning unit 51 generates a whole histogram by connecting the third histogram to the fourth histogram. Then, the machine learning unit 51 generates a face image feature vector by normalizing the whole histogram. For example, the machine learning unit 51 divides the frequency of each class in the entire histogram by the total value of the frequencies of all classes to generate a face image feature vector.

機械学習部５１は、例えば、サポートベクトルマシンによる機械学習を実行し、顔表情の種類別に顔画像を分類する境界面（第２の境界面）を計算し、この境界面のデータを顔表情分類部５２に供給する。サポートベクトルマシンは２クラス分類器であるため、機械学習部５１は、顔表情の種類数に応じて２クラス分類を繰り返す。顔表情分類部５２は、機械学習部５１が供給する境界面のデータを取り込み、この境界面のデータを記憶する。 The machine learning unit 51 executes, for example, machine learning using a support vector machine, calculates a boundary surface (second boundary surface) for classifying a face image according to the type of facial expression, and uses the data on this boundary surface as a facial expression classification. To the unit 52. Since the support vector machine is a two-class classifier, the machine learning unit 51 repeats the two-class classification according to the number of types of facial expressions. The facial expression classification unit 52 takes in boundary surface data supplied by the machine learning unit 51 and stores the boundary surface data.

顔表情解析装置１が顔表情解析モードに設定されているとき、顔表情分類部５２は、画像特徴量分析部３０が供給する、評価画像データから得られた解析領域の画像特徴量を取り込む。また、顔表情分類部５２は、顔表情強度値計算部４２が供給する顔表情強度値を取り込む。そして、顔表情分類部５２は、取り込んだ画像特徴量を機械学習部４１が実行したクラスタ分析の結果であるクラスタに分類（第２のクラスタ分類処理）して、顔画像特徴ベクトル（第２の顔画像特徴ベクトル）を生成する。 When the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the facial expression classification unit 52 captures the image feature amount of the analysis region obtained from the evaluation image data supplied from the image feature amount analysis unit 30. In addition, the facial expression classification unit 52 captures the facial expression intensity value supplied by the facial expression intensity value calculation unit 42. Then, the facial expression classification unit 52 classifies the captured image feature quantities into clusters (second cluster classification process) that are the result of the cluster analysis performed by the machine learning unit 41, and performs facial image feature vectors (second Face image feature vector).

そして、顔表情分類部５２は、記憶した境界面に対する顔画像特徴ベクトルの位置関係と、顔表情強度値計算部４２から取り込んだ顔表情強度値とに基づいて、解析領域に対応する顔表情種別を示す顔表情種別情報を生成し、この顔表情種別情報を出力する。
具体的に、顔表情分類部５２は、顔表情強度値とあらかじめ保有する閾値とを比較する。そして、顔表情分類部５２は、顔表情強度値が閾値以下である場合、解析領域における顔表情がニュートラル顔表情であると判定し、顔表情強度値が閾値を超える場合、解析領域における顔表情が非ニュートラル顔表情であると判定する。そして、顔表情分類部５２は、顔表情種別がニュートラル顔表情であると判定した場合、ニュートラル顔表情を示す情報を含めた顔表情種別情報を生成する。一方、顔表情分類部５２は、顔表情種別がニュートラル顔表情でないと判定した場合、各境界面に対する顔画像特徴ベクトルの位置を判定して分類を絞り込むことによって顔表情種別情報を生成する。 Then, the facial expression classification unit 52 determines the facial expression type corresponding to the analysis region based on the positional relationship of the facial image feature vector with respect to the stored boundary surface and the facial expression intensity value acquired from the facial expression intensity value calculation unit 42. Is generated, and this facial expression type information is output.
Specifically, the facial expression classification unit 52 compares the facial expression intensity value with a threshold value that is held in advance. Then, the facial expression classification unit 52 determines that the facial expression in the analysis area is a neutral facial expression if the facial expression intensity value is less than or equal to the threshold, and the facial expression in the analysis area if the facial expression intensity value exceeds the threshold. Is determined to be a non-neutral facial expression. If the facial expression classification unit 52 determines that the facial expression type is a neutral facial expression, the facial expression classification unit 52 generates facial expression type information including information indicating the neutral facial expression. On the other hand, when it is determined that the facial expression type is not a neutral facial expression, the facial expression classification unit 52 determines the position of the face image feature vector with respect to each boundary surface and narrows down the classification to generate facial expression type information.

なお、顔表情分類部５２は、顔表情の種類別に閾値を保有してもよい。顔表情の種類は、例えば、怒り、嫌悪、恐れ、喜び、悲しみ、驚きである。 The facial expression classification unit 52 may have a threshold for each type of facial expression. The types of facial expressions are, for example, anger, disgust, fear, joy, sadness, and surprise.

図２は、顔表情解析装置１が機械学習モードに設定されて機械学習処理を実行する際に用いる、顔表情教師データベースのデータ構造の一部分を概念的に示す図である。同図に示すように、顔表情教師データベースは、顔表情の種類別に、ニュートラル顔表情からピーク顔表情まで顔表情の度合がそれぞれ異なる顔表情教師データの集合に、当該顔表情の種類を示すラベルを対応付けて構成した顔表情教師データ群を格納している。顔表情の種類は、例えば、「怒り」、「嫌悪」、「恐れ」、「喜び」、「悲しみ」、および「驚き」の６種類である。 FIG. 2 is a diagram conceptually showing a part of the data structure of the facial expression teacher database used when the facial expression analysis apparatus 1 is set to the machine learning mode and executes the machine learning process. As shown in the figure, the facial expression teacher database has a label indicating the type of facial expression in a set of facial expression teacher data with different degrees of facial expression from neutral facial expressions to peak facial expressions for each type of facial expression. Are stored in association with facial expression teacher data group. There are six types of facial expressions, for example, “anger”, “disgust”, “fear”, “joy”, “sadness”, and “surprise”.

顔表情教師データベースとして、例えば、Patrick Lucey, Jeffrey F. Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, "The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression", the Third IEEE Workshop on CVPR for Human Communicative Behavior Analysis, pp. 94-101, 2010に記載された、Cohn-Kanade Facial Expression Databaseを適用できる。 For example, Patrick Lucey, Jeffrey F. Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, "The Extended Cohn-Kanade Dataset (CK +): A complete dataset for action unit and emotion-specified expression", the The Cohn-Kanade Facial Expression Database described in Third IEEE Workshop on CVPR for Human Communicative Behavior Analysis, pp. 94-101, 2010 can be applied.

図３は、画像データと、この画像データから抽出された顔領域データと、この顔領域データを正規化して得られた正規化顔領域データとを模式的に示す図である。つまり、同図は、画像データ取得部１０が取得する画像データ２と、顔領域検出部２１が検出する顔領域データ２ａと、解析領域決定部２２が正規化（ここでは、縮小）する正規化顔領域データ２ｂとを時系列に示している。同図に示すように、画像データ２は、人物の首より上側を含む画像である。顔領域データ２ａは、画像データ２から抽出された人物顔を含む画像である。正規化顔領域データ２ｂは、例えば、人物の顔表情を決定付ける顔の主要なパーツ（両眉毛、両目、鼻、口）を含むように、水平画素数Ｌ_Ｘ×垂直画素数Ｌ_Ｙサイズに正規化された画像である。水平画素数Ｌ_Ｘと垂直画素数Ｌ_Ｙとの長さの関係は、例えば、水平画素数Ｌ_Ｘ＝垂直画素数Ｌ_Ｙである。 FIG. 3 is a diagram schematically showing image data, face area data extracted from the image data, and normalized face area data obtained by normalizing the face area data. That is, the figure shows the image data 2 acquired by the image data acquisition unit 10, the face region data 2a detected by the face region detection unit 21, and the normalization that the analysis region determination unit 22 normalizes (here, reduces). The face area data 2b is shown in time series. As shown in the figure, the image data 2 is an image including the upper side of the person's neck. The face area data 2a is an image including a human face extracted from the image data 2. The normalized face area data 2b has, for example, a horizontal pixel number L _X × vertical pixel number _LY size so as to include main parts of the face (both eyebrows, both eyes, nose, mouth) that determine the facial expression of a person. This is a normalized image. The length relationship between the horizontal pixel number L _X and the vertical pixel number L _Y is, for example, horizontal pixel number L _X = vertical pixel number L _Y.

図４は、解析領域決定部２２が正規化顔領域データ２ｂから決定した解析領域を、視覚的に分かり易く線描画した図である。同図に示すように、解析領域決定部２２は、水平画素数Ｌ_Ｘ×垂直画素数Ｌ_Ｙの正規化顔領域データ２ｂの中心位置を中心として、正規化顔領域データ２ｂに含まれる円形の解析領域３を決定する。解析領域３の水平方向の径は、例えば水平画素数Ｌ_Ｘの０．８倍の長さであり、垂直方向の径は、例えば垂直画素数Ｌ_Ｙの０．８倍の長さである。このように、解析領域３の径を正規化顔領域データ２ｂの内接円の径よりも小さくすることにより、人物顔の認識や顔表情認識にとって重要度が低い髪の毛、耳、イヤリング等の情報を除外することができる。解析領域決定部２２は、解析領域３の水平方向であって且つその中心を通る直線で、解析領域３を上部解析領域３Ｕと下部解析領域３Ｄとに区分する。このように区分することにより、上部解析領域３Ｕは両眉毛および両目を含み、下部解析領域３Ｄは鼻頭および口を含むこととなる。 FIG. 4 is a diagram in which the analysis region determined by the analysis region determination unit 22 from the normalized face region data 2b is drawn in a line that is easy to understand visually. As shown in the figure, the analysis region determining unit 22 about the central position of the horizontal pixel number L _X × vertical pixel number L _Y of the normalized face region data 2b, circular included in the normalized face region data 2b The analysis area 3 is determined. Horizontal diameter of the analysis region 3, for example, 0.8 times the length of the horizontal pixel number L _X, the diameter of the vertical direction is, for example, 0.8 times the length of the vertical pixel number L _Y. Thus, by making the diameter of the analysis region 3 smaller than the diameter of the inscribed circle of the normalized face region data 2b, information on hair, ears, earrings, etc. that are less important for human face recognition and facial expression recognition Can be excluded. The analysis region determination unit 22 divides the analysis region 3 into an upper analysis region 3U and a lower analysis region 3D by a straight line passing through the center of the analysis region 3 in the horizontal direction. By dividing in this way, the upper analysis region 3U includes both eyebrows and both eyes, and the lower analysis region 3D includes the nasal head and mouth.

図５は、機械学習部４１が画像特徴量をクラスタに分類して生成するヒストグラムを模式的に示した図である。同図は、機械学習部４１が上部解析領域における画像特徴量のヒストグラムの後に、下部解析領域における画像特徴量のヒストグラムを連結して解析領域全体のヒストグラムを得る例である。このヒストグラムは顔画像の特徴ベクトルを表す。このように、機械学習部４１が解析部分領域ごとにクラスタ分類することにより、画像特徴量に、位置情報（上部解析領域または下部解析領域）が対応付けられる。
なお、機械学習部４１は、下部解析領域における画像特徴量のヒストグラムの後に、上部解析領域における画像特徴量のヒストグラムを連結して解析領域全体のヒストグラムを得てもよい。 FIG. 5 is a diagram schematically illustrating a histogram generated by the machine learning unit 41 by classifying image feature amounts into clusters. This figure is an example in which the machine learning unit 41 obtains a histogram of the entire analysis region by connecting the histogram of the image feature amount in the lower analysis region after the histogram of the image feature amount in the upper analysis region. This histogram represents the feature vector of the face image. As described above, the machine learning unit 41 performs cluster classification for each analysis partial region, so that position information (upper analysis region or lower analysis region) is associated with the image feature amount.
Note that the machine learning unit 41 may obtain the histogram of the entire analysis region by connecting the histogram of the image feature amount in the upper analysis region after the histogram of the image feature amount in the lower analysis region.

次に、顔表情解析装置１に適用されるサポートベクトルマシンについて説明する。
図６は、顔表情教師データの顔画像特徴ベクトルが２クラスに分類された様子を示すサポートベクトルマシンの概念図である。便宜上、同図は、顔画像特徴ベクトルの次元数を“２”とした場合を示している。２クラスとは、「ニュートラル顔表情」のクラスおよび「ピーク顔表情」のクラスである。また、同図に示した８個の顔画像（顔表情教師データの顔画像）および１個の顔画像（顔画像特徴ベクトルＸに対応する評価画像データの顔画像）のそれぞれは、顔画像特徴ベクトルに対応する顔の表情を視覚化したものであり、各顔画像が配置された位置は、特徴ベクトル空間（ここでは、特徴ベクトル平面）における顔画像特徴ベクトルの位置を示すものである。 Next, a support vector machine applied to the facial expression analysis apparatus 1 will be described.
FIG. 6 is a conceptual diagram of a support vector machine showing how facial image feature vectors of facial expression teacher data are classified into two classes. For convenience, this figure shows a case where the dimension number of the face image feature vector is “2”. The two classes are a “neutral facial expression” class and a “peak facial expression” class. Each of the eight face images (face images of facial expression teacher data) and one face image (face images of evaluation image data corresponding to the face image feature vector X) shown in FIG. The facial expression corresponding to the vector is visualized, and the position where each face image is arranged indicates the position of the face image feature vector in the feature vector space (here, the feature vector plane).

本実施形態では、顔表情解析装置１は、機械学習モードに設定された場合、外部の顔表情教師データベースから複数の顔表情教師データを取り込み、これら複数の顔表情教師データを用いて、サポートベクトルマシンにより境界面Ｈを計算する。図６では、顔画像特徴ベクトルを２次元としているため、境界面Ｈは線で表されるが、実際は、顔画像特徴ベクトルの“次元数−１”の次元数による超平面となる。例えば、顔画像特徴ベクトルが６００次元のクラスタである場合、境界面Ｈは、５９９次元の超平面となる。
同図において、８個の顔表情教師データの顔画像それぞれの顔画像特徴ベクトルは、境界面Ｈによってニュートラル顔表情のクラスＡと、ピーク顔表情のクラスＢとに分類される。 In the present embodiment, when the facial expression analysis apparatus 1 is set to the machine learning mode, the facial expression analysis apparatus 1 takes in a plurality of facial expression teacher data from an external facial expression teacher database, and uses the plurality of facial expression teacher data to support vectors. The boundary surface H is calculated by the machine. In FIG. 6, since the face image feature vector is two-dimensional, the boundary surface H is represented by a line. For example, when the face image feature vector is a 600-dimensional cluster, the boundary surface H is a 599-dimensional hyperplane.
In the figure, the face image feature vectors of each of the eight facial expression teacher data face images are classified into a neutral facial expression class A and a peak facial expression class B by the boundary surface H.

顔表情解析装置１が顔表情解析モードに設定された場合、機械学習後のサポートベクトルマシンは、境界面Ｈから評価画像データの顔画像特徴ベクトルＸまでの距離（ユークリッド距離）Ｄを計算する。本実施形態では、距離Ｄを、例えば、境界面Ｈ上の値が０（ゼロ）、ピーク顔表情のクラスＡ側が正値、ニュートラル顔表情のクラスＢ側が負値となる。この距離Ｄが顔表情強度値である。 When the facial expression analysis apparatus 1 is set to the facial expression analysis mode, the support vector machine after machine learning calculates a distance (Euclidean distance) D from the boundary surface H to the facial image feature vector X of the evaluation image data. In the present embodiment, for example, the value on the boundary surface H is 0 (zero), the peak face expression class A side has a positive value, and the neutral face expression class B side has a negative value. This distance D is a facial expression intensity value.

次に、顔表情解析装置１の動作について、機械学習処理と顔表情解析処理とに分けて説明する。 Next, the operation of the facial expression analysis apparatus 1 will be described separately for machine learning processing and facial expression analysis processing.

図７は、顔表情解析装置１が実行する機械学習処理の手順を示すフローチャートである。
ステップＳ１において、モード切替部６０は、機械学習モードに設定する。
次に、ステップＳ２において、画像データ取得部１０は、外部の顔表情教師データベースに格納された複数の顔表情教師データから一つの顔表情教師データを取り込み、この顔表情教師データを顔領域抽出部２０に供給する。 FIG. 7 is a flowchart illustrating a procedure of machine learning processing executed by the facial expression analysis apparatus 1.
In step S1, the mode switching unit 60 sets the machine learning mode.
Next, in step S2, the image data acquisition unit 10 takes in one facial expression teacher data from a plurality of facial expression teacher data stored in the external facial expression teacher database, and uses this facial expression teacher data as a facial region extraction unit. 20 is supplied.

次に、ステップＳ３において、顔領域抽出部２０は、画像データ取得部１０が供給する画像データを取り込み、この画像データから人物顔の解析領域を抽出する。
具体的に、顔領域検出部２１は、取り込んだ画像データに対して顔検出処理を実行し、その画像データから人物の顔領域を検出する。
次に、解析領域決定部２２は、顔領域検出部２１が検出した顔領域データを所定画素サイズ（例えば、水平方向１２８画素×垂直方向１２８画素）に正規化する。
次に、解析領域決定部２２は、正規化顔領域データから解析領域を抽出し、この解析領域から二つの解析部分領域（上部解析領域および下部解析領域）を決定する。 Next, in step S <b> 3, the face area extraction unit 20 takes in the image data supplied by the image data acquisition unit 10, and extracts a human face analysis area from the image data.
Specifically, the face area detection unit 21 performs face detection processing on the captured image data, and detects a human face area from the image data.
Next, the analysis area determination unit 22 normalizes the face area data detected by the face area detection unit 21 to a predetermined pixel size (for example, 128 pixels in the horizontal direction × 128 pixels in the vertical direction).
Next, the analysis region determination unit 22 extracts an analysis region from the normalized face region data, and determines two analysis partial regions (an upper analysis region and a lower analysis region) from this analysis region.

次に、ステップＳ４において、画像特徴量分析部３０は、顔領域抽出部２０が抽出した解析領域データの画像特徴量を計算する。例えば、画像特徴量分析部３０は、解析領域決定部２２が決定した解析領域における上部解析領域および下部解析領域それぞれのデータについて、画像特徴量（例えば、ＳＵＲＦ特徴量またはＳＩＦＴ特徴量）を計算する。そして、画像特徴量分析部３０は、計算した上部解析領域および下部解析領域それぞれの画像特徴量を、顔表情強度評価部４０および顔表情評価部５０に供給する。 Next, in step S4, the image feature amount analysis unit 30 calculates the image feature amount of the analysis region data extracted by the face region extraction unit 20. For example, the image feature amount analysis unit 30 calculates an image feature amount (for example, a SURF feature amount or a SIFT feature amount) for each data of the upper analysis region and the lower analysis region in the analysis region determined by the analysis region determination unit 22. . Then, the image feature amount analysis unit 30 supplies the calculated image feature amounts of the upper analysis region and the lower analysis region to the facial expression strength evaluation unit 40 and the facial expression evaluation unit 50, respectively.

次に、ステップＳ５において、顔表情教師データベースから取り込むべき全ての顔表情教師データの取り込みが完了した場合（Ｓ５：ＹＥＳ）、ステップＳ６の処理に移し、顔表情教師データベースから取り込むべき全ての顔表情教師データの取り込みが完了していない場合（Ｓ５：ＮＯ）、ステップＳ２の処理に戻す。 Next, when all the facial expression teacher data to be imported from the facial expression teacher database is completed in step S5 (S5: YES), the process proceeds to step S6, and all the facial expressions to be imported from the facial expression teacher database. If the teacher data has not been taken in (S5: NO), the process returns to step S2.

ステップＳ６において、顔表情強度評価部４０は、複数の顔表情教師データから得られた各解析領域の画像特徴量を用いて、顔表情強度値を計算するための分類器の機械学習を行う。
具体的に、機械学習部４１は、複数の顔表情教師データ分の画像特徴量についてクラスタ分析（例えば、Ｋ平均法のクラスタリング）を実行する。次に、機械学習部４１は、顔表情の種類ごとの顔表情教師データの集合における顔表情の度合が最小および最大である顔表情教師データそれぞれに対応する画像特徴量を、クラスタに分類してヒストグラムを生成（クラスタ分類）することにより、顔画像特徴ベクトルを生成する。 In step S <b> 6, the facial expression strength evaluation unit 40 performs machine learning of a classifier for calculating a facial expression strength value using image feature amounts of each analysis region obtained from a plurality of facial expression teacher data.
Specifically, the machine learning unit 41 performs cluster analysis (for example, clustering by the K-average method) for image feature amounts for a plurality of facial expression teacher data. Next, the machine learning unit 41 classifies image feature amounts corresponding to the facial expression teacher data having the minimum and maximum degrees of facial expression in the set of facial expression teacher data for each type of facial expression into clusters. A face image feature vector is generated by generating a histogram (cluster classification).

次に、ステップＳ７において、機械学習部４１は、例えば、サポートベクトルマシンによる機械学習を実行し、顔表情の度合が最小である顔画像と、顔表情の度合が最大である顔画像とを分類する第１の境界面を計算し、この第１の境界面のデータを顔表情強度値計算部４２に供給する。そして、顔表情強度値計算部４２は、機械学習部４１が供給する第１の境界面のデータを取り込み、この第１の境界面のデータを記憶する。 Next, in step S7, the machine learning unit 41 performs machine learning using, for example, a support vector machine, and classifies a face image having the smallest facial expression level and a face image having the largest facial expression degree. The first boundary surface is calculated, and the data of the first boundary surface is supplied to the facial expression intensity value calculation unit. Then, the facial expression intensity value calculation unit 42 takes in the first boundary surface data supplied from the machine learning unit 41 and stores the first boundary surface data.

次に、ステップＳ８において、顔表情評価部５０は、複数の顔表情教師データから得られた各解析領域の画像特徴量を用いて、顔表情を分類するための分類器の機械学習を行う。
具体的に、機械学習部５１は、複数の顔表情教師データの全てまたは一部の顔表情教師データから得られた各解析領域の画像特徴量を、機械学習部４１が実行したクラスタ分析の結果であるクラスタに分類してヒストグラムを生成（クラスタ分類）することにより、顔画像特徴ベクトルを生成する。 Next, in step S8, the facial expression evaluation unit 50 performs machine learning of a classifier for classifying facial expressions using image feature amounts of each analysis region obtained from a plurality of facial expression teacher data.
Specifically, the machine learning unit 51 uses the image analysis amount of each analysis region obtained from all or part of the plurality of facial expression teacher data as a result of the cluster analysis performed by the machine learning unit 41. A face image feature vector is generated by generating a histogram (cluster classification) by classifying the data into clusters.

次に、ステップＳ９において、機械学習部５１は、例えば、サポートベクトルマシンによる機械学習を実行し、顔表情の種類別に顔画像を分類する第２の境界面を計算し、この第２の境界面のデータを顔表情分類部５２に供給する。顔表情分類部５２は、機械学習部５１が供給する第２の境界面のデータを取り込み、この第２の境界面のデータを記憶する。 Next, in step S9, the machine learning unit 51 performs machine learning using, for example, a support vector machine, calculates a second boundary surface that classifies the face image according to the type of facial expression, and the second boundary surface. Is supplied to the facial expression classification unit 52. The facial expression classification unit 52 takes in the data of the second boundary surface supplied from the machine learning unit 51 and stores the data of the second boundary surface.

図８は、顔表情解析装置１が実行する顔表情解析処理の手順を示すフローチャートである。
ステップＳ２１において、モード切替部６０は、顔表情解析モードに設定する。
次に、ステップＳ２２において、画像データ取得部１０は、例えば、撮影装置または記録装置が供給する評価画像データを取り込み、この評価画像データを顔領域抽出部２０に供給する。 FIG. 8 is a flowchart showing a procedure of facial expression analysis processing executed by the facial expression analysis apparatus 1.
In step S21, the mode switching unit 60 sets the facial expression analysis mode.
Next, in step S <b> 22, for example, the image data acquisition unit 10 takes in the evaluation image data supplied from the imaging device or the recording device, and supplies this evaluation image data to the face area extraction unit 20.

次に、ステップＳ２３において、顔領域抽出部２０は、画像データ取得部１０が供給する評価画像データを取り込み、この評価画像データから人物顔の解析領域を抽出する。
具体的に、顔領域検出部２１は、取り込んだ評価画像データに対して顔検出処理を実行し、その評価画像データから人物の顔領域を検出する。
次に、解析領域決定部２２は、顔領域検出部２１が検出した顔領域データを所定画素サイズ（例えば、水平方向１２８画素×垂直方向１２８画素）に正規化する。
次に、解析領域決定部２２は、正規化顔領域データから解析領域を抽出し、この解析領域から二つの解析部分領域（上部解析領域および下部解析領域）を決定する。 Next, in step S <b> 23, the face area extraction unit 20 takes in the evaluation image data supplied from the image data acquisition unit 10, and extracts a human face analysis area from the evaluation image data.
Specifically, the face area detection unit 21 performs face detection processing on the captured evaluation image data, and detects a human face area from the evaluation image data.
Next, the analysis area determination unit 22 normalizes the face area data detected by the face area detection unit 21 to a predetermined pixel size (for example, 128 pixels in the horizontal direction × 128 pixels in the vertical direction).
Next, the analysis region determination unit 22 extracts an analysis region from the normalized face region data, and determines two analysis partial regions (an upper analysis region and a lower analysis region) from this analysis region.

次に、ステップＳ２４において、画像特徴量分析部３０は、前述したステップＳ４の処理と同様に、顔領域抽出部２０が抽出した解析領域データの画像特徴量を計算する。つまり、例えば、画像特徴量分析部３０は、解析領域決定部２２が決定した解析領域における上部解析領域および下部解析領域それぞれのデータについて、画像特徴量（例えば、ＳＵＲＦ特徴量またはＳＩＦＴ特徴量）を計算する。そして、画像特徴量分析部３０は、計算した上部解析領域および下部解析領域それぞれの画像特徴量を、顔表情強度評価部４０および顔表情評価部５０に供給する。 Next, in step S24, the image feature amount analysis unit 30 calculates the image feature amount of the analysis region data extracted by the face region extraction unit 20, similarly to the processing in step S4 described above. That is, for example, the image feature amount analysis unit 30 calculates image feature amounts (for example, SURF feature amount or SIFT feature amount) for each data of the upper analysis region and the lower analysis region in the analysis region determined by the analysis region determination unit 22. calculate. Then, the image feature amount analysis unit 30 supplies the calculated image feature amounts of the upper analysis region and the lower analysis region to the facial expression strength evaluation unit 40 and the facial expression evaluation unit 50, respectively.

次に、ステップＳ２５において、顔表情強度評価部４０は、評価画像データから得られた解析領域の画像特徴量を用いて、機械学習された分類器により顔表情強度値を計算する。
具体的に、顔表情強度値計算部４２は、機械学習部４１が実行したクラスタ分析の結果であるクラスタに画像特徴量を分類（第１のクラスタ分類処理）して、顔画像特徴ベクトル（第１の顔画像特徴ベクトル）を生成する。 Next, in step S25, the facial expression strength evaluation unit 40 calculates the facial expression strength value by the machine-learned classifier using the image feature amount of the analysis region obtained from the evaluation image data.
Specifically, the facial expression intensity value calculation unit 42 classifies image feature amounts into clusters that are the result of the cluster analysis performed by the machine learning unit 41 (first cluster classification process), and generates a facial image feature vector (first clustering process). 1 face image feature vector).

次に、ステップＳ２６において、顔表情強度値計算部４２は、記憶した境界面から顔画像特徴ベクトルまでの距離を計算し、この距離の値を顔表情強度値として出力するとともに、顔表情評価部５０に供給する。 Next, in step S26, the facial expression intensity value calculation unit 42 calculates the distance from the stored boundary surface to the facial image feature vector, outputs the value of this distance as the facial expression intensity value, and the facial expression evaluation unit. 50.

次に、ステップＳ２７において、顔表情評価部５０は、評価画像データから得られた解析領域の画像特徴量と顔表情強度評価部４０が供給した顔表情強度値とに基づいて、機械学習された分類器により顔表情を分類して顔表情種別情報を生成する。
具体的に、顔表情分類部５２は、機械学習部４１が実行したクラスタ分析の結果であるクラスタに画像特徴量を分類（第２のクラスタ分類処理）して、顔画像特徴ベクトル（第２の顔画像特徴ベクトル）を生成する。 In step S27, the facial expression evaluation unit 50 performs machine learning based on the image feature amount of the analysis region obtained from the evaluation image data and the facial expression strength value supplied by the facial expression strength evaluation unit 40. A facial expression is classified by a classifier to generate facial expression type information.
Specifically, the facial expression classification unit 52 classifies image feature amounts into clusters that are the result of the cluster analysis performed by the machine learning unit 41 (second cluster classification process), and performs facial image feature vectors (second Face image feature vector).

次に、顔表情分類部５２は、記憶した境界面に対する顔画像特徴ベクトルの位置関係と顔表情強度値計算部４２から取り込んだ顔表情強度値とに基づいて、解析領域に対応する顔表情種別を示す顔表情種別情報を生成し、この顔表情種別情報を出力する。
具体的に、顔表情分類部５２は、顔表情強度値とあらかじめ決定された閾値とを比較する。そして、顔表情分類部５２は、顔表情強度値が閾値以下である場合、解析領域における顔表情がニュートラル顔表情であると判定し、顔表情強度値が閾値を超える場合、解析領域における顔表情が非ニュートラル顔表情であると判定する。次に、顔表情分類部５２は、顔表情種別がニュートラル顔表情であると判定した場合、ニュートラル顔表情を示す情報を含めた顔表情種別情報を生成する。一方、顔表情分類部５２は、顔表情種別がニュートラル顔表情でないと判定した場合、各境界面に対する顔画像特徴ベクトルの位置を判定して分類を絞り込むことによって顔表情種別情報を生成する。 Next, the facial expression classification unit 52 determines the facial expression type corresponding to the analysis region based on the positional relationship of the facial image feature vector with respect to the stored boundary surface and the facial expression intensity value acquired from the facial expression intensity value calculation unit 42. Is generated, and this facial expression type information is output.
Specifically, the facial expression classification unit 52 compares the facial expression intensity value with a predetermined threshold value. Then, the facial expression classification unit 52 determines that the facial expression in the analysis area is a neutral facial expression if the facial expression intensity value is less than or equal to the threshold, and the facial expression in the analysis area if the facial expression intensity value exceeds the threshold. Is determined to be a non-neutral facial expression. Next, when it is determined that the facial expression type is a neutral facial expression, the facial expression classification unit 52 generates facial expression type information including information indicating the neutral facial expression. On the other hand, when it is determined that the facial expression type is not a neutral facial expression, the facial expression classification unit 52 determines the position of the face image feature vector with respect to each boundary surface and narrows down the classification to generate facial expression type information.

［第２の実施の形態］
図９は、本発明の第２実施形態である顔表情解析装置の機能構成を示すブロック図である。上述した第１実施形態における顔表情解析装置１と同一の構成については、同一の符号を付してその説明を省略する。同図に示すように、顔表情解析装置１ａは、顔表情解析装置１から、顔表情強度評価部４０および顔表情評価部５０を、顔表情強度評価部４０ａおよび顔表情評価部５０ａに変更した構成を有する。 [Second Embodiment]
FIG. 9 is a block diagram showing a functional configuration of a facial expression analysis apparatus according to the second embodiment of the present invention. About the same structure as the facial expression analysis apparatus 1 in 1st Embodiment mentioned above, the same code | symbol is attached | subjected and the description is abbreviate | omitted. As shown in the figure, the facial expression analysis apparatus 1a changes the facial expression strength evaluation unit 40 and the facial expression evaluation unit 50 from the facial expression analysis apparatus 1 to a facial expression strength evaluation unit 40a and a facial expression evaluation unit 50a. It has a configuration.

顔表情解析装置１ａが機械学習モードに設定されているとき、顔表情強度評価部４０ａは、複数の顔表情教師データから得られた各解析領域の画像特徴量を用いて、顔表情ごとに、顔表情強度値を計算するための分類器の機械学習を行う。また、顔表情解析装置１ａが顔表情解析モードに設定されているとき、顔表情強度評価部４０ａは、評価画像データから得られた解析領域の画像特徴量を用い、顔表情評価部５０ａが供給する顔表情種別情報に応じた分類器により顔表情強度値を計算する。
顔表情強度評価部４０ａは、その機能構成として、機械学習部４１ａと、顔表情強度値計算部４２ａとを備える。 When the facial expression analysis apparatus 1a is set to the machine learning mode, the facial expression strength evaluation unit 40a uses the image feature amount of each analysis region obtained from a plurality of facial expression teacher data for each facial expression. Perform machine learning of classifiers to calculate facial expression intensity values. When the facial expression analysis apparatus 1a is set to the facial expression analysis mode, the facial expression strength evaluation unit 40a uses the image feature amount of the analysis region obtained from the evaluation image data, and is supplied by the facial expression evaluation unit 50a. The facial expression strength value is calculated by a classifier corresponding to the facial expression type information to be performed.
The facial expression strength evaluation unit 40a includes a machine learning unit 41a and a facial expression strength value calculation unit 42a as functional configurations.

顔表情解析装置１ａが機械学習モードに設定されているとき、機械学習部４１ａは、画像特徴量分析部３０が供給する、複数の顔表情教師データから得られた各解析領域の画像特徴量を取り込む。そして、機械学習部４１ａは、第１実施形態と同様に、複数の顔表情教師データ分の画像特徴量についてクラスタ分析（クラスタリング）を実行する。 When the facial expression analysis apparatus 1a is set to the machine learning mode, the machine learning unit 41a calculates the image feature amount of each analysis region obtained from the plurality of facial expression teacher data supplied from the image feature amount analysis unit 30. take in. Then, the machine learning unit 41a performs cluster analysis (clustering) on image feature amounts for a plurality of facial expression teacher data, as in the first embodiment.

そして、機械学習部４１ａは、顔表情の種類別に、顔表情教師データの集合における顔表情の度合が最小および最大である顔表情教師データそれぞれに対応する画像特徴量を、クラスタ分析の結果であるクラスタに分類してヒストグラムを生成（クラスタ分類）することにより、顔画像特徴ベクトルを生成する。顔表情の種類が、例えば、怒り、嫌悪、恐れ、喜び、悲しみ、驚きである場合、機械学習部４１ａは、それら６種類の顔表情別に顔画像特徴ベクトルを生成する。 Then, the machine learning unit 41a is a result of the cluster analysis on the image feature amounts corresponding to the facial expression teacher data having the minimum and maximum degrees of facial expression in the set of facial expression teacher data for each type of facial expression. A face image feature vector is generated by classifying into clusters and generating a histogram (cluster classification). When the types of facial expressions are, for example, anger, disgust, fear, joy, sadness, and surprise, the machine learning unit 41a generates a facial image feature vector for each of these six types of facial expressions.

具体的に、機械学習部４１ａは、顔表情の種類別に、顔表情教師データの集合における顔表情の度合が最小および最大である顔表情教師データの上部解析領域に対応する画像特徴量をクラスタに分類する。そして、機械学習部４１ａは、クラスタを階級とし、各クラスタの要素数を頻度とするヒストグラム（第５のヒストグラム）を生成する。また、機械学習部４１ａは、顔表情の種類別に、顔表情教師データの集合における顔表情の度合が最小および最大である顔表情教師データの下部解析領域に対応する画像特徴量をクラスタに分類する。そして、機械学習部４１ａは、クラスタを階級とし、各クラスタの要素数を頻度とするヒストグラム（第６のヒストグラム）を生成する。そして、機械学習部４１ａは、顔表情ごとに、分類結果である第５のヒストグラムと第６のヒストグラムとを連結して解析領域全体に対するヒストグラム（全体ヒストグラム）を生成する。例えば、機械学習部４１ａは、顔表情ごとに、第５のヒストグラムに第６のヒストグラムを連結して全体ヒストグラムを生成する。または、機械学習部４１ａは、顔表情ごとに、第６のヒストグラムに第５のヒストグラムを連結して全体ヒストグラムを生成する。そして、機械学習部４１ａは、各全体ヒストグラムを正規化することにより、顔表情ごとの顔画像特徴ベクトルを生成する。例えば、機械学習部４１ａは、各全体ヒストグラムにおける各階級の頻度を、全階級の頻度の合計値で除算して、顔表情ごとの顔画像特徴ベクトルを生成する。 Specifically, for each type of facial expression, the machine learning unit 41a clusters image feature amounts corresponding to the upper analysis area of the facial expression teacher data in which the degree of facial expression in the set of facial expression teacher data is minimum and maximum. Classify. Then, the machine learning unit 41a generates a histogram (fifth histogram) having clusters as classes and the number of elements of each cluster as a frequency. Further, the machine learning unit 41a classifies image feature amounts corresponding to the lower analysis region of the facial expression teacher data in which the degree of facial expression in the set of facial expression teacher data is minimum and maximum for each type of facial expression into clusters. . Then, the machine learning unit 41a generates a histogram (sixth histogram) having clusters as classes and the number of elements of each cluster as a frequency. Then, the machine learning unit 41a generates a histogram (overall histogram) for the entire analysis region by connecting the fifth histogram and the sixth histogram, which are the classification results, for each facial expression. For example, the machine learning unit 41a generates a whole histogram by connecting the sixth histogram to the fifth histogram for each facial expression. Alternatively, the machine learning unit 41a generates a whole histogram by connecting the fifth histogram to the sixth histogram for each facial expression. Then, the machine learning unit 41a generates a face image feature vector for each facial expression by normalizing each whole histogram. For example, the machine learning unit 41a divides the frequency of each class in each overall histogram by the total value of the frequencies of all classes to generate a face image feature vector for each facial expression.

機械学習部４１ａは、例えば、サポートベクトルマシンによる機械学習を実行し、顔表情の度合が最小である顔画像と、顔表情の度合が最大である顔画像とを分類する境界面（第１の境界面）を顔表情ごとに計算し、これら境界面のデータを顔表情強度値計算部４２ａに供給する。顔表情強度値計算部４２ａは、機械学習部４１ａが供給する顔表情ごとの境界面のデータを取り込み、これら境界面のデータを記憶する。 For example, the machine learning unit 41a performs machine learning using a support vector machine, and classifies a face image having the smallest facial expression degree and a face image having the largest facial expression degree (first surface). Boundary surface) is calculated for each facial expression, and the data of these boundary surfaces is supplied to the facial expression intensity value calculation unit 42a. The facial expression intensity value calculation unit 42a takes in boundary surface data for each facial expression supplied by the machine learning unit 41a and stores the data of these boundary surfaces.

機械学習部４１ａが顔表情ごとに機械学習を行うことにより、顔表情強度値の精度をより高めることができる。 Since the machine learning unit 41a performs machine learning for each facial expression, the accuracy of the facial expression intensity value can be further increased.

顔表情解析装置１ａが顔表情解析モードに設定されているとき、顔表情強度値計算部４２ａは、画像特徴量分析部３０が供給する、評価画像データから得られた解析領域の画像特徴量を取り込む。また、顔表情強度値計算部４２ａは、顔表情評価部５０ａが供給する顔表情種別情報を取り込む。そして、顔表情強度値計算部４２ａは、取り込んだ画像特徴量を機械学習部４１ａが実行したクラスタ分析の結果であるクラスタに分類（第１のクラスタ分類処理）して、顔画像特徴ベクトル（第１の顔画像特徴ベクトル）を生成する。そして、顔表情強度値計算部４２ａは、顔表情種別情報に対応する境界面から顔画像特徴ベクトルまでの距離を計算し、この距離の値を顔表情強度値として出力する。また、顔表情強度値計算部４２ａは、顔表情強度値を顔表情評価部５０ａに供給する。 When the facial expression analysis apparatus 1a is set to the facial expression analysis mode, the facial expression intensity value calculation unit 42a calculates the image feature amount of the analysis region obtained from the evaluation image data supplied from the image feature amount analysis unit 30. take in. Further, the facial expression intensity value calculation unit 42a takes in facial expression type information supplied by the facial expression evaluation unit 50a. Then, the facial expression intensity value calculation unit 42a classifies the captured image feature amount into clusters (first cluster classification process) that are the result of the cluster analysis performed by the machine learning unit 41a, and performs a facial image feature vector (first clustering process). 1 face image feature vector). Then, the facial expression intensity value calculation unit 42a calculates the distance from the boundary surface corresponding to the facial expression type information to the facial image feature vector, and outputs this distance value as the facial expression intensity value. In addition, the facial expression intensity value calculation unit 42a supplies the facial expression intensity value to the facial expression evaluation unit 50a.

顔表情解析装置１ａが機械学習モードに設定されているとき、顔表情評価部５０ａは、第１実施形態における顔表情評価部５０と同様に、顔表情を分類するための分類器の機械学習を行う。また、顔表情解析装置１ａが顔表情解析モードに設定されているとき、顔表情評価部５０ａは、第１実施形態における顔表情評価部５０と同様に、機械学習された分類器により顔表情を分類して顔表情種別情報を生成する。ただし、顔表情評価部５０ａは、生成した顔表情種別情報を顔表情強度評価部４０ａに供給する。 When the facial expression analysis apparatus 1a is set to the machine learning mode, the facial expression evaluation unit 50a performs machine learning of a classifier for classifying facial expressions, similar to the facial expression evaluation unit 50 in the first embodiment. Do. When the facial expression analysis apparatus 1a is set to the facial expression analysis mode, the facial expression evaluation unit 50a performs facial expression using a machine-learned classifier, similar to the facial expression evaluation unit 50 in the first embodiment. Classify and generate facial expression type information. However, the facial expression evaluation unit 50a supplies the generated facial expression type information to the facial expression strength evaluation unit 40a.

顔表情評価部５０ａは、その機能構成として、機械学習部５１と、顔表情分類部５２ａとを備える。機械学習部５１は、第１実施形態における機械学習部５１と同等であるため、ここではその説明を省略する。
顔表情解析装置１ａが顔表情解析モードに設定されているとき、顔表情分類部５２ａは、第１実施形態における顔表情分類部５２と同様に顔表情種別情報を生成する。そして、顔表情分類部５２ａは、生成した顔表情種別情報を出力するとともに顔表情強度評価部４０ａに供給する。 The facial expression evaluation unit 50a includes a machine learning unit 51 and a facial expression classification unit 52a as functional configurations. Since the machine learning unit 51 is equivalent to the machine learning unit 51 in the first embodiment, the description thereof is omitted here.
When the facial expression analysis device 1a is set to the facial expression analysis mode, the facial expression classification unit 52a generates facial expression type information in the same manner as the facial expression classification unit 52 in the first embodiment. Then, the facial expression classification unit 52a outputs the generated facial expression type information and supplies it to the facial expression strength evaluation unit 40a.

［第３の実施の形態］
前述した第１実施形態である顔表情解析装置１を顔表情解析モードに設定し、動画像データを供給して顔表情解析処理を実行させた場合、顔表情解析装置１が生成する、一連のキーフレームそれぞれの顔表情種別情報に、周囲と異なる種類の顔表情種別情報が突発的に現出する場合がある。周囲と異なる種類の顔表情種別情報が突発的に現出する原因は、例えば、人物顔を撮影する際の照明による影やカメラに対する顔の向き等が顔表情に影響したり、顔表情強度値のばらつきが影響したりすることである。
本発明の第３実施形態である顔表情解析装置は、この突発的に現出する顔表情種別情報をノイズとみなして除去する。 [Third Embodiment]
When the facial expression analysis apparatus 1 according to the first embodiment described above is set to the facial expression analysis mode, moving image data is supplied and the facial expression analysis process is executed, the facial expression analysis apparatus 1 generates a series of There are cases where face expression type information of a different type from the surroundings suddenly appears in the face expression type information of each key frame. The causes of sudden appearance of facial expression type information of a different type from the surroundings are, for example, shadows due to lighting when shooting a human face, face orientation with respect to the camera, etc. It is that the variation of the influence.
The facial expression analysis apparatus according to the third embodiment of the present invention removes the facial expression type information that appears suddenly as noise.

本実施形態である顔表情解析装置の構成は第１実施形態と同様であるため、図１のブロック図を参照して以下説明する。
顔表情解析装置１の顔表情強度評価部４０における顔表情強度値計算部４２は、複数フレーム分の画像データを含む区間（時間、フレーム数）ごとに、顔表情強度値の平均を計算し、平均値を当該区間における代表顔表情強度値とする。 Since the configuration of the facial expression analysis apparatus according to this embodiment is the same as that of the first embodiment, it will be described below with reference to the block diagram of FIG.
The facial expression intensity value calculation unit 42 in the facial expression intensity evaluation unit 40 of the facial expression analysis apparatus 1 calculates an average of facial expression intensity values for each section (time, number of frames) including image data for a plurality of frames. The average value is set as the representative facial expression intensity value in the section.

また、顔表情解析装置１の顔表情評価部５０における顔表情分類部５２は、上記の区間ごとに、顔表情の種類別に顔表情強度値の総和を計算し、総和値（重要度）が最大となる顔表情の種類（代表顔種別）を示す顔表情種別情報を生成する。 In addition, the facial expression classification unit 52 in the facial expression evaluation unit 50 of the facial expression analysis apparatus 1 calculates the sum of facial expression intensity values for each type of facial expression for each of the above sections, and the total value (importance) is maximum. The facial expression type information indicating the type of facial expression (representative face type) is generated.

図１０は、顔表情解析装置１の出力結果を模式的に示した図である。同図における上段のグラフは、第１実施形態である顔表情解析装置１に動画像データを供給した場合に、顔表情解析装置１が出力する顔表情強度値を時系列に示したグラフである。このグラフは、横軸を時間軸とし、縦軸を顔表情強度値としている。このグラフが示すように、第１実施形態である顔表情解析装置１が出力する顔表情強度値は、時間経過に対してばらつきがある。 FIG. 10 is a diagram schematically showing the output result of the facial expression analysis apparatus 1. The upper graph in the figure is a graph showing in time series the facial expression intensity values output by the facial expression analysis apparatus 1 when moving image data is supplied to the facial expression analysis apparatus 1 according to the first embodiment. . In this graph, the horizontal axis is the time axis, and the vertical axis is the facial expression intensity value. As this graph shows, the facial expression intensity value output by the facial expression analysis apparatus 1 according to the first embodiment varies with time.

また、このグラフの直下にある△、▲、および□記号（便宜上、顔表情記号と呼ぶ）は、顔表情解析装置１が出力する顔表情種別情報を示す記号であり、グラフの時間軸に対応付けて図示されている。ここでは、△は幸せ、▲は驚き、□は怒りを示す記号である。このグラフ直下の顔表情記号によれば、一連の時間において、幸せを示す顔表情の中に、突発的に驚きや怒りの顔表情が現出している。 In addition, Δ, ▲, and □ symbols (referred to as facial expression symbols for convenience) immediately below the graph are symbols indicating facial expression type information output by the facial expression analysis device 1, and correspond to the time axis of the graph. It is shown in the drawing. Here, Δ is a symbol indicating happiness, ▲ is a surprise, and □ is a symbol indicating anger. According to the facial expression symbols immediately below the graph, surprise and anger facial expressions suddenly appear in the facial expression showing happiness in a series of times.

また、図１０における下段のグラフは、本実施形態である顔表情解析装置１に動画像データを供給した場合に、顔表情解析装置１が出力する顔表情強度値を時系列に示したグラフである。このグラフも、横軸を時間軸とし、縦軸を顔表情強度値としている。このグラフが示すように、本実施形態である顔表情解析装置１は、複数フレーム（例えば１０フレーム）ごと（Ｔ_１，Ｔ_２，Ｔ_３，・・・）ではあるが、ばらつきを抑えた顔表情強度値を出力することができ、複数の区間を含む時間における顔表情強度値の信頼度を向上させることができる。 Further, the lower graph in FIG. 10 is a graph showing the facial expression intensity values output by the facial expression analysis apparatus 1 in time series when moving image data is supplied to the facial expression analysis apparatus 1 according to the present embodiment. is there. In this graph, the horizontal axis is the time axis, and the vertical axis is the facial expression intensity value. As shown in this graph, the facial expression analysis apparatus 1 according to the present embodiment is a face that suppresses variation, although every frame (for example, 10 frames) (T ₁ , T ₂ , T ₃ ,...). The expression intensity value can be output, and the reliability of the facial expression intensity value in a time including a plurality of sections can be improved.

また、このグラフの直下の顔表情記号によれば、一連の時間において、突発的な顔表情が現出することなく、安定した顔表情分類の結果が示されている。つまり、本実施形態である顔表情解析装置１は、顔表情強度値の重要度が最大となるように顔表情の分類を行うことによって顔表情のノイズを除去し、顔表情分類の精度を高めることができる。 Further, according to the facial expression symbol immediately below the graph, a stable facial expression classification result is shown without a sudden facial expression appearing in a series of times. That is, the facial expression analysis apparatus 1 according to the present embodiment removes facial expression noise by classifying facial expressions so that the importance of the facial expression intensity value is maximized, thereby improving the accuracy of facial expression classification. be able to.

［第３の実施の形態の変形例］
上述した第３実施形態では、顔表情上解析装置１は、区間ごと（例えば、１０フレームごと）に顔表情強度値および顔表情種別情報を得るものであった。
本発明の第３実施形態の変形例である顔表情解析装置は、上記の区間を時間方向にずらしながら顔表情強度値および顔表情種別情報を得る。
つまり、顔表情強度値計算部４２は、一区間に含まれる複数フレームよりも少ないフレーム数おきに、当該区間をそのフレーム数分ずらし、顔表情強度値の平均を計算し、平均値を当該区間における代表顔表情強度値とする。 [Modification of Third Embodiment]
In the third embodiment described above, the facial expression analysis apparatus 1 obtains the facial expression strength value and facial expression type information for each section (for example, every 10 frames).
A facial expression analysis apparatus, which is a modification of the third embodiment of the present invention, obtains facial expression intensity values and facial expression type information while shifting the above section in the time direction.
That is, the facial expression strength value calculation unit 42 shifts the section by the number of frames every frame number smaller than a plurality of frames included in one section, calculates the average of facial expression strength values, and calculates the average value for the section. The representative facial expression intensity value at.

また、顔表情分類部５２は、上記の区間ごとに、顔表情の種類別に顔表情強度値の総和を計算し、総和値が最大となる顔表情の種類（代表顔種別）を示す顔表情種別情報を生成する。 In addition, the facial expression classification unit 52 calculates the sum of facial expression intensity values for each type of facial expression for each of the sections described above, and indicates the facial expression type (representative face type) that maximizes the total value. Generate information.

図１１は、顔表情解析装置１の出力結果を模式的に示した図である。同図における各グラフは、第３実施形態の変形例である顔表情解析装置１に動画像データを供給した場合に、顔表情解析装置１が出力する顔表情強度値を時系列に示したグラフである。各グラフは、横軸を時間軸とし、縦軸を顔表情強度値としている。時刻ｔ_１、時刻ｔ_２、および時刻ｔ_３は、連続するフレームに対する時刻である。また、時間（ｔ_ｐ＋ｔ_ｆ）は、一区間である。 FIG. 11 is a diagram schematically showing the output result of the facial expression analysis apparatus 1. Each graph in the figure shows the facial expression intensity values output by the facial expression analysis apparatus 1 in time series when moving image data is supplied to the facial expression analysis apparatus 1 which is a modification of the third embodiment. It is. In each graph, the horizontal axis is the time axis, and the vertical axis is the facial expression intensity value. Time t ₁ , time t ₂ , and time t ₃ are times for successive frames. The time (t _p + t _f ) is one section.

また、各グラフの直下にある△記号（顔表情記号）は、顔表情解析装置１が出力する顔表情種別情報を示す記号（例えば、幸せを示す）であり、グラフの時間軸に対応付けて図示されている。これらグラフ直下の顔表情記号によれば、連続する時刻ｔ_１、時刻ｔ_２、および時刻ｔ_３それぞれにおいて、安定した顔表情分類の結果が示されている。 Further, a Δ symbol (face expression symbol) immediately below each graph is a symbol (for example, indicating happiness) indicating the facial expression type information output by the facial expression analysis apparatus 1, and is associated with the time axis of the graph. It is shown in the figure. According to the facial expression symbols immediately below these graphs, the result of stable facial expression classification is shown at each of time t ₁ , time t ₂ , and time t ₃ .

図１１における上段のグラフおよび顔表情記号は、時刻（ｔ_１−ｔ_ｐ）から時刻（ｔ_１＋ｔ_ｆ）までの区間を対象として、顔表情強度値計算部４２が代表顔表情強度値を計算し、顔表情分類部５２が代表顔種別を示す顔表情種別情報を生成することを示している。
また、同図における中段のグラフおよび顔表情記号は、時刻（ｔ_２−ｔ_ｐ）から時刻（ｔ_２＋ｔ_ｆ）までの区間を対象として、顔表情強度値計算部４２が代表顔表情強度値を計算し、顔表情分類部５２が代表顔種別を示す顔表情種別情報を生成することを示している。
また、同図における下段のグラフおよび顔表情記号は、時刻（ｔ_３−ｔ_ｐ）から時刻（ｔ_３＋ｔ_ｆ）までの区間を対象として、顔表情強度値計算部４２が代表顔表情強度値を計算し、顔表情分類部５２が代表顔種別を示す顔表情種別情報を生成することを示している。 In the upper graph and facial expression symbols in FIG. 11, the facial expression intensity value calculation unit 42 calculates the representative facial expression intensity value for the section from time (t ₁ -t _p ) to time (t ₁ + t _f ). The facial expression classification unit 52 generates the facial expression type information indicating the representative face type.
Also, middle graph and facial expression symbols in the figure, the time a section from (t 2 _-t _p) to time _(t 2 ₊ t _f) as a target, facial expression intensity value calculating unit 42 is representative facial expression intensity values It is shown that the facial expression classification unit 52 generates facial expression type information indicating the representative face type.
Also, lower graph and facial expression symbols in the figure, the time a section from (t 3 _-t _p) to time _(t 3 ₊ t _f) as a target, facial expression intensity value calculating unit 42 is representative facial expression intensity values It is shown that the facial expression classification unit 52 generates facial expression type information indicating the representative face type.

つまり、図１１によれば、顔表情解析装置１は、ばらつきを抑えて信頼度を向上させた顔表情強度値および安定した顔表情種別情報を、フレームごとに出力することができる。 That is, according to FIG. 11, the facial expression analysis apparatus 1 can output the facial expression intensity value and the stable facial expression type information with improved reliability by suppressing variation for each frame.

以上説明したとおり、第１実施形態〜第３実施形態および変形によれば、顔表情解析装置１，１ａは、画像特徴量に対し第１のクラスタ分類処理を実行して第１の顔画像特徴ベクトルを生成し、顔画像特徴ベクトル空間においてあらかじめ決定された第１の境界面から第１の顔画像特徴ベクトルまでの距離である顔表情強度値を計算する顔表情強度評価部４０を備えた。
また、顔表情解析装置１，１ａは、画像特徴量に対し第２のクラスタ分類処理を実行して第２の顔画像特徴ベクトルを生成し、顔画像特徴ベクトル空間においてあらかじめ決定された第２の境界面に対する第２の顔画像特徴ベクトルの位置関係と顔表情強度評価部４０が計算した顔表情強度値とに基づき、解析領域に対応する顔表情種別を示す顔表情種別情報を生成する顔表情評価部５０を備えた。 As described above, according to the first to third embodiments and the modifications, the facial expression analysis apparatuses 1 and 1a execute the first cluster classification process on the image feature amount and perform the first facial image feature. A facial expression strength evaluation unit 40 is provided that generates a vector and calculates a facial expression strength value that is a distance from a first boundary surface determined in advance in the facial image feature vector space to the first facial image feature vector.
Further, the facial expression analysis apparatuses 1 and 1a execute a second cluster classification process on the image feature quantity to generate a second face image feature vector, and the second facial image feature vector space determined in advance in the face image feature vector space. Based on the positional relationship of the second facial image feature vector with respect to the boundary surface and the facial expression intensity value calculated by the facial expression intensity evaluation unit 40, facial expression that generates facial expression type information indicating the facial expression type corresponding to the analysis region An evaluation unit 50 is provided.

このように構成したことにより、無表情な顔つきから表情の種類を判別困難な程度の顔つきまでを示す中立的な顔表情（ニュートラル顔表情）を分類することが容易となった。したがって、第１実施形態〜第３実施形態および変形によれば、中立的な顔表情の分類を容易にするとともに、顔表情分類の精度を高めることができる。 With this configuration, it becomes easy to classify neutral facial expressions (neutral facial expressions) that show a range from an expressionless face to a face whose degree of expression is difficult to distinguish. Therefore, according to the first to third embodiments and modifications, it is possible to facilitate the classification of neutral facial expressions and to improve the accuracy of facial expression classification.

なお、第１実施形態〜第３実施形態および変形例では、解析領域決定部２２は、解析領域を二つの解析部分領域に分割する例であった。解析領域の分割数は二つに限られない。すなわち、解析領域決定部２２は、解析領域を分割しなくてもよいし、三つ以上の解析部分領域に分割してもよい。 In the first embodiment to the third embodiment and the modification, the analysis region determination unit 22 is an example of dividing the analysis region into two analysis partial regions. The number of divisions of the analysis area is not limited to two. That is, the analysis region determination unit 22 may not divide the analysis region, or may divide the analysis region into three or more analysis partial regions.

また、上述した実施形態および変形例における各顔表情解析装置の一部の機能をコンピュータで実現するようにしてもよい。この場合、その機能を実現するための顔表情解析プログラムをコンピュータ読み取り可能な記録媒体に記録し、この記録媒体に記録された顔表情解析プログラムをコンピュータシステムに読み込ませて、このコンピュータシステムが実行することによって実現してもよい。なお、このコンピュータシステムとは、オペレーティング・システム（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ；ＯＳ）や周辺装置のハードウェアを含むものである。また、コンピュータ読み取り可能な記録媒体とは、フレキシブルディスク、光磁気ディスク、光ディスク、メモリカード等の可搬型記録媒体、コンピュータシステムに備えられる磁気ハードディスクやソリッドステートドライブ等の記憶装置のことをいう。さらに、コンピュータ読み取り可能な記録媒体とは、インターネット等のコンピュータネットワーク、および電話回線や携帯電話網を介してプログラムを送信する場合の通信回線のように、短時間の間、動的にプログラムを保持するもの、さらには、その場合のサーバ装置やクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持するものを含んでもよい。また上記の顔表情解析プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせにより実現するものであってもよい。 Moreover, you may make it implement | achieve a part of function of each facial expression analyzer in embodiment and the modification mentioned above with a computer. In this case, a facial expression analysis program for realizing the function is recorded on a computer-readable recording medium, and the facial expression analysis program recorded on the recording medium is read into the computer system and executed by the computer system. May be realized. This computer system includes an operating system (OS) and hardware of peripheral devices. The computer-readable recording medium is a portable recording medium such as a flexible disk, a magneto-optical disk, an optical disk, or a memory card, and a storage device such as a magnetic hard disk or a solid state drive provided in the computer system. Furthermore, a computer-readable recording medium dynamically holds a program for a short time, such as a computer network such as the Internet, and a communication line when transmitting a program via a telephone line or a cellular phone network. In addition, a server that holds a program for a certain period of time, such as a volatile memory inside a computer system serving as a server device or a client in that case, may be included. Further, the facial expression analysis program described above may be for realizing a part of the functions described above, and further, the function described above is realized by a combination with a program already recorded in a computer system. There may be.

以上、本発明の実施の形態について図面を参照して詳述したが、具体的な構成はその実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計等も含まれる。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the specific structure is not restricted to that embodiment, The design of the range which does not deviate from the summary of this invention, etc. are included.

１，１ａ顔表情解析装置
１０画像データ取得部
２０顔領域抽出部
２１顔領域検出部
２２解析領域決定部
３０画像特徴量分析部（画像特徴量計算部）
４０，４０ａ顔表情強度評価部
４１，４１ａ機械学習部
４２，４２ａ顔表情強度値計算部
５０，５０ａ顔表情評価部
５１機械学習部
５２，５２ａ顔表情分類部
６０モード切替部 DESCRIPTION OF SYMBOLS 1,1a Facial expression analyzer 10 Image data acquisition part 20 Face area extraction part 21 Face area detection part 22 Analysis area determination part 30 Image feature-value analysis part (image feature-value calculation part)
40, 40a Facial expression strength evaluation unit 41, 41a Machine learning unit 42, 42a Facial expression strength value calculation unit 50, 50a Facial expression evaluation unit 51 Machine learning unit 52, 52a Facial expression classification unit 60 Mode switching unit

Claims

An image data acquisition unit for capturing image data;
A face region extraction unit that extracts a face analysis region from the image data captured by the image data acquisition unit;
An image feature amount calculation unit for calculating an image feature amount of the analysis region extracted by the face region extraction unit;
A first cluster classification process is performed on the image feature amount calculated by the image feature amount calculation unit to generate a first face image feature vector, and a first boundary determined in advance in the face image feature vector space A facial expression strength evaluation unit that calculates a facial expression strength value that is a distance from a surface to the first facial image feature vector;
A second cluster classification process is performed on the image feature amount to generate a second face image feature vector, and the second face image feature with respect to a second boundary surface determined in advance in the face image feature vector space. A facial expression evaluation unit that generates facial expression type information indicating a facial expression type corresponding to the analysis region based on a positional relationship between vectors and the facial expression intensity value calculated by the facial expression intensity evaluation unit;
A facial expression analysis apparatus comprising:

The facial expression evaluation unit determines whether the facial expression type corresponding to the analysis region is a neutral facial expression based on the facial expression intensity value, and determines that the facial expression type is not the neutral facial expression 2. The facial expression analysis apparatus according to claim 1, wherein the facial expression type information is generated based on a positional relationship of the second facial image feature vector with respect to the second boundary surface.

The facial expression strength evaluation unit calculates the facial expression strength value that is a distance from a boundary surface corresponding to the facial expression type information generated by the facial expression evaluation unit to the first facial image feature vector. The facial expression analysis apparatus according to claim 1, wherein the facial expression analysis apparatus is characterized.

The first boundary surface is acquired from a facial expression teacher data group configured by associating a label indicating the type of facial expression with a set of facial expression teacher data having different degrees of facial expression for each type of facial expression. The image feature amount is calculated for each analysis region of the plurality of facial expression teacher data, the image feature amount for the plurality of facial expression teacher data is clustered, and the degree of the facial expression in the set for each type of facial expression is determined. It is calculated by a support vector machine applying a face image feature vector obtained by classifying image feature amounts corresponding to minimum and maximum facial expression teacher data into clusters as a result of the cluster analysis. The facial expression analysis apparatus according to any one of claims 1 to 3.

The second boundary surface is applied with a face image feature vector obtained by classifying image feature amounts corresponding to all or a part of each of the plurality of face expression teacher data into each of the clusters. 5. The facial expression analysis apparatus according to claim 4, wherein the facial expression analysis apparatus is calculated by a support vector machine.

The face area extraction unit divides the analysis area into a plurality of analysis partial areas,
The image feature amount calculation unit calculates an image feature amount of each of the plurality of analysis partial regions,
The facial expression strength evaluation unit performs the first cluster classification process on the image feature amount of each of the plurality of analysis partial regions, and combines the classification results to obtain the first facial image feature vector. Generate
The facial expression evaluation unit performs the second cluster classification process on the image feature amount of each of the plurality of analysis partial regions, and generates the second face image feature vector by connecting the classification results. The facial expression analysis apparatus according to any one of claims 1 to 5, wherein:

The facial expression evaluation unit calculates a sum of facial expression intensity values for each facial expression type for each predetermined section including image data for a plurality of frames, and facial expression type information indicating the facial expression type with the maximum total value The facial expression analysis apparatus according to any one of claims 1 to 6, wherein:

The facial expression analysis apparatus according to claim 7, wherein the facial expression evaluation unit shifts the predetermined section by the number of frames every frame number smaller than the plurality of frames.

Computer
An image data acquisition unit for capturing image data;
A face region extraction unit that extracts an analysis region from the image data captured by the image data acquisition unit;
An image feature amount calculation unit for calculating an image feature amount of the analysis region extracted by the face region extraction unit;
A first cluster classification process is performed on the image feature amount calculated by the image feature amount calculation unit to generate a first face image feature vector, and a first boundary determined in advance in the face image feature vector space A facial expression strength evaluation unit that calculates a facial expression strength value that is a distance from a surface to the first facial image feature vector;
A second cluster classification process is performed on the image feature amount to generate a second face image feature vector, and the second face image feature with respect to a second boundary surface determined in advance in the face image feature vector space. A facial expression evaluation unit that generates facial expression type information indicating a facial expression type corresponding to the analysis region based on a positional relationship between vectors and the facial expression intensity value calculated by the facial expression intensity evaluation unit;
Facial expression analysis program to function as.