JP2007272398A

JP2007272398A - Classification device, program, and method

Info

Publication number: JP2007272398A
Application number: JP2006095204A
Authority: JP
Inventors: Masaki Ishii; 雅樹石井; Kazuto Sato; 和人佐藤; Hirokazu Madokoro; 洋和間所
Original assignee: Akita Prefecture
Current assignee: Akita Prefecture
Priority date: 2006-03-30
Filing date: 2006-03-30
Publication date: 2007-10-18

Abstract

<P>PROBLEM TO BE SOLVED: To classify classification object data as a classification problem whose number of categories is unknown. <P>SOLUTION: The learning of a self-organized map is operated by using face image data 62 for learning so that the coupled loads of map layer units U0 to U4 of a map layer 64 are determined as a visualization image 66. Then, correlation coefficients r0 to r3 of the coupled loads between the adjacent map layer units are calculated, and a gap between the map layer units U3 and U4 showing the minimum r3 is defined as a division boundary 70 in the first layer. The map layer unit U4 is classified into only one, and the classification of the second layer is not operated. On the other hand, the classification of the map layer units U0 to U3 is insufficient, and the classification of the second layer is operated. This processing is continued until the classification is sufficiently operated based on the classification characteristics. Thus, the expression feature space of a binary tree corresponding to the face image data for learning is finally formed. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、自己組織化マップを用いたデータ分類技術、特に、分類されたデータをさらに細かく分類する技術に関する。 The present invention relates to a data classification technique using a self-organizing map, and more particularly to a technique for further classifying classified data.

顔画像データを、顔の表情に応じて分類する技術が知られている。 A technique for classifying face image data according to facial expressions is known.

下記特許文献１には、顔器官（目や口など）の輪郭形状変化を数値化し、表情を推定する技術が開示されている。この技術では、驚き、喜び、怒り、中立の４表情を対象として、表情の推定を行っている。 Patent Document 1 listed below discloses a technique for estimating a facial expression by quantifying changes in the contour shape of a facial organ (such as eyes and mouth). In this technique, facial expressions are estimated for four facial expressions of surprise, joy, anger, and neutrality.

下記特許文献２には、初めから基本６表情（驚き、喜び、悲しみ、恐れ、嫌悪、怒り）のカテゴリを決定し、それらの特徴ベクトルによるコードブックを基にして、マッチングによる表情認識を行う技術が開示されている。そして、無表情から基本６表情への遷移について、ＨＭＭ（隠れマルコフモデル）を用いた認識処理を行っている。 Patent Document 2 listed below is a technology for determining the category of basic six facial expressions (surprise, joy, sadness, fear, disgust, anger) from the beginning, and performing facial expression recognition by matching based on a code book based on those feature vectors. Is disclosed. Then, the recognition process using the HMM (Hidden Markov Model) is performed for the transition from the expressionless to the basic six expressions.

下記特許文献３の技術は、笑顔についてのみの判定を、口の開き具合の計測に基づいて行うものである。具体的には、色情報を用いて唇領域（口領域）や歯領域を抽出し、口領域の外形、内径、面積や、歯領域の面積などから口の開き具合を認識している。 The technique of the following Patent Document 3 performs determination only for a smile based on measurement of the degree of opening of the mouth. Specifically, the lip region (mouth region) and the tooth region are extracted using the color information, and the opening degree of the mouth is recognized from the outer shape, inner diameter, and area of the mouth region, the area of the tooth region, and the like.

これらの文献の技術では、顔器官の輪郭形状の抽出や顔器官上の特徴点の抽出に失敗した場合に、認識処理に破綻をきたしたり、処理の負荷が増加したりするため、リアルタイム性の高い認識が困難となる。また、これらの技術は、いずれも予め定められたいくつかの表情へと顔画像データを分類するものである。しかし、個人差等の要因により、同一の感情であっても表情の特徴が異なることがあるため、基本６表情とは異なる表情を分類のベースとする方が良い場合も考えられる。 In the techniques of these documents, if the extraction of the facial organ contour shape or the feature points on the facial organ fails, the recognition processing may fail or the processing load will increase. High recognition becomes difficult. In addition, these techniques classify face image data into several predetermined facial expressions. However, because the characteristics of facial expressions may differ even for the same emotion due to factors such as individual differences, it may be better to use a facial expression different from the basic six facial expressions as the basis of classification.

これに対し、下記非特許文献１乃至３の技術では、自己組織化マップを利用して、個人の表情を分類している。この技術では、顔画像データの分類カテゴリの基準を予め定めることはせず、大きな特徴から小さな特徴へと次々と二分割して表情を分類している。これにより、個人の表情の特徴に適したカテゴリへと各表情が分類されることになる。 On the other hand, in the techniques of Non-Patent Documents 1 to 3 below, individual facial expressions are classified using a self-organizing map. In this technique, the classification category of facial image data is not set in advance, and facial expressions are classified into two parts one after another from a large feature to a small feature. As a result, each facial expression is classified into a category suitable for the characteristics of the individual facial expression.

特開２００５−２９３５３９号公報JP 2005-293539 A 特開平８−２４９４５３号公報JP-A-8-249453 特開２００５−２３４６８６号公報JP 2005-234686 A 石井雅樹、佐藤和人、間所洋和、門脇さくら、西田眞、「顔の動的変化に着目した表情空間モデルの最適化に関する検討」、電子情報通信学会技術研究報告、電子情報通信学会、平成１７年（２００５年）３月１１日、第１０４巻、第７４２号、ＰＲＭＵ２００４−２５０、ｐｐ．１０９−１１４Masaki Ishii, Kazuto Sato, Hirokazu Mazokoro, Sakura Kadowaki, Satoshi Nishida, “Examination on Optimization of Facial Space Model Focusing on Dynamic Changes in Face”, IEICE Technical Report, IEICE, March 11, 2005, Vol. 104, No. 742, PRMU 2004-250, pp. 109-114 石井雅樹、佐藤和人、間所洋和、西田眞、「顔画像の位相特性に基づいた個人固有の表情特徴空間の形成」、画像の認識・理解シンポジウム２００５論文集、情報処理学会、平成１７年（２００５年）７月１８日、ｐｐ．７６３−７７０Masaki Ishii, Kazuhito Sato, Hirokazu Masho, Satoshi Nishida, “Formation of personalized facial expression feature space based on phase characteristics of facial images”, Proceedings of Image Recognition and Understanding Symposium 2005, Information Processing Society of Japan, 2005 July 18, 2005, pp. 763-770 石井雅樹、西田眞、佐藤和人、間所洋和、「顔画像の位相特性に基づいた個人固有の表情特徴空間の形成に関する検討」、平成１７年度電気関係学会東北支部連合大会講演論文集、平成１７年度電気関係学会東北支部連合大会実行委員会、平成１７年８月２５日、ｐ．２９７Masaki Ishii, Satoshi Nishida, Kazuto Sato, Hirokazu Mazokoro, “Examination of the formation of personal facial expression feature space based on the phase characteristics of facial images”, Proceedings of the 2005 Annual Conference of the Tohoku Branch of the Electrical Engineering Society, 2005 Electrical Engineering Society Tohoku Branch Union Conference Executive Committee, August 25, 2005, p. 297

上記非特許文献１乃至３の技術は、上記特許文献１乃至３のように、固定された分類カテゴリへの分類を行うものではない。しかし、上記非特許文献１の技術では、あらかじめ階層的な分類の深さを定めてしまっている。このため、多様な表情を捉え切れない場合がありえる。特に、個人差、性別差、民族差などの違いや、顔の向きの違い、照明の違いなども考慮すると、予定した数よりも多くの分類を行った方が好ましいことがありえる。また、実際にはそれほど多様な表情が無いにもかかわらず、カテゴリが予定した数になるまで、強引に分類を行ってしまうこともありえる。 The techniques of Non-Patent Documents 1 to 3 do not perform classification into fixed classification categories as in Patent Documents 1 to 3. However, in the technique of Non-Patent Document 1, the hierarchical classification depth is determined in advance. For this reason, it may be impossible to capture various facial expressions. In particular, considering differences such as individual differences, gender differences, ethnic differences, face orientations, lighting differences, etc., it may be preferable to perform more classifications than planned. In addition, even though there are actually not so many different expressions, it may be forcibly classified until the number of categories reaches a predetermined number.

なお、表情分類は、技術的に困難である一方、Human Computer Interaction をはじめとする様々な分野での応用が期待されている。しかし、表情分類の技術は、必ずしも顔データの分類に限定されるものではなく、一般的な画像、動画、音声などの各種センサデータはもとより、図形や文書などの様々なデータの分類にも適用できることが多い。 Although facial expression classification is technically difficult, it is expected to be applied in various fields including Human Computer Interaction. However, facial expression classification technology is not necessarily limited to facial data classification, and it can be applied to various types of data such as graphics and documents as well as various sensor data such as general images, movies, and sounds. There are many things you can do.

本発明の目的は、カテゴリ数が未知の分類問題として、分類対象データの分類を行う技術を確立することにある。 An object of the present invention is to establish a technique for classifying data to be classified as a classification problem with an unknown number of categories.

本発明の別の目的は、表情の多様性に応じて、顔画像データの分類カテゴリ数を変更する技術を実現することにある。 Another object of the present invention is to realize a technique for changing the number of classification categories of face image data in accordance with the variety of facial expressions.

本発明のさらに別の目的は、個人の顔画像データをもとに、その個人に固有の表情特徴空間を形成する新たな技術を開発することにある。 Still another object of the present invention is to develop a new technique for forming a facial expression feature space unique to an individual based on the individual face image data.

本発明の分類装置は、複数の学習データを順次マップ層へと自己組織化マッピングさせる学習演算を行い、得られた自己組織化マップの分類特性を評価して、前記マップ層を二以上のカテゴリに分類する分類処理手段と、前記分類特性を評価して、あるカテゴリの再分類の必要性を判定する判定手段と、前記判定手段により再分類が必要であると判定された場合に、そのカテゴリに属する前記学習データを対象として、再度、前記分類処理手段に分類を行わせる階層化手段と、を備える。 The classification apparatus according to the present invention performs a learning operation for sequentially mapping a plurality of learning data to a map layer, evaluates classification characteristics of the obtained self-organizing map, and divides the map layer into two or more categories. A classification processing unit that classifies the classification characteristics, a determination unit that evaluates the classification characteristics and determines the necessity of reclassification of a certain category, and when the determination unit determines that reclassification is necessary, the category And hierarchizing means for causing the classification processing means to classify again the learning data belonging to.

分類装置は、演算機能及び記憶機能を備えたコンピュータハードウエアと、その動作を制御するソフトウエアとを用いて構築することができる。分類装置は、通信可能な複数のハードウエアを用いた分散システムとして構築されても、単体のハードウエアを用いた集中システムとして構築されてもよい。 The classification device can be constructed using computer hardware having an arithmetic function and a storage function and software for controlling the operation. The classification device may be constructed as a distributed system using a plurality of communicable hardware or may be constructed as a centralized system using a single piece of hardware.

分類処理手段は、自己組織化マップの学習演算を行って、そのマップ層を二以上のカテゴリに分類する。自己組織化マップとは、分類対象データが入力される入力層と、その入力が写像（マッピング）されるマップ層とを備えた（さらに中間層を備えることもできる）数理モデルであり、分類装置内では記憶装置（メモリ）上に構築される。自己組織化マップの学習では、教師信号は用いられず、代わりに、分類対象データとしての学習データの入力に対し、マップ層を構成するマップ層ユニットの競合と更新が行われる。これにより、一般に、自己組織化マップは、類似する分類対象データ同士を同一又は近傍のマップ層ユニットに写像するような分類特性を備えるに至る。分類処理手段は、予め設定された分類条件に当てはめるなどして分類特性を評価し、マップ層を二以上のカテゴリに分類する。 The classification processing means performs learning calculation of the self-organizing map and classifies the map layer into two or more categories. The self-organizing map is a mathematical model that includes an input layer to which data to be classified is input and a map layer to which the input is mapped (mapped) (and can further include an intermediate layer). It is built on a storage device (memory). In the learning of the self-organizing map, the teacher signal is not used, and instead, the competition and update of the map layer units constituting the map layer are performed with respect to the input of the learning data as the classification target data. As a result, in general, the self-organizing map has classification characteristics that map similar classification target data to the same or nearby map layer units. The classification processing means evaluates the classification characteristics by applying to a preset classification condition and classifies the map layer into two or more categories.

判定手段は、あるカテゴリの再分類の必要性を判定する手段である。判定は、分類特性の評価によって行われる。この評価態様は、分類処理手段と同じであってもよいが、異なっていてもよい。例えば、判定手段は、あるカテゴリ内の分類特性を評価して、予め設定された再分類条件に基づき、再分類の必要性を判定する。 The determining means is means for determining the necessity of reclassification of a certain category. The determination is made by evaluating the classification characteristics. This evaluation mode may be the same as the classification processing unit, but may be different. For example, the determination unit evaluates the classification characteristics in a certain category, and determines the necessity of reclassification based on preset reclassification conditions.

階層化手段は、再分類が必要であると判定された場合に、そのカテゴリに属する学習データを対象として、再度、分類処理手段による分類を実行させる手段である。これにより、そのカテゴリが、複数の下位カテゴリに再分類される。 The hierarchizing unit is a unit that, when it is determined that reclassification is necessary, performs classification by the classification processing unit again on the learning data belonging to the category. Thereby, the category is reclassified into a plurality of lower categories.

この構成によれば、学習データに応じた自己組織化マップが階層的に形成される。そして、下位の分類を行うか否かは分類特性の判定により行われるため、学習データや判定基準に応じた適当な数や分類特性をもつカテゴリが形成されることになる。すなわち、学習データの特徴に対応した特徴空間が自己組織的に形成されることになる。 According to this configuration, the self-organizing map corresponding to the learning data is formed hierarchically. Whether or not the lower classification is performed is determined by classification characteristic determination, and thus a category having an appropriate number and classification characteristic according to the learning data and the determination criterion is formed. That is, a feature space corresponding to the feature of the learning data is formed in a self-organizing manner.

なお、形成された分類空間を用いて、学習データはもとより、新たな分類対象データを分類することができる。すなわち、上位の階層から順に自己組織化マップを用いた分類を行うことで、この分類対象データの分類を行うことが可能となる。この分類処理は、階層数分の演算により実施できるので、リアルタイム処理にも適している。 Note that new classification target data as well as learning data can be classified using the formed classification space. That is, it is possible to classify the classification target data by performing classification using the self-organizing map in order from the upper hierarchy. Since this classification process can be performed by calculation for the number of layers, it is also suitable for real-time processing.

階層化手段が分類処理手段に分類を行わせるにあたっては、通常は、階層の深さによらず、同じ数のマップ層ユニットが使われる。これは、深さによらず同程度の精度で分類を行うという観点から有効である。しかし、使用する前記マップ層ユニットの数を変更させることも可能である。例えば、階層の深さに応じてマップ層ユニットの数を増加させた場合には、比較的小さな差異の解析を精度よく行うことが可能となる。 When the hierarchizing means causes the classification processing means to perform classification, the same number of map layer units is usually used regardless of the depth of the hierarchy. This is effective from the viewpoint of performing classification with the same degree of accuracy regardless of the depth. However, it is possible to change the number of map layer units to be used. For example, when the number of map layer units is increased in accordance with the depth of the hierarchy, it is possible to accurately analyze a relatively small difference.

本発明の分類装置の一態様においては、前記マップ層は、結合荷重が設定された複数のマップ層ユニットを備え、前記学習演算では、類似する前記学習データ同士を同一または近傍のマップ層ユニットにマッピングさせるように、前記結合加重が変更され、前記分類特性は、前記学習演算により得られた前記結合加重についての特性であり、前記判定手段は、前記マップ層ユニット間の結合荷重の相関を評価して、再分類の必要性を判定する。 In one aspect of the classification apparatus according to the present invention, the map layer includes a plurality of map layer units in which joint weights are set, and in the learning calculation, the similar learning data are set to the same or nearby map layer units. The connection weight is changed so that the mapping is performed, the classification characteristic is a characteristic of the connection weight obtained by the learning operation, and the determination unit evaluates a correlation of the connection weight between the map layer units. Then, the necessity for reclassification is determined.

結合荷重は、入力層の位相に対応したデータであり、各マップ層ユニットに設定される。学習演算では、結合荷重を調整することで、入力されるデータに対する競合特性を変更し、ひいては分類特性を変更することとなる。そして、判定手段は、再分類の必要性を、マップ層ユニット間の結合荷重の相関により判定する。相関の評価は、絶対的な大きさの評価であっても、複数の前記相関同士の相対的な大きさの評価であってもよい。相関を評価することで、例えば、同じカテゴリ内での分類特性の類似度や、別カテゴリとの間における分類特性の相違度を評価することが可能となる。判定手段では、こうした評価結果を適当な基準値と比較するなどして、下位の分類を行うべきか否か判定する。なお、この態様においては、分類処理手段は、得られた自己組織化マップにおける結合荷重を評価して、マップ層の分類を行う。具体例としては、隣接するマップ層ユニット間の結合荷重の相関が小さな部分を分類の境界としてマップ層の分割を行う態様を挙げることができる。 The combined load is data corresponding to the phase of the input layer, and is set in each map layer unit. In the learning calculation, by adjusting the coupling weight, the competitive characteristic for the input data is changed, and consequently, the classification characteristic is changed. And a determination means determines the necessity for a reclassification by the correlation of the joint load between map layer units. The evaluation of the correlation may be an absolute evaluation or a relative evaluation of the correlations. By evaluating the correlation, for example, it is possible to evaluate the degree of similarity of classification characteristics within the same category and the degree of difference of classification characteristics between different categories. The determination means determines whether or not the lower classification should be performed by comparing such an evaluation result with an appropriate reference value. In this aspect, the classification processing means evaluates the combined load in the obtained self-organizing map and classifies the map layer. As a specific example, a mode in which a map layer is divided using a portion having a small correlation of coupling loads between adjacent map layer units as a classification boundary can be exemplified.

本発明の分類装置の一態様においては、前記判定手段は、ある階層までの分類により確定したカテゴリ数に基づいて、その階層におけるカテゴリの再分類の必要性を判定する。つまり、既に再分類の必要性がないとされたカテゴリの数を用いて、自己組織化マップの分類特性を評価し、さらに細かな再分類を行うべきか否か決定する。 In one aspect of the classification apparatus of the present invention, the determination means determines the necessity of reclassification of a category in the hierarchy based on the number of categories determined by classification up to a certain hierarchy. In other words, the classification characteristics of the self-organizing map are evaluated using the number of categories that have not been reclassified, and it is determined whether or not to perform further reclassification.

本発明の分類装置の一態様においては、分類処理手段は、前記マップ層を二つのカテゴリに分類する。二つのカテゴリに分類することで、明確な特徴に基づいて大別することが可能となる。なお、この場合、マップ層ユニットの数は比較的少数で十分であると考えられる。具体的な数は、学習データの数量などのパラメータにも依存するため適宜実験を行って決定することが望ましいが、例えば２０個程度以下、さらには１０個以下としても良いことが多いであろう。また、二つのカテゴリに分類する場合には、マップ層ユニットの配置は１次元的であれば十分なことが多いと考えられる。これにより、演算速度の高速化が実現するものと期待される。なお、こうして形成される分類の二分木構造は、学習データ群に包含される特徴を、寄与度の大きいものから順に分類基準とした結果を表すことになる。 In one aspect of the classification apparatus of the present invention, the classification processing unit classifies the map layer into two categories. By classifying into two categories, it can be roughly divided based on clear features. In this case, it is considered that a relatively small number of map layer units is sufficient. Since the specific number depends on parameters such as the quantity of learning data, it is desirable to determine the number by performing experiments as appropriate. For example, it may be about 20 or less, or even 10 or less. In addition, in the case of classification into two categories, it is considered sufficient that the arrangement of the map layer units is one-dimensional. This is expected to increase the calculation speed. The binary tree structure of the classification formed in this way represents the result of using the characteristics included in the learning data group as a classification criterion in descending order of contribution.

本発明の分類装置の一態様においては、分類対象データを、形成された二分木の階層的な分類構造へと分類する手段と、二分木の階層的な分類構造を表すインデックスを、分類された前記分類対象データに付与する手段と、前記インデックスを利用して、分類された前記分類対象データの検索を行う検索手段と、を備える。分類対象データは、学習に使用した学習データであってもよいが、新たなデータであってもよい。二分木を利用した検索アルゴリズムは様々に知られているところである。ここでは、分類の結果得られる二分木構造を、そのまま検索に利用することで、直ちにそうした検索アルゴリズムが使用できることとなる。 In one aspect of the classification apparatus of the present invention, the means for classifying the classification target data into the hierarchical classification structure of the formed binary tree and the index representing the hierarchical classification structure of the binary tree are classified. Means for assigning to the classification target data; and search means for searching the classified target data using the index. The classification target data may be learning data used for learning, or may be new data. There are various known search algorithms using binary trees. Here, by using the binary tree structure obtained as a result of the classification for the search as it is, such a search algorithm can be used immediately.

本発明の分類装置の一態様においては、前記学習データは、センサの検知結果に基づいて生成されたセンサデータである。また、本発明の分類装置の一態様においては、前記センサデータは、カメラ撮影された人間の顔画像データである。つまり、表情認識装置や表情分類装置とでも呼ぶべき分類装置が実現する。この場合、典型的には、学習データは、同一の個人の複数の顔画像データである。これにより、その個人固有の表情特徴空間を形成することができる。しかし、もちろん、複数の個人のデータをまとめて分類してもよい。この場合には、最初に、個人差が現れる可能性が高いが、特徴的な表情の分類が先に行われる可能性もある。 In one aspect of the classification device of the present invention, the learning data is sensor data generated based on a detection result of the sensor. In one aspect of the classification device of the present invention, the sensor data is human face image data captured by a camera. That is, a classification device that can be called a facial expression recognition device or a facial expression classification device is realized. In this case, the learning data is typically a plurality of face image data of the same individual. Thereby, the expression feature space unique to the individual can be formed. However, of course, data of a plurality of individuals may be classified together. In this case, there is a high possibility that an individual difference appears first, but there is a possibility that characteristic facial expressions are classified first.

本発明の分類装置の一態様においては、あるカテゴリの結合荷重から生成された可視化画像に対して表情解析を行い、予め設定された対応表を参照して、このカテゴリを特徴づける意味情報を算出する算出手段を備える。全てのカテゴリについて意味情報を算出してもよいし、末端（最下位）の階層のカテゴリについてのみ意味情報を算出してもよい。表情解析の例としては、特徴点の動きに基づくルールベースの解析や、テンプレートデータと対比するテンプレートベースの解析などを挙げることができる。また、意味情報は、喜びや悲しみというような感情についての情報や、笑顔や泣き顔というような模写的情報など、カテゴリの意味を与える情報である。表情解析結果に応じた意味情報を予め対応表として設定し記憶させておくことで、カテゴリについての意味情報を算出することが可能となる。 In one aspect of the classification apparatus of the present invention, facial expression analysis is performed on a visualized image generated from a combination load of a certain category, and semantic information characterizing this category is calculated with reference to a preset correspondence table. The calculating means to perform is provided. Semantic information may be calculated for all categories, or semantic information may be calculated only for the category of the last (lowest) hierarchy. Examples of facial expression analysis include rule-based analysis based on the movement of feature points and template-based analysis compared with template data. The semantic information is information that gives the meaning of a category, such as information about emotions such as joy and sadness, and copying information such as smiles and crying faces. By setting and storing semantic information corresponding to the facial expression analysis result in advance as a correspondence table, it is possible to calculate semantic information about the category.

本発明の分類装置の一態様においては、上位階層のカテゴリとその下位階層のカテゴリとから、前記算出手段によって相反する意味情報が算出されたか否かを検査する検査手段を備える。例えば、喜びに分類されたカテゴリが、下位の階層において怒りと悲しみに分類されるような場合には、その分類は相反するものと言える（通常は予め相反する組み合わせなどが規則化され設定される）。そこで、検査手段を設け、こうした矛盾を検証することとした。なお、相反する分類がなされた場合には、例えば、学習データ数を増やす処理を行ってもよいし、上位のカテゴリに比べ小さな差異に基づいて生成された下位のカテゴリの分類を不採用とする処理を行ってもよい。 In one aspect of the classification apparatus of the present invention, an inspection unit is provided for inspecting whether or not conflicting semantic information has been calculated by the calculation unit from a category in an upper hierarchy and a category in a lower hierarchy. For example, when a category classified as joy is classified as anger and sadness in a lower hierarchy, it can be said that the categories are contradictory (usually, combinations that are contradictory in advance are regularly set and set) ). Therefore, we decided to provide inspection means to verify these contradictions. In addition, when the conflicting classification is made, for example, the process of increasing the number of learning data may be performed, or the classification of the lower category generated based on a small difference compared to the upper category is not adopted. Processing may be performed.

本発明の分類装置の一態様においては、前記学習データの一部分をなす部分データを対象として、前記分類処理手段、前記判定手段、前記階層手段、及び前記算出手段による各処理を実施させる手段と、新たな顔画像データを、前記学習データに基づく分類構造、及び、前記部分データに基づく分類構造へと分類し、各々から得られる前記意味情報を対比して、この顔画像データを特徴づける意味情報を出力する手段と、を備える。対比に基づく意味情報の決定処理は、例えば、多数決（これは特に複数の部分データに対して同様の処理を行い、それぞれから意味情報を得た場合に有効である）により行ってもよいし、共通部分の選定処理により行ってもよい。また、前記学習データ（広範囲のデータ）に基づく意味情報を前記部分データ（狭範囲のデータ）に基づく意味情報よりも重きをおくようにすることも有効である。 In one aspect of the classification apparatus of the present invention, for the partial data forming part of the learning data, means for performing each processing by the classification processing means, the determination means, the hierarchy means, and the calculation means; Semantic information that characterizes this face image data by classifying new face image data into a classification structure based on the learning data and a classification structure based on the partial data, and comparing the semantic information obtained from each. Means for outputting. The determination processing of semantic information based on the comparison may be performed by, for example, majority (this is particularly effective when similar processing is performed on a plurality of partial data and semantic information is obtained from each of the partial data) You may carry out by the selection process of a common part. It is also effective to place more weight on the semantic information based on the learning data (wide range of data) than the semantic information based on the partial data (narrow range data).

本発明により、処理対象データに含まれる特徴に応じて、適当なカテゴリ数を定め、そのカテゴリへと分類することが可能となる。例えば、処理対象データとして個人の顔画像データを用いた場合には、その個人に適した表情特徴空間を形成することが可能となる。 According to the present invention, it is possible to determine an appropriate number of categories in accordance with the characteristics included in the processing target data and classify into the categories. For example, when personal face image data is used as the processing target data, a facial expression feature space suitable for the individual can be formed.

図１は、本実施の形態にかかる表情認識システム１０の概略的な構成を説明するブロック図である。表情認識システム１０は、表情認識装置２０と、表情認識装置２０に顔画像データを入力する撮像装置４０と、表情認識装置２０による分類結果を表示する表示端末５０を備えている。 FIG. 1 is a block diagram illustrating a schematic configuration of a facial expression recognition system 10 according to the present exemplary embodiment. The facial expression recognition system 10 includes a facial expression recognition device 20, an imaging device 40 that inputs facial image data to the facial expression recognition device 20, and a display terminal 50 that displays a classification result by the facial expression recognition device 20.

表情認識装置２０は、通常、ＰＣ（パーソナルコンピュータ）などのコンピュータを用いて形成される。表情認識装置２０には、信号入力処理部２２、表情学習処理部２４、表情認識処理部３０、及び認識結果出力部３４が設けられている。信号入力処理部２２は、撮像装置４０から顔画像データを入力し、表情学習処理部２４や表情判定処理部３２に出力する。その際には、信号入力処理部２２は、必要に応じて動画データからの静止画の切り出しを行ったり、顔領域の領域画定やサイズ正規化などを行う。 The facial expression recognition device 20 is usually formed using a computer such as a PC (personal computer). The facial expression recognition device 20 includes a signal input processing unit 22, a facial expression learning processing unit 24, a facial expression recognition processing unit 30, and a recognition result output unit 34. The signal input processing unit 22 inputs face image data from the imaging device 40 and outputs the facial image data to the facial expression learning processing unit 24 and the facial expression determination processing unit 32. At that time, the signal input processing unit 22 cuts out a still image from the moving image data as necessary, and performs area demarcation and size normalization.

表情学習処理部２４は、入力される顔画像データを学習データとして、表情特徴空間を形成するものである。表情学習処理部２４には、表情特徴空間形成処理部２６と、表情特徴空間データベース２８が設けられている。表情特徴空間形成処理部２６は、ＳＯＭ（自己組織化マップ）を内蔵しており、順次分類結果を二分割して、二分木構造をもつ表情特徴空間を形成する。形成された表情特徴空間は、表情特徴空間データベース２８に格納される。 The facial expression learning processing unit 24 forms facial expression feature space using input facial image data as learning data. The facial expression learning processing unit 24 includes an facial expression feature space formation processing unit 26 and an facial expression feature space database 28. The expression feature space formation processing unit 26 has a built-in SOM (self-organizing map), and sequentially divides the classification result into two to form an expression feature space having a binary tree structure. The formed facial expression feature space is stored in the facial expression feature space database 28.

また、表情学習処理部２４は、分類された表情特徴空間がどのような表情を反映したものであるかを解析することができる。すなわち、分類された特徴空間を表す可視化画像を解析して、驚き、喜び、悲しみ、恐れ、嫌悪、怒り、無表情などの基本表情についての意味情報や、基本表情が複合的に表現されている旨を示す意味情報、基本表情よりもさらに細かな表情を表現する意味情報などを求める。また、特徴空間が顔の向きの差異や、撮影条件の差異を反映しているような場合には、そうした差異に基づく意味情報を与えることも可能である。もちろん、特徴空間に対しては必ずしも意味情報を与える必要はないが、意味情報を与えることで、分類結果の理解が促進され、また、検索や利用がしやすくなるという利点が生じる。意味情報を算出するための解析は、従来から知られている手法を利用して自動的に行うことができる。手法の例としては、各種表情に対応した複数のテンプレートを用意しておき、類似性を判定するテンプレートベースの手法や、顔器官の距離や面積などを測定するルールベースの手法を挙げることができる。そして、これらの解析結果と意味情報との対応づけを予め規則化しておくことで、解析結果から意味情報を求めることが可能となる。 In addition, the facial expression learning processing unit 24 can analyze what facial expression the classified facial expression feature space reflects. That is, by analyzing the visualized image representing the classified feature space, semantic information about basic facial expressions such as surprise, joy, sadness, fear, disgust, anger, and expressionlessness, and basic facial expressions are expressed in a composite manner. Semantic information indicating the meaning and semantic information expressing a finer facial expression than the basic facial expression are obtained. In addition, when the feature space reflects a difference in face orientation or a difference in shooting conditions, semantic information based on such difference can be given. Of course, it is not always necessary to provide semantic information to the feature space, but providing semantic information has the advantage of facilitating understanding of the classification results and facilitating search and use. The analysis for calculating the semantic information can be automatically performed using a conventionally known method. Examples of techniques include a template-based technique for preparing a plurality of templates corresponding to various facial expressions and determining similarity, and a rule-based technique for measuring the distance and area of facial organs. . And it becomes possible to obtain | require semantic information from an analysis result by rule-regulating matching with these analysis results and semantic information beforehand.

表情認識処理部３０は、表情判定処理部３２を備える他、表情特徴空間データベース２８を表情学習処理部２４と共有している。表情判定処理部３２では、表情特徴空間データベース２８に形成された表情特徴空間に従って、新たに入力される顔画像データの分類を行う。具体的には、形成されたＳＯＭに顔画像データを入力し、どのカテゴリに分類されるかという処理を最下層のＳＯＭにまで繰り返すことで、分類が行われる。 The facial expression recognition processing unit 30 includes the facial expression determination processing unit 32 and also shares the facial expression feature space database 28 with the facial expression learning processing unit 24. The facial expression determination processing unit 32 classifies newly input facial image data according to the facial expression feature space formed in the facial expression feature space database 28. Specifically, classification is performed by inputting face image data to the formed SOM and repeating the process of which category is classified to the lowest SOM.

認識結果出力部３４は、表情認識処理部３０による表情認識の結果を入力し、表示端末５０に出力する。その際には、認識結果出力部３４は、認識結果を階層表示したり、得られた意味情報を表示したりするなど、ユーザ理解が容易な形態へとデータを整理する。 The recognition result output unit 34 inputs the result of facial expression recognition by the facial expression recognition processing unit 30 and outputs the result to the display terminal 50. At that time, the recognition result output unit 34 organizes the data into a form that is easy to understand by the user, such as displaying the recognition results in a hierarchical manner or displaying the obtained semantic information.

図２は、図１に示した表情学習処理部２４が表情特徴空間を形成する手順を示すフローチャートである。処理が開始されると、表情特徴空間形成処理部２６は、信号入力処理部２２から顔画像データを順次入力して（Ｓ１０）、ＳＯＭの学習処理を行う（Ｓ１２）。ＳＯＭ及びその学習法については後で詳しく説明する。続いて、ＳＯＭのマップ層ユニットに設定された結合荷重の可視化処理を行うとともに（Ｓ１４）、隣接するマップ層ユニットの結合荷重の相関係数を算出する（Ｓ１６）。そして、相関係数が最も小さいマップ層ユニット間で、マップ層の二分割を行い（Ｓ１８）、さらに分割されたマップ層の分割比が検討される（Ｓ２０）。ここでは、マップ層ユニットが５個である場合を想定しており、２分割が４：１または１：４で行われた場合に、１のユニットについては再分類を行わず分類処理を終了する（Ｓ２２）。他方、上記２分割が４：１もしくは１：４で行われた場合の４のユニット、または、上記２分割が３：２もしくは２：３で行われた場合の各ユニットについては、分類処理の対象となる。１のユニットについてのみ、再分割を行わない理由は、１のユニットでは既に十分に細かな分割が行われたとみなせることによる。ただし、一般的な分割の深さや、マップ層ユニットの数によっては、特に上位の階層においては、１のユニットの再分割を実施したほうが良い場合もありえる。 FIG. 2 is a flowchart showing a procedure for the expression learning processing unit 24 shown in FIG. 1 to form an expression feature space. When the processing is started, the facial expression feature space formation processing unit 26 sequentially inputs face image data from the signal input processing unit 22 (S10), and performs SOM learning processing (S12). SOM and its learning method will be described in detail later. Subsequently, the process of visualizing the combined load set in the map layer unit of the SOM is performed (S14), and the correlation coefficient of the combined load of the adjacent map layer unit is calculated (S16). Then, the map layer is divided into two parts between the map layer units having the smallest correlation coefficient (S18), and the division ratio of the further divided map layer is examined (S20). Here, it is assumed that there are five map layer units, and when the two division is performed at 4: 1 or 1: 4, the classification process is terminated without reclassifying one unit. (S22). On the other hand, for the 4 units when the 2 division is performed at 4: 1 or 1: 4, or for each unit when the 2 division is performed at 3: 2 or 2: 3, the classification process is performed. It becomes a target. The reason why the re-division is not performed for only one unit is that it can be considered that a sufficiently fine division has already been performed for one unit. However, depending on the general division depth and the number of map layer units, it may be better to redivide one unit, particularly in the upper hierarchy.

続いて、再分類の対象となったマップ層ユニット間の結合荷重の相関係数を計算し、その値を評価する（Ｓ２４）。これは、同じカテゴリに分類されたマップ層ユニットがほぼ同様の結合荷重を有しているのであれば、再分類をする必要性が小さいためである。具体的には、マップ層ユニット間の相関係数のばらつき（相対的な大きさの差）が基準値より小さい場合には、再度の分類の必要はないと判断し、処理を終了する（Ｓ２２）。他方、マップ層ユニット間の相関係数のばらつきが大きい場合には、再分類が必要であると判断し、処理を続行する。なお、判定は、複数の相関係数の絶対的な大きさにより行うことも可能である。具体的には、相関係数がある基準値以上になるまで、再分類を繰り返す例を挙げることができる。また、分類の収束・発散を判断するため、下層になるにつれ、相関係数の絶対的あるいは相対的な大きさがどう変化するかを評価し、再分類の判定を行うことも有効であろう。判定における具体的な判断基準は、顔画像データの特性にもよるため、実験的、経験的に定めればよい。 Subsequently, the correlation coefficient of the coupling load between the map layer units that are the target of reclassification is calculated, and the value is evaluated (S24). This is because if the map layer units classified into the same category have almost the same combined load, the necessity for reclassification is small. Specifically, when the variation (relative magnitude difference) in the correlation coefficient between the map layer units is smaller than the reference value, it is determined that the classification is not necessary again, and the process ends (S22). ). On the other hand, when the variation in the correlation coefficient between the map layer units is large, it is determined that reclassification is necessary, and the processing is continued. Note that the determination can also be made based on the absolute size of a plurality of correlation coefficients. Specifically, an example in which reclassification is repeated until the correlation coefficient becomes a certain reference value or more can be given. Also, in order to judge the convergence / divergence of classification, it will be effective to evaluate how the absolute or relative magnitude of the correlation coefficient changes as it goes down, and perform reclassification judgment. . Specific judgment criteria in the judgment depend on the characteristics of the face image data, and may be determined experimentally and empirically.

最後に、再分類用の顔画像データが抽出される（Ｓ２６）。これは、最終的な結合荷重で固定されたＳＯＭに対し、顔画像データを入力し、いずれのマップ層ユニットに写像されたかを確認することで行われる。そして、再分類対象となった顔画像データに対しては、ステップＳ１２以降の処理が繰り返される。また、分類結果の情報は、表情特徴空間データベース２８に格納される。 Finally, face image data for reclassification is extracted (S26). This is performed by inputting face image data to the SOM fixed with the final combined load and confirming which map layer unit has been mapped. And the process after step S12 is repeated with respect to the face image data used as the reclassification object. The classification result information is stored in the facial expression feature space database 28.

次に、図３を用いて、さらに詳しく、分類の過程について説明する。図３は、表情特徴空間の形成例を示す模式図である。ＳＯＭは、入力層６０とマップ層６４の２層で構成されている。入力層６０は、顔画像データの画素数と同じ数（ここではｎ個とする）の入力層ユニットを備えており、ｉ番目（１≦ｉ≦ｎ）の入力層ユニットには、顔画像データ６２のｉ番目の画素の画素値ｘ_i（０≦ｘ_i≦２５５）が入力される。 Next, the classification process will be described in more detail with reference to FIG. FIG. 3 is a schematic diagram showing an example of forming a facial expression feature space. The SOM is composed of two layers, an input layer 60 and a map layer 64. The input layer 60 includes the same number of input layer units (here, n) as the number of pixels of the face image data, and the i-th (1 ≦ i ≦ n) input layer unit includes the face image data. The pixel value x _i (0 ≦ x _i ≦ 255) of the 62nd i-th pixel is input.

マップ層６４は、一次元配置された五つのマップ層ユニットＵ０，Ｕ１，．．，Ｕ４を備える。各マップ層ユニットは各入力層ユニットと結びつけられており、各入力層ユニットに対応した結合荷重を備えている。すなわち、ｊ番目（１≦ｊ≦５）のマップ層ユニットはｗ_ij（０≦ｗ_ij≦１）で表される結合荷重を備える。結合荷重ｗ_ijの値の大きさは画素値に関連している。 The map layer 64 includes five map layer units U0, U1,. . , U4. Each map layer unit is associated with each input layer unit and has a coupling load corresponding to each input layer unit. That is, the j-th (1 ≦ j ≦ 5) map layer unit has a combined load represented by w _ij (0 ≦ w _ij ≦ 1). The magnitude of the value of the combined load w _ij is related to the pixel value.

ＳＯＭの学習の開始時には、各マップ層ユニットの結合荷重ｗ_ijは乱数を用いて初期化される。そして、ｔ回目の学習では、ｉ番目の入力層ユニットには、顔画像データにおけるｉ番目の画素の画素値ｘ_iが入力される。この時、ｘ_iとｗ_ijとのユークリッド距離ｄ_jが、次式によって計算される。 At the start of SOM learning, the connection weight w _ij of each map layer unit is initialized using a random number. In the t-th learning, the pixel value x _i of the i-th pixel in the face image data is input to the i-th input layer unit. At this time, the Euclidean distance d _j between x _i and w _ij is calculated by the following equation.

この演算に基づいて、ｄ_jが最小となるマップ層ユニット、すなわち勝者ユニットが探索される。次に、勝者ユニット及びその近傍領域に含まれるマップ層ユニットの結合荷重が、次式によって更新される。 Based on this calculation, a map layer unit that minimizes d _j , that is, a winner unit is searched. Next, the combined load of the map layer unit included in the winner unit and its neighboring area is updated by the following equation.

ここで、α(t)は学習率係数（０＜α(t)＜１）である。この一連の処理は、最大学習回数繰り返される。 Here, α (t) is a learning rate coefficient (0 <α (t) <1). This series of processes is repeated the maximum number of learning times.

本実施の形態では、学習回数ｔは経験的に２０万回に設定し、近傍領域は勝者ユニットと隣接するマップ層ユニットのみとして時間的に固定した。また、α（ｔ）の初期値は０．５とし、学習回数が１０万回の時点で０．０２まで、最大学習回数の時点で０までそれぞれ線形的に減少させた。さらに、結合荷重の更新比率は勝者ユニットを１とし、隣接するマップ層ユニットを０．５とした。 In the present embodiment, the number of learnings t is empirically set to 200,000 times, and the neighborhood region is fixed in time as only the map layer unit adjacent to the winner unit. In addition, the initial value of α (t) was set to 0.5, and was linearly decreased to 0.02 when the number of learning was 100,000 and to 0 when the maximum number of learning was reached. Furthermore, the update rate of the combined load was set to 1 for the winner unit and 0.5 for the adjacent map layer unit.

最終的に得られたマップ層の各ユニットの結合荷重ｗ_ijの値は、０から２５５の値に変換され、可視化画像６６が生成される。また、マップ層の各隣接ユニット間において結合荷重ｗ_ijの相関係数ｒ０，ｒ１，ｒ２，ｒ３が算出される。図では、相関係数ｒ３が最も小さいことを仮定しており、マップ層ユニットＵ３，Ｕ４の間でマップ層が二分割される。これは、相関係数が最も小さなマップ層ユニット同士が、最も大きな位相特性差を有していると考えられることによる。 The finally obtained value of the combined load w _ij of each unit of the map layer is converted to a value from 0 to 255, and a visualized image 66 is generated. Further, correlation coefficients r0, r1, r2, and r3 of the coupling load w _ij are calculated between the adjacent units in the map layer. In the figure, it is assumed that the correlation coefficient r3 is the smallest, and the map layer is divided into two parts between the map layer units U3 and U4. This is because the map layer units having the smallest correlation coefficient are considered to have the largest phase characteristic difference.

二分割したマップ層の双方に分類された入力群は、それぞれ独立な部分問題を構成するとみなせる。そこで、双方に分類された入力群を対象として再帰的にＳＯＭによる分類処理が繰り返される。ただし、図示した例では、マップ層ユニットＵ３，Ｕ４の間でマップ層が二分割されたため、マップ層ユニットＵ４は表情の位相特性が他と比較して特異性を有しているといえる。そこで、階層化処理を終了し、独立のカテゴリとして扱うことにする。他方、四つのマップ層ユニットＵ０〜Ｕ３に分類される顔画像データ７２に対しては、その間の相関係数ｒ０，ｒ１，ｒ２が基準値より小さい場合には、未だ分類が十分ではないとして再分類処理が行われる。 Input groups classified into both of the two divided map layers can be regarded as constituting independent subproblems. Therefore, the classification process by SOM is recursively repeated for the input groups classified into both. However, in the illustrated example, since the map layer is divided into two parts between the map layer units U3 and U4, it can be said that the map layer unit U4 has a specific phase characteristic of the facial expression as compared with others. Therefore, the hierarchization process is terminated and handled as an independent category. On the other hand, for the face image data 72 classified into the four map layer units U0 to U3, if the correlation coefficients r0, r1 and r2 between them are smaller than the reference value, the classification is not yet sufficient, Classification processing is performed.

図示した例では、再分類処理の結果、分割境界７４がマップ層ユニットＵ２，Ｕ３の間に設定され、顔画像データ７２は、顔画像データ７６，７８の二つに分類されている。そして、これらの顔画像データ７６，７８に対しても、同様に、再分類の必要性が判定され、さらなる分類が継続される。 In the illustrated example, as a result of the reclassification process, the division boundary 74 is set between the map layer units U2 and U3, and the face image data 72 is classified into two face image data 76 and 78. Similarly, the necessity of reclassification is determined for these face image data 76 and 78, and further classification is continued.

次に、図１に示した表情認識処理部３０における処理について、図４を用いて説明する。図４は、表情特徴空間データベース２８に格納された表情特徴空間の意味情報の解析結果を示す図である。図には、第１層の分類、及び第２層の分類により得られた可視化画像が示されている。可視化画像は、各マップ層ユニットＵ０〜Ｕ４の結合荷重から作成した画像である。つまり、ＳＯＭの入力層ユニットに対し、図示した可視化画像またはその類似画像を入力すると、そのマップ層ユニットへと分類されることを示している。 Next, processing in the facial expression recognition processing unit 30 shown in FIG. 1 will be described with reference to FIG. FIG. 4 is a diagram showing the analysis result of the semantic information of the facial expression feature space stored in the facial expression feature space database 28. The figure shows the visualized images obtained by the first layer classification and the second layer classification. The visualized image is an image created from the combined load of each map layer unit U0 to U4. That is, when the illustrated visualized image or a similar image is input to the SOM input layer unit, it is classified into the map layer unit.

第１層においては、マップ層ユニットＵ４が１ユニットから分類され、独立のカテゴリとして決定されている。そこで、マップ層ユニットＵ４の可視化画像を隣接するマップ層ユニットＵ３の可視化画像と比較する。ここでは、比較は、FACS（Facial Action Coding System)のAUに従って行っている。FACSは、代表的な表情分析モデルとして知られているものであり、表情に対応した眉、目、口等の顔部位の動きの特徴をAU(Action Unit)として定性的に定義している。 In the first layer, the map layer unit U4 is classified from one unit and determined as an independent category. Therefore, the visualized image of the map layer unit U4 is compared with the visualized image of the adjacent map layer unit U3. Here, the comparison is performed according to AU of FACS (Facial Action Coding System). FACS is known as a representative facial expression analysis model, and qualitatively defines the characteristics of the movement of facial parts such as eyebrows, eyes, and mouth corresponding to facial expressions as AU (Action Unit).

その結果、第１層では、マップ層ユニットＵ４の可視化画像は、AU1の「眉の内側を上げる」特徴、AU2の「眉の外側を上げる」特徴、AU5の「上瞼を上げる」特徴、AU26の「顎を下げて唇を開く」特徴、AU27の「口を大きく開く」特徴などを備えていることが見出された。そして、これらの特徴をFACSの分類に当てはめることにより、マップ層ユニットＵ４は驚きの表情であると判定された。 As a result, in the first layer, the visualized image of the map layer unit U4 has the AU1 “raise the inside of eyebrows” feature, the AU2 “raise the outside of eyebrows” feature, the AU5 “raise the upper eyelid” feature, AU26 It has been found that it has the features of “opening the lips by lowering the chin” and the “opening of mouth” feature of AU27. Then, by applying these features to the FACS classification, the map layer unit U4 was determined to have a surprising expression.

同様にして、第２層で独立のカテゴリに分類されたマップ層ユニットＵ４の可視化画像は、AU6の「頬を上げる」特徴、AU25の「唇を開く（顎は下げない）」特徴、AU11の「鼻唇溝を深める」特徴、AU12の「唇端を鋭く引っ張り上げる」特徴、AU13の「唇端を鋭く上げて頬を膨らます」特徴を備えていることが示された。そして、FACSの分類に基づいて、喜びの表情であると判定された。 Similarly, the visualized image of the map layer unit U4 classified into an independent category in the second layer includes the AU6 “raise the cheek” feature, the AU25 “open lips” feature, the AU11 It has been shown that it has the features of “deep nasal lip deep”, AU12 “pulls up the lip sharply”, and AU13 “picks up the lip sharply to swell the cheeks”. Based on the FACS classification, it was determined that the expression was a joyful expression.

図示した例では、オペレータが目視でAUの解析を行っている。しかし、例えば、特徴点を設定して顔器官の差異を算出する従来技術を用いれば、FACSに基づく顔分類結果を、全く同様にして自動的に行うことができる。また、自動的な解析は、特徴点による手法に限られるものではなく、例えば、テンプレートを用いた手法など、様々な一般的技術を用いて行うことができる。なお、自動解析の結果、例えば、驚きと喜びがある割合で混ざったような複合的な表情との判断がなされる可能性があるが、その場合には、複合的な表情であるとの意味情報を付与すればよい。さらに、ここでは、分類の境界にあるマップ層ユニット間での比較を行ったが、二分された各分類の平均値同士を比較するなど、カテゴリ全体を判断対象に含めることも効果的であろう。 In the illustrated example, the operator visually analyzes the AU. However, for example, if a conventional technique for calculating a facial organ difference by setting feature points is used, a face classification result based on FACS can be automatically performed in exactly the same manner. Further, the automatic analysis is not limited to the technique using feature points, and can be performed using various general techniques such as a technique using a template. As a result of automatic analysis, for example, there is a possibility that it is judged as a complex facial expression that is mixed at a ratio of surprise and joy, in which case it means that it is a complex facial expression Information only needs to be given. Furthermore, here, the comparison was made between the map layer units at the boundary of the classification. However, it would be effective to include the entire category in the judgment target, for example, by comparing the average values of the two divided classifications. .

図５は、表情特徴空間の形成結果を示す図である。ここで使用した顔画像データは、被験者が主観的に表現した７表情(無表情、怒り、嫌悪、恐れ、喜び、悲しみ、驚き)を対象とし、通常の室内環境(日常一般的と考えられる蛍光灯による照明下)において動画像(24 bit Color)で取得したものである。取得した動画像をグレースケールの静止画像に変換した。そして、１被験者につき、表情毎に50枚から60枚程度、合計350枚から420枚程度の画像を学習データとして採用した。また、この例では、第４層までの分類で分類処理を打ち切っている。 FIG. 5 is a diagram illustrating the formation result of the facial expression feature space. The facial image data used here is for seven facial expressions (no expression, anger, disgust, fear, joy, sadness, surprise) subjectively expressed by the subject, and the normal indoor environment (fluorescence that is considered to be common in everyday life) It was acquired as a moving image (24 bit Color) under lighting with a lamp. The acquired video was converted to a grayscale still image. For each subject, about 50 to 60 images for each facial expression, a total of about 350 to 420 images were used as learning data. In this example, the classification process is terminated by the classification up to the fourth layer.

まず、第１層では、マップ層ユニットＵ４に、驚きを表す独立カテゴリが構築されている。これは、驚きの表情が、他の表情に比べて特異性をもっているからだと考えられる。また、第２層では、マップ層ユニットＵ４に、喜びを表す独立カテゴリが構築されている。 First, in the first layer, an independent category representing surprise is constructed in the map layer unit U4. This is thought to be because the surprised expression has specificity compared to other expressions. In the second layer, an independent category representing pleasure is constructed in the map layer unit U4.

第３層では、マップ層ユニットＵ０，Ｕ１のカテゴリと、マップ層ユニットＵ２〜Ｕ４のカテゴリとに分割されている。そして、これらに対しては第４層の分類（それぞれ４．１層及び４．２層と呼ぶ）が行われている。４．１層では、マップ層ユニットＵ２とＵ３の間の相関係数が最も低く、また、マップ層ユニットＵ０，１は恐れの表情、マップ層ユニットＵ２〜Ｕ４は無表情であると分析されている。他方、４．２層では、マップ層ユニットＵ２とＵ３の間の相関係数が最も低く、また、マップ層ユニットＵ０は嫌悪、マップ層ユニットＵ１，２は悲しみ、マップ層ユニットＵ３，４は怒りを表すと分析されている。 The third layer is divided into categories of map layer units U0 and U1 and categories of map layer units U2 to U4. For these, the classification of the fourth layer (referred to as 4.1 layer and 4.2 layer, respectively) is performed. In the 4.1 layer, the correlation coefficient between the map layer units U2 and U3 is the lowest, the map layer units U0 and U1 are analyzed to be a fearful expression, and the map layer units U2 to U4 are analyzed to have no expression. Yes. On the other hand, in the 4.2 layer, the correlation coefficient between the map layer units U2 and U3 is the lowest, the map layer unit U0 is disgusted, the map layer units U1 and U2 are sad, and the map layer units U3 and U4 are angry. It is analyzed to represent.

なお、この例では、第４層で分類処理を終了しているが、さらに下層までの分類を認める場合には、第４層の分類特性を評価することが望ましい。例えば、４．１層の場合、Ｕ３とＵ４は共に無表情に分類されていることからわかるように、似通った分類特性を有している。したがって、両者の結合荷重の相関荷重は比較的大きなものとなろう。ゆえに、このカテゴリについては、（基準値にもよるが）再分類をする必要がないと判断される可能性が高いと言える。他方、Ｕ０〜Ｕ３については、恐れまたは無表情に分類されていることからわかるように、両者の結合荷重の相関係数は比較的小さなものとなろう。そこで、このカテゴリについては、再分類をする必要があると判断される可能性が高いと言える。 In this example, the classification process is finished in the fourth layer. However, when the classification to the lower layer is permitted, it is desirable to evaluate the classification characteristics of the fourth layer. For example, in the case of the 4.1 layer, U3 and U4 have similar classification characteristics, as can be seen from the fact that both are classified as expressionless. Therefore, the correlation load between the two combined loads will be relatively large. Therefore, it can be said that there is a high possibility that this category does not need to be reclassified (depending on the reference value). On the other hand, as shown by the fact that U0 to U3 are classified as feared or expressionless, the correlation coefficient of the combined load between them will be relatively small. Therefore, it can be said that there is a high possibility that this category needs to be reclassified.

続いて、別の被験者に対する分類結果について、図６を用いて説明する。図６の（ａ）は図５の被験者について、また、（ｂ）〜（ｄ）は、図５とは別の３名の被験者について解析した表情特徴空間を示す図である。ここでは、簡単のため、いずれも第３層までで分類を中止している。 Subsequently, a classification result for another subject will be described with reference to FIG. 6A is a diagram showing the facial expression feature space analyzed for the subject of FIG. 5 and FIGS. 6B to 6D are three subjects that are different from FIG. Here, for the sake of simplicity, the classification is suspended up to the third layer.

図６（ａ）の被験者の場合には、全ての顔画像データ１００は、第１層でカテゴリ１０２とカテゴリ１０４に分類される。カテゴリ１０４は、独立したマップ層ユニットからなり、驚きの表情であると分析されている。他方、カテゴリ１０２は、第２層でカテゴリ１０６とカテゴリ１０８に分類され、独立のマップ層ユニットからなるカテゴリ１０８は、喜びの表情であると分析されている。そして、カテゴリ１０６は、第３層で恐怖又は無表情を表すカテゴリ１１０と、嫌悪、悲しみ、怒りを表すカテゴリ１１２に分類されている。こうして、図６（ａ）の場合には、第３層までの分類で、四つのカテゴリからなる表情特徴空間が形成される。 In the case of the subject in FIG. 6A, all the face image data 100 is classified into the category 102 and the category 104 in the first layer. The category 104 is composed of independent map layer units and has been analyzed as a surprising expression. On the other hand, the category 102 is classified into a category 106 and a category 108 in the second layer, and the category 108 including independent map layer units is analyzed to be an expression of joy. The category 106 is classified into a category 110 representing fear or no expression in the third layer and a category 112 representing disgust, sadness, and anger. In this way, in the case of FIG. 6A, a facial expression feature space consisting of four categories is formed by classification up to the third layer.

図６（ｂ）の被験者の場合には、第１層で驚きの表情を表すカテゴリ１２２が得られ、第３層で嫌悪のカテゴリ１２４、喜びのカテゴリ１２６、悲しみのカテゴリ１２８、及び怒り、恐怖、無表情を含むカテゴリ１３０が得られている。つまり、図６（ｂ）の場合には、第３層までの分類で、五つのカテゴリからなる表情特徴空間が形成される。 In the case of the subject in FIG. 6B, a category 122 representing a surprised expression is obtained in the first layer, and a disgusting category 124, a joy category 126, a sadness category 128, and anger and fear are obtained in the third layer. The category 130 including no expression is obtained. That is, in the case of FIG. 6B, a facial expression feature space composed of five categories is formed by classification up to the third layer.

図６（ｃ）の被験者の場合には、第２層で驚きの表情を表すカテゴリ１４２と、無表情のカテゴリ１４４が得られ、第３層で怒りのカテゴリ１４６、恐怖と嫌悪のカテゴリ１４８、喜びのカテゴリ１５０、及び悲しみのカテゴリ１５２が得られている。つまり、図６（ｃ）の場合には、第３層までの分類で、六つのカテゴリからなる表情特徴空間が形成される。 In the case of the subject in FIG. 6C, a category 142 representing a surprised expression in the second layer and a category 144 of no expression are obtained, and an anger category 146, a fear and aversion category 148 in the third layer, A joy category 150 and a sadness category 152 are obtained. That is, in the case of FIG. 6C, a facial expression feature space composed of six categories is formed by classification up to the third layer.

図６（ｄ）の被験者の場合には、第２層で喜びのカテゴリ１６２が得られ、第３層で恐怖のカテゴリ１６４、嫌悪のカテゴリ１６６、悲しみのカテゴリ１６８、怒りのカテゴリ１７０、驚き及び無表情のカテゴリ１７２、及び無表情のカテゴリ１７４が得られている。つまり、図６（ｄ）の場合には、第３層までの分類で、七つのカテゴリからなる表情特徴空間が形成される。 In the case of the subject of FIG. 6 (d), the category of pleasure 162 is obtained in the second layer, the category of fear 164, the category of disgust 166, the category of sadness 168, the category of anger 170, surprise and The expressionless category 172 and the expressionless category 174 are obtained. That is, in the case of FIG. 6D, a facial expression feature space composed of seven categories is formed by classification up to the third layer.

このように、同じ階層数であっても、被験者ごとに、特徴空間の数や、その意味情報は異なる。つまり、被験者の表情には個人差があり、表情間の類似性も異なっている。したがって、どの程度のカテゴリ数で分類すべきか、また、どのような種類のカテゴリで分類すべきかは、一概に決定できるものではないことがわかる。そして、この性質にもかかわらず、本実施の形態の手法によれば、特徴差が大きなものから順に分類が行われ、その個人に適した表情特徴空間を形成することができる。また、複数の感情が混在した表情の分類においても有用に機能する。 Thus, even if the number of layers is the same, the number of feature spaces and the semantic information thereof are different for each subject. That is, there are individual differences in the facial expressions of subjects, and the similarities between facial expressions are also different. Therefore, it can be understood that the number of categories to be classified and the type of category to be classified cannot be determined in a general manner. In spite of this property, according to the method of the present embodiment, classification is performed in descending order of feature difference, and a facial expression feature space suitable for the individual can be formed. It also functions effectively in the classification of facial expressions in which multiple emotions are mixed.

なお、図６に示した例では、分類の深さを第３層までに限定したが、さらに深くまで分類することも可能であるし、また、分類結果に基づいて動的に深さを決定することも可能である。動的に深さを決定する例としては、カテゴリ数が一定値（例えば６個〜８個程度）を超えるまでは、次の層の分類を行う態様をあげることができる。図６（ａ）の場合には、第３層目までに四つのカテゴリのみしか得られておらず、さらにもう１層深く分類することが有効であろう。また、動的に深さを決定する別の例としては、図５の説明において言及したように、カテゴリ内のマップ層ユニット間における結合荷重の相関係数値に基づいて、そのカテゴリが再分類を必要としているか否かを判定する態様を挙げることができる。図６（ａ）の場合には、カテゴリ１１２において、嫌悪と悲しみと怒りが混ざっており、一般に互いの相関係数も大きなものとなる。そこで、このような場合には、（少なくともそのカテゴリは）さらに深くまで分類することが有効となろう。 In the example shown in FIG. 6, the depth of classification is limited to the third layer, but it is also possible to classify deeper, and the depth is dynamically determined based on the classification result. It is also possible to do. As an example in which the depth is dynamically determined, a mode in which the next layer is classified until the number of categories exceeds a certain value (for example, about 6 to 8) can be given. In the case of FIG. 6 (a), only four categories are obtained by the third layer, and it would be effective to classify one layer deeper. As another example of dynamically determining the depth, as mentioned in the description of FIG. 5, the category is reclassified based on the correlation coefficient value of the combined load between the map layer units in the category. An aspect of determining whether or not it is necessary can be given. In the case of FIG. 6A, dislike, sadness, and anger are mixed in the category 112, and the correlation coefficient between them is generally large. Therefore, in such a case, it would be effective to classify further (at least the category).

図７は、決定された表情特徴空間を利用して、新たな顔画像データの分類を行う場合の処理の流れを示すフローチャートである。この処理は、図１に示した表情認識処理部３０を用いて行われる。 FIG. 7 is a flowchart showing the flow of processing when new facial image data is classified using the determined facial expression feature space. This process is performed using the facial expression recognition processing unit 30 shown in FIG.

処理においては、まず、表情判定処理部３２が、分類対象となる個人の表情特徴空間のデータを、表情特徴空間データベース２８からロードする（Ｓ４０）。そして、信号入力処理部２２から、適当なサンプリング間隔で動画像から切り出した顔画像データを入力する（Ｓ４２）。顔画像データは、まず、第１層のマップ層に写像され（Ｓ４４）、どのカテゴリに写像されたかが判定される（Ｓ４６）。そして、そのカテゴリが最終階層でない場合には（Ｓ４８）、一つ下の階層（Ｓ５０）のマップ層に再度写像される。他方、分類されたカテゴリが最下層の場合には、そのカテゴリに分類された旨の情報が、そのカテゴリに付与された意味情報とともに出力される（Ｓ５２）。 In the process, first, the facial expression determination processing unit 32 loads data of the facial expression feature space of the individual to be classified from the facial expression feature space database 28 (S40). Then, face image data cut out from the moving image at an appropriate sampling interval is input from the signal input processing unit 22 (S42). The face image data is first mapped to the first map layer (S44), and it is determined to which category it is mapped (S46). If the category is not the last layer (S48), it is mapped again to the map layer one level below (S50). On the other hand, when the classified category is the lowest layer, information indicating that the category is classified is output together with the semantic information assigned to the category (S52).

最後に、顔画像データに対する分類処理の変形例について説明する。図８は、顔画像データを複数部分に分割して行う処理を説明する図である。 Finally, a modification of the classification process for face image data will be described. FIG. 8 is a diagram for explaining processing performed by dividing face image data into a plurality of parts.

一般に、表情は顔全体の変化として現れるが、変化の度合いは顔全体において必ずしも一律ではない。また、同じ表情であっても、個人差などにより、顕著な特徴の現れる部分が異なることもありえる。そこで、ここでは、顔全体及び部分の画像を用いて解析を行うこととした。 In general, facial expressions appear as changes in the entire face, but the degree of change is not necessarily uniform across the entire face. Moreover, even if the facial expression is the same, a portion where a prominent feature appears may be different due to individual differences. Therefore, here, the analysis is performed using the entire face image and the partial image.

図示した例では、撮影された頭部領域画像２００から、顔全体領域画像２０２、顔上部領域画像２０４、及び顔下部領域画像２０６を切り出している。顔全体領域画像２０２は、眉、目、鼻、口を含む顔面全体を含む画像である。また、顔上部領域画像２０４は、眉や目を含む画像であり、顔下部領域画像２０６は、鼻や口を含む画像である。これらの画像の切り出しは、例えばパターンマッチングの手法を用いて自動的に行うことができる。 In the illustrated example, an entire face area image 202, an upper face area image 204, and a lower face area image 206 are cut out from the captured head area image 200. The entire face area image 202 is an image including the entire face including the eyebrows, eyes, nose, and mouth. The upper face region image 204 is an image including eyebrows and eyes, and the lower face region image 206 is an image including a nose and mouth. These images can be cut out automatically using, for example, a pattern matching technique.

自己組織化マップを用いた表情特徴空間の形成は、領域画像ごとに行われる。すなわち、まず、複数の顔全体領域画像２０２を用いた学習が行われ、顔全体領域表情特徴空間２０８が得られるとともに意味情報が付与される。同様にして、顔上部領域画像２０４を用いた学習により顔上部領域表情特徴空間２１０及びその意味情報が与えられ、顔下部領域画像２０６を用いた学習により顔下部領域表情特徴空間２１２及びその意味情報が与えられる。 The expression feature space is formed for each region image using the self-organizing map. That is, first, learning using a plurality of whole face area image 202 is performed, a face whole area expression feature space 208 is obtained, and semantic information is given. Similarly, the upper face area facial expression feature space 210 and its semantic information are given by learning using the upper face area image 204, and the lower face area facial expression feature space 212 and its semantic information are given by learning using the lower face area image 206. Is given.

そして、これらの表情特徴空間を用いて新たな顔画像データの分類を行う場合には、その顔画像データから顔全体領域画像、顔上部領域画像、及び顔下部領域画像を作成し、各特徴空間を用いた判定を行う。これにより、顔全体領域表情判定結果２１４、顔上部領域表情判定結果２１６、及び顔下部領域表情判定結果２１８が得られる。 When new facial image data is classified using these facial expression feature spaces, an entire face region image, upper face region image, and lower face region image are created from the face image data, and each feature space is created. Make a determination using. As a result, an entire face area facial expression determination result 214, an upper face area facial expression determination result 216, and a lower face area facial expression determination result 218 are obtained.

最後に、これらを総合的に判断することで、表情認識結果２２０が得られる。判断は様々に行うことが可能である。例えば、多数決を行って、三つの判定結果のうち支配的な表情の意味情報を採用する態様が挙げられる。また、例えば、顔全体領域についての結果を基本的分類とし、顔全体領域ではうまく細分化できない表情については、顔上部領域や顔下部領域のうちこれらを明確に分類できた結果があればそれを利用する態様を挙げることもできる。このように、三つのうち、どの結果をどの段階で利用するかをルール化しておくことにより、個人の特徴にきめ細かに対応した表情認識を行うことが可能となる。 Finally, a facial expression recognition result 220 is obtained by comprehensively judging these. Various judgments can be made. For example, a mode in which a majority decision is made and the semantic information of the dominant facial expression among the three determination results is employed. In addition, for example, the results for the whole face area are set as the basic classification. For facial expressions that cannot be subdivided well in the whole face area, if there is a result that can clearly classify these in the upper face area or lower face area, The aspect to utilize can also be mentioned. In this way, it is possible to perform facial expression recognition corresponding to individual characteristics in detail by setting rules as to which result is used at which stage among the three.

なお、以上に示した表情認識の技術は、様々な分野に応用することが可能である。例えば、表情特徴空間を記憶しておくことにより、顔の表情に影響を受けない個人認証（代表ベクトル抽出手法）を行うことができる。また、顔のパターン認識によるコンピュータ入力（例えばポインティングディバイスの操作）を行う場合にも、個人に適した顔のパターンを利用できるようになるため、特に、医療分野や福祉分野で高い効果を発揮することが期待できる。また、商品のモニター調査や心理テストなど感情情報を収集するような分野においても、本技術は有用なものとなろう。 The facial expression recognition technique described above can be applied to various fields. For example, by storing the facial expression feature space, it is possible to perform personal authentication (representative vector extraction method) that is not affected by facial expressions. Also, when performing computer input by face pattern recognition (for example, pointing device operation), it is possible to use a face pattern suitable for an individual, so that it is particularly effective in the medical field and the welfare field. I can expect that. The technology will also be useful in fields that collect emotional information such as product monitoring and psychological tests.

さらに、本実施の形態の技術は、顔画像データ以外のデータを対象とすることで、様々なデータの分類に転用できる。データの例としては、顔画像データ以外の一般の画像データや、音声データなどを挙げることができる。 Furthermore, the technique of the present embodiment can be diverted to various data classifications by targeting data other than face image data. Examples of data include general image data other than face image data, audio data, and the like.

本実施の形態のシステム構成を示す概略的な機能ブロック図である。It is a schematic functional block diagram which shows the system configuration | structure of this Embodiment. 表情特徴空間の形成時の処理を示すフローチャートである。It is a flowchart which shows the process at the time of formation of an expression feature space. 表情特徴空間の形成時の処理概要を示す図である。It is a figure which shows the process outline | summary at the time of formation of an expression feature space. 表情の分析により意味情報を決定する過程を説明する図である。It is a figure explaining the process which determines semantic information by analysis of a facial expression. 表情特徴空間の形成結果の例を示す図である。It is a figure which shows the example of the formation result of expression feature space. 異なる４人の被験者の表情特徴空間を示す図である。It is a figure which shows the facial expression feature space of four different test subjects. 新たな顔画像データに対する表情認識の過程を示すフローチャートである。It is a flowchart which shows the process of facial expression recognition with respect to new face image data. 顔全体、顔上部、顔下部の表情認識を統合する処理を示す図である。It is a figure which shows the process which integrates the facial expression recognition of the whole face, the upper face, and the lower face.

Explanation of symbols

１０表情認識システム、２０表情認識装置、２２信号入力処理部、２４表情学習処理部、２６表情特徴空間形成処理部、２８表情特徴空間データベース、３０表情認識処理部、３２表情判定処理部、３４認識結果出力部、４０撮像装置、５０表示端末、６０入力層、６２顔画像データ、６４マップ層、６６可視化画像、７２顔画像データ、７４分割境界。 DESCRIPTION OF SYMBOLS 10 facial expression recognition system, 20 facial expression recognition apparatus, 22 signal input processing part, 24 facial expression learning processing part, 26 facial expression feature space formation processing part, 28 facial expression feature space database, 30 facial expression recognition processing part, 32 facial expression judgment processing part, 34 recognition Result output unit, 40 imaging device, 50 display terminal, 60 input layer, 62 face image data, 64 map layer, 66 visualized image, 72 face image data, 74 division boundary.

Claims

Classification processing means for performing a learning operation for sequentially mapping a plurality of learning data to a map layer, evaluating classification characteristics of the obtained self-organizing map, and classifying the map layer into two or more categories; ,
Determining means for evaluating the classification characteristics to determine the necessity of reclassification of a certain category;
A hierarchizing unit that causes the classification processing unit to perform classification again for the learning data belonging to the category when the determination unit determines that reclassification is necessary;
A classification apparatus comprising:

The classification device according to claim 1,
The map layer includes a plurality of map layer units in which a combined load is set,
In the learning operation, the joint weight is changed so that similar learning data are mapped to the same or neighboring map layer units,
The classification characteristic is a characteristic about the connection weight obtained by the learning operation,
The determination unit is a classification device that evaluates the correlation of the coupling load between the map layer units and determines the necessity of reclassification.

The classification device according to claim 1,
The determination unit is a classification device that determines the necessity of reclassification of a category in a hierarchy based on the number of categories determined by classification up to a certain hierarchy.

The classification device according to claim 1,
The classification processing means classifies the map layer into two categories.

The classification device according to claim 4,
Means for classifying the data to be classified into a hierarchical classification structure of the formed binary tree;
Means for giving an index representing a hierarchical classification structure of a binary tree to the classified data to be classified;
Search means for searching for the classified data classified using the index;
A classification apparatus comprising:

The classification device according to claim 1,
The classification device, wherein the learning data is sensor data generated based on a detection result of the sensor.

The classification device according to claim 6,
The sensor data is a classification device that is human face image data captured by a camera.

The classification device according to claim 7,
A classification apparatus including a calculation unit that performs facial expression analysis on a visualized image generated from a combined load of a certain category and calculates semantic information that characterizes the category with reference to a preset correspondence table.

The classification device according to claim 8, wherein
A classification apparatus comprising inspection means for inspecting whether or not conflicting semantic information is calculated by the calculation means from an upper hierarchy category and a lower hierarchy category.

The classification device according to claim 8, wherein
Means for performing each processing by the classification processing means, the determination means, the hierarchy means, and the calculation means for partial data forming a part of the learning data;
Semantic information that characterizes this face image data by classifying new face image data into a classification structure based on the learning data and a classification structure based on the partial data, and comparing the semantic information obtained from each. Means for outputting
A classification apparatus comprising:

Computer
Classification processing means for performing a learning operation for sequentially mapping a plurality of learning data to a map layer, evaluating classification characteristics of the obtained self-organizing map, and classifying the map layer into two or more categories; ,
Determining means for evaluating the classification characteristics to determine the necessity of reclassification of a certain category;
A hierarchizing unit that causes the classification processing unit to perform classification again for the learning data belonging to the category when the determination unit determines that reclassification is necessary;
Classification program to make it function.

A method performed by a computer,
A classification processing procedure for performing a learning operation for sequentially mapping a plurality of learning data to a map layer, evaluating classification characteristics of the obtained self-organizing map, and classifying the map layer into two or more categories; ,
A determination procedure that evaluates the classification characteristics to determine the need for reclassification of a category;
A hierarchizing procedure for reclassifying the classification processing procedure for the learning data belonging to the category when it is determined by the determination means that reclassification is necessary;
Classification method including