JP2008123399A

JP2008123399A - Feeling recognition device, electronic equipment, feeling recognizing method, control program and recording medium

Info

Publication number: JP2008123399A
Application number: JP2006308727A
Authority: JP
Inventors: Michihiro Nagaishi; 道博長石
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2006-11-15
Filing date: 2006-11-15
Publication date: 2008-05-29

Abstract

<P>PROBLEM TO BE SOLVED: To provide a feeling recognition device, electronic equipment, a feeling recognizing method, a control program and a recording medium capable of easily recognizing feeling from a face image without performing large scale study processes or creation of a dictionary. <P>SOLUTION: A visual guide field, which is a "field" distributing over the perimeter of the face image, of which intensity depends on the distance from the image, and of which value gets larger as closer to the image is determined. The degree of complexity of a closed curved surface of an equipotential line is determined from the guide field. A function which is a correspondence of the degree of complexity and a potential value is acquired as distribution information of the guide field, and the feeling corresponding to the face image is distinguished based on the function. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、顔画像から感情を認識可能な感情認識装置、電子機器、感情認識方法、制御
プログラム及び記録媒体に関する。 The present invention relates to an emotion recognition device, an electronic device, an emotion recognition method, a control program, and a recording medium that can recognize an emotion from a face image.

従来、顔画像から顔の特徴量を抽出し、特徴量がどの感情に対応するかを事前にサンプ
ル調査して得た辞書データに基づいて判定する感情認識装置が知られている（例えば特許
文献１乃至４参照）。この種の感情認識装置には、顔の特徴量として、目、鼻、口、眉毛
等の顔部品を用いる方式や、周波数分析或いはフィルタで得られた画像解析結果等の数量
値を用いる方式のものがある。
特開２００４−５１３４６２号公報特開２００５−５１２２４８号公報特開２００５−２３４６８６号公報特開２００５−２９３５３９号公報 2. Description of the Related Art Conventionally, an emotion recognition device that extracts facial feature values from a face image and determines which emotion the feature value corresponds to based on dictionary data obtained by a sample survey in advance is known (for example, Patent Documents). 1 to 4). This type of emotion recognition device employs a method that uses facial parts such as eyes, nose, mouth, and eyebrows as facial features, and a method that uses quantity values such as image analysis results obtained by frequency analysis or filters. There is something.
JP 2004-513462 A JP 2005-512248 A JP 2005-234686 A JP 2005-293539 A

しかし、従来の構成では、得られた特徴量と感情との対応関係を得る課程、すなわち、
学習が大変である。学習が大変なのはパターン認識全体に言えるが、顔画像の場合、どの
顔画像がどんな表情かを一つ一つ手作業でインデックスを付け、それに対応する特徴量の
パラメータを、例えば、ニューラルネット等を用いて決定していく膨大な作業が必要にな
ってしまう。
特に感情の場合には、例えば、微苦笑といったように、一種類の感情に明確に区別でき
ないことが多いため、顔画像のサンプルを多数用意して複数の感情が混在する場合も区別
可能にする必要があり、識別用辞書の作成作業が膨大になる。
さらに、パターン認識自体が得られた特徴量と、識別用辞書との整合に時間がかかる上
に、感情認識の場合は、複数の感情が混在するために、照合に大変時間がかかってしまう
等、検討すべき課題が多く、システム構成の障害の一つになっている。 However, in the conventional configuration, the process of obtaining the correspondence between the obtained feature quantity and emotion, that is,
Learning is hard. Learning is difficult for the entire pattern recognition, but in the case of facial images, each facial image is indexed manually, and the corresponding feature parameter, for example, a neural network, is used. The enormous amount of work that is determined by using it becomes necessary.
Especially in the case of emotions, for example, it is often impossible to clearly distinguish one type of emotion, such as a bitter smile, so it is necessary to prepare a large number of facial image samples so that they can be distinguished even when multiple emotions are mixed There is an enormous amount of work for creating an identification dictionary.
In addition, it takes time to match the feature quantity obtained from pattern recognition itself with the dictionary for identification, and in the case of emotion recognition, since multiple emotions are mixed, it takes a lot of time to collate. There are many issues to consider, which is one of the obstacles to system configuration.

本発明は、上述した事情に鑑みてなされたものであり、大規模な学習処理や辞書作成を
行うことなく、顔画像から容易に感情を認識可能な感情認識装置、電子機器、感情認識方
法、制御プログラム及び記録媒体を提供することにある。 The present invention has been made in view of the above-described circumstances, and an emotion recognition device, an electronic device, an emotion recognition method, and the like that can easily recognize an emotion from a face image without performing a large-scale learning process or dictionary creation, A control program and a recording medium are provided.

上述課題を解決するため、本発明は、感情認識装置において、顔画像を入力する入力手
段と、前記顔画像の周囲に分布する「場」であって、その強さが顔画像からの距離に依存
し、顔画像に近いほど大きな値を持つ視覚の誘導場を求め、前記誘導場の分布情報に基づ
いて前記顔画像に対応する感情を判別する感情判別手段とを備えることを特徴とする。こ
の発明によれば、顔画像の視覚の誘導場を求め、この誘導場の分布情報に基づいて顔画像
に対応する感情を判別するので、大規模な学習処理や辞書作成を行うことなく、我々の物
の見方、感じ方に近い形で感情を判別することができる。 In order to solve the above-described problems, the present invention provides an emotion recognition apparatus including an input unit for inputting a face image and a “field” distributed around the face image, the strength of which is determined by the distance from the face image. And an emotion discriminating means for obtaining a visual guidance field having a larger value as it is closer to the face image and discriminating an emotion corresponding to the face image based on distribution information of the guidance field. According to the present invention, the visual guidance field of the face image is obtained, and the emotion corresponding to the face image is determined based on the distribution information of the guidance field, so that we do not perform large-scale learning processing or dictionary creation. Emotions can be discriminated in a way that is close to how you see and feel the object.

上記構成において、前記感情判別手段は、前記顔画像の視覚の誘導場から等ポテンシャ
ル線の閉曲面の複雑度を求め、この複雑度と等ポテンシャル値との対応関係である曲線情
報を、前記誘導場の分布情報として取得し、前記曲線情報に基づいて感情を判別すること
が好ましい。この構成によれば、複雑度とポテンシャル値との対応関係である曲線情報を
、誘導場の分布情報として取得するので、誘導場の分布を数値化した情報を容易に得るこ
とができる。 In the above configuration, the emotion discrimination means obtains the complexity of a closed surface of an equipotential line from the visual guidance field of the face image, and curve information that is a correspondence relationship between the complexity and the equipotential value is obtained as the guidance. It is preferable to acquire as field distribution information and discriminate emotion based on the curve information. According to this configuration, the curve information that is the correspondence relationship between the complexity and the potential value is acquired as the distribution information of the induction field, so that information that quantifies the distribution of the induction field can be easily obtained.

上記構成において、前記感情判別手段は、前記曲線情報を近似する関数を算出する関数
算出手段と、前記関数算出手段により算出される関数と、予め得たそれぞれの感情に対応
する関数との比較により感情を判定する感情判定手段とを有することが好ましい。この構
成によれば、関数算出手段により算出される関数と、予め得たそれぞれの感情に対応する
関数との比較により感情を判定するので、いずれの感情であるかを容易に判定することが
可能になる。 In the above-described configuration, the emotion discriminating unit is configured by comparing a function calculating unit that calculates a function that approximates the curve information, a function calculated by the function calculating unit, and a function corresponding to each emotion obtained in advance. It is preferable to have emotion determination means for determining emotion. According to this configuration, since the emotion is determined by comparing the function calculated by the function calculation means and the function corresponding to each emotion obtained in advance, it is possible to easily determine which emotion is the emotion. become.

この場合、前記関数特定手段は、シグモイド関数を算出することが好ましい。この構成
によれば、顔画像の視覚の誘導場の分布を示すのに好適なシグモイド関数で近似するので
、感情認識の判定精度を向上することができる。 In this case, it is preferable that the function specifying unit calculates a sigmoid function. According to this configuration, since it is approximated by a sigmoid function suitable for showing the visual guidance field distribution of the face image, it is possible to improve the determination accuracy of emotion recognition.

さらに、前記シグモイド関数は、複雑度をＣ、ポテンシャル値をｐとし、レンジをａ、
オフセット値をｂ、パラメータをＴ、ｐ０とした場合に、

で定義されるシグモイド関数であり、前記感情判定手段は、前記シグモイド関数のパラ
メータに基づいて感情を判定することが好ましい。この構成によれば、シグモイド関数の
パラメータに基づいて感情を容易に判定することができる。 Further, the sigmoid function has a complexity C, a potential value p, a range a,
When the offset value is b and the parameters are T and p0,

It is preferable that the emotion determination unit determines an emotion based on a parameter of the sigmoid function. According to this configuration, emotion can be easily determined based on the parameters of the sigmoid function.

また、上記構成において、前記感情判定手段は、前記関数算出手段により算出されたシ
グモイド関数と、予め得たそれぞれの感情に対応するシグモイド関数との比較により感情
を判定することが好ましい。この構成によれば、関数算出手段により算出されたシグモイ
ド関数と、予め得たそれぞれの感情に対応するシグモイド関数との比較によりいずれの感
情か否かを容易に判定することができる。
この場合、前記感情判定手段は、予め得たそれぞれの感情に対応するシグモイド関数の
うち、前記関数算出手段により算出されたシグモイド関数に似た一又は複数のシグモイド
関数を算出し、似たシグモイド関数が一つだけ存在した場合は、その一つのシグモイド関
数に対応する感情と判定し、似たシグモイド関数が複数存在した場合は、その複数のシグ
モイド関数に対応する複数の感情が混在した状態と判定するようにしてもよい。この構成
によれば、感情間の狭間にある状態も識別可能になる。 In the above configuration, it is preferable that the emotion determination unit determines an emotion by comparing the sigmoid function calculated by the function calculation unit with a sigmoid function corresponding to each emotion obtained in advance. According to this configuration, it is possible to easily determine which emotion is based on a comparison between the sigmoid function calculated by the function calculation unit and the sigmoid function corresponding to each emotion obtained in advance.
In this case, the emotion determination unit calculates one or a plurality of sigmoid functions similar to the sigmoid function calculated by the function calculation unit among the sigmoid functions corresponding to each emotion obtained in advance, and the similar sigmoid function If there is a single sigmoid function, it is determined as an emotion corresponding to that single sigmoid function, and if there are multiple similar sigmoid functions, it is determined that multiple emotions corresponding to the multiple sigmoid functions are mixed. You may make it do. According to this configuration, a state between emotions can be identified.

また、上記構成において、前記顔画像は二値画像であることが好ましい。この構成によ
れば、多値画像を感情認識する場合に比して計算量を低減することができ、判定速度の高
速化を図ることができる。また、本発明は、感情認識装置を備える電子機器に広く適用可
能である。 In the above configuration, the face image is preferably a binary image. According to this configuration, the amount of calculation can be reduced as compared with the case of emotion recognition of a multi-value image, and the determination speed can be increased. In addition, the present invention can be widely applied to electronic devices including an emotion recognition device.

また、本発明は、感情認識方法において、顔画像の周囲に分布する「場」であって、そ
の強さが顔画像からの距離に依存し、顔画像に近いほど大きな値を持つ視覚の誘導場を求
め、前記誘導場の分布情報に基づいて前記顔画像に対応する感情を認識することを特徴と
する。この発明によれば、顔画像の視覚の誘導場を求め、この誘導場の分布情報に基づい
て顔画像に対応する感情を判別するので、大規模な学習処理や辞書作成を行うことなく、
我々の物の見方、感じ方に近い形で感情を判別することができる。 Further, the present invention provides a visual guidance that is a “field” distributed around the face image in the emotion recognition method, the strength of which depends on the distance from the face image and has a larger value as the face image is closer. A field is obtained, and an emotion corresponding to the face image is recognized based on distribution information of the guidance field. According to the present invention, since the visual guidance field of the face image is obtained and the emotion corresponding to the face image is determined based on the distribution information of the guidance field, without performing a large-scale learning process or dictionary creation,
Emotions can be discriminated in a way that is close to how we see and feel our things.

また、本発明は、以上説明した感情認識装置、電子機器及び感情認識方法に適用する他
、この発明を実施するための制御プログラムを電気通信回線を介してダウンロード可能に
したり、そのようなプログラムを、磁気記録媒体、光記録媒体、半導体記録媒体といった
、コンピュータに読み取り可能な記録媒体に記憶して配布する、といった態様でも実施さ
れ得る。 In addition to being applied to the emotion recognition apparatus, electronic device, and emotion recognition method described above, the present invention makes it possible to download a control program for carrying out the present invention via an electric communication line, or to download such a program. The present invention can also be implemented in such a manner that it is stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, or a semiconductor recording medium and distributed.

本発明に係る感情認識装置、電子機器、感情認識方法、制御プログラム及び記録媒体に
よれば、顔画像の視覚の誘導場を求め、この誘導場の分布情報に基づいて顔画像に対応す
る感情を判別するので、大規模な学習処理や辞書作成を行うことなく、我々の物の見方、
感じ方に近い形で感情を判別することができる。 According to the emotion recognition device, the electronic device, the emotion recognition method, the control program, and the recording medium according to the present invention, the visual guidance field of the face image is obtained, and the emotion corresponding to the face image is obtained based on the distribution information of the guidance field. Because we discriminate, we do not have to do large-scale learning processing and dictionary creation,
Emotion can be discriminated in a form close to how it is felt.

以下、図面を参照して本発明の実施形態を詳述する。本実施形態では、本発明を顔画像
の表情から感情を認識する感情認識装置に適用した場合を例に説明する。この感情認識装
置１０では、「視覚の誘導場」という概念を用いて感情を認識するため、まず、視覚の誘
導場について説明する。
視覚の誘導場は、図形の周りに静電場のような場を仮定し、パターン認知などの視知覚
現象を説明する心理学的概念であり、横瀬善正著の“形の心理学”（名古屋大学出版会（
１９８６））に記載されている（以下、これを参考論文という）。
この参考論文では、視覚の誘導場（以下、単に誘導場と表記する）の分布の仕方が、例
えば、文字の類似性、錯視図形の解釈など、我々の物の見方、感じ方と関連するとしてい
る。この参考論文では、直線・円弧で構成された図形を対象としているため、任意のディ
ジタル画像の誘導場は求められない。ここでは、最初に白黒２値のディジタル画像（以下
、二値画像という）における誘導場の計算方法を示す。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, a case where the present invention is applied to an emotion recognition apparatus that recognizes emotion from facial expression is described as an example. In this emotion recognition device 10, in order to recognize emotions using the concept of “visual guidance field”, the visual guidance field will be described first.
The visual induction field is a psychological concept that explains the visual perception phenomenon such as pattern recognition, assuming a field like an electrostatic field around the figure. “Psychology of shape” by Yoshimasa Yokose (Nagoya University) Publication (
1986)) (hereinafter referred to as a reference paper).
In this reference paper, the distribution of visual guidance fields (hereinafter simply referred to as induction fields) is related to how we see and feel our objects, such as the similarity of letters and the interpretation of optical illusions. Yes. In this reference paper, since the target is a figure composed of straight lines and arcs, an induction field for an arbitrary digital image is not required. Here, a method for calculating an induction field in a black and white binary digital image (hereinafter referred to as a binary image) will be described first.

誘導場は基本的にクーロンポテンシャルと解釈できることから、パターンの外郭を構成
する画素を点電荷と仮定し、それらが作るクーロンポテンシャルの集積から、ディジタル
画像における誘導場の分布を計算する。 Since the induction field can be basically interpreted as a Coulomb potential, the pixels constituting the outline of the pattern are assumed to be point charges, and the distribution of the induction field in the digital image is calculated from the accumulation of the Coulomb potential generated by them.

図１はディジタル画像の画素配列を示す図である。図１に示すように、ｎ個の点列から
構成される曲線ｆ（ｓ）によって、任意の点Ｐに誘導場が形成されるとする。この曲線ｆ
（ｓ）は線図形の線分や画図形の輪郭線に相当する。そして、曲線ｆ（ｓ）を構成する各
点ｐ１，ｐ２，・・・，ｐｉ，・・・，ｐｎを正電荷１の点電荷と仮定し、点Ｐから曲線
ｆ（ｓ）上を走査して、曲線ｆ（ｓ）を構成するｎ個の点ｐ１，ｐ２，・・・，ｐｉ，・
・・，ｐｎが見つかり、走査して見つかった曲線ｆ（ｓ）上の各点までの距離をｒｉとす
ると、点Ｐにおける誘導場の強さＭｐは次のように定義される。 FIG. 1 is a diagram showing a pixel arrangement of a digital image. As shown in FIG. 1, it is assumed that an induction field is formed at an arbitrary point P by a curve f (s) composed of n point sequences. This curve f
(S) corresponds to a line segment of a line figure or an outline of an image figure. Then, each point p1, p2,..., Pi,..., Pn constituting the curve f (s) is assumed to be a point charge having a positive charge 1, and the curve f (s) is scanned from the point P. N points p1, p2,..., Pi,.
..., Where pn is found and the distance to each point on the curve f (s) found by scanning is ri, the strength Mp of the induction field at the point P is defined as follows.

この式（２）を用いることにより、任意のディジタル画像の誘導場を求めることができ
る。また、曲線が複数ある場合、点Ｐにおける誘導場の強さは個々の曲線が点Ｐにつくる
誘導場の和になる。なお、式（２）は点Ｐから発した光が直接当たる部分のみ和をとると
いう制約条件がつく。例えば、点Ｐに対して、曲線ｆ１（ｓ），ｆ２（ｓ），ｆ３（ｓ）
が図２に示すように存在しているとすると、点Ｐから見えない部分、つまり、この場合、
曲線ｆ１（ｓ）に遮蔽されて点Ｐから見えない範囲Ｚに存在する部分の和はとらない。こ
の図２の例では、曲線ｆ３（ｓ）のすべてと曲線ｆ２（ｓ）の一部の和はとらないことに
なる。これを、ここでは遮蔽条件という。

By using this equation (2), the induction field of an arbitrary digital image can be obtained. When there are a plurality of curves, the strength of the induction field at the point P is the sum of the induction fields created by the individual curves at the point P. It should be noted that Equation (2) has a constraint that only the portion directly irradiated with light emitted from the point P is summed. For example, for the point P, the curves f1 (s), f2 (s), f3 (s)
Is present as shown in FIG. 2, the part that cannot be seen from the point P, that is, in this case,
The sum of the portions present in the range Z that is shielded by the curve f1 (s) and cannot be seen from the point P is not taken. In the example of FIG. 2, the sum of all of the curve f3 (s) and a part of the curve f2 (s) is not taken. This is referred to herein as a shielding condition.

図３（Ａ）は「Ａ」という文字について、画素全てを電荷１の点電荷と仮定し、前述の
式（１）で計算した誘導場の例を示すものである。図３（Ａ）の文字「Ａ」周辺に地図の
等高線状に分布している細い線が誘導場における等ポテンシャル値を結んで描かれる等ポ
テンシャル線であり、中央から外に行くほど誘導場の強さ（ポテンシャル値）は弱くなり
、やがて０（零）に近づく。
図３（Ａ）の誘導場の分布の形状・強さにおける特徴、特に「Ａ」の頂点付近の分布が
他より鋭角な特徴は、前述の参考論文による四角形や三角形など、図形の角付近に関する
誘導場の分布の心理実験結果と一致する。 FIG. 3A shows an example of the induction field calculated by the above-described equation (1) assuming that all the pixels are point charges of charge 1 for the letter “A”. A thin line distributed in contour lines on the map around the letter “A” in FIG. 3A is an equipotential line drawn by connecting the equipotential values in the induction field. The strength (potential value) becomes weak and eventually approaches 0 (zero).
The characteristics in the shape and strength of the distribution of the induction field in FIG. 3A, particularly the characteristics near the apex of “A” are sharper than others, are related to the vicinity of the corner of the figure such as a rectangle or a triangle according to the aforementioned reference paper. It agrees with the result of psychological experiment on the distribution of the induction field.

また、図３（Ｂ）は、前述した遮蔽条件（任意の点Ｐから見えない範囲Ｚに存在する部
分の和はとらない）がなく、画素全てを電荷１の点電荷と仮定した誘導場の例であるが、
誘導場の分布は全体的に丸くなり、前述の参考論文による心理実験結果と異なったものと
なる。このように、遮蔽条件は誘導場を特徴づける上で重要なものとなる。 FIG. 3B shows an induction field in which all the pixels are assumed to be a point charge of charge 1 without the above-described shielding condition (the sum of the portions existing in the range Z invisible from an arbitrary point P is not taken). For example,
The distribution of the induction field is rounded as a whole, which is different from the result of the psychological experiment by the reference paper mentioned above. Thus, the shielding condition is important in characterizing the induction field.

このようにして、ある図形についての視覚の誘導場を得ることができる。なお、このよ
うな視覚の誘導場を用いた技術の例としては、例えば、「長石道博：“視覚の誘導場を用
いた読みやすい和文プロポーショナル表示”、映像メディア学会誌、Ｖｏｌ．５２，Ｎｏ
．１２，ｐｐ．１８６５−１８７２（１９９８」（以下、第１の論文という）や、「三好
正純、下塩義文、古賀広昭、井手口健：“視覚の誘導場理論を用いた感性にもとづく文字
配置の設計”、電子情報通信学会論文誌、８２−Ａ，９，１４６５−１４７３（１９９９
）」（以下、第２の論文という）がある。ちなみに、上述の第１の論文の著者は本発明の
発明者である。 In this way, a visual guidance field for a certain figure can be obtained. Examples of techniques using such visual guidance fields include, for example, “Michihiro Nagaishi:“ Easy-to-read Japanese proportional display using visual guidance fields ”, Journal of the Institute of Image Media and Technology, Vol. 52, No.
. 12, pp. 1865-1872 (1998) (hereinafter referred to as the first paper), Masazumi Miyoshi, Yoshifumi Shimoshiro, Hiroaki Koga, Ken Ideguchi: “Design of character layout based on sensibility using the visual induction field theory”, Electronics IEICE Transactions, 82-A, 9, 1465-1473 (1999)
”” (Hereinafter referred to as the second paper). Incidentally, the author of the first paper mentioned above is the inventor of the present invention.

本実施形態では、このような視覚の誘導場を利用することによって、今まで人間の直感
や手作業に頼っていた画像の感性評価が可能となり、より具体的には、顔画像の視覚の誘
導場の分布を評価することで、顔の表情から感情認識の自動化を可能にしている。
詳述すると、本実施形態では、感情認識対象の顔画像の誘導場から等ポテンシャル線の
閉曲面の凹凸の度合いを示す複雑度を求める。この複雑度は、等ポテンシャル値がｉの閉
曲線の複雑度をＣｉで表せば、次式（３）で求められる。 In this embodiment, by using such a visual guidance field, it is possible to evaluate the sensitivity of images that have been relied on human intuition and manual work, and more specifically, visual guidance of facial images. By evaluating the field distribution, it is possible to automate emotion recognition from facial expressions.
More specifically, in this embodiment, the complexity indicating the degree of unevenness of the closed surface of the equipotential line is obtained from the induction field of the face image to be emotion recognition. This complexity can be obtained by the following equation (3) when the complexity of a closed curve having an equipotential value i is represented by Ci.

ここで、Ｌｉは、等ポテンシャル値ｉの等ポテンシャル面の周囲長であり、Ｓｉは面積
である。なお、周囲長Ｌｉは、等ポテンシャル面の輪郭を構成するドット数と考えること
ができ、面積Ｓｉは、等ポテンシャル面に存在するドット数と考えることができる。

Here, Li is the perimeter of the equipotential surface having the equipotential value i, and Si is the area. The peripheral length Li can be considered as the number of dots constituting the contour of the equipotential surface, and the area Si can be considered as the number of dots existing on the equipotential surface.

この式（３）によれば、周囲長Ｌｉが長く、面積Ｓｉが小さいほど、複雑度Ｃｉの値が
大きくなり、つまり、等ポテンシャル線の凹凸が多い程、複雑度Ｃｉの値が大きくなる。
そして、複雑度Ｃｉと等ポテンシャル値ｉ（以下、等ポテンシャル値ｉに限定されない
ポテンシャル値をｐと表記する）とをグラフにして特性曲線を示すことによって、「対象
画像の視覚の誘導場の分布」を図表化することができる。 According to this equation (3), the longer the circumference length Li and the smaller the area Si, the larger the value of the complexity Ci, that is, the greater the equipotential line unevenness, the greater the value of the complexity Ci.
Then, a characteristic curve is shown by graphing the complexity Ci and the equipotential value i (hereinafter, the potential value that is not limited to the equipotential value i). Can be charted.

顔画像の場合、「顔画像の視覚の誘導場の分布」（分布情報）である特性曲線は、おお
よそ単調増加関数になり、この関数、すなわち、「顔画像の視覚の誘導場の分布」を示す
曲線は、シグモイド関数で近似することができる。
図４は、シグモイド関数の一例を示している。シグモイド関数は、特に、同図に符号α
で示す立ち上がり部分に特徴があり、この立ち上がり部分を含めた曲線全体が、顔画像の
場合の複雑度Ｃとポテンシャル値ｐとの対応関係を示す特性曲線の傾向と良く似ている。
このため、本実施形態では、複雑度Ｃとポテンシャル値ｐとの対応関係を示す曲線情報
として、つまり、「顔画像の視覚の誘導場の分布」（分布情報）としてシグモイド関数を
得ることによって、「顔画像の視覚の誘導場の分布」を数値化した情報を容易に得ること
ができる。 In the case of a face image, the characteristic curve that is “distribution of the visual induction field of the face image” (distribution information) is approximately a monotonically increasing function, and this function, that is, “the distribution of the visual induction field of the facial image” The curve shown can be approximated by a sigmoid function.
FIG. 4 shows an example of the sigmoid function. In particular, the sigmoid function has the symbol α
And the entire curve including the rising portion is very similar to the tendency of the characteristic curve indicating the correspondence between the complexity C and the potential value p in the case of a face image.
For this reason, in the present embodiment, by obtaining a sigmoid function as curve information indicating the correspondence relationship between the complexity C and the potential value p, that is, as “the distribution of the visual induction field of the face image” (distribution information), Information obtained by quantifying the “distribution of the visual induction field of the face image” can be easily obtained.

次に、感情毎の「顔画像の視覚の誘導場の分布」の算出方法の概要を説明する。ここで
は、基本的な感情のうち、「怒り」、「普通」、「喜び」のシグモイド関数を求める場合
を例に説明する。
図５（Ａ）は、怒りの表情の顔画像の一例を示し、図５（Ｂ）はその顔画像の誘導場を
示す図である。また、図５（Ｃ）は、普通の表情の顔画像の一例を示し、図５（Ｄ）はそ
の顔画像の誘導場を示す図である。また、図５（Ｅ）は、怒りの顔画像の一例を示し、図
５（Ｆ）はその顔画像の誘導場を示す図である。なお、これら顔画像は、説明を分かり易
くするため、目、鼻、口の顔部品のみの極めて簡略化した画像例を示している。
「怒り」、「普通」及び「喜び」のシグモイド関数を計算する場合、まず、図５（Ａ）
（Ｃ）（Ｄ）に示す「怒り」、「普通」及び「喜び」の顔画像について、図５（Ｂ）（Ｄ
）（Ｆ）に示すように、誘導場を各々計算し、各誘導場について、等ポテンシャル値ｉ毎
に複雑度Ｃｉを求め、これを複雑度Ｃとポテンシャル値ｐの関係を最小二乗法を用いて、
次式（４）で示すシグモイド関数で各々近似することによって得られる。 Next, an outline of a method for calculating the “distribution of the visual induction field of the face image” for each emotion will be described. Here, the case where the sigmoid functions of “anger”, “normal”, and “joy” are obtained from basic emotions will be described as an example.
FIG. 5A shows an example of a face image of an angry expression, and FIG. 5B is a diagram showing a guidance field of the face image. FIG. 5C shows an example of a face image having a normal expression, and FIG. 5D shows a guidance field of the face image. FIG. 5E shows an example of an angry face image, and FIG. 5F shows a guidance field of the face image. Note that these face images show extremely simplified image examples of only the face parts of the eyes, nose and mouth for easy understanding.
When calculating the sigmoid functions of “anger”, “normal”, and “joy”, first, FIG.
(C) For the face images of “anger”, “normal” and “joy” shown in (D), FIG.
) (F) As shown in (F), each induction field is calculated, and for each induction field, the complexity Ci is obtained for each equipotential value i, and the relationship between the complexity C and the potential value p is calculated using the least square method. And
It is obtained by approximating each with a sigmoid function expressed by the following equation (4).

ここで、ａはレンジであり、ｐ０及びＴはパラメータであり、ｂはオフセット値である
。

Here, a is a range, p0 and T are parameters, and b is an offset value.

上記算出に用いる顔画像のデータは、元画像となる撮影時の画像データ（カラー画像又
はグレースケール画像等の複数色、複数階調を有する画像）に対し、エッジを取り出して
、各画素を白と黒とに二値化した二値化画像のデータが使用される。
図６（Ａ）（Ｂ）は、元画像を異なるしきい値で二値化した顔画像を示している。具体
的には、上記二値化の際には、顔の眼、鼻、口等の主要な顔部品が欠落しない程度のしき
い値に設定され、図６（Ｂ）に示すように、主要な顔部品を除く部分ができるだけ除去さ
れた二値画像に変換することが好ましい。この二値化画像は更にノイズ除去処理と孤立点
の除去処理とが施され、最終的に、図６（Ｃ）に示すように、眉、眼、鼻、口だけが抽出
された二値画像に変換される。なお、二値画像は、眼と鼻と口だけが判別可能な画像であ
ってもよい。これによって、主要な顔部品を抽出した二値画像を得ている。 The face image data used for the above calculation is obtained by taking an edge of image data (image having a plurality of colors and a plurality of gradations such as a color image or a gray scale image) at the time of shooting as an original image, And binarized image data binarized into black and black.
6A and 6B show face images obtained by binarizing the original image with different threshold values. Specifically, at the time of the above binarization, the threshold value is set to such a degree that main face parts such as the eyes, nose and mouth of the face are not lost, and as shown in FIG. It is preferable to convert the image into a binary image from which parts other than the facial parts are removed as much as possible. This binarized image is further subjected to noise removal processing and isolated point removal processing, and finally, as shown in FIG. 6C, a binary image in which only eyebrows, eyes, nose, and mouth are extracted. Is converted to The binary image may be an image in which only eyes, nose, and mouth can be distinguished. Thereby, a binary image obtained by extracting main face parts is obtained.

このようにして得られた６つの基本的な感情（普通、悲しみ、嫌悪、喜び、怒り、驚き
）の顔画像のシグモイド関数の一例を図７に示す。この図７において、曲線ｈ１（ｐ）が
、「普通」の表情をした顔画像のシグモイド関数を示し、曲線ｈ２（ｐ）が、「悲しみ」
の場合のシグモイド関数を示し、曲線ｈ３（ｐ）が「嫌悪」の場合のシグモイド関数を示
し、曲線ｈ４（ｐ）が、「喜び」の場合のシグモイド関数を示し、曲線ｈ５（ｐ）が、「
怒り」の場合のシグモイド関数を示し、曲線ｈ６（ｐ）が「驚き」の場合のシグモイド関
数を示している。 An example of the sigmoid function of the face image of the six basic emotions (normal, sadness, disgust, joy, anger, surprise) obtained in this way is shown in FIG. In FIG. 7, a curve h1 (p) represents a sigmoid function of a face image having a “normal” expression, and a curve h2 (p) represents “sadness”.
The sigmoid function in the case of, the curve h3 (p) indicates the sigmoid function in the case of "disgust", the curve h4 (p) indicates the sigmoid function in the case of "joy", and the curve h5 (p) "
The sigmoid function in the case of “anger” is shown, and the sigmoid function in the case where the curve h6 (p) is “surprise” is shown.

「普通」の場合の曲線ｈ１（ｐ）を基準に考えると、「怒り」や「驚き」といった強い
感情の曲線ｈ５（ｐ），ｈ６（ｐ）の立ち上がりは急峻であり、「喜び」の曲線ｈ４（ｐ
）の複雑度Ｃがやや大きく、「嫌悪」や「悲しみ」といったネガティブな感情の曲線ｈ３
（ｐ），ｈ２（ｐ）の複雑度Ｃが小さい、といったように、感情により曲線の特性（形状
）が異なることが判る。
このように、感情毎に「顔画像の視覚の誘導場の分布」を示すシグモイド関数が異なる
ことに鑑み、本実施形態では、「顔画像の視覚の誘導場の分布」をシグモイド関数で近似
し、予め得たそれぞれの感情に対応する「顔画像の視覚の誘導場の分布」を示すシグモイ
ド関数との比較により、感情を容易に推定することが可能になる。 Considering the curve h1 (p) in the case of “normal” as a reference, the rises of the strong emotional curves h5 (p) and h6 (p) such as “anger” and “surprise” are steep, and the curve of “joy” h4 (p
) Complexity C is slightly larger, negative emotional curve h3 such as “disgust” or “sadness”
It can be seen that the characteristic (shape) of the curve differs depending on the emotion such that the complexity C of (p) and h2 (p) is small.
In this way, in view of the fact that the sigmoid function indicating “the distribution of the visual induction field of the face image” is different for each emotion, in this embodiment, the “distribution of the visual induction field of the face image” is approximated by a sigmoid function. The emotion can be easily estimated by comparison with a sigmoid function indicating the “distribution of the visual induction field of the face image” corresponding to each emotion obtained in advance.

例えば、「怒り」は、上述のパラメータＴが０．０４近辺の値となり、上述のレンジａ
が３００以上といったように特定することができ、この条件を満たすかどうかで「怒り」
の感情か否かを判定することができる。実際には、同じ感情でもシグモイド関数の曲線が
多少異なる場合が生じるので、予め複数種類の感情の顔画像の誘導場を計算し、複雑度を
求めてプロットすることによって感情毎のシグモイド関数の範囲を求めておくことが好ま
しい。この作業によって、例えば、図８（Ａ）に概念的に示すように、ある感情Ｉの範囲
が決まり、図８（Ｂ）に示すように、感情Ｉのパラメータの範囲が定まる。同様にして、
他の感情ＩＩのパラメータの範囲も定めることができる。本実施形態では、複数の感情毎
のパラメータ範囲を内部メモリに格納しておき、いずれのパラメータ範囲に属するかどう
かを判定することにより、感情を精度良く判別可能にしている。 For example, “anger” indicates that the above-described parameter T is a value around 0.04, and the above-described range a
Can be specified such as 300 or more, "anger" depending on whether this condition is met
It is possible to determine whether or not it is an emotion. Actually, the sigmoid function curves may be slightly different even for the same emotion, so the range of the sigmoid function for each emotion can be calculated by calculating the induction field of face images of multiple types of emotions in advance and plotting them for complexity. Is preferably obtained. By this work, for example, the range of a certain emotion I is determined as conceptually shown in FIG. 8A, and the parameter range of the emotion I is determined as shown in FIG. 8B. Similarly,
Other emotional II parameter ranges can also be defined. In this embodiment, the parameter range for each of a plurality of emotions is stored in the internal memory, and the emotions can be accurately discriminated by determining which parameter range they belong to.

また、本構成では、図８（Ａ）に示すように、感情毎の範囲が、複雑度Ｃとパラメータ
ｐの二次元平面で特定されるので、出現密度の大きいエリアを中心に範囲を区分けすれば
、十分に精度の高い範囲に容易に区分けすることができる。
これに対し、一般にパターン認識では、ｎ次元空間（ｎ＝６０以上）を区分けする必要
が生じるため、非常に沢山のサンプルを偏りなく集め、かつ、機械学習でうまく区分けを
決定する必要が生じ、その作業が繁雑化すると共に、学習処理にも時間がかかってしまう
。従って、本構成では、感情毎の範囲の区分け作業が、一般のパターン認識の場合に比し
て圧倒的に簡素化され、また、学習処理をしなくても容易かつ短時間で感情毎の範囲を定
めることが可能である。 Further, in this configuration, as shown in FIG. 8A, since the range for each emotion is specified by a two-dimensional plane of complexity C and parameter p, the range is divided around an area having a high appearance density. Thus, it can be easily classified into a sufficiently accurate range.
On the other hand, generally, in pattern recognition, it is necessary to partition an n-dimensional space (n = 60 or more). Therefore, it is necessary to collect a large number of samples without bias and to determine the partitioning well by machine learning. The work becomes complicated and the learning process takes time. Therefore, in this configuration, the division work of the range for each emotion is overwhelmingly simplified compared to the case of general pattern recognition, and the range for each emotion is easily and in a short time without learning processing. Can be determined.

図９は、本実施形態に係る感情認識装置１０の機能構成を示すブロック図である。この
感情認識装置１０は、電子化された顔画像の誘導場を計算し、誘導場の分布に基づいて感
情を判定し、その結果を表示する装置である。
詳述すると、この感情認識装置１０は、判定対象となる電子化された画像を入力する顔
画像入力部（入力手段）１１と、この顔画像入力部１１に入力された画像から顔画像を認
識して抽出する顔領域抽出部（領域抽出手段）１２と、顔領域抽出部１２で抽出された顔
領域から感情を判別する感情判別部（感情判別手段）１３と、感情判別部１３の判別結果
を表示する表示部（出力手段）１４とを備えている。 FIG. 9 is a block diagram illustrating a functional configuration of the emotion recognition apparatus 10 according to the present embodiment. This emotion recognition device 10 is a device that calculates the induced field of an electronic face image, determines emotion based on the distribution of the induced field, and displays the result.
More specifically, the emotion recognition apparatus 10 recognizes a face image from a face image input unit (input means) 11 for inputting an electronic image to be determined and an image input to the face image input unit 11. The face region extraction unit (region extraction unit) 12 to be extracted, the emotion discrimination unit (emotion discrimination unit) 13 that discriminates emotions from the face area extracted by the face region extraction unit 12, and the discrimination results of the emotion discrimination unit 13 The display part (output means) 14 which displays is provided.

顔画像入力部１１は、複数色及び複数階調の画像（以下、多値画像という）を入力し、
顔の眼、鼻、口等の主要な顔部品が欠落しない程度のしきい値で二値化する前述の二値化
処理等を施して二値画像に変換するものである。この顔画像入力部１１への画像の入力方
法は、無線又は有線による通信で入力してもよいし、記録媒体に記録された画像データを
読み取って入力してもよい。また、この顔画像入力部１１が、撮影機能を有し、撮影によ
り得た画像データを直接入力するものであってもよい。
なお、顔画像入力部１１が、顔の眼、鼻、口等の主要な顔部品が欠落しない程度のしき
い値で二値化された二値画像を直接入力するものであってもよい。 The face image input unit 11 inputs images of a plurality of colors and gradations (hereinafter referred to as multi-value images),
The image is converted into a binary image by performing the above-described binarization processing that binarizes with a threshold value such that major facial parts such as the eyes, nose, and mouth of the face are not lost. As an image input method to the face image input unit 11, the image may be input by wireless or wired communication, or may be input by reading image data recorded on a recording medium. Further, the face image input unit 11 may have a shooting function and directly input image data obtained by shooting.
Note that the face image input unit 11 may directly input a binary image binarized with a threshold value such that main face parts such as the eyes, nose, and mouth of the face are not lost.

顔領域抽出部１２は、肌色利用方式や顔部品検出方式等の公知の顔認識技術を利用して
顔領域を認識するものである。なお、肌色利用方式は、顔画像が肌色であることを利用し
て、色の分布を基準に肌色部分を顔領域と認識する方式であり、顔部品検出方式は、画像
中のエッジを基準に顔部品（眼、鼻、口等）を検出し、顔部品のある領域を顔領域と認識
する方式である。 The face area extraction unit 12 recognizes a face area using a known face recognition technique such as a skin color utilization method or a face part detection method. The skin color utilization method is a method for recognizing a skin color part as a face area based on the color distribution by utilizing the fact that the face image is a skin color, and the face part detection method is based on the edge in the image. This is a method of detecting a facial part (eye, nose, mouth, etc.) and recognizing an area with the facial part as a facial area.

感情判別部１３は、誘導場計算部（計算手段）２１と、誘導場分布評価部２２と、判定
部（判定手段）２３とを備えている。誘導場計算部２１は、顔領域抽出部１２で抽出され
た顔領域の誘導場を計算するものである。
誘導場分布評価部２２は、誘導場計算部２１で計算された誘導場から「視覚の誘導場の
分布」を得て評価するものである。具体的には、この誘導場分布評価部２２は、視覚の誘
導場に対し、式（３）により等ポテンシャル値ｉ毎に複雑度Ｃｉを求め、複雑度Ｃとポテ
ンシャル値ｐの関係を近似するシグモイド関数をそれぞれ計算により求める。これによっ
て、誘導場分布評価部２２は、「視覚の誘導場の分布」を示すシグモイド関数（顔画像用
関数）を算出する関数算出手段として機能する。 The emotion determination unit 13 includes a guidance field calculation unit (calculation unit) 21, a guidance field distribution evaluation unit 22, and a determination unit (determination unit) 23. The guidance field calculation unit 21 calculates the guidance field of the face area extracted by the face area extraction unit 12.
The guidance field distribution evaluation unit 22 obtains and evaluates “visual guidance field distribution” from the guidance field calculated by the guidance field calculation unit 21. Specifically, the induction field distribution evaluation unit 22 obtains the complexity Ci for each equipotential value i by the equation (3) for the visual induction field, and approximates the relationship between the complexity C and the potential value p. Each sigmoid function is calculated. Thus, the guidance field distribution evaluation unit 22 functions as a function calculation unit that calculates a sigmoid function (face image function) indicating “visual guidance field distribution”.

判定部２３は、誘導場計算部２１で計算されたシグモイド関数と、予め得たそれぞれの
感情（例えば、上述した６つの基本的な感情）に対応するシグモイド関数との比較により
感情を判定する感情判定手段として機能する。このように、感情判別部１３を構成する誘
導場計算部２１、誘導場分布評価部２２及び判定部２３は、上述した演算処理を行う演算
部で構成することができる。このため、実際には、感情判別部１３の各部は、上記演算処
理をハードウェア処理で行う一又は複数の半導体集積回路で構成してもよいし、若しくは
、ソフトウェア処理で行うＣＰＵやＲＯＭやＲＡＭといった汎用のコンピュータで構成し
てもよいし、又は、視覚の誘導場の計算処理等の比較的重い演算処理はハードウェア処理
で行い、感情判定等の比較的軽い演算処理はソフトウェア処理で行うように構成してもよ
い。 The determination unit 23 determines the emotion by comparing the sigmoid function calculated by the induction field calculation unit 21 and the sigmoid function corresponding to each of the emotions obtained in advance (for example, the six basic emotions described above). It functions as a determination means. As described above, the guidance field calculation unit 21, the guidance field distribution evaluation unit 22, and the determination unit 23 included in the emotion determination unit 13 can be configured by a calculation unit that performs the calculation process described above. Therefore, in actuality, each unit of the emotion discrimination unit 13 may be configured by one or a plurality of semiconductor integrated circuits that perform the above arithmetic processing by hardware processing, or a CPU, ROM, or RAM that performs software processing. It may be configured by a general-purpose computer such as, or relatively heavy calculation processing such as visual guidance field calculation processing is performed by hardware processing, and relatively light calculation processing such as emotion determination is performed by software processing. You may comprise.

次に、感情認識装置１０の処理フローを図１０に示すフローチャートを参照しながら説
明する。まず、入力した二値画像に対し、顔領域抽出部１２によって顔領域が抽出される
と、誘導場計算部２１は、顔領域の誘導場を計算し（ステップＳ１１）。誘導場分布評価
部２２は、計算された誘導場の等ポテンシャル面毎の複雑度Ｃを計算する処理を開始する
。
ここで、上述した式（２）に定めた誘導場の計算定義式は、最小の画素距離が１の場合
、場の強さは０（零）から１の範囲となる。複雑度Ｃを計算するポテンシャル値ｐの範囲
は、多いほど後のシグモイド関数の近似精度が高まるが、計算時間の短縮化の観点から、
感情認識に有益な最小限の範囲に留めることが好ましい。
そこで、本実施形態では、複雑度Ｃの計算に際し、ポテンシャル値ｐは、０（零）以上
、かつ、１未満の区間で適当な範囲、例えば、ポテンシャル値ｐの最小値ｐ１を０．０３
とし、分解能の値Δｐを０．０１ステップとすることによって計算量を低減している。 Next, the processing flow of the emotion recognition apparatus 10 will be described with reference to the flowchart shown in FIG. First, when a face area is extracted from the input binary image by the face area extraction unit 12, the guidance field calculation unit 21 calculates a guidance field of the face area (step S11). The induction field distribution evaluation unit 22 starts a process of calculating the complexity C for each equipotential surface of the calculated induction field.
Here, in the calculation definition formula of the induction field defined in the above formula (2), when the minimum pixel distance is 1, the field strength is in the range of 0 (zero) to 1. As the range of the potential value p for calculating the complexity C increases, the approximation accuracy of the later sigmoid function increases. From the viewpoint of shortening the calculation time,
It is preferable to keep the minimum range useful for emotion recognition.
Therefore, in the present embodiment, when calculating the complexity C, the potential value p is an appropriate range in a section of 0 (zero) or more and less than 1, for example, the minimum value p1 of the potential value p is set to 0.03.
And the amount of calculation is reduced by setting the resolution value Δp to 0.01 steps.

具体的には、誘導場分布評価部２２は、まず、ポテンシャル値ｐを最小値ｐ１に設定し
（ステップＳ１２）、このポテンシャル値ｐ（＝ｐ１）の等ポテンシャル面を抽出し（ス
テップＳ１３）、等ポテンシャル値ｉ（＝ｐ１）の閉曲面の周囲長Ｌｉ及び面積Ｓｉを求
め、式（３）により複雑度Ｃｉを計算する（ステップＳ１４）。
続いて、誘導場分布評価部２２は、ポテンシャル値ｐが最大値「１」未満であれば（ス
テップＳ５：ＹＥＳ）、ポテンシャル値ｐに分解能の値Δｐを加算し（ステップＳ１６）
、上述したステップＳ３の等ポテンシャル面の抽出処理と、ステップＳ１４の複雑度Ｃの
計算処理とを繰り返すことにより、ポテンシャル値ｐが最小値ｐ１から最大値「１」の範
囲で分解能Δｐ単位で複雑度Ｃを計算する。なお、最大値「１」の範囲まで複雑度ｃを計
算する場合に限らず、最大値を値「１」以下の値に変更してもよい。 Specifically, the induction field distribution evaluation unit 22 first sets the potential value p to the minimum value p1 (step S12), and extracts an equipotential surface of the potential value p (= p1) (step S13). The perimeter length Li and the area Si of the closed curved surface having the equipotential value i (= p1) are obtained, and the complexity Ci is calculated by the equation (3) (step S14).
Subsequently, if the potential value p is less than the maximum value “1” (step S5: YES), the induction field distribution evaluation unit 22 adds the resolution value Δp to the potential value p (step S16).
By repeating the equipotential surface extraction process of step S3 and the calculation process of complexity C of step S14, the potential value p is complex in units of resolution Δp in the range from the minimum value p1 to the maximum value “1”. Calculate degree C. Note that the maximum value may be changed to a value equal to or less than the value “1” without being limited to the case where the complexity c is calculated up to the range of the maximum value “1”.

そして、ポテンシャル値ｐが最大値「１」に達すると（ステップＳ１５：ＮＯ）、誘導
場分布評価部２２は、得られた複数の複雑度Ｃと、各複雑度Ｃに対応するポテンシャル値
ｐとから、最小二乗法を用いて式（４）で示すシグモイド関数を決めるパラメータ（レン
ジａ、オフセット値ｂ、パラメータｐ０、Ｔ）を各々決定し、近似するシグモイド関数を
求める（ステップＳ１７）。このステップＳ１７の処理の際には、理論値からの誤差（残
差）の平方和を計算することにより、誤差の分散、つまり、近似誤差が求められる。 When the potential value p reaches the maximum value “1” (step S15: NO), the induced field distribution evaluation unit 22 determines the obtained complexity C and the potential value p corresponding to each complexity C. From these, parameters (range a, offset value b, parameters p0, T) for determining the sigmoid function shown in Expression (4) are determined using the least square method, and an approximate sigmoid function is obtained (step S17). In the process of step S17, the error variance, that is, the approximate error is obtained by calculating the sum of squares of the error (residual) from the theoretical value.

この場合、近似誤差が、顔らしい画像か否かを示す情報に相当する。このため、近似誤
差が大きく、つまり、関数の相関が非常に低く、シグモイド関数で近似するのが困難な場
合は、対象が顔でない、或いは、非常に画像が劣化して表情認識できない状態と考えられ
る。なお、具体的閾値は、実サンプルを使って実験的に決定すればよい。このような場合
は、判定部２３が、認識困難な旨を表示部１４に表示することによって使用者に警告を報
知する。 In this case, the approximation error corresponds to information indicating whether the image is a face-like image. For this reason, if the approximation error is large, that is, the correlation of the function is very low and it is difficult to approximate with a sigmoid function, it is considered that the target is not a face, or the facial expression cannot be recognized because the image is very deteriorated. It is done. The specific threshold may be determined experimentally using an actual sample. In such a case, the determination unit 23 notifies the user of a warning by displaying on the display unit 14 that recognition is difficult.

一方、近似誤差が小さい場合、判定部２３は、予め得た感情毎のシグモイド関数のうち
、近似したシグモイド関数に似ている一つのシグモイド関数を算出し、つまり、近似する
シグモイド関数のパラメータ（レンジａ、オフセット値ｂ、パラメータｐ０、Ｔ等）が、
内部メモリに格納された感情毎のパラメータ範囲のいずれに該当するか否かを判定するこ
とにより、感情を判定し（ステップＳ１８）、判定結果を画像やメッセージで表示部１４
に表示し、顔画像の感情（表情）を使用者に通知することができる。 On the other hand, when the approximation error is small, the determination unit 23 calculates one sigmoid function similar to the approximated sigmoid function among the sigmoid functions obtained for each emotion in advance, that is, the parameter (range) of the approximate sigmoid function. a, offset value b, parameters p0, T, etc.)
Emotion is determined by determining which of the parameter ranges for each emotion stored in the internal memory corresponds (step S18), and the determination result is displayed as an image or message on the display unit 14
It is possible to notify the user of the emotion (expression) of the face image.

以上説明したように、本実施形態によれば、顔画像の視覚の誘導場を求め、この誘導場
の分布情報をシグモイド関数で近似し、近似されたシグモイド関数に基づいて顔画像に対
応する感情を判別するので、我々の物の見方、感じ方に近い形で顔画像の感情（表情）を
判別することができ、しかも、近似したシグモイド関数から容易に感情認識することがで
きるので、感情認識の高速化（短時間化）が可能である。
また、顔画像の視覚の誘導場の分布を示すのに好適なシグモイド関数で近似するので、
感情認識の判定精度を向上することができる。 As described above, according to the present embodiment, the visual guidance field of the face image is obtained, the distribution information of this guidance field is approximated by a sigmoid function, and the emotion corresponding to the face image based on the approximated sigmoid function Because it is possible to discriminate emotions (facial expressions) in facial images in a way that is close to how we see and feel our objects, and it is also possible to easily recognize emotions from the approximate sigmoid function. Can be made faster (shorter time).
Moreover, since it approximates with the sigmoid function suitable for showing the distribution of the visual induction field of the face image,
The determination accuracy of emotion recognition can be improved.

この場合、近似されたシグモイド関数と、予め得たそれぞれの感情に対応するシグモイ
ド関数との比較により感情が判別され、より具体的には、シグモイド関数のパラメータが
いずれの感情のパラメータの範囲に属するか否かにより感情が判別されるので、パラメー
タと感情との対応関係を算出する作業が、二次元平面の識別作業で済む。このため、サン
プル数が膨大にしなくてもよいし、上記算出作業をニューラルネット等を用いて行うよう
にした場合でも、その処理時間が大幅に少なくて済む。
これにより、本実施形態では、一般のパターン認識に比して、大規模な学習処理や辞書
作成を行う必要がない分、認識前の事前作業が少なくて済むといった効果も奏する。 In this case, the emotion is determined by comparing the approximated sigmoid function and the sigmoid function corresponding to each emotion obtained in advance, and more specifically, the parameters of the sigmoid function belong to any emotion parameter range. Since the emotion is discriminated based on whether or not it is, the task of calculating the correspondence between the parameter and the emotion is a two-dimensional plane identification task. For this reason, the number of samples does not have to be enormous, and even when the calculation operation is performed using a neural network or the like, the processing time can be significantly reduced.
Thereby, in this embodiment, compared with general pattern recognition, since it is not necessary to perform a large-scale learning process and dictionary creation, there is an effect that less pre-recognition work is required.

なお、本発明は上述の実施形態に限定されるものではなく、本発明の目的を達成できる
範囲での変形、改良などは本発明に含まれるものである。例えば、上述の実施形態では、
本発明を、二値画像の顔画像を感情認識する場合に適用する場合を説明したが、これに限
らず、多値画像の顔画像を感情認識する場合にも適用可能である。 In addition, this invention is not limited to the above-mentioned embodiment, The deformation | transformation in the range which can achieve the objective of this invention, improvement, etc. are included in this invention. For example, in the above embodiment,
Although the case where the present invention is applied to the case where emotion recognition is performed on a face image of a binary image has been described, the present invention is not limited thereto, and the present invention can also be applied to the case where emotion recognition is performed on a face image of a multi-value image.

以下、多値画像の顔画像を感情認識する場合を説明する。一般にディジタル機器は基本
的な色としてＲ（赤）・Ｇ（緑）・Ｂ（青）を採用しているものが多いので、色はこれの
ＲＧＢの組み合わせで表現されるものとする。なお、ＲＧＢはそれぞれが０から２５５ま
で変化するものとし、これらの組み合わせで色を表現するものとする。ちなみに、黒はＲ
＝Ｇ＝Ｂ＝２５５の組み合わせ、白はＲ＝Ｇ＝Ｂ＝０の組み合わせであり、それらの中間
の値を有するＲ＝Ｇ＝Ｂの組み合わせは無彩色（グレー）である。このように、ＲＧＢに
より色だけではなく階調も表現することができる。
この場合、特開２００４−１７１１１５号公報（以下、これを参考技術文献という）の
技術を適用して以下のように計算される。詳述すると、図１において、各点ｐ１，ｐ２，
・・・，ｐｉ，・・・，ｐｎの電荷はＲ，Ｇ，Ｂの階調（例えば、０から２５５）の影響
を受けるため、各電荷をＱｉとすると、点Ｐにおける誘導場の強さＭｐは式（５）のよう
に定義される。 Hereinafter, the case where emotion recognition is performed on a face image of a multi-valued image will be described. In general, many digital devices adopt R (red), G (green), and B (blue) as basic colors, and the color is expressed by a combination of RGB. Note that each of RGB changes from 0 to 255, and a color is expressed by a combination thereof. By the way, black is R
= G = B = 255, white is a combination of R = G = B = 0, and a combination of R = G = B having an intermediate value thereof is an achromatic color (gray). In this way, not only colors but also gradations can be expressed by RGB.
In this case, the calculation is performed as follows by applying the technique of Japanese Patent Application Laid-Open No. 2004-171115 (hereinafter referred to as a reference technical document). More specifically, in FIG. 1, each point p1, p2,
.., Pi,..., Pn are affected by the gradations of R, G, B (for example, 0 to 255). Mp is defined as in equation (5).

ここで、Ｑｉ（Ｒ，Ｇ，Ｂ）は、ＲＧＢそれぞれの独立の関数（Ｑｉ（Ｒ），Ｑｉ（Ｇ
），Ｑｉ（Ｂ））の線形結合であり、二値画像の場合、Ｑｉ（Ｒ＝０，Ｇ＝０，Ｂ＝０）
＝１であり、多値画像の場合、Ｑｉ（Ｒ，Ｇ，Ｂ）は１よりも大（Ｑｉ＞１）となる。こ
れらＱｉ（Ｒ），Ｑｉ（Ｇ），Ｑｉ（Ｂ）は、前述の参考技術文献によれば、階調（濃度
）が大きくなってある値に達すると飽和するほぼＳ字カーブを描く曲線となり、また、階
調（濃度）の変化に対してＲ（赤）が最も敏感であり、続いてＢ（青）、Ｇ（緑）の順と
なることが知られている。
これは、例えば、交通標識などにおいては注意を促す表示を行う際、色としては赤、青
の順で用いられ、緑はあまり用いられないことが多いことと一致している。このような注
意を促す度合いの大きさは誘導場の強さやエネルギであると考えられるが、それを根拠に
すると、Ｑｉの色による変化の違いは、上述の交通標識の事例と一致している。したがっ
て、この式（５）で用いられるＱｉを得るための関数は心理実験などによって決定するこ
とができる。 Here, Qi (R, G, B) is an independent function (Qi (R), Qi (G
), Qi (B)), and in the case of a binary image, Qi (R = 0, G = 0, B = 0)
= 1, and in the case of a multi-valued image, Qi (R, G, B) is larger than 1 (Qi> 1). These Qi (R), Qi (G), and Qi (B) are curves that draw a substantially S-shaped curve that saturates when the gradation (density) reaches a certain value according to the above-mentioned reference technical literature. It is also known that R (red) is the most sensitive to changes in gradation (density), followed by B (blue) and G (green).
This is consistent with the fact that, for example, when a warning sign is displayed on a traffic sign or the like, colors are used in the order of red and blue, and green is often not used. The magnitude of such a degree of attention is considered to be the strength and energy of the induction field. On the basis of this, the difference in change due to the color of Qi is consistent with the traffic sign example described above. . Therefore, the function for obtaining Qi used in this equation (5) can be determined by psychological experiments or the like.

従って、式（５）を用いることにより、多値画像の誘導場を計算することができる。そ
して、この誘導場が決まれば、上述の実施形態とほぼ同様の処理で、複雑度Ｃを計算し、
得られた複数の複雑度Ｃとポテンシャル値ｐとの対応関係をシグモイド関数で近似し、近
似誤差やパラメータ（レンジａ、オフセット値ｂ、パラメータｐ０、Ｔ）に基づいて顔画
像から感情を判定することができる。
このように、多値画像の顔画像を感情認識する場合は、多値画像から二値画像に変換す
る必要がないため、かかる変換時の情報落ちがない分、判定精度を向上させることができ
る。 Therefore, the induced field of the multi-valued image can be calculated by using Expression (5). And if this induction field is decided, the complexity C is calculated by the process similar to the above-mentioned embodiment,
The correspondence relationship between the obtained plurality of complexity C and potential value p is approximated by a sigmoid function, and emotion is determined from the face image based on approximation error and parameters (range a, offset value b, parameters p0, T). be able to.
As described above, when emotion recognition is performed on a face image of a multi-valued image, it is not necessary to convert the multi-valued image into a binary image. Therefore, the determination accuracy can be improved as there is no information loss during such conversion. .

但し、多値画像から感情認識する場合は、計算量が増えて計算時間が長くなってしまう
。このため、判定速度を優先する場合は、二値画像に変換して感情認識し、判定精度を優
先する場合は、多値画像のまま感情認識するというように、いずれの感情認識を行うかを
選択可能にしてもよい。 However, when emotions are recognized from a multi-value image, the amount of calculation increases and the calculation time becomes longer. For this reason, when priority is given to determination speed, emotion recognition is performed by converting to a binary image, and when priority is given to determination accuracy, which emotion recognition is to be performed, such as emotion recognition with a multi-value image. It may be selectable.

また、上述の実施形態では、顔画像の視覚の誘導場の分布を示す顔画像用関数として、
シグモイド関数を用いる場合について説明したが、これに限らず、顔画像の視覚の誘導場
を表現可能な他の関数を適用してもよい。
また、上述の実施形態では、予め得たそれぞれの感情に対応するシグモイド関数のうち
、近似したシグモイド関数に似ている一つのシグモイド関数を算出することによっていず
れの感情かを判定する場合について説明したが、これに限らず、近似したシグモイド関数
に似ている複数のシグモイド関数が存在した場合は、これら複数のシグモイド関数に対応
する複数の感情が混在した状態、例えば、「怒り」と「驚き」が混在した状態と判定する
ようにしてもよい。この場合、予め得たそれぞれの感情に対応するシグモイド関数のうち
、近似したシグモイド関数に似た一又は複数のシグモイド関数が存在するか否かをパラメ
ータの比較により行うようにすればよい。この構成によれば、感情間の狭間にある状態も
識別可能になる。 In the above-described embodiment, as a function for a face image indicating the distribution of the visual induction field of the face image,
Although the case of using the sigmoid function has been described, the present invention is not limited to this, and other functions that can express the visual induction field of the face image may be applied.
In the above-described embodiment, a case has been described in which one emotion is determined by calculating one sigmoid function that is similar to the approximate sigmoid function among sigmoid functions corresponding to each emotion obtained in advance. However, the present invention is not limited to this, and when there are a plurality of sigmoid functions similar to the approximate sigmoid function, a plurality of emotions corresponding to the plurality of sigmoid functions are mixed, for example, `` anger '' and `` surprise '' You may make it determine with the state which mixed. In this case, among the sigmoid functions corresponding to the emotions obtained in advance, whether or not there is one or a plurality of sigmoid functions similar to the approximated sigmoid function may be determined by parameter comparison. According to this configuration, a state between emotions can be identified.

また、本発明は、以上説明した本発明を実施するための処理手順が記述された制御プロ
グラムを作成し、この制御プログラムを電気通信回線を介してダウンロード可能にしたり
、そのようなプログラムを、磁気記録媒体、光記録媒体、半導体記録媒体といった、コン
ピュータに読み取り可能な記録媒体に記憶して配布する、といった態様でも実施され可能
である。
なお、本実施形態に係る感情認識装置をカメラ、スキャナ、プロジェクタ、テレビ、プ
リンタ等のあらゆる電子機器が備える形態で実施することも可能である。例えば、上述し
た感情認識装置を備えるカメラは、撮影対象の顔画像に対応する感情を認識し、楽しい感
情と認識した場合は明るい色調補正を施すといったように、認識した感情に基づいて画像
補正を実行することが可能となる。また、上述した感情認識装置を備えるプリンタは、印
刷対象画像に含まれる顔画像に対応する感情を認識し、楽しい感情と認識した場合は、楽
しげなフレーム（枠）を自動が画像に付加して印刷するといったように、認識した感情に
基づいて画像を加工し、印刷することが可能となる。 In addition, the present invention creates a control program in which the processing procedure for carrying out the present invention described above is described, and makes this control program downloadable via a telecommunication line, The present invention can also be implemented in such a manner that the program is stored and distributed in a computer-readable recording medium such as a recording medium, an optical recording medium, or a semiconductor recording medium.
It should be noted that the emotion recognition apparatus according to the present embodiment can be implemented in any electronic device such as a camera, scanner, projector, television, or printer. For example, a camera equipped with the emotion recognition device described above recognizes an emotion corresponding to the face image to be captured, and performs image correction based on the recognized emotion, such as performing bright color correction when it is recognized as a pleasant emotion. It becomes possible to execute. In addition, a printer including the emotion recognition device described above recognizes an emotion corresponding to a face image included in the image to be printed, and if it is recognized as a pleasant emotion, automatically adds a pleasant frame to the image. Thus, it is possible to process and print an image based on the recognized emotion.

視覚の誘導場を説明するためのディジタル画像の画素配列を示す図である。It is a figure which shows the pixel arrangement | sequence of the digital image for demonstrating the visual induction field. 視覚の誘導場の強さを求める際の遮蔽条件を説明する図である。It is a figure explaining the shielding conditions at the time of calculating | requiring the intensity | strength of a visual induction field. （Ａ）は文字「Ａ」の視覚の誘導場を遮蔽条件を考慮した求めた場合を示す図であり、（Ｂ）は視覚の誘導場を遮蔽条件を考慮せずに求めた場合を示す図である。(A) is a figure which shows the case where the visual guidance field of character "A" is calculated | required in consideration of shielding conditions, (B) is the figure which shows the case where the visual guidance field is calculated | required without considering shielding conditions. It is. シグモイド関数の一例を示す図である。It is a figure which shows an example of a sigmoid function. （Ａ）は怒りの表情の顔画像の一例を示す図であり、（Ｂ）は（Ａ）の誘導場を示す図であり、（Ｃ）は普通の表情の顔画像の一例を示す図であり、（Ｄ）は（Ｃ）の誘導場を示す図であり、（Ｅ）は喜びの表情の顔画像の一例を示す図であり、（Ｆ）は（Ｅ）の誘導場を示す図である。(A) is a figure which shows an example of the face image of an angry expression, (B) is a figure which shows the induction field of (A), (C) is a figure which shows an example of the face image of a normal expression. Yes, (D) is a diagram showing the guidance field of (C), (E) is a diagram showing an example of a facial image of a joyful expression, and (F) is a diagram showing the guidance field of (E). is there. （Ａ）は主要な顔部品を除く部分があまり除去されていない顔の二値画像を示す図であり、（Ｂ）は主要な顔部品を除く部分ができるだけ除去された顔の二値画像を示す図であり、（Ｃ）は（Ｂ）の図から主要な顔部品を抽出した二値画像を示す図である。(A) is a figure which shows the binary image of the face from which the part except main face parts was not removed much, (B) is the face binary image from which the part except main face parts was removed as much as possible. (C) is a figure which shows the binary image which extracted the main face components from the figure of (B). 感情毎の顔画像のシグモイド関数の一例を示す図である。It is a figure which shows an example of the sigmoid function of the face image for every emotion. （Ａ）は感情毎の範囲を示す図であり、（Ｂ）は感情毎のパラメータ範囲を示す図である。(A) is a figure which shows the range for every emotion, (B) is a figure which shows the parameter range for every emotion. 本実施形態に係る感情認識装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the emotion recognition apparatus which concerns on this embodiment. 感情認識装置の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow of an emotion recognition apparatus.

Explanation of symbols

１０…感情認識装置、１１…顔画像入力部（入力手段）、１２…顔領域抽出部（領域抽
出手段）、１３…感情判別部（感情判別手段）、１４…表示部（出力手段）、２１…誘導
場計算部（計算手段）、２２…誘導場分布評価部（関数算出手段）、２３…判定部（判定
手段）。 DESCRIPTION OF SYMBOLS 10 ... Emotion recognition apparatus, 11 ... Face image input part (input means), 12 ... Face area extraction part (area extraction means), 13 ... Emotion discrimination part (emotion discrimination means), 14 ... Display part (output means), 21 ... guidance field calculation part (calculation means), 22 ... guidance field distribution evaluation part (function calculation means), 23 ... determination part (determination means).

Claims

An input means for inputting a face image;
A “field” distributed around the face image, the strength of which depends on the distance from the face image,
An emotion recognition apparatus comprising: an emotion discriminating means for obtaining a visual guidance field having a larger value as it is closer to a face image, and discriminating an emotion corresponding to the face image based on the distribution information of the guidance field.

The emotion recognition apparatus according to claim 1,
The emotion discriminating means obtains the complexity of the closed surface of the equipotential line from the visual induction field of the face image, and obtains curve information that is a correspondence relationship between the complexity and the equipotential value as distribution information of the induction field. And recognizing emotion based on the curve information.

The emotion recognition apparatus according to claim 2,
The emotion discrimination means includes
Function calculating means for calculating a function approximating the curve information;
An image recognition apparatus comprising: an emotion determination unit that determines an emotion by comparing a function calculated by the function calculation unit and a function corresponding to each emotion obtained in advance.

The emotion recognition device according to claim 3,
The emotion recognition apparatus, wherein the function specifying means calculates a sigmoid function.

The emotion recognition device according to claim 4,
When the complexity is C, the potential value is p, the range is a, the offset value is b, the parameters are T, and p0,

Is a sigmoid function defined by
The emotion recognition apparatus characterized in that the emotion determination means determines an emotion based on a parameter of the sigmoid function.

The emotion recognition apparatus according to claim 5,
The emotion recognition apparatus, wherein the emotion determination unit determines an emotion by comparing the sigmoid function calculated by the function calculation unit with a sigmoid function corresponding to each emotion obtained in advance.

The emotion recognition apparatus according to claim 6,
The emotion determination unit calculates one or a plurality of sigmoid functions similar to the sigmoid function calculated by the function calculation unit among the sigmoid functions corresponding to each emotion obtained in advance, and there is one similar sigmoid function. If there is only one sigmoid function, it is determined that the emotion corresponds to that one sigmoid function, and if there are multiple similar sigmoid functions, it is determined that a plurality of emotions corresponding to the plurality of sigmoid functions are mixed. A feature emotion recognition device.

The emotion recognition apparatus according to any one of claims 1 to 7,
The emotion recognition apparatus, wherein the face image is a binary image.

An electronic device comprising the emotion recognition device according to claim 1.

A “field” distributed around the face image, the strength of which depends on the distance from the face image, and a visual induction field having a larger value is obtained as the face image is closer. An emotion recognition method characterized by recognizing an emotion corresponding to the face image.

Computer
An input means for inputting a face image;
A “field” distributed around the face image, the strength of which depends on the distance from the face image,
A control program for obtaining a visual induction field having a larger value as it is closer to a face image, and functioning as an emotion determination unit that determines an emotion corresponding to the face image based on distribution information of the induction field.

Computer
An input means for inputting a face image;
A “field” distributed around the face image, the strength of which depends on the distance from the face image,
A computer in which a control program for obtaining a visual guidance field having a larger value as it is closer to the face image and functioning as an emotion discrimination means for discriminating an emotion corresponding to the face image based on the distribution information of the guidance field is recorded A readable recording medium.