JP2018045350A

JP2018045350A - Device, program and method for identifying state in specific object of predetermined object

Info

Publication number: JP2018045350A
Application number: JP2016178294A
Authority: JP
Inventors: 剣明呉; Jiangming Wu; 矢崎　智基; Tomomoto Yazaki; 智基矢崎
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2016-09-13
Filing date: 2016-09-13
Publication date: 2018-03-22
Anticipated expiration: 2036-09-13
Also published as: JP6697356B2

Abstract

PROBLEM TO BE SOLVED: To provide a device that can further certainly identify a state of a predetermined object, the state that may have a different trend to occur for each object or for each type of the object.SOLUTION: The state identification device comprises score determination means that determines a score representing a state of object information input using an identification model, correct answer determination means that determines the state corresponding to a cluster to which the object information related to a specific object being a state identification object belongs as a correct answer for this object information, when these pieces of object information are classified into the cluster associated with each state on the basis of the score determined from the object information related to the specific object, and state determination means that inputs the score determined for one piece of object information related to the specific object to a specified identification model determined based on the score determined for the object information of the specific object and the correct answer determined for the object information, and that determines the state related to the one piece of object information from the output.SELECTED DRAWING: Figure 1

Description

本発明は、所定対象の状態を、当該所定対象に係る情報に基づいて識別する技術に関する。 The present invention relates to a technique for identifying a state of a predetermined target based on information related to the predetermined target.

従来、所定対象の状態、例えば人間の表情を、この所定対象に関する情報、例えば顔を撮影した写真画像を用いて識別する技術は、種々考案されてきた。 Conventionally, various techniques have been devised for identifying a state of a predetermined object, for example, a human facial expression using information related to the predetermined object, for example, a photographic image obtained by photographing a face.

特に、人間の表情認識の分野では、ポジティブ、ネガティブ、ニュートラルの３分類モデルや、Paul Ekman の７分類モデル（ニュートラル、喜び、嫌悪、怒り、サプライズ、悲しみ、恐怖）等を採用し、多くの研究者が表情認識技術の向上に取り組んでいる。 In particular, in the field of human facial expression recognition, many researches have been adopted, including positive, negative, and neutral three classification models, and Paul Ekman's seven classification models (neutral, joy, disgust, anger, surprise, sadness, and fear). Are working on improving facial expression recognition technology.

このような取り組みの一例として、特許文献１には、上記の分類モデルに基づく大量の顔画像データの特徴量を学習し、その特徴量に基づいて表情を識別する技術が開示されている。この技術では、特に、意図的に作った顔ではなく、自然な顔表情の学習データを効率良く収集し、認識精度の良い識別器を作成することを目的としている。 As an example of such an approach, Patent Literature 1 discloses a technique for learning a feature amount of a large amount of face image data based on the above classification model and identifying a facial expression based on the feature amount. In particular, this technology aims to efficiently collect learning data of natural facial expressions, not intentionally created faces, and create a discriminator with high recognition accuracy.

特開２０１１−１５０３８１号公報JP 2011-150381 A

しかしながら、特許文献１に記載されたような従来技術においては、具体的に表情を識別すべき個人の顔の表情を判定したとしても、その個人の有する表情の表出傾向によって、実際とは異なる判定結果が出ることも少なくなく、大きな問題となっている。 However, in the related art as described in Patent Document 1, even if the facial expression of an individual whose facial expression should be specifically identified is determined, the actual facial expression differs from the actual expression tendency of the individual. Judgment results often appear, which is a big problem.

すなわち、その個人の性格や、その個人の属する民族、居住地域等の違いによって、例えば、元来顔の表情が厳しい、怒りの感情の表現が控えめであるといったような、現れる表情に特定の傾向が存在することはよく知られている。これに対し、従来の表情の判定処理においては、例えば特許文献１の技術のように、大量の顔画像データの特徴量を学習した識別器を用いて処理を行っている。従って、このような表情識別対象の有する特定の傾向は、表情表出の一般的傾向からは逸脱していることも少なくないので、表情識別の失敗を起こす原因となってしまう。 In other words, depending on the personality of the individual, the ethnicity to which the individual belongs, the area of residence, etc., for example, the facial expression that is inherently severe, the expression of an angry emotion, etc. It is well known that there exists. On the other hand, in the conventional facial expression determination processing, for example, as in the technique of Patent Document 1, processing is performed using a discriminator that has learned a large amount of feature amount of face image data. Therefore, such a specific tendency of the facial expression identification target often deviates from the general tendency of facial expression expression, which causes a facial expression identification failure.

そこで、本発明は、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態をより確実に識別することが可能な装置、プログラム及び方法を提供することを目的とする。 Therefore, the present invention provides a device, a program, and a method that can more reliably identify a state of a predetermined object and a state in which a tendency to appear for each individual object or for each type of the object is different. The purpose is to provide.

本発明によれば、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該特定対象に係る対象情報が属するクラスタに対応する状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
を有する状態識別装置が提供される。 According to the present invention, a state of a predetermined target that is different in tendency to appear for each individual target or for each type of the target is identified based on target information related to the predetermined target. A state identification device,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. Correct determination means for determining a state corresponding to the cluster to which the target information related to the specific target belongs as a correct answer for the target information;
About one target information related to the specific target with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification device is provided that includes a state determination unit that inputs a determined score and determines a state related to the one target information in the specific target from the output.

この本発明による状態識別装置は、当該特定対象に係る複数の対象情報から決定されたスコアに基づいて、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類するクラスタリング手段を更に有することも好ましい。 The state identification device according to the present invention includes a clustering unit that classifies the plurality of pieces of target information into a plurality of clusters associated with each state based on a score determined from the plurality of pieces of target information related to the specific target. It is also preferable to have it.

また、本発明による状態識別装置の状態決定手段は、入力されたスコアから生成された特徴量のなす特徴量空間において各特徴量の点との距離が最大となる識別超平面を求める特定識別モデルを用いて、当該特定対象における当該１つの対象情報に係る状態を決定することも好ましい。 In addition, the state determination unit of the state identification device according to the present invention provides a specific identification model for obtaining an identification hyperplane that maximizes the distance from each feature amount point in the feature amount space formed by the feature amount generated from the input score. It is also preferable to determine a state related to the one target information in the specific target.

または、この状態決定手段は、入力されたスコアに対する重み付け係数を含む特定識別モデルであって、決定された正解に係る状態と、当該モデルの出力との誤差を減少させるように当該重み付け係数を更新する特定識別モデルを用いて、当該特定対象における当該１つの対象情報に係る状態を決定することも好ましい。 Alternatively, the state determination unit is a specific identification model including a weighting coefficient for the input score, and updates the weighting coefficient so as to reduce an error between the state related to the determined correct answer and the output of the model. It is also preferable to determine the state related to the one target information in the specific target using the specific identification model.

さらに、本発明による状態識別装置の一実施形態として、当該所定の対象は人間の顔であり、当該状態は顔の表情であって、当該対象情報は、人間の顔の画像に係る情報であり、
当該特定対象は、その表情を識別する対象である個人、又はその表情を識別する対象である人間の属する所定の属性集団であり、
状態決定手段は、当該個人又は当該属性集団に属する人間の顔の表情の画像情報に基づいて、当該画像情報に係る顔に現れた表情を識別することも好ましい。 Furthermore, as an embodiment of the state identification device according to the present invention, the predetermined target is a human face, the state is a facial expression, and the target information is information related to a human face image. ,
The specific object is a predetermined attribute group to which an individual who identifies the facial expression or a human who identifies the facial expression belongs,
It is also preferable that the state determination means identifies the facial expression that appears on the face related to the image information based on the image information of the facial expression of the person or the human belonging to the attribute group.

また、本発明による状態識別装置における、当該複数の対象情報の当該クラスタへの分類は、当該スコアのなす空間においてｋ平均（k-means）法を用いて実行されることも好ましい。 In the state identification device according to the present invention, the classification of the plurality of pieces of target information into the cluster is preferably performed using a k-means method in the space formed by the score.

さらに、本発明による状態識別装置のスコア決定手段において用いられる識別モデルは、畳み込み層を含む畳み込みニューラルネットワーク（Convolutional Neural Network）における学習モデルであることも好ましい。 Furthermore, the identification model used in the score determination means of the state identification device according to the present invention is preferably a learning model in a convolutional neural network including a convolutional layer.

本発明によれば、さらに、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該複数のクラスタの中心のうち、当該特定対象に係る１つの対象情報について決定されたスコアとの距離が最も小さい中心を有するクラスタに対応する状態を、該１つの対象情報に係る状態に決定する状態決定手段と
を有する状態識別装置が提供される。 Further, according to the present invention, a state of a predetermined target that has a different tendency to appear for each target or for each type of the target is based on target information related to the predetermined target. A state identification device for identifying,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. The state corresponding to the cluster having the center having the smallest distance from the score determined for one target information related to the specific target among the centers of the plurality of clusters is determined as the state related to the single target information. There is provided a state identification device having state determination means.

本発明によれば、また、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する装置に搭載されたコンピュータを機能させる評価推定プログラムであって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該特定対象に係る対象情報が属するクラスタに対応する状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
してコンピュータを機能させる状態識別プログラムが提供される。 According to the present invention, a state of a predetermined target that has a different tendency to appear for each individual target or for each type of the target is based on target information related to the predetermined target. An evaluation estimation program for causing a computer mounted on an identification device to function,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. Correct determination means for determining a state corresponding to the cluster to which the target information related to the specific target belongs as a correct answer for the target information;
About one target information related to the specific target with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification program is provided that causes a computer to function as state determination means for inputting a determined score and determining a state related to the one target information in the specific target from the output.

本発明によれば、さらに、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する装置に搭載されたコンピュータにおいて実施される状態識別方法であって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するステップと、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該特定対象に係る対象情報が属するクラスタに対応する状態を、該対象情報についての正解に決定するステップと、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定するステップと
を有する状態識別方法が提供される。 Further, according to the present invention, a state of a predetermined target that has a different tendency to appear for each target or for each type of the target is based on target information related to the predetermined target. A state identification method implemented in a computer mounted on an identification device,
Using an identification model determined based on a large number of target information, determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. Determining a state corresponding to the cluster to which the target information related to the specific target belongs as a correct answer for the target information;
About one target information related to the specific target with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification method is provided that includes inputting a determined score and determining a state related to the one target information in the specific target from the output.

本発明の状態識別装置、プログラム及び方法によれば、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態をより確実に識別することができる。 According to the state identification device, the program, and the method of the present invention, it is possible to more reliably identify a state of a predetermined target that has a different tendency to appear for each individual target or each type of the target. Can do.

本発明による状態識別装置の一実施形態における機能構成を示す機能ブロック図である。It is a functional block diagram which shows the function structure in one Embodiment of the state identification apparatus by this invention. 表情識別エンジンで構築・使用される表情識別モデルの一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of the facial expression identification model constructed | assembled and used with a facial expression identification engine. 表情スコア決定部（表情識別エンジン）におけるスコア決定処理の一実施例を示すテーブルである。It is a table which shows one Example of the score determination process in an expression score determination part (expression identification engine). 画像クラスタリング部及び正解表情決定部における処理の一実施例を示すテーブルである。It is a table which shows one Example of the process in an image clustering part and a correct face expression determination part. 状態決定部で使用される特定識別モデルの識別器における学習の一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of the learning in the discriminator of the specific discrimination model used by a state determination part. 特定識別モデルの識別器に採用されるＳＶＭにおける識別境界面を説明するための模式図である。It is a schematic diagram for demonstrating the identification boundary surface in SVM employ | adopted as the discriminator of a specific identification model. 本発明による状態識別装置の他の実施形態における機能構成を示す機能ブロック図である。It is a functional block diagram which shows the function structure in other embodiment of the state identification apparatus by this invention.

以下、本発明の実施形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［一実施形態における装置構成］
図１は、本発明による状態識別装置の一実施形態における機能構成を示す機能ブロック図である。 [Apparatus Configuration in One Embodiment]
FIG. 1 is a functional block diagram showing a functional configuration in an embodiment of a state identification device according to the present invention.

図１によれば、本実施形態の状態識別装置としてのスマートフォン１は、公知の構成を有するカメラ１０５を内蔵しており、このカメラ１０５を用いて、例えばユーザの顔を撮影してこの顔の写真画像（個人画像）を生成し、生成した写真画像に映ったユーザの顔の表情を識別して、タッチパネル・ディスプレイ（ＴＰ・ＤＰ）に識別結果を表示することができる。また、当然に、このような表情識別対象である顔の写真画像を、外部から通信ネットワークを介して取得して処理することも可能である。 According to FIG. 1, the smartphone 1 as the state identification device of the present embodiment incorporates a camera 105 having a known configuration. For example, the camera 105 is used to photograph the face of the user and It is possible to generate a photographic image (personal image), identify the facial expression of the user reflected in the generated photographic image, and display the identification result on the touch panel display (TP / DP). Naturally, it is also possible to acquire and process a photographic image of the face as a facial expression identification target from the outside via a communication network.

また、１つの応用例として、スマートフォン１のアプリケーション１２１、例えば対話ＡＩアプリが、この表情の識別結果を利用して、例えば対話しているユーザの感情（発話意図）を理解し、その応答内容を調整したり、当該ユーザとの対話内容をパーソナライズしたりすることも可能になる。 As one application example, the application 121 of the smartphone 1, for example, the conversation AI application, for example, understands the emotion (utterance intention) of the user who is interacting using the facial expression identification result, It is also possible to make adjustments and personalize the dialogue with the user.

さらに、スマートフォン１は、本実施形態において、表情識別のための表情識別エンジン１１２における学習用の大量の一般画像（様々な人間の顔の写真画像）を、画像管理サーバ２から取得することも好ましい。 Furthermore, in the present embodiment, the smartphone 1 also preferably acquires a large amount of general images (photo images of various human faces) for learning in the facial expression identification engine 112 for facial expression identification from the image management server 2. .

このような本発明による状態識別装置としてのスマートフォン１は、所定の対象（例えば人間の顔）の状態（例えば顔の表情）であって、個々の対象（例えば個々人）毎に又は当該対象の種別（例えば属する民族や居住地域）毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報（例えば顔の写真画像（に係る情報））に基づいて識別する装置であって、
（Ａ）多数の対象情報（写真画像）に基づいて決定された「識別モデル」を用いて、入力された対象情報（写真画像）からこの対象情報に係る対象の状態（顔の表情）を表すスコアを決定するスコア決定手段（表情スコア決定部１１２ｂ）と、
（Ｂ）所定の対象（人間の顔）のうちの状態識別対象である特定対象（例えば特定のユーザの顔）に係る複数の対象情報（写真画像）から決定されたスコアに基づき、これら複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、特定対象（特定のユーザの顔）に係る対象情報（写真画像）が属するクラスタに対応する状態を、この対象情報（写真画像）についての正解に決定する正解決定手段（正解表情決定部１１４）と、
（Ｃ）特定対象（特定ユーザの顔）に係る複数の対象情報（写真画像）について決定されたスコアと、上記の複数の対象情報（写真画像）について決定された正解とに基づいて決定された「特定識別モデル」に対して、特定対象（特定ユーザの顔）に係る１つの対象情報（写真画像）について決定されたスコアを入力し、その出力から、特定対象におけるこの１つの対象情報に係る状態（写真画像における特定ユーザの顔に現れた表情）を決定する状態決定手段（表情決定部１１５）と
を有することを特徴としている。 The smartphone 1 as the state identification device according to the present invention is in a state (for example, facial expression) of a predetermined target (for example, human face), and for each individual (for example, individual) or the type of the target An apparatus for identifying a state in which a tendency to be expressed is different for each (for example, an ethnic group or a residential area) based on target information (for example, a photographic image of a face (information on the face)) related to the predetermined target,
(A) Using an “identification model” determined based on a large amount of target information (photo image), the state of the target (facial expression) related to the target information is represented from the input target information (photo image) Score determining means for determining a score (facial expression score determining unit 112b);
(B) Based on scores determined from a plurality of pieces of target information (photo images) relating to a specific target (for example, a specific user's face) that is a state identification target among predetermined targets (human face), When the target information is classified into a plurality of clusters associated with each state, the state corresponding to the cluster to which the target information (photo image) related to the specific target (specific user's face) belongs is represented by this target information ( Correct answer determining means (correct face expression determining unit 114) for determining the correct answer for the photographic image),
(C) Determined based on a score determined for a plurality of pieces of target information (photo images) relating to a specific target (a face of a specific user) and a correct answer determined for the plurality of pieces of target information (photo images). A score determined for one piece of target information (photo image) related to a specific target (a face of a specific user) is input to the “specific identification model”, and the output relates to this single target information in the specific target. It is characterized by having state determining means (expression determining unit 115) for determining the state (expression appearing on the face of the specific user in the photographic image).

このように、スマートフォン１によれば、表情識別器によって決定されるスコアだけに頼って表情を識別するのではなく、特定対象（例えば特定のユーザの顔）の対象情報（例えば写真画像）に対し、クラスタリング処理を利用して正解を予め決定する。これにより、この特定対象（特定のユーザの顔）の識別に適合した「特定識別モデル」を利用することができ、結果として、この特定対象の状態（特定ユーザの顔の表情）をより確実に、高い精度で識別することが可能となるのである。 As described above, according to the smartphone 1, the facial expression is not identified based on only the score determined by the facial expression classifier, but the target information (for example, a photographic image) of the specific target (for example, a specific user's face). The correct answer is determined in advance using a clustering process. As a result, it is possible to use a “specific identification model” adapted to the identification of this specific target (a face of a specific user), and as a result, the state of the specific target (the facial expression of the specific user) can be more reliably determined. This makes it possible to identify with high accuracy.

ここで、本実施形態のように人間の顔の表情を識別する場合、識別すべき特定対象は、その表情を識別する対象である特定の個人（例えばスマートフォン１のユーザ）、又はその表情を識別する対象である人間の属する所定の属性集団、例えば特定の個人の属する民族や居住地域とすることができる。 Here, when identifying a facial expression of a human as in this embodiment, the specific target to be identified is a specific individual (for example, a user of the smartphone 1) who is the target for identifying the facial expression, or the facial expression. It can be a predetermined attribute group to which a person who is a target belongs, for example, an ethnic group or a residential area to which a specific individual belongs.

実際、国・民族別（地域別）や、年齢、性別等の個人属性別による表情識別結果の相違については、ポジティブ、ニュートラル、ネガティブ３分類モデルや、Ekman の７分類モデルといった、広く普及している表情カテゴリモデルを利用して、種々の研究がなされている。 In fact, the differences in facial expression recognition results by country / ethnic group (by region), and individual attributes such as age and gender are widely used, such as the positive, neutral, and negative 3 classification models and the Ekman 7 classification model. Various studies have been made using the facial expression category model.

例えば、研究文献：Jack, R. E.， Blais, C.， Scheepers, C.，Schyns, P. G.，及びCaldara, R. "Cultural confusions show that facial expressions are not universal" Current Biology, 19，２００９年，１５４３〜１５４８頁は、東アジア系の被験者がヨーロッパ系の被験者に比べて、恐怖を驚きに、嫌悪を怒りに混同させる表情をとる傾向を示す実験結果を示している。また、その原因として、ヨーロッパ系の被験者は、他人の表情を観察する際、目と口とを同程度見る、すなわち顔全体を見るのに対し、東アジア系の被験者は目に対してより注視を行うことを記載している。 For example, research literature: Jack, RE, Blais, C., Scheepers, C., Schyns, PG, and Caldara, R. "Cultural confusions show that facial expressions are not universal" Current Biology, 19, 2009, 1543-1548. Page shows experimental results showing that East Asian subjects tend to take a facial expression that confuses fear with surprise and hatred with anger compared to European subjects. Another reason for this is that European subjects look at the eyes and mouth to the same extent when observing other people's facial expressions, that is, look at the entire face, while East Asian subjects look more closely at the eyes. Is described.

さらに、研究文献：Yuki, M.，Maddux, W. W.，及びMasuda, T. "Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and the United States" Journal of Experimental Social Psychology, 43，２００７年，３０３〜３１１頁においては、日本人は、喜びや悲しみを示す顔の表情を評価する際、米国人に比べ口元よりも目元に対してより重点を置く傾向のあることが記載されている。 Further research literature: Yuki, M., Maddux, WW, and Masuda, T. "Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and In the United States "Journal of Experimental Social Psychology, 43, 2007, pp. 303-311, when evaluating facial expressions of joy and sadness, Japanese people are more concerned with their eyes than their mouths compared to Americans. It is described that there is a tendency to focus more.

これらの研究結果が示すような個人差や国・民族・個人地域差等が存在する人間の表情を判定する処理は、従来それにもかかわらず、大量の多種多様な顔画像データの特徴量を学習した識別器を用いて行われてきた。従って例えば、特定の個人の表情を識別するのに失敗する場合も少なくなかったのである。これに対し、スマートフォン１を用いれば、特定のユーザの顔に対し、クラスタリング処理を利用して正解を予め決定した上でより適合した識別器を構築するので、結局、この特定のユーザの表情をより確実に識別することが可能となるのである。 The process of determining human facial expressions that have individual differences, country / ethnic / personal area differences, etc., as shown in these research results, nevertheless, has learned the features of a large amount of diverse face image data. Has been performed using a discriminator. Therefore, for example, there are many cases in which identification of a specific individual's facial expression fails. On the other hand, if the smartphone 1 is used, a corrector is determined for a specific user's face in advance using a clustering process and a more suitable classifier is constructed. It becomes possible to identify more reliably.

なお、上記のスマートフォン１に具現されたような本発明による状態識別装置は、識別すべき所定対象の状態として、人間の顔の表情にのみ適用されるものではない。本発明によれば、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態であるならば、種々の状態が、より確実に識別可能となる。言い換えると、従来そのような異なる傾向故に識別結果に大きな誤差や間違いが発生していたのに対し、本発明によれば、そのような状態をより精度良く識別することができるのである。 Note that the state identification device according to the present invention as embodied in the above-described smartphone 1 is not applied only to facial expressions of a human face as a predetermined target state to be identified. According to the present invention, various states can be more reliably identified if the tendency to develop for each individual object or for each type of the object is different. In other words, in the past, a large error or error has occurred in the identification result due to such a different tendency, but according to the present invention, such a state can be identified with higher accuracy.

さらに、スマートフォン１に具現されたような本発明による状態識別装置は、当然にスマートフォンに限定されるものではない。例えば、この状態識別装置として、タブレット型コンピュータ、ノート型コンピュータ、パーソナルコンピュータ、セットトップボックス（セットトップボックス）、ロボット、デジタルサイネージ等を採用することもできる。例えば、カメラを内蔵したこれらの装置（端末）において、ユーザの表情を読み取ることによって、読み取った表情に係る情報に応じた応答を行ったり、読み取った表情に係る情報から、先に実施されたユーザに対するアクション等の評価を行ったりすることも可能となる。 Furthermore, the state identification device according to the present invention as embodied in the smartphone 1 is not limited to the smartphone as a matter of course. For example, a tablet computer, a notebook computer, a personal computer, a set top box (set top box), a robot, a digital signage, or the like can be adopted as the state identification device. For example, in these devices (terminals) with a built-in camera, the user's facial expression is read to make a response according to the information related to the read facial expression, or from the information related to the read facial expression, It is also possible to evaluate actions for the.

同じく図１の機能ブロック図に示すように、状態識別装置（表情識別装置）である本実施形態のスマートフォン１は、通信インタフェース部１０１と、一般画像データベース１０２と、個人画像データベース１０３と、表情データ記憶部１０４と、カメラ１０５と、タッチパネル・ディスプレイ（ＴＰ・ＤＰ）１０６と、プロセッサ・メモリとを有する。ここで、プロセッサ・メモリは、スマートフォン１のコンピュータを機能させるプログラムを実行することによって、状態識別機能（表情識別機能）を実現させる。 Similarly, as shown in the functional block diagram of FIG. 1, the smartphone 1 of this embodiment which is a state identification device (expression identification device) includes a communication interface unit 101, a general image database 102, a personal image database 103, and expression data. A storage unit 104, a camera 105, a touch panel display (TP / DP) 106, and a processor memory are included. Here, the processor memory realizes a state identification function (facial expression identification function) by executing a program that causes the computer of the smartphone 1 to function.

さらに、このプロセッサ・メモリは、機能構成部として、画像管理部１１１と、識別モデル学習部１１２ａ及び表情スコア決定部１１２ｂを有する表情識別エンジン１１２と、画像クラスタリング部１１３と、正解表情決定部１１４と、表情決定部１１５と、アプリケーション１２１とを有する。ここで、図１におけるスマートフォン１の機能構成部間を矢印で接続して示した処理の流れは、本発明による表情識別方法の一実施形態としても理解される。 Further, the processor memory includes an image management unit 111, a facial expression identification engine 112 having an identification model learning unit 112a and a facial expression score determination unit 112b, an image clustering unit 113, and a correct facial expression determination unit 114 as functional components. A facial expression determination unit 115 and an application 121. Here, the flow of processing shown by connecting the functional components of the smartphone 1 in FIG. 1 with arrows is understood as an embodiment of the facial expression identification method according to the present invention.

通信インタフェース部１０１は、表情識別エンジン１１２における学習用の大量の一般画像を、画像管理サーバ２からインターネット等の通信ネットワークを介して取得する。また、通信インタフェース部１０１は、本発明に係る表情識別プログラム（アプリ）や、当該表情識別結果を利用したサービスを提供可能なアプリケーション・プログラム、例えば対話ＡＩアプリ、をダウンロードすることもできる。 The communication interface unit 101 acquires a large number of general images for learning in the facial expression identification engine 112 from the image management server 2 via a communication network such as the Internet. The communication interface unit 101 can also download the facial expression identification program (application) according to the present invention and an application program that can provide a service using the facial expression identification result, for example, a dialogue AI application.

画像管理部１１１は、カメラ１０５から、又は外部の情報機器から通信インタフェース１０１を介して、表情識別対象である特定の個人（例えばスマートフォン１のユーザ）の個人画像を取得し、個人画像データベース１０３に保存し管理することができる。また、通信インタフェース１０１を介して取得された一般画像も、一般画像データベース１０２に保存し管理してもよい。例えば、個人画像データに対しては、（例えばユーザの指定入力に基づく）個人画像ラベルを付与して管理することも好ましい。 The image management unit 111 acquires a personal image of a specific individual who is a facial expression identification target (for example, a user of the smartphone 1) from the camera 105 or an external information device via the communication interface 101, and stores the personal image in the personal image database 103. Can be stored and managed. Also, general images acquired via the communication interface 101 may be stored and managed in the general image database 102. For example, it is also preferable to manage personal image data by assigning a personal image label (for example, based on a user's designated input).

表情識別エンジン１１２は、本実施形態において、識別モデル学習部１１２ａと、表情スコア決定部１１２ｂとを有する。このうち、識別モデル学習部１１２ａは、取得された大量の一般画像（様々な人間の顔の写真画像）を用いて学習を行い、表情識別モデルを構築・決定する。この表情識別モデルは、例えば、ディープラーニングの一種である畳み込みニューラルネットワーク（Convolutional Neural Network）を含む識別器とすることができ、一般的な万人向けの、又は平均的な若しくは共通する表情の傾向をもった人的集団に向けた識別器と捉えることができる。 In this embodiment, the facial expression identification engine 112 includes an identification model learning unit 112a and a facial expression score determination unit 112b. Among these, the identification model learning unit 112a performs learning using a large amount of acquired general images (photo images of various human faces), and constructs and determines a facial expression identification model. This facial expression identification model can be a discriminator including, for example, a convolutional neural network that is a kind of deep learning. It can be seen as a classifier for a human group with

一方、表情スコア決定部１１２ｂは、構築・決定された表情識別モデルを用いて、入力された対象情報からこの対象情報に係る対象の状態を表すスコアを決定する。 On the other hand, the facial expression score determination unit 112b uses the facial expression identification model constructed and determined to determine a score representing the state of the target related to the target information from the input target information.

図２は、表情識別エンジン１１２で構築・使用される表情識別モデルの一実施形態を示す模式図である。 FIG. 2 is a schematic diagram showing an embodiment of a facial expression identification model constructed and used by the facial expression identification engine 112.

図２に示すように、本実施形態において、表情識別エンジン１１２で構築・決定される表情識別モデルは、順伝播型の一種である畳み込みニューラルネットワーク（ＣＮＮ, ConvNet）に基づいて構成されている。このＣＮＮは複数の畳み込み層を含んでいるが、この畳み込み層は、動物の視覚野の単純細胞の働きを模しており、画像に対しカーネル（重み付け行列フィルタ）をスライドさせて特徴マップを生成する畳み込み処理を実行する層である。この畳み込み処理によって、画像の解像度を段階的に落としながら、エッジや勾配等の基本的特徴を抽出し、局所的な相関パターンの情報を得ることができる。 As shown in FIG. 2, in this embodiment, the facial expression identification model constructed and determined by the facial expression identification engine 112 is configured based on a convolutional neural network (CNN, ConvNet) which is a kind of forward propagation type. This CNN contains multiple convolution layers, which mimic the function of simple cells in the visual cortex of animals, and generate a feature map by sliding a kernel (weighted matrix filter) against the image. This is a layer for executing the convolution process. With this convolution process, it is possible to extract basic features such as edges and gradients while gradually reducing the resolution of the image, and obtain information on local correlation patterns.

また、各畳み込み層はプーリング層（サブサンプリング層）と対になっており、畳み込み処理とプーリング処理とが繰り返されることも好ましい。ここで、プーリング処理とは、動物の視覚野の複雑細胞の働きを模した処理であり、畳み込み層から出力される特徴マップ（一定領域内の畳み込みフィルタの反応）を最大値や平均値等でまとめ、調整パラメータを減らしつつ、局所的な平行移動不変性を確保する処理である。これにより、顔のサイズ、顔の向き、頭の傾き、帽子やサングラス等の付属物の付加といった画像における多少のズレによる見え方の違いを吸収し、本来の特徴を捉えた適切な特徴量を獲得することができる。 Each convolution layer is paired with a pooling layer (subsampling layer), and it is also preferable that the convolution process and the pooling process are repeated. Here, the pooling process is a process that mimics the function of complex cells in the visual cortex of animals. The feature map output from the convolution layer (convolution filter response in a certain area) is expressed as a maximum value or an average value. In summary, it is a process of ensuring local translational invariance while reducing adjustment parameters. This absorbs the difference in appearance due to some shift in the image, such as the size of the face, the orientation of the face, the inclination of the head, the addition of accessories such as hats and sunglasses, etc., and an appropriate feature amount that captures the original features Can be earned.

表情識別エンジン１１２の識別モデル学習部１１２ａ（図１）は、例えば一般画像データベース１０２（図１）に蓄積された大量の一般画像からなる大規模画像データセットを用いて、このＣＮＮに対し学習を行わせる。具体的には、この大規模画像データセットの画像をＣＮＮに入力し、ＣＮＮ内の複数の層のうち最終層を除いたいくつかの層分による多層ネットワークとしての反応を特徴量として出力し、この出力を正解と照合して、ニューロンの結合荷重やネットワーク構成のパラメータ等を生成・更新することにより学習を行う。 The identification model learning unit 112a (FIG. 1) of the facial expression identification engine 112 learns for this CNN using, for example, a large-scale image data set consisting of a large number of general images stored in the general image database 102 (FIG. 1). Let it be done. Specifically, an image of this large-scale image data set is input to the CNN, and a reaction as a multilayer network by several layers excluding the final layer among a plurality of layers in the CNN is output as a feature amount. This output is collated with a correct answer, and learning is performed by generating and updating neuron connection weights and network configuration parameters.

ここで、本実施形態では、入力する大規模画像データセットの画像を、ポジティブ、ニュートラル、ネガティブという表情に関する３つのカテゴリに予め分類しておき、この分類結果を正解として使用する。 Here, in the present embodiment, the images of the large-scale image data set to be input are classified in advance into three categories related to facial expressions of positive, neutral, and negative, and the classification result is used as a correct answer.

図３は、表情スコア決定部１１２ｂ（表情識別エンジン１１２）におけるスコア決定処理の一実施例を示すテーブルである。 FIG. 3 is a table showing an embodiment of score determination processing in the expression score determination unit 112b (expression identification engine 112).

ここで、本実施形態において、スコアは、スコア算定対象の画像を、上述したような表情識別モデルの識別器に入力した結果出力される値であり、ポジティブ、ニュートラル、ネガティブの３項目の各々についての値となっている。すなわち、スコア算定対象である１つの画像を入力することによって、これら３つのスコアの組が１つ出力されるのである。以下、このスコアの組を単にスコアと称呼する場合もある。なお、本実施形態のこれら３つのスコアは、各項目の度合いをレコード間で比較しやすいように、合計値が１となるように規格化されている。 Here, in the present embodiment, the score is a value output as a result of inputting the score calculation target image to the discriminator of the facial expression identification model as described above, and for each of the three items, positive, neutral, and negative. It is the value of. That is, by inputting one image as a score calculation target, one set of these three scores is output. Hereinafter, this set of scores may be simply referred to as a score. Note that these three scores of the present embodiment are standardized so that the total value is 1 so that the degree of each item can be easily compared between records.

図３（Ａ）には、ユーザＡ、ユーザＢ、・・・についての「実際にネガティブと判断される表情」の画像に対するスコアが示されている。ここで、ユーザＡは、表情の表出に関して一般的とされる通常タイプであり、実際、そのスコアもネガティブについての値（0.90）が最も大きくなっている。一方、ユーザＢは、「怒っても表情表出が控えめなタイプ」であり、それ故、そのスコアは、「実際にはネガティブ」であるにもかかわらずニュートラルについての値（0.65）が最も大きくなっている。 FIG. 3A shows scores for the images of “expressions that are actually determined to be negative” for user A, user B,... Here, the user A is a normal type that is commonly used for the expression of facial expressions, and in fact, the score for the negative (0.90) is the largest. On the other hand, the user B is a “type of expression that is unobtrusive even if angry”, and therefore the score for the neutral (0.65) is the largest even though the score is “actually negative”. It has become.

ちなみに、この表情識別モデルの識別器だけを用いた表情判定を行うとすると、上記３つのスコアのうちで最も大きい値のものに対応するカテゴリが、識別結果として出力される。例えば、図３（Ａ）のユーザＡでは、表情はネガティブであると識別されるが、ユーザＢではニュートラルであると識別されてしまう。 By the way, if facial expression determination is performed using only the discriminator of this facial expression identification model, the category corresponding to the largest value among the above three scores is output as the identification result. For example, user A in FIG. 3A is identified as having a negative facial expression, but user B is identified as being neutral.

次いで、図３（Ｂ）には、ユーザＡ、ユーザＣ、・・・についての「実際にニュートラルと判断される表情」の画像に対するスコアが示されている。ここで、ユーザＡは、上述したように通常タイプであり、実際、そのスコアもニュートラルについての値（0.95）が最も大きくなっている。一方、ユーザＣは、「日頃から表情の厳しいタイプ」であり、それ故、そのスコアは、「実際にはニュートラル」であるにもかかわらずネガティブについての値（0.50）が最も大きくなっている。 Next, FIG. 3B shows scores for the images of “expressions that are actually judged to be neutral” for the users A, C,. Here, the user A is a normal type as described above, and in fact, the score of the neutral (0.95) is the largest. On the other hand, the user C is “a type whose expression is severe from day to day”, and therefore, the negative value (0.50) is the largest even though the score is “actually neutral”.

さらに、図３（Ｃ）には、ユーザＡ、ユーザＤ、・・・についての「実際にポジティブと判断される表情」の画像に対するスコアが示されている。ここで、ユーザＡは、上述したように通常タイプであり、実際、そのスコアもポジティブについての値（1.00）が最も大きくなっている。一方、ユーザＤは、「笑っても表情表出が控えめなタイプ」であり、それ故、そのスコアは、「実際にはポジティブ」であるにもかかわらずニュートラルについての値（0.50）が最も大きくなっている。 Further, FIG. 3C shows scores for the images of “expressions that are actually determined to be positive” for the users A, D,. Here, the user A is the normal type as described above, and the score (1.00) for positive is actually the largest. On the other hand, the user D is “a type with a modest expression of expression even if he / she laughs”, and therefore the score for the neutral (0.50) is the largest even though the score is “actually positive”. It has become.

以上、ユーザＡ〜Ｄについての実施例を用いて説明したように、表情スコア決定部１１２ｂ（表情識別エンジン１１２）において決定されたスコアは、表情表出傾向の個人差によって、本来あるべき値からずれてしまう場合のあることが理解される。すなわち、当該個人差によっては、正確な表情の識別が行えないことも少なくない。 As described above, as described with reference to the examples of the users A to D, the score determined by the facial expression score determination unit 112b (the facial expression identification engine 112) is based on the individual value of the facial expression expression tendency. It will be understood that there may be deviations. In other words, depending on the individual difference, it is often the case that accurate facial expression identification cannot be performed.

図１の機能ブロック図に戻って、画像クラスタリング部１１３は、特定対象（例えば特定のユーザの顔）に係る複数の対象情報（例えば写真画像）から決定されたスコアに基づいて、これら複数の対象情報を、各状態（例えば顔の表情）に対応付けられた複数のクラスタに分類する。ここで、このクラスタへの分類は、スコアのなす空間においてｋ平均（k-means）法を用いて実行されてもよい。ちなみに、クラスタリング対象となる複数の写真画像は、例えば、スマートフォン１のユーザが当該端末の使用を開始し自身の写真画像を所定量蓄積した段階での、これらの蓄積された写真画像とすることができる。 Returning to the functional block diagram of FIG. 1, the image clustering unit 113 selects the plurality of targets based on scores determined from a plurality of pieces of target information (for example, photographic images) related to the specific target (for example, a specific user's face). The information is classified into a plurality of clusters associated with each state (for example, facial expression). Here, the classification into clusters may be performed using a k-means method in a space formed by scores. Incidentally, the plurality of photographic images to be clustered may be, for example, these accumulated photographic images when the user of the smartphone 1 starts using the terminal and accumulates a predetermined amount of photographic images of the terminal. it can.

また、正解表情決定部１１４は、各状態（顔の表情）に対応付けられた複数のクラスタに分類された特定対象（特定のユーザの顔）に係る対象情報（写真画像）が属するクラスタに対応する状態（顔の表情）を、この対象情報（写真画像）についての正解に決定する。 In addition, the correct facial expression determination unit 114 corresponds to a cluster to which target information (photo image) related to a specific target (a face of a specific user) classified into a plurality of clusters associated with each state (facial expression) belongs. The state to be performed (facial expression) is determined as the correct answer for the target information (photo image).

図４は、画像クラスタリング部１１３及び正解表情決定部１１４における処理の一実施例を示すテーブルである。 FIG. 4 is a table showing an example of processing in the image clustering unit 113 and the correct facial expression determination unit 114.

図４（Ａ）には、図３（Ａ）で説明した「怒っても表情表出が控えめなタイプ」であるユーザＢについてのクラスタリング及び正解表情決定処理の結果が示されている。同図によれば、決定されたスコアからニュートラル、ニュートラル及びポジティブと判定されたユーザＢの顔画像データレコード（群）として、それぞれ
（ａ１）レコード：B-neutral-001、B-neutral-002、B-neutral-003、・・・、
（ａ２）レコード：B-neutral-101、B-neutral-102、B-neutral-103、・・・及び
（ａ３）レコード：B-positive-001、B- positive-002、B- positive-003、・・・
が挙げられている。この図４（Ａ）のテーブルでは、これらのレコードの各々について、決定された３つのスコアの値と、これらのレコードのスコアに基づいて生成されたクラスタのうちで当該レコードの属しているクラスタのＩＤ（識別子）とが、対応付けて記録されている。 FIG. 4A shows the result of the clustering and correct facial expression determination processing for user B who is the “modest expression of expression even if angry” described with reference to FIG. According to the figure, as face image data records (groups) of user B determined as neutral, neutral and positive from the determined score, (a1) records: B-neutral-001, B-neutral-002, B-neutral-003 ...
(A2) Record: B-neutral-101, B-neutral-102, B-neutral-103, ... and (a3) Record: B-positive-001, B-positive-002, B-positive-003, ...
Is listed. In the table of FIG. 4A, for each of these records, among the clusters generated based on the determined three score values and the scores of these records, the cluster to which the record belongs is shown. An ID (identifier) is recorded in association with each other.

また、図４（Ｂ）には、図３（Ｂ）で説明した「日頃から表情の厳しいタイプ」であるユーザＣについてのクラスタリング及び正解表情決定処理の結果が示されている。同図によれば、決定されたスコアからネガティブ、ネガティブ、ポジティブ及びニュートラルと判定されたユーザＣの顔画像データレコード（群）として、それぞれ
（ｂ１）レコード：C-negative-001、C-negative-002、C-negative-003、・・・、
（ｂ２）レコード：C-negative-101、C-negative-102、C-negative-103、・・・、
（ｂ３）レコード：C-positive-001、C-positive-002、・・・及び
（ｂ４）レコード：C-neutral-001、・・・
が挙げられている。この図４（Ｂ）のテーブルでも、これらのレコードの各々について、決定された３つのスコアの値と、これらのレコードのスコアに基づいて生成されたクラスタのうちで当該レコードの属しているクラスタのＩＤ（識別子）とが、対応付けて記録されている。 FIG. 4B shows the result of the clustering and correct facial expression determination processing for the user C who is the “daily severe expression type” described in FIG. 3B. According to the figure, the face image data records (group) of the user C determined as negative, negative, positive and neutral from the determined scores are (b1) records: C-negative-001 and C-negative-, respectively. 002, C-negative-003, ...
(B2) Records: C-negative-101, C-negative-102, C-negative-103,.
(B3) Record: C-positive-001, C-positive-002, ... and (b4) Record: C-neutral-001, ...
Is listed. Also in the table of FIG. 4B, for each of these records, among the clusters generated based on the three score values determined and the scores of these records, the cluster to which the record belongs An ID (identifier) is recorded in association with each other.

ここで、図４（Ａ）に示したユーザＢのレコードのテーブル、及び図４（Ｂ）に示したユーザＣのレコードのテーブルにおいて、クラスタＩＤ：１，２，３の付されたクラスタは、これらのレコードについて決定されたスコアのなすスコア空間において、k-means法を用いて形成されている。具体的には、典型的な手順として、
（ア）スコア空間における各点（レコード）に対しランダムにクラスタを割り当てる。ここで、割り当てるクラスタの数は、表情識別のために採用する表情の分類モデルにおけるカテゴリの数であり、３分類モデルを採用する本実施形態では３つ（k＝3）となる。 Here, in the record table of user B shown in FIG. 4A and the record table of user C shown in FIG. The score space formed by the scores determined for these records is formed using the k-means method. Specifically, as a typical procedure,
(A) A cluster is randomly assigned to each point (record) in the score space. Here, the number of clusters to be assigned is the number of categories in the facial expression classification model employed for facial expression identification, and is three (k = 3) in the present embodiment employing the three classification model.

（イ）次いで、各クラスタにおける重心を算出する。
（ウ）各点（レコード）の所属するクラスタを、当該点から最も近い重心のクラスタとする。
（エ）上記（ウ）の処理を行っても、全ての点について、属するクラスタに変更が生じなければ、クラスタリングを終了する。一方、変更が生じた場合は、再度、上記（ウ）の処理を実行する。 (A) Next, the center of gravity in each cluster is calculated.
(C) The cluster to which each point (record) belongs is set as the cluster of the center of gravity closest to the point.
(D) Even if the process (C) is performed, if there is no change in the cluster to which all points belong, clustering is terminated. On the other hand, if a change occurs, the process (c) is performed again.

なお、上記（ア）〜（エ）の処理が終了しても、この段階ではまだ、分類されたクラスタは、表情識別の分類カテゴリ（ポジティブ、ネガティブ、ニュートラル）に対応付けられていない。これらのクラスタにカテゴリ（ポジティブ、ネガティブ、ニュートラル）をラベル付けする１つの手法として、例えば、各クラスタに属するレコードにおけるカテゴリ毎のスコアの平均値を算出し、全クラスタの中で、この平均値が最も高いクラスタに対して、この平均値に係るカテゴリをラベル付けする手法が挙げられる。 Even if the above processes (a) to (d) are completed, the classified clusters are not yet associated with the facial expression identification classification categories (positive, negative, neutral) at this stage. As one method of labeling these clusters with categories (positive, negative, neutral), for example, an average value of scores for each category in records belonging to each cluster is calculated, and this average value is calculated among all clusters. There is a technique for labeling the category related to the average value for the highest cluster.

具体的には、例えば、図４（Ｂ）における
（ｂ１）レコード：C-negative-001、C-negative-002、C-negative-003、・・・、
には、ＩＤ＝１のクラスタ（以後、クラスタ１と略称）が対応付けられている。ここで、これらのレコード（ｂ１）においては、ネガティブについてのスコアの平均値が、他のレコード（ｂ２）、（ｂ３）及び（ｂ４）におけるネガティブについてのスコアの平均値のいずれよりも大きく、最大となっている。従って、レコード（ｂ１）の属するクラスタ１にはネガティブのラベルが付与される。また、
（ｂ２）レコード：C-negative-101、C-negative-102、C-negative-103、・・・、
には、クラスタ２が対応付けられている。ここで、これらのレコード（ｂ２）においては、ニュートラルについてのスコアの平均値が、他のレコード（ｂ１）、（ｂ３）及び（ｂ４）におけるニュートラルについてのスコアの平均値のいずれよりも大きく、最大となっている。従って、レコード（ｂ２）の属するクラスタ２にはニュートラルのラベルが付与される。 Specifically, for example, (b1) record in FIG. 4B: C-negative-001, C-negative-002, C-negative-003,.
Is associated with a cluster with ID = 1 (hereinafter abbreviated as cluster 1). Here, in these records (b1), the average score value for the negative is larger than any of the average score values for the negative in the other records (b2), (b3), and (b4). It has become. Therefore, a negative label is assigned to the cluster 1 to which the record (b1) belongs. Also,
(B2) Records: C-negative-101, C-negative-102, C-negative-103,.
Is associated with cluster 2. Here, in these records (b2), the average score for the neutral is larger than any of the average scores for the neutral in the other records (b1), (b3), and (b4). It has become. Accordingly, a neutral label is assigned to the cluster 2 to which the record (b2) belongs.

さらに、
（ｂ３）レコード：C-positive-001、C-positive-002、・・・及び
（ｂ４）レコード：C-neutral-001、・・・
には、クラスタ３が対応付けられている。ここで、これらのレコード（ｂ３）及び（ｂ４）においては、ポジティブについてのスコアの平均値が、他のレコード（ｂ１）及び（ｂ２）におけるポジティブについてのスコアの平均値のいずれよりも大きく、最大となっている。従って、レコード（ｂ３）及び（ｂ４）の属するクラスタ３にはポジティブのラベルが付与される。 further,
(B3) Record: C-positive-001, C-positive-002, ... and (b4) Record: C-neutral-001, ...
Is associated with cluster 3. Here, in these records (b3) and (b4), the average value of the positive scores is larger than any of the average score values of the positive records in the other records (b1) and (b2). It has become. Therefore, a positive label is assigned to the cluster 3 to which the records (b3) and (b4) belong.

また、図４（Ａ）に記録されたクラスタ１〜３についても、上記と同様の手法をもって、それぞれネガティブ、ニュートラル及びポジティブのラベルが付与される。 Also, the clusters 1 to 3 recorded in FIG. 4A are given negative, neutral, and positive labels, respectively, by the same method as described above.

以上説明したように、画像クラスタリング部１１３によれば、レコードのスコアだけから判断するとニュートラルであるにもかかわらず、実際にはネガティブな表情でありがちなユーザＢにおいて、これらのレコードの属するクラスタに対し、本来の（正解とされる）カテゴリであるネガティブのラベルを付与することが可能となっている。また、レコードのスコアだけから判断するとネガティブであるにもかかわらず、実際にはニュートラルな表情であることも少なくないユーザＣにおいて、これらのレコードの属するクラスタに対し、本来の（正解とされる）カテゴリであるニュートラルのラベルを付与することも可能となっている。 As described above, according to the image clustering unit 113, in the user B who tends to have a negative expression in practice even though it is neutral based on only the score of the record, the cluster to which these records belong is determined. , It is possible to give a negative label which is the original (correct) category. In addition, although it is negative when judging only from the score of the record, the user C who often has a neutral facial expression actually has an original (correct) answer to the cluster to which these records belong. It is also possible to give a neutral label as a category.

すなわち、以上に説明したクラスタリング処理を行うことによって、表情表出傾向の個人差に起因するスコア判定の誤差を修正可能な表情カテゴリのラベリングを行うことも可能となっている。また、これを受けて、正解表情決定部１１４は、各レコード（ユーザの写真画像に係る情報）について、当該レコードの属するクラスタに付与されたラベルのカテゴリを、「正解」に決定することができるのである。 That is, by performing the clustering process described above, it is possible to label facial expression categories that can correct score determination errors caused by individual differences in facial expression expression tendency. In response to this, the correct facial expression determination unit 114 can determine, for each record (information related to the user's photographic image), the category of the label assigned to the cluster to which the record belongs as “correct”. It is.

なお、分類したクラスタに対するラベリング処理は、当然、上述した手法に限定されるものではない。例えば、クラスタを表現するベクトルと、各表情カテゴリを代表する代表ベクトルとのコサイン類似度に基づいてラベルを決定してもよい。または、所定カテゴリを有する点（レコード）からのユークリッド距離が最短となる中心値を有するクラスタに対し、当該所定カテゴリのラベルを付与することも可能である。 Of course, the labeling process for the classified clusters is not limited to the above-described method. For example, a label may be determined based on the cosine similarity between a vector representing a cluster and a representative vector representing each facial expression category. Alternatively, it is also possible to give a label of the predetermined category to a cluster having a center value at which the Euclidean distance from a point (record) having the predetermined category is the shortest.

さらに、図３及び図４に示した実施例では、表情について３分類モデルを採用しているが、当然これに限定されるものではなく、例えば、Paul Ekman の７分類モデルや、これらのモデルよりもさらに細分化された感情分類モデルを適用してもよい。例えば、分類カテゴリとして、Paul Ekmanモデルの７つに加え、面白さ、軽蔑、満足、困惑、興奮、罪悪感、功績に基づく自負心、安心、納得感、喜び、及び恥を採用したものを使用することも可能である。いずれにしても、分類カテゴリの数だけクラスタが生成され、これらのクラスタにそれぞれ、当該分類カテゴリのラベルが付与される。 Further, in the embodiment shown in FIG. 3 and FIG. 4, a three-class model is adopted for facial expressions. However, the present invention is not limited to this. For example, Paul Ekman's seven-class model and these models are used. Alternatively, a more detailed emotion classification model may be applied. For example, in addition to the seven categories of Paul Ekman models, classification categories that use fun, contempt, satisfaction, embarrassment, excitement, guilt, pride based on achievement, security, persuasion, joy, and shame are used. It is also possible to do. In any case, as many clusters as the number of classification categories are generated, and a label of the classification category is given to each of these clusters.

図１の機能ブロック図に戻って、表情決定部１１５は、
（ａ）特定対象（例えば特定ユーザの顔）に係る複数の対象情報（例えば写真画像）について決定されたスコアと、
（ｂ）当該複数の対象情報（写真画像）について決定された「正解」と
に基づいて決定された「特定識別モデル」に対して、特定対象（特定ユーザの顔）に係る１つの対象情報（写真画像）について決定されたスコアを入力し、その出力から、特定対象におけるこの１つの対象情報に係る状態（写真画像における特定ユーザの顔に現れた表情）を決定する。 Returning to the functional block diagram of FIG.
(A) a score determined for a plurality of pieces of target information (for example, photographic images) relating to a specific target (for example, the face of a specific user);
(B) One target information related to a specific target (a face of a specific user) with respect to a “specific identification model” determined based on the “correct answer” determined for the plurality of target information (photo images) ( The score determined for the photographic image is input, and the state related to the one target information in the specific target (expression that appears on the face of the specific user in the photographic image) is determined from the output.

このように、表情決定部１１５で決定された、特定対象の対象情報に係る状態（特定ユーザの写真画像の顔に現れた表情）の情報は、この対象情報（写真画像）と対応付けて表情データ記憶部１０４に記録されてもよく、また、アプリケーション１２１へ出力されて、所定のアプリケーション・プログラムによって表情判断データとして処理されてもよい。また、このアプリケーション・プログラムでの処理を介して、タッチパネル・ディスプレイ１０６に表示されてもよく、通信インタフェース部１０１を通して外部に送信されてもよい。 As described above, the information on the state (expression appearing on the face of the specific user's photographic image) related to the target information of the specific target determined by the facial expression determination unit 115 is associated with the target information (photo image). It may be recorded in the data storage unit 104, or may be output to the application 121 and processed as facial expression determination data by a predetermined application program. Further, it may be displayed on the touch panel display 106 through processing by the application program, or may be transmitted to the outside through the communication interface unit 101.

ここで、この状態決定部１１５の「特定識別モデル」は、例えば、サポートベクタマシン（Support Vector Machine）による識別器のモデルであって、入力されたスコアから生成された特徴量のなす特徴量空間において各特徴量の点との距離が最大となる識別超平面を求めるモデルであってもよい。または、その他の学習有りの機械学習、例えばニューラルネットワークによる識別器のモデルとすることもできる。 Here, the “specific identification model” of the state determination unit 115 is, for example, a model of a discriminator by a support vector machine (Support Vector Machine), and a feature amount space formed by a feature amount generated from an input score. A model for obtaining an identification hyperplane that maximizes the distance to each feature point in FIG. Alternatively, a machine learning with other learning, for example, a classifier model using a neural network may be used.

図５は、状態決定部１１５で使用される特定識別モデルの識別器における学習の一実施形態を示す模式図である。また、図６は、特定識別モデルの識別器に採用されるＳＶＭにおける識別境界面を説明するための模式図である。 FIG. 5 is a schematic diagram showing an embodiment of learning in the classifier of the specific identification model used in the state determination unit 115. FIG. 6 is a schematic diagram for explaining an identification boundary surface in the SVM employed in the classifier of the specific identification model.

図５によれば、状態決定部１１５は、図４（Ａ）及び図４（Ｂ）に示したような、特定ユーザについての（スコアの決定された）各レコードに対し、所属するクラスタのラベルを正解として紐づけたレコードデータを、特徴量化して特定識別モデルの識別器に入力し、当該特定識別モデルの学習・更新を行っている。ここで、これらの正解付きのレコードデータは、その正解のカテゴリ別に、ネガティブログ、ニュートラルログ及びポジティブログの３種に区分されている。 According to FIG. 5, the state determination unit 115 applies the label of the cluster to which each record (score is determined) for a specific user as shown in FIGS. 4 (A) and 4 (B). The record data associated with the correct answer is converted into features and input to the discriminator of the specific identification model, and the specific identification model is learned and updated. Here, the record data with the correct answer is classified into three types of negative log, neutral log, and positive log according to the category of the correct answer.

また、この特定識別モデルの識別器は、本実施形態においてＳＶＭを採用している。ＳＶＭは、現在開発されている数多くの機械学習手法の中でも汎用性と認識性能の両方が優れているとされる手法の１つであり、未学習データに対して高い識別性能を発揮することが可能となっている。 Further, the classifier of this specific identification model adopts SVM in this embodiment. SVM is one of the methods that are considered to be excellent in both versatility and recognition performance among many machine learning methods that are currently developed, and can exhibit high discrimination performance against unlearned data. It is possible.

このＳＶＭを採用した識別器では、図６に示すように、例えば、ネガティブ判定を行う場合、特徴量空間において、ネガティブログのレコード点には正解ラベルを付与して、その他のレコード点には不正解ラベルを付与する。次いで、各レコード点からの距離が最大となる面（識別境界面）を決定して、以後、ネガティブ判定に使用する。同様の処理をニュートラル判定やポジティブ判定にも行い、結局、全てのログの各フィールドの変数を入力して集計処理を行い、ＳＶＭ識別関数の判定係数を決定する。 In the discriminator adopting this SVM, as shown in FIG. 6, for example, in the case of performing a negative determination, in the feature amount space, a correct log label is assigned to the record point of the negative log, and the other record points are not. Give the correct answer label. Next, the surface (identification boundary surface) having the maximum distance from each record point is determined, and thereafter used for negative determination. Similar processing is performed for neutral determination and positive determination. Eventually, a variable for each field of all logs is input and aggregation processing is performed to determine a determination coefficient of the SVM discrimination function.

状態決定部１１５では、このように構築された特定識別モデルのＳＶＭ識別器に対し、例えば、識別対象となる特定ユーザの写真画像におけるポジティブ、ニュートラル及びネガティブについての各スコアを入力し、すなわち上記のＳＶＭ識別関数に入力して、この特定ユーザに適した表情識別結果を出力する。 The state determination unit 115 inputs, for example, each score for positive, neutral, and negative in the photographic image of the specific user to be identified to the SVM classifier of the specific identification model constructed in this way, that is, the above-mentioned Input to the SVM identification function and output a facial expression identification result suitable for this specific user.

例えば、図３及び図４の実施例で説明した、「怒っても表情表出が控えめなタイプ」のユーザＢについて学習を行った特定識別モデルのＳＶＭ識別器に対し、このユーザＢの写真画像についての３つのスコアであってニュートラルが最大であるスコアを入力することによって、正解であるネガティブとの識別結果を出力することも可能となる。また、「日頃から表情の厳しいタイプ」のユーザＣについて学習を行った特定識別モデルのＳＶＭ識別器に対し、このユーザＢの写真画像についての３つのスコアであってネガティブが最大であるスコアを入力することによって、正解であるニュートラルとの識別結果を出力することも可能となるのである。 For example, with respect to the SVM classifier of a specific identification model that has been learned for the user B who has been “angry and confidently expresses facial expression” as described in the embodiment of FIGS. It is also possible to output a result of discrimination from negative which is a correct answer by inputting a score having the maximum neutral among the three scores for. In addition, for the SVM classifier of the specific identification model that has been learned for the user C who is “a type whose expression is severe from day to day”, three scores for the photograph image of the user B and the score with the greatest negative are input. By doing so, it becomes possible to output the identification result from the neutral which is the correct answer.

このように、状態決定部１１５での状態決定処理によれば、特定対象（例えば特定のユーザの顔）の対象情報（例えば写真画像）に対し、クラスタリング処理から決定された正解を用いて学習した、この特定対象（特定のユーザの顔）の識別に適合した特定識別モデルを利用することができる。また、その結果、この特定対象の状態（特定ユーザの顔の表情）をより高い精度で識別することが可能となるのである。 As described above, according to the state determination process in the state determination unit 115, learning is performed on target information (for example, a photographic image) of a specific target (for example, a face of a specific user) using the correct answer determined from the clustering process. A specific identification model suitable for identification of this specific target (a face of a specific user) can be used. As a result, the state of the specific target (facial expression of the specific user) can be identified with higher accuracy.

なお、特定識別モデルの識別器は、本実施形態において、特定ユーザに適合したものとなっているが、当然これに限定されるものではない。例えば、表情識別対象として、所定の属性集団、例えばある民族や、所定の居住地域の住民等を採用し、このような対象に特化した特定識別モデルの識別器を構成することもできる。なお、この場合、特定識別モデルの識別器への入力は、このような表情識別対象となる属性集団に属する人間の顔についてのスコア（レコード）となる。 In addition, although the discriminator of the specific identification model is adapted to the specific user in this embodiment, it is naturally not limited to this. For example, a predetermined attribute group, for example, a certain ethnic group, a resident in a predetermined residential area, or the like may be adopted as a facial expression identification target, and a specific identification model classifier specialized for such a target may be configured. In this case, the input to the classifier of the specific identification model is a score (record) for a human face belonging to such an attribute group as a facial expression identification target.

また、特定識別モデルの識別器に採用される機械学習手法も、上述したＳＶＭに限定されるものではない。例えば、ニューラルネットワークを採用した識別器とすることも可能である。この場合、このニューラルネットワークは、入力されたスコアに対する重み付け係数を含み、決定された正解に係る状態（表情のカテゴリ）と、当該モデルの出力との誤差を減少させるように重み付け係数を更新するタイプのものとすることができる。 Further, the machine learning method adopted for the discriminator of the specific discrimination model is not limited to the above-described SVM. For example, a discriminator employing a neural network can be used. In this case, the neural network includes a weighting factor for the input score, and updates the weighting factor so as to reduce an error between the determined state of the correct answer (expression category) and the output of the model. Can be.

さらに、状態決定部１１５での状態決定処理は、以上に述べた特定識別モデルを用いず、より簡易な実装の下で実施することも可能である。例えば、画像クラスタリング部１１３で生成された複数のクラスタの中心のうち、特定対象（特定ユーザの顔）に係る１つの対象情報（写真画像）について決定されたスコアとの距離が最も小さい中心を有するクラスタに付与されたラベルの状態（表情のカテゴリ）を、この１つの対象情報（写真画像）に係る状態（表情のカテゴリ）に決定してもよい。 Furthermore, the state determination process in the state determination unit 115 can be performed under a simpler implementation without using the specific identification model described above. For example, among the centers of a plurality of clusters generated by the image clustering unit 113, the center having the smallest distance from the score determined for one piece of target information (photo image) related to the specific target (specific user's face) The state (expression category) of the label assigned to the cluster may be determined as the state (expression category) related to this one piece of target information (photo image).

具体的には、１つのレコードのスコアを要素とするスコア空間のベクトルを、<(ポジティブ), (ニュートラル), (ネガティブ)>の形に記述するとした場合に、画像クラスタリング部１１３で生成され、それぞれ表情カテゴリ：ネガティブ、ニュートラル及びポジティブをラベリングされた３つのクラスタの中心は、１つの実施例として、
ネガティブ・クラスタの中心：<0.02, 0.10, 0.88>、
ニュートラル・クラスタの中心：<0.08, 0.42, 0.50>、及び
ポジティブ・クラスタの中心：<0.37, 0.35, 0.28>
といった形で表される。ここで、表情識別対象である特定対象の対象情報（特定ユーザの顔の写真画像）について決定されたスコアのなす点を<ng, nt, ps>とすると、上記の３つの中心のうち、この点<ng, nt, ps>とのユークリッド距離が最も小さい中心のクラスタに付与されたラベルを、この特定対象の対象情報の状態（表情カテゴリ）とすることができるのである。 Specifically, when a score space vector having the score of one record as an element is described in the form of <(positive), (neutral), (negative)>, it is generated by the image clustering unit 113, The centers of the three clusters labeled with facial expression categories: negative, neutral and positive, respectively, are as an example:
Negative cluster center: <0.02, 0.10, 0.88>,
Neutral cluster centers: <0.08, 0.42, 0.50>, and positive cluster centers: <0.37, 0.35, 0.28>
It is expressed in the form. Here, assuming that the point formed by the score determined for the target information of the specific target that is the facial expression identification target (photo image of the face of the specific user) is <ng, nt, ps>, among the above three centers, The label given to the central cluster having the smallest Euclidean distance from the point <ng, nt, ps> can be set as the state (expression category) of the target information of the specific target.

［他の実施形態における装置構成］
図７は、本発明による状態識別装置の他の実施形態における機能構成を示す機能ブロック図である。 [Apparatus Configuration in Other Embodiments]
FIG. 7 is a functional block diagram showing a functional configuration in another embodiment of the state identification device according to the present invention.

図７に示した実施形態の状態識別装置であるスマートフォン５は、図１に示したスマートフォン１の機能構成部と対応する機能構成部を有している。具体的には、通信インタフェース部５０１と、カメラ５０５と、タッチパネル・ディスプレイ５０６と、画像管理部５１１と、表情スコア決定部５１２ｂを有する表情識別エンジン５１２と、正解表情決定部５１４と、表情決定部５１５と、アプリケーション１２１とを有する。 The smartphone 5 which is the state identification device of the embodiment illustrated in FIG. 7 has a functional configuration unit corresponding to the functional configuration unit of the smartphone 1 illustrated in FIG. Specifically, the communication interface unit 501, the camera 505, the touch panel display 506, the image management unit 511, the facial expression identification engine 512 including the facial expression score determination unit 512b, the correct facial expression determination unit 514, and the facial expression determination unit 515 and an application 121.

すなわち、スマートフォン５は、図１に示したスマートフォン１の有する識別モデル学習部１１２ａ及び画像クラスタリング部１１３に対応する機能構成部を備えていない。本実施形態では、表情識別エンジン５１２の有する表情識別モデルの構築（学習）については、外部の表情識別準備装置３が、画像管理サーバ２から一般画像データを取得して行っている。また、スコアを有する写真画像データに対するクラスタリング処理についても、この表情識別準備装置３が、スマートフォン５から個人画像データを取得して行っているのである。 That is, the smartphone 5 does not include a functional configuration unit corresponding to the identification model learning unit 112a and the image clustering unit 113 included in the smartphone 1 illustrated in FIG. In the present embodiment, the construction (learning) of the facial expression identification model of the facial expression identification engine 512 is performed by the external facial expression identification preparation device 3 acquiring general image data from the image management server 2. Also, the facial expression identification preparation device 3 acquires personal image data from the smartphone 5 for the clustering processing for the photographic image data having a score.

スマートフォン５の正解表情決定部５１４は、表情識別準備装置３から、構築された表情識別モデル及びクラスタリング結果を受信して、管理している個人画像データについての正解を決定する。次いで、表情決定部５１５は、この正解を用いて特定識別モデルを構築し、構築したこの特定識別モデルによって、表情識別対象（例えばスマートフォン５のユーザの顔写真画像）の表情カテゴリを決定するのである。 The correct facial expression determination unit 514 of the smartphone 5 receives the constructed facial expression identification model and the clustering result from the facial expression identification preparation device 3, and determines the correct answer for the managed personal image data. Next, the facial expression determination unit 515 constructs a specific identification model using the correct answer, and determines the facial expression category of the facial expression identification target (for example, a facial photograph image of the user of the smartphone 5) based on the constructed specific identification model. .

変更態様として、スマートフォン５は、スマートフォン１の画像クラスタリング部１１３（図１）に対応する画像クラスタリング部５１３を備えていてもよい。この場合、クラスタリング処理はスマートフォン５で実施されるので、表情識別準備装置３に個人画像データを送信する必要はなくなる。 As a change mode, the smartphone 5 may include an image clustering unit 513 corresponding to the image clustering unit 113 (FIG. 1) of the smartphone 1. In this case, since the clustering process is performed by the smartphone 5, it is not necessary to transmit the personal image data to the facial expression identification preparation device 3.

以上説明したように、スマートフォン５では、少なくとも表情識別モデルを構築する処理を省略できる分、装置内で実行する情報処理量が格段に小さくて済む。言い換えれば、スマートフォン５は、携帯端末レベルのサイズ及び処理能力をもって表情識別を実現可能とするのである。 As described above, in the smartphone 5, the amount of information processing to be executed in the apparatus can be significantly reduced because at least the process of constructing the facial expression identification model can be omitted. In other words, the smartphone 5 can realize facial expression identification with the size and processing capability at the portable terminal level.

なお、更なる他の実施形態として、スマートフォン５は、表情識別エンジン５１２、画像クラスタリング部５１３、正解表情決定部５１４及び表情決定部５１５のいずれも備えておらず、表情識別準備装置３がこれらの機能構成部を全て備えていてもよい。このような実施形態では、表情識別準備装置３が本発明に係る状態識別装置として機能する。 As yet another embodiment, the smartphone 5 does not include any of the facial expression identification engine 512, the image clustering unit 513, the correct facial expression determination unit 514, and the facial expression determination unit 515. All the functional components may be provided. In such an embodiment, the facial expression identification preparation device 3 functions as a state identification device according to the present invention.

具体的には、スマートフォン５のカメラ５０５で撮影された個人画像を受信した表情識別準備装置３は、表情識別モデルによるスコア決定処理だけでなく、クラスタリング処理及び個人画像についての正解決定処理、さらには、特定識別モデルによる個人画像の表情カテゴリの決定処理を実施する。表情識別準備装置３は、次いで、この決定された表情カテゴリに係る情報（表情識別結果）をスマートフォン５に送信し、当該情報を受信したスマートフォン５は、当該情報をアプリケーション５２１において利用するのである。 Specifically, the facial expression identification preparation device 3 that has received the personal image captured by the camera 505 of the smartphone 5 not only performs score determination processing by the facial expression identification model, but also clustering processing and correct determination processing for the personal image, and Then, the process of determining the facial expression category of the personal image by the specific identification model is performed. The facial expression identification preparation device 3 then transmits information related to the determined facial expression category (expression identification result) to the smartphone 5, and the smartphone 5 that has received the information uses the information in the application 521.

ちなみに、上述したようなサーバ（表情識別準備装置３）から出力された表情識別結果を享受する端末は当然、スマートフォンに限定されるものではない。例えば、タブレット型コンピュータ、ノート型コンピュータや、ＰＣ（パーソナル・コンピュータ）であってもよく、さらには、ＩＯＴ（Internet Of Things）環境での使用に適したデバイスとしてのシンクライアント（Thin client）端末等、種々の形態の端末を採用することが可能である。 Incidentally, the terminal that receives the facial expression identification result output from the server (facial expression identification preparation device 3) as described above is not limited to a smartphone. For example, it may be a tablet computer, a notebook computer, a PC (personal computer), or a thin client terminal as a device suitable for use in an IOT (Internet Of Things) environment. Various types of terminals can be employed.

以上、詳細に説明したように、本発明によれば、表情識別器によって決定されるスコアだけに頼って表情を識別するのではなく、特定対象（例えば特定のユーザの顔）の対象情報（例えば写真画像）に対し、クラスタリング処理を利用して正解を予め決定する。これにより、この特定対象（特定のユーザの顔）の識別に適合した特定識別モデルを利用することができ、結果として、この特定対象の状態（特定ユーザの顔の表情）をより確実に識別することが可能となるのである。 As described above in detail, according to the present invention, the target information (for example, the face of a specific user) (for example, the face of a specific user) is not identified based on the score determined by the expression classifier alone. For a photograph image), a correct answer is determined in advance using a clustering process. As a result, a specific identification model suitable for identification of the specific target (a face of a specific user) can be used, and as a result, the state of the specific target (facial expression of the specific user) is more reliably identified. It becomes possible.

特に、顔の表情を識別する場合、個人差や国・民族・居住地域差等が存在する表情を、これらの差異を考慮したモデルを構築することによって、より高い精度で識別することが可能となる。 In particular, when identifying facial expressions, it is possible to identify facial expressions with individual differences, national / ethnic / residential area differences, etc. with higher accuracy by building a model that takes these differences into account. Become.

ちなみに、本発明に基づき、端末ユーザのような特定の個人の表情をより確実に識別し、そこで得られた高精度の表情識別結果を利用することによって、様々なサービスを提供可能なアプリケーション・プログラムを開発することもできる。そのようなアプリとして、例えば、この表情識別結果を利用して、対話している端末ユーザの感情（発話意図）を理解し、その応答内容を調整したり、当該ユーザとの対話内容をパーソナライズしたりすることが可能な対話ＡＩアプリが挙げられる。 By the way, based on the present invention, an application program capable of providing various services by more accurately identifying a facial expression of a specific individual such as a terminal user and using a highly accurate facial expression identification result obtained there. Can also be developed. As such an application, for example, by using this facial expression identification result, it is possible to understand the emotion (utterance intention) of the terminal user who is interacting, adjust the response content, or personalize the content of the interaction with the user. An interactive AI application that can be used.

以上に述べた本発明の種々の実施形態について、本発明の技術思想及び見地の範囲内での種々の変更、修正及び省略は、当業者によれば容易に行うことができる。以上に述べた説明はあくまで例示であって、何ら制約を意図するものではない。本発明は、特許請求の範囲及びその均等物によってのみ制約される。 Various changes, modifications, and omissions of the various embodiments of the present invention described above within the scope of the technical idea and the viewpoint of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be any limitation. The present invention is limited only by the claims and the equivalents thereof.

１、５スマートフォン（状態識別装置）
１０１、５０１通信インタフェース部
１０２一般画像データベース
１０３個人画像データベース
１０４表情データ記憶部
１０５、５０５カメラ
１０６、５０６タッチパネル・ディスプレイ（ＴＰ・ＤＰ）
１１１、５１１画像管理部
１１２、５１２表情識別エンジン
１１２ａ識別モデル学習部
１１２ｂ、５１２ｂ表情スコア決定部
１１３、５１３画像クラスタリング部
１１４、５１４正解表情決定部
１１５、５１５表情決定部
１２１、５２１アプリケーション
２画像管理サーバ
３表情識別準備装置 1, 5 Smartphone (state identification device)
101, 501 Communication interface unit 102 General image database 103 Personal image database 104 Facial expression data storage unit 105, 505 Camera 106, 506 Touch panel display (TP / DP)
111,511 Image management unit 112,512 Expression identification engine 112a Identification model learning unit 112b, 512b Expression score determination unit 113,513 Image clustering unit 114,514 Correct expression determination unit 115,515 Expression determination unit 121,521 Application 2 Image management Server 3 facial expression identification preparation device

Claims

A state identification device that identifies a state of a predetermined target, such that a tendency to develop for each individual target or for each type of the target is different based on target information related to the predetermined target. ,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. Correct determination means for determining a state corresponding to the cluster to which the target information related to the specific target belongs as a correct answer for the target information;
About one target information related to the specific target with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification device comprising: a state determination unit that inputs a determined score and determines a state related to the one target information in the specific target from the output.

The clustering means for classifying the plurality of pieces of target information into a plurality of clusters associated with each state based on a score determined from the plurality of pieces of target information related to the specific target. The state identification device according to 1.

The state determination means uses a specific identification model for obtaining an identification hyperplane that maximizes the distance from each feature amount point in the feature amount space formed by the feature amount generated from the input score. The state identification apparatus according to claim 1 or 2, wherein a state related to the one target information is determined.

The state determination means is a specific identification model including a weighting coefficient for an input score, and is a specification for updating the weighting coefficient so as to reduce an error between the state related to the determined correct answer and the output of the model The state identification apparatus according to claim 1 or 2, wherein a state relating to the one target information in the specific target is determined using an identification model.

The predetermined target is a human face, the state is a facial expression, and the target information is information related to an image of a human face,
The specific object is a predetermined attribute group to which an individual who identifies the facial expression or a human who identifies the facial expression belongs,
5. The state determination means identifies facial expressions appearing on a face related to the image information based on image information of facial expressions of a human belonging to the individual or the attribute group. The state identification apparatus of any one of Claims.

The classification of the plurality of pieces of target information into the cluster is performed using a k-means method in a space formed by the score. State identification device.

7. The state identification device according to claim 6, wherein the identification model used in the score determination means is a learning model in a convolutional neural network including a convolutional layer.

A state identification device that identifies a state of a predetermined target, such that a tendency to develop for each individual target or for each type of the target is different based on target information related to the predetermined target. ,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. The state corresponding to the cluster having the center having the smallest distance from the score determined for one target information related to the specific target among the centers of the plurality of clusters is determined as the state related to the single target information. And a state determining means for performing state identification.

A computer mounted on a device that identifies a state of a predetermined target and that has a different tendency to appear for each individual target or for each type of the target based on target information related to the predetermined target An evaluation estimation program that allows
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. Correct determination means for determining a state corresponding to the cluster to which the target information related to the specific target belongs as a correct answer for the target information;
About one target information related to the specific target with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification program for inputting a determined score and causing a computer to function as state determination means for determining a state related to the one target information in the specific target from the output.

A computer mounted on a device that identifies a state of a predetermined target and that has a different tendency to appear for each individual target or for each type of the target based on target information related to the predetermined target A state identification method implemented in
Using an identification model determined based on a large number of target information, determining a score representing the state of the target related to the target information from the input target information;
In the case where the plurality of pieces of target information are classified into a plurality of clusters associated with each state based on scores determined from a plurality of pieces of target information related to a specific target that is a state identification target among the predetermined targets. Determining a state corresponding to the cluster to which the target information related to the specific target belongs as a correct answer for the target information;
About one target information related to the specific target with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification method comprising: inputting a determined score and determining a state related to the one target information in the specific target from an output thereof.