JP2021033359A

JP2021033359A - Emotion estimation device, emotion estimation method, program, information presentation device, information presentation method and emotion estimation system

Info

Publication number: JP2021033359A
Application number: JP2019148936A
Authority: JP
Inventors: 伸一深澤; Shinichi Fukazawa
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2019-08-14
Filing date: 2019-08-14
Publication date: 2021-03-01
Anticipated expiration: 2039-08-14
Also published as: JP7306152B2

Abstract

To provide a technique capable of improving accuracy of emotion estimation.SOLUTION: An emotion estimation device is provided that includes: an association processing unit that associates a plurality of person area images in which the same person is captured from a plurality of viewpoints; an evaluation value calculation unit that calculates an evaluation value of a shooting condition for each of the plurality of person area images; and a comprehensive emotion estimation unit that generates comprehensive estimated emotion information of the person based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value.SELECTED DRAWING: Figure 1

Description

本発明は、感情推定装置、感情推定方法、プログラム、情報提示装置、情報提示方法及び感情推定システムに関する。 The present invention relates to an emotion estimation device, an emotion estimation method, a program, an information presentation device, an information presentation method, and an emotion estimation system.

特許文献１には、表情の時系列画像に基づき、計算機により表情の測定を行い表情の機械認識を行う表情認識装置が提案されている。 Patent Document 1 proposes a facial expression recognition device that measures facial expressions with a computer based on time-series images of facial expressions and performs machine recognition of facial expressions.

特開平３−２５２７７５号公報Japanese Unexamined Patent Publication No. 3-252775

前記特許文献１を含め、ある１台のカメラ（撮像装置）によって撮像されたヒトの顔画像から、その顔の表情種別や感情表出強度を推定する技術（以下「表情推定」技術）が既存に知られている。特に、近年では深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）の登場により、その推定（識別）精度が向上してきた。 A technique for estimating the facial expression type and emotional expression intensity of a human face image captured by a single camera (imaging device) including the above-mentioned Patent Document 1 (hereinafter referred to as "facial expression estimation" technique) already exists. Is known for. In particular, in recent years, with the advent of deep learning, the accuracy of its estimation (identification) has improved.

一方で、実環境（ＩｎｔｈｅＷｉｌｄ環境）での画像認識技術においては、理想的な実験室統制環境と比較して、表情推定処理における外乱要因、たとえばカメラと被写体間の位置関係性に基づく見えの変化、照明変動、オクルージョン等により、表情推定の精度（を含む認識処理の性能）が下がってしまうという課題がある。 On the other hand, in the image recognition technology in the real environment (In the Wild environment), the appearance is based on the disturbance factor in the facial expression estimation processing, for example, the positional relationship between the camera and the subject, as compared with the ideal laboratory control environment. There is a problem that the accuracy of facial expression estimation (including the performance of recognition processing) is lowered due to changes in the image, lighting fluctuation, occlusion, and the like.

そこで本発明は、感情推定の精度を向上させることが可能な技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique capable of improving the accuracy of emotion estimation.

上記問題を解決するために、本発明のある観点によれば、複数の視点から同一の人物が撮像された複数の人物領域画像を対応付ける対応付け処理部と、前記複数の人物領域画像それぞれの撮影条件の評価値を算出する評価値算出部と、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報を生成する総合感情推定部と、を備える、感情推定装置が提供される。 In order to solve the above problem, according to a certain viewpoint of the present invention, a mapping processing unit for associating a plurality of person area images in which the same person is captured from a plurality of viewpoints and photographing each of the plurality of person area images. Comprehensive emotion that generates comprehensive estimated emotion information of the person based on the evaluation value calculation unit that calculates the evaluation value of the condition, the estimated emotion information generated from each of the plurality of person area images, and the evaluation value. An emotion estimation device including an estimation unit is provided.

前記対応付け処理部は、前記複数の人物領域画像それぞれに写る前記人物の空間における位置情報に基づいて、前記複数の人物領域画像を対応付けてもよい。 The association processing unit may associate the plurality of person area images based on the position information in the space of the person reflected in each of the plurality of person area images.

前記対応付け処理部は、前記複数の人物領域画像それぞれの撮影時刻に基づいて、前記複数の人物領域画像を対応付けてもよい。 The association processing unit may associate the plurality of person area images based on the shooting times of the plurality of person area images.

前記総合感情推定部は、前記評価値に基づいて複数の推定感情情報それぞれの重みを算出し、前記推定感情情報と前記重みとに基づいて、前記総合的な推定感情情報を生成してもよい。 The comprehensive emotion estimation unit may calculate the weight of each of the plurality of estimated emotion information based on the evaluation value, and generate the comprehensive estimated emotion information based on the estimated emotion information and the weight. ..

前記総合感情推定部は、前記撮影条件の優先順位および前記評価値に基づいて、前記重みを算出してもよい。 The comprehensive emotion estimation unit may calculate the weight based on the priority of the imaging conditions and the evaluation value.

前記総合感情推定部は、前記評価値間の正規化に基づいて前記重みを算出してもよい。 The comprehensive emotion estimation unit may calculate the weight based on the normalization between the evaluation values.

前記評価値算出部は、当該複数の推定感情情報それぞれの推定における尤度に基づいて前記評価値を設定してもよい。 The evaluation value calculation unit may set the evaluation value based on the likelihood in the estimation of each of the plurality of estimated emotion information.

前記評価値算出部は、前記複数の人物領域画像それぞれに対応して、前記人物と前記人物領域画像を撮像するカメラとの角度または距離に基づいて、前記人物領域画像の撮影条件の評価値を設定してもよい。 The evaluation value calculation unit determines the evaluation value of the shooting condition of the person area image based on the angle or distance between the person and the camera that captures the person area image in response to each of the plurality of person area images. It may be set.

前記評価値算出部は、前記複数の人物領域画像それぞれに対応して、前記人物に対する光照射度合い、および、前記人物の撮像遮蔽度合いの少なくともいずれか一方に基づいて、前記人物領域画像の撮影条件の評価値を設定してもよい。 The evaluation value calculation unit corresponds to each of the plurality of person area images, and based on at least one of the degree of light irradiation to the person and the degree of imaging shielding of the person, the shooting conditions of the person area image. The evaluation value of may be set.

前記評価値算出部は、前記複数の人物領域画像それぞれに対応して、前記人物領域画像の解像度および画像品質の少なくともいずれか一方に基づいて、前記人物領域画像の撮影条件の評価値を設定してもよい。 The evaluation value calculation unit sets the evaluation value of the shooting condition of the person area image based on at least one of the resolution and the image quality of the person area image corresponding to each of the plurality of person area images. You may.

また、本発明の他の観点によれば、複数の視点から同一の人物が撮像された複数の人物領域画像を対応付けることと、前記複数の人物領域画像それぞれの撮影条件の評価値を算出することと、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報を生成することと、を含む、感情推定方法が提供される。 Further, according to another viewpoint of the present invention, associating a plurality of person area images obtained by capturing the same person from a plurality of viewpoints and calculating an evaluation value of shooting conditions for each of the plurality of person area images. An emotion estimation method including the generation of comprehensive estimated emotion information of the person based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value is provided. ..

また、本発明の他の観点によれば、コンピュータを、複数の視点から同一の人物が撮像された複数の人物領域画像を対応付ける対応付け処理部と、前記複数の人物領域画像それぞれの撮影条件の評価値を算出する評価値算出部と、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報を生成する総合感情推定部と、を備える感情推定装置として機能させるためのプログラムが提供される。 Further, according to another viewpoint of the present invention, the computer has a matching processing unit for associating a plurality of person area images in which the same person is captured from a plurality of viewpoints, and shooting conditions for each of the plurality of person area images. An evaluation value calculation unit that calculates an evaluation value, and a comprehensive emotion estimation unit that generates comprehensive estimated emotion information of the person based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value. A program for functioning as an emotion estimation device is provided.

また、本発明の他の観点によれば、複数の視点から同一の人物が撮像されて対応付けられた複数の人物領域画像それぞれの撮影条件の評価値が算出され、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報が生成されると、前記総合的な推定感情情報が提示されるように制御する制御部を備える、情報提示装置が提供される。 Further, according to another viewpoint of the present invention, the evaluation value of the shooting conditions of each of the plurality of person area images associated with the same person being imaged from the plurality of viewpoints is calculated, and each of the plurality of person area images is calculated. It is provided with a control unit that controls so that when the comprehensive estimated emotion information of the person is generated based on the estimated emotion information generated from the above and the evaluation value, the comprehensive estimated emotion information is presented. , An information presentation device is provided.

前記制御部は、前記人物が写る人物領域画像が提示されるように制御するとともに、前記人物領域画像において前記人物が写る座標に応じた位置に前記総合的な推定感情情報が重畳されるように制御してもよい。 The control unit controls so that the person area image in which the person appears is presented, and the comprehensive estimated emotion information is superimposed on the position corresponding to the coordinates in which the person appears in the person area image. You may control it.

また、本発明の他の観点によれば、複数の視点から同一の人物が撮像されて対応付けられた複数の人物領域画像それぞれの撮影条件の評価値が算出され、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報が生成されると、前記総合的な推定感情情報が提示されるように制御することを含む、情報提示方法が提供される。 Further, according to another viewpoint of the present invention, the evaluation value of the shooting conditions of each of the plurality of person area images associated with the same person being imaged from the plurality of viewpoints is calculated, and each of the plurality of person area images is calculated. When the comprehensive estimated emotional information of the person is generated based on the estimated emotional information generated from the above and the evaluation value, the comprehensive estimated emotional information is controlled to be presented. Information presentation methods are provided.

また、本発明の他の観点によれば、コンピュータを、複数の視点から同一の人物が撮像されて対応付けられた複数の人物領域画像それぞれの撮影条件の評価値が算出され、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報が生成されると、前記総合的な推定感情情報が提示されるように制御する制御部を備える、情報提示装置として機能させるためのプログラムが提供される。 Further, according to another viewpoint of the present invention, the evaluation value of the shooting conditions of each of the plurality of person area images associated with the same person being imaged from a plurality of viewpoints is calculated by the computer, and the plurality of persons. Control to control so that when the comprehensive estimated emotion information of the person is generated based on the estimated emotion information generated from each of the region images and the evaluation value, the comprehensive estimated emotion information is presented. A program for functioning as an information presenting device is provided.

また、本発明の他の観点によれば、複数の視点から同一の人物が撮像された複数の人物領域画像を対応付ける対応付け処理部と、前記複数の人物領域画像それぞれの撮影条件の評価値を算出する評価値算出部と、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報を生成する総合感情推定部と、を備える、感情推定装置と、前記総合的な推定感情情報が提示されるように制御する制御部を備える、情報提示装置と、を有する、感情推定システムが提供される。 Further, according to another viewpoint of the present invention, the associating processing unit for associating a plurality of person area images in which the same person is captured from a plurality of viewpoints and the evaluation value of the shooting conditions of the plurality of person area images are determined. An evaluation value calculation unit to be calculated, and a comprehensive emotion estimation unit that generates comprehensive estimated emotion information of the person based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value. Provided is an emotion estimation system including an emotion estimation device and an information presentation device including a control unit for controlling the comprehensive estimated emotion information to be presented.

また、本発明の他の観点によれば、複数の視点から同一の人物が撮像された複数の人物領域画像を対応付けることと、前記複数の人物領域画像それぞれの撮影条件の評価値を算出することと、前記複数の人物領域画像それぞれから生成される推定感情情報と前記評価値とに基づいて、前記人物の総合的な推定感情情報を生成することと、前記総合的な推定感情情報が提示されるように制御することと、含む、感情推定方法が提供される。 Further, according to another viewpoint of the present invention, associating a plurality of person area images obtained by capturing the same person from a plurality of viewpoints and calculating an evaluation value of shooting conditions for each of the plurality of person area images. And, based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value, the comprehensive estimated emotion information of the person is generated, and the comprehensive estimated emotion information is presented. Emotion estimation methods are provided, including controlling and including.

以上説明したように本発明によれば、感情推定の精度を向上させることが可能な技術が提供される。 As described above, according to the present invention, there is provided a technique capable of improving the accuracy of emotion estimation.

本発明の実施形態に係る情報通信システムの概略的な構成の一例を示す説明図である。It is explanatory drawing which shows an example of the schematic structure of the information communication system which concerns on embodiment of this invention. 同実施形態に係る感情推定サーバ、カメラ、情報提示端末のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware configuration of the emotion estimation server, the camera, and the information presenting terminal which concerns on this embodiment. 同実施形態に係るカメラの機能構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the camera which concerns on the same embodiment. 同実施形態に係る感情推定サーバの機能構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the emotion estimation server which concerns on the same embodiment. 感情人物照合部によって紐づけ処理され記憶部に記憶される感情人物位置ＤＢのデータテーブルの一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of the data table of the emotion person position DB which is associated with the emotion person person collating unit and is stored in the storage part. 同実施形態に係る情報提示端末の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the information presenting terminal which concerns on this embodiment. 情報提示端末の提示部によって提示された表示画面の一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of the display screen presented by the presenting part of an information presenting terminal. 同実施形態に係る情報通信システムの動作フローの一例を示す説明図である。It is explanatory drawing which shows an example of the operation flow of the information communication system which concerns on this embodiment.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are designated by the same reference numerals, so that duplicate description will be omitted.

また、本明細書及び図面において、実質的に同一または類似の機能構成を有する複数の構成要素を、同一の符号の後に異なるアルファベットを付して区別する。ただし、実質的に同一または類似の機能構成を有する複数の構成要素の各々を特に区別する必要がない場合、同一符号のみを付する。 Further, in the present specification and the drawings, a plurality of components having substantially the same or similar functional configurations are distinguished by adding different alphabets after the same reference numerals. However, if it is not necessary to distinguish each of a plurality of components having substantially the same or similar functional configurations, only the same reference numerals are given.

（０．概要）
前記特許文献１には、表情の時系列画像に基づき、計算機により表情の測定を行い表情の機械認識を行う表情認識装置が提案されている。 (0. Overview)
Patent Document 1 proposes a facial expression recognition device that measures facial expressions with a computer based on time-series images of facial expressions and performs machine recognition of facial expressions.

前記特許文献１を含め、ある１台のカメラ（撮像装置）によって撮像されたヒトの顔画像（または身体画像）から、その顔の表情（または行動しぐさ）の種別や感情表出強度を推定する技術（以下「表情推定」技術）が既存に知られている。特に、近年では深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）の登場により、その推定（識別）精度が向上してきた。 From the human face image (or body image) captured by one camera (imaging device) including the above-mentioned Patent Document 1, the type of facial expression (or behavioral behavior) and the emotional expression intensity of the face are estimated. Technology (hereinafter referred to as "facial expression estimation" technology) is already known. In particular, in recent years, with the advent of deep learning, the accuracy of its estimation (identification) has improved.

一方で、実環境（ＩｎｔｈｅＷｉｌｄ環境）での画像認識においては、理想的な実験室統制環境と比較して、表情推定処理における外乱要因、たとえばカメラと被写体間の位置関係性に基づく見えの変化、照明変動、オクルージョン等により、表情推定の精度が下がってしまうという課題がある。 On the other hand, in image recognition in a real environment (In the Wild environment), compared to an ideal laboratory control environment, the appearance is based on disturbance factors in facial expression estimation processing, for example, the positional relationship between the camera and the subject. There is a problem that the accuracy of facial expression estimation is lowered due to changes, lighting fluctuations, occlusion, and the like.

さらに近年、遠隔環境において、離れた拠点に備えられた複数のカメラ映像を常時共有することにより、離れていてもあたかも一緒に働いているかのような協働環境を実現させるテレワークシステムが開発されている（非特許文献１：徳満昌之・野中雅人、超臨場感テレワークシステムの開発、ＯＫＩテクニカルレビュー、Ｖｏｌ．８４（１）、ｐｐ．３２−３５、２０１７）。遠隔環境では同室環境と比較して離れた相手の状況、たとえば「感情」が把握し難く感じられる問題があるが（非特許文献２：有本泰子ほか、オンラインコミュニケーションにおけるモダリティ統制下の情動理解、日本音響学会２０１４年秋季研究発表会講演論文集、ｐｐ．３８５−３８６、２０１４）、前記の「表情推定技術」を利用し、表情の推定情報を生成して遠隔地の相手に情報提示することで、その問題も軽減できると考えられる。 Furthermore, in recent years, a telework system has been developed that realizes a collaborative environment as if working together even if they are separated by constantly sharing images from multiple cameras installed at remote locations in a remote environment. (Non-Patent Document 1: Masayuki Tokumatsu and Masato Nonaka, Development of Super Realistic Telework System, OKI Technical Review, Vol.84 (1), pp.32-35, 2017). In a remote environment, there is a problem that it is difficult to grasp the situation of the other party, for example, "emotion" compared to the room environment (Non-Patent Document 2: Yasuko Arimoto et al., Emotional understanding under modality control in online communication, Proceedings of the 2014 Autumn Meeting of the Acoustic Society of Japan, pp.385-386, 2014), using the above-mentioned "facial expression estimation technology" to generate facial expression estimation information and present it to a remote partner. Therefore, it is thought that the problem can be alleviated.

しかしながら、実環境のオフィスでは前記の照明変動や各種室内設備のカメラ内映り込みによるオクルージョンが高確率で発生し、推定感情情報の精度を低下させる。そこで、前記テレワークシステムが備える複数のカメラを利用することで、当該課題の解決を試みる。 However, in an office in a real environment, occlusion due to the above-mentioned lighting fluctuation and reflection in the camera of various indoor equipment occurs with high probability, and the accuracy of the estimated emotion information is lowered. Therefore, we try to solve the problem by using a plurality of cameras included in the telework system.

本実施形態は、複数のカメラと、顔画像から表情推定処理を行うサーバと、を備えた「複数視点映像による顔表情推定システム」であって、前記サーバは、複数のカメラ画像間で推定対象の撮像データまたは抽出データの対応付けを行う「感情人物照合部」、複数のカメラそれぞれの撮影条件を比較し複数の推定感情情報それぞれの評価値を算出する「撮像条件比較部」、複数の推定感情情報それぞれの評価値から最終的な推定感情情報を算出する「総合感情推定部」を備える。複数視点のカメラによる多重の表情推定処理を実現できることにより、従来の単一のカメラ画像に基づく表情推定技術と比較して、高精度の表情推定を実現できる。 The present embodiment is a "facial expression estimation system using a plurality of viewpoint images" including a plurality of cameras and a server that performs facial expression estimation processing from a facial image, and the server is an estimation target among a plurality of camera images. "Emotional person matching unit" that associates the captured data or extracted data of the camera, "Imaging condition comparison unit" that compares the shooting conditions of each of multiple cameras and calculates the evaluation value of each of the multiple estimated emotional information, and multiple estimations. Emotion information A "comprehensive emotion estimation unit" that calculates the final estimated emotion information from each evaluation value is provided. By realizing multiple facial expression estimation processes using a camera having a plurality of viewpoints, it is possible to realize highly accurate facial expression estimation as compared with the conventional facial expression estimation technology based on a single camera image.

（１．第１の実施形態）
続いて、図１を参照して、本発明の実施形態に係る情報通信システム（感情推定システム）の概略的な構成を説明する。 (1. First Embodiment)
Subsequently, with reference to FIG. 1, a schematic configuration of an information communication system (emotion estimation system) according to the embodiment of the present invention will be described.

図１は、本実施形態に係る情報通信システムの概略的な構成の一例を示す説明図である。図１を参照すると、本情報通信システムは、感情推定サーバ（感情推定装置）１００、複数のカメラ２００、情報提示端末（情報提示装置）３００及びＬＡＮ５０を含み、その一部（たとえば複数のカメラ２００）はオフィス４００内に存在しても構わない。また、オフィス４００内には本情報通信システムの一部の他に、一例としてユーザー９００、障害物５００及び照明６００が存在している。複数のカメラ２００の撮影範囲は互いに重複していても構わない。 FIG. 1 is an explanatory diagram showing an example of a schematic configuration of an information communication system according to the present embodiment. Referring to FIG. 1, the information communication system includes an emotion estimation server (emotion estimation device) 100, a plurality of cameras 200, an information presentation terminal (information presentation device) 300, and a LAN 50, and a part thereof (for example, a plurality of cameras 200). ) May exist in the office 400. Further, in the office 400, in addition to a part of the information communication system, a user 900, an obstacle 500, and a lighting 600 are present as examples. The shooting ranges of the plurality of cameras 200 may overlap each other.

図２は、本実施形態に係る感情推定サーバ１００、カメラ２００、情報提示端末３００（以下、感情推定サーバ１００、カメラ２００及び情報提示端末３００それぞれを区別せずに「本実施形態に係る装置」と言う場合がある。）のハードウェア構成の一例を示すブロック図である。なお、前記の各装置のすべてに下記のハードウェア構成のすべてが備えられている必要はなく（たとえば感情推定サーバ１００に直接的にセンサが備えられている必要はない）、後述する各装置の機能構成を実現できるハードウェアモジュールが適宜限定して備えられてもよい。 FIG. 2 shows the “device according to the present embodiment” without distinguishing each of the emotion estimation server 100, the camera 200, and the information presentation terminal 300 (hereinafter, the emotion estimation server 100, the camera 200, and the information presentation terminal 300) according to the present embodiment. It is a block diagram which shows an example of the hardware configuration of). It should be noted that it is not necessary that all of the above-mentioned devices are provided with all of the following hardware configurations (for example, the emotion estimation server 100 does not need to be directly provided with a sensor), and each device described later does not need to be provided with a sensor. A hardware module capable of realizing a functional configuration may be provided with an appropriate limitation.

図２を参照すると、本実施形態に係る装置は、バス８０１、ＣＰＵ（Central Processing Unit）８０３、ＲＯＭ（Read Only Memory）８０５、ＲＡＭ（Random Access Memory）８０７、記憶装置８０９、通信インタフェース８１１、センサ８１３、入力装置８１５、表示装置８１７、スピーカ８１９を備える。ＣＰＵ８０３は、本実施形態に係る装置における様々な処理を実行する。また、ＲＯＭ８０５は、本実施形態に係る装置における処理をＣＰＵ８０３に実行させるためのプログラム及びデータを記憶する。また、ＲＡＭ８０７は、ＣＰＵ８０３の処理の実行時に、プログラム及びデータを一時的に記憶する。 Referring to FIG. 2, the apparatus according to the present embodiment includes a bus 801 and a CPU (Central Processing Unit) 803, a ROM (Read Only Memory) 805, a RAM (Random Access Memory) 807, a storage device 809, a communication interface 811 and a sensor. It includes an 813, an input device 815, a display device 817, and a speaker 819. The CPU 803 executes various processes in the apparatus according to the present embodiment. Further, the ROM 805 stores a program and data for causing the CPU 803 to execute the processing in the apparatus according to the present embodiment. Further, the RAM 807 temporarily stores the program and the data when the processing of the CPU 803 is executed.

バス８０１は、ＣＰＵ８０３、ＲＯＭ８０５及びＲＡＭ８０７を相互に接続する。バス８０１には、さらに、記憶装置８０９、通信インタフェース８１１、センサ８１３、入力装置８１５、表示装置８１７及びスピーカ８１９が接続される。バス８０１は、例えば、複数の種類のバスを含む。一例として、バス８０１は、ＣＰＵ８０３、ＲＯＭ８０５及びＲＡＭ８０７を接続する高速バスと、前記高速バスよりも低速の１つ以上の別のバスを含む。 The bus 801 connects the CPU 803, the ROM 805, and the RAM 807 to each other. A storage device 809, a communication interface 811, a sensor 813, an input device 815, a display device 817, and a speaker 819 are further connected to the bus 801. Bus 801 includes, for example, a plurality of types of buses. As an example, the bus 801 includes a high-speed bus connecting the CPU 803, ROM 805, and RAM 807, and one or more other buses slower than the high-speed bus.

記憶装置８０９は、本実施形態に係る装置内で一時的または恒久的に保存すべきデータを記憶する。記憶装置８０９は、例えば、ハードディスク（Hard Disk）等の磁気記憶装置であってもよく、または、ＥＥＰＲＯＭ（Electrically Erasable and Programmable Read
Only Memory）、フラッシュメモリ（flash memory）、ＭＲＡＭ（Magnetoresistive Random Access Memory）、ＦｅＲＡＭ（Ferroelectric Random Access Memory）及びＰＲＡＭ（Phase change Random Access Memory）等の不揮発性メモリ（nonvolatile memory）であってもよい。 The storage device 809 stores data to be temporarily or permanently stored in the device according to the present embodiment. The storage device 809 may be, for example, a magnetic storage device such as a hard disk, or an EEPROM (Electrically Erasable and Programmable Read).
It may be a non-volatile memory such as Only Memory, flash memory, MRAM (Magnetoresistive Random Access Memory), FeRAM (Ferroelectric Random Access Memory) and PRAM (Phase change Random Access Memory).

通信インタフェース８１１は、本実施形態に係る装置が備える通信手段であり、ネットワークを介して（あるいは直接的に）外部装置と通信する。通信インタフェース８１１は、無線通信用のインタフェースであってもよく、この場合に、例えば、通信アンテナ、ＲＦ回路及びその他の通信処理用の回路を含んでもよい。また、通信インタフェース８１１は、有線通信用のインタフェースであってもよく、この場合に、例えば、ＬＡＮ端子、伝送回路及びその他の通信処理用の回路を含んでもよい。 The communication interface 811 is a communication means included in the device according to the present embodiment, and communicates with an external device via a network (or directly). The communication interface 811 may be an interface for wireless communication, and in this case, for example, a communication antenna, an RF circuit, and other circuits for communication processing may be included. Further, the communication interface 811 may be an interface for wired communication, and in this case, for example, a LAN terminal, a transmission circuit, and other circuits for communication processing may be included.

センサ８１３は、たとえばカメラ、マイクロフォン、生体センサ、その他のセンサまたはそれらの複合である。カメラは、被写体を撮像するもので、例えば光学系、撮像素子及び画像処理回路を含む。マイクロフォンは、周囲の音を収音するもので、前記音を電気信号へ変換し前記電気信号をデジタルデータに変換する。 The sensor 813 is, for example, a camera, a microphone, a biosensor, another sensor, or a combination thereof. The camera captures a subject and includes, for example, an optical system, an image sensor, and an image processing circuit. The microphone picks up ambient sound, converts the sound into an electric signal, and converts the electric signal into digital data.

入力装置８１５は、タッチパネル、マウス、視線検出装置等である。表示装置８１７は、本実施形態に係る装置からの出力画像（すなわち表示画面）を表示するもので、例えば液晶、有機ＥＬ（Organic Light-Emitting Diode）、ＣＲＴ（Cathode Ray Tube）等を用いて実現され得る。スピーカ８１９は、音声を出力するもので、デジタルデータを電気信号に変換し前記電気信号を音声に変換する。 The input device 815 is a touch panel, a mouse, a line-of-sight detection device, and the like. The display device 817 displays an output image (that is, a display screen) from the device according to the present embodiment, and is realized by using, for example, a liquid crystal, an organic EL (Organic Light-Emitting Diode), a CRT (Cathode Ray Tube), or the like. Can be done. The speaker 819 outputs voice, converts digital data into an electric signal, and converts the electric signal into voice.

次に、図３を参照して、本実施形態に係る「カメラ２００」の機能構成の一例を説明する。カメラ２００は、実世界の計測データを生成する機能を有し、ユーザー９００を含むオフィス４００内の撮像画像（動画像であってもよく、画像には映像が含まれ得る）や、ユーザー９００の各種行動・生理反応等（表情、身振り、音声等を含む）を外的に計測して、取得したセンサデータを後述する感情推定サーバ１００へ送信する。 Next, an example of the functional configuration of the “camera 200” according to the present embodiment will be described with reference to FIG. The camera 200 has a function of generating measurement data in the real world, and is an image captured in the office 400 including the user 900 (may be a moving image, and the image may include an image) or the user 900. Various behaviors / physiological reactions (including facial expressions, gestures, voices, etc.) are externally measured, and the acquired sensor data is transmitted to the emotion estimation server 100 described later.

図３は、本実施形態に係るカメラ２００の機能構成の一例を示すブロック図である。図３を参照すると、カメラ２００は、通信部２１０、計測部２２０及び制御部２３０を備える。なお、図３には図示していないが、カメラ２００は、計測データを保存するための記憶部や、内部動作状況をユーザーに示すための表示部等をさらに備えていてもよい。 FIG. 3 is a block diagram showing an example of the functional configuration of the camera 200 according to the present embodiment. Referring to FIG. 3, the camera 200 includes a communication unit 210, a measurement unit 220, and a control unit 230. Although not shown in FIG. 3, the camera 200 may further include a storage unit for storing measurement data, a display unit for showing the internal operation status to the user, and the like.

通信部２１０は、他の装置と通信する。たとえば、通信部２１０は、ＬＡＮ５０に直接的に接続され、感情推定サーバ１００と通信する。また、他のカメラ２００と通信してもよい。なお、通信部２１０は、通信インタフェース８１１により実装され得る。 The communication unit 210 communicates with another device. For example, the communication unit 210 is directly connected to the LAN 50 and communicates with the emotion estimation server 100. Further, it may communicate with another camera 200. The communication unit 210 may be implemented by the communication interface 811.

計測部２２０は、実世界の計測データ（たとえばオフィス４００内の俯瞰的画角の撮映像）や、ユーザー９００の行動や生理反応を外的に計測してデータを取得する。前記行動や生体反応のデータは、たとえば、カメラにより計測されるオフィス４００内移動行動、顔表情や身体姿勢の状態内容を含む画像データ、マイクロフォンにより計測される音声データである。さらには、人体の撮像データにおける肌の色の微細な変化から推定する脈拍データ、眼の撮像データから推定する視線運動データや瞳孔径データ、前記カメラに赤外線サーモグラフィ機能が備えられていれば計測できる皮膚温分布データ等、ユーザーの自律神経系活動情報を反映する高次の生理指標データであってもよい。 The measurement unit 220 externally measures the measurement data in the real world (for example, a captured image of a bird's-eye view of the office 400) and the behavior and physiological reaction of the user 900 to acquire the data. The behavior and biological reaction data are, for example, moving behavior in the office 400 measured by a camera, image data including facial expressions and body posture states, and voice data measured by a microphone. Furthermore, pulse data estimated from minute changes in skin color in human body imaging data, line-of-sight motion data and pupil diameter data estimated from eye imaging data, and measurement can be performed if the camera is equipped with an infrared thermography function. It may be higher-order physiological index data that reflects the user's autonomic nervous system activity information such as skin temperature distribution data.

前記推定の処理は、後述する制御部２３０によりカメラ２００内で行われてもよいし、カメラ２００から後述する感情推定サーバ１００へ生の測定データを送信し感情推定サーバ１００内で行われてもよい。なお、計測部２２０は、センサ８１３により実装され得る。 The estimation process may be performed in the camera 200 by the control unit 230 described later, or may be performed in the emotion estimation server 100 by transmitting raw measurement data from the camera 200 to the emotion estimation server 100 described later. Good. The measuring unit 220 may be mounted by the sensor 813.

制御部２３０は、カメラ２００の様々な機能を提供する。制御部２３０は、前記計測データを、後述する計測対象のユーザー９００の位置情報のデータや、計測データを計測した時刻情報のデータと紐づけ、通信部２１０を介して感情推定サーバ１００へ送信してもよい。カメラ２００は、計測だけでなく、前処理、特徴抽出処理、推定を含む解析処理までを実施してもよく、その場合の各種演算処理を制御部２３０が行ってもよい。なお、制御部２３０は、ＣＰＵ８０３、ＲＯＭ８０５及びＲＡＭ８０７により実装され得る。 The control unit 230 provides various functions of the camera 200. The control unit 230 associates the measurement data with the position information data of the user 900 to be measured and the time information data obtained by measuring the measurement data, and transmits the measurement data to the emotion estimation server 100 via the communication unit 210. You may. The camera 200 may perform not only measurement but also analysis processing including preprocessing, feature extraction processing, and estimation, and the control unit 230 may perform various arithmetic processing in that case. The control unit 230 may be mounted by the CPU 803, the ROM 805, and the RAM 807.

カメラ２００は、撮像範囲内に含まれるユーザー９００の位置を推定する機能を有していてもよい。たとえば、カメラ２００にレーザレンジファインダの機能も搭載されており、撮像範囲の３次元計測機能を有していてもよい。また、カメラ２００が汎用的な単眼カメラであっても、撮像対象人物の３次元実空間における存在位置を推定する方法は既存に複数あり、公知の方法である（たとえば、非特許文献３：大澤達哉ほか、映像モニタリングのための人物追跡技術、ＮＴＴ技術ジャーナル、１９（８）、ｐｐ．１７−２０、２００７）。 The camera 200 may have a function of estimating the position of the user 900 included in the imaging range. For example, the camera 200 may also be equipped with a laser range finder function and may have a three-dimensional measurement function of an imaging range. Further, even if the camera 200 is a general-purpose monocular camera, there are a plurality of existing methods for estimating the existence position of the person to be imaged in the three-dimensional real space, which is a known method (for example, Non-Patent Document 3: Osawa). Tatsuya et al., Person tracking technology for video monitoring, NTT Technology Journal, 19 (8), pp.17-20, 2007).

本発明の実施形態では、カメラ２００は、たとえばオフィス内の固定設置利用であってもよく、その場合、カメラの内部または外部パラメータの情報（カメラの３次元空間内位置、姿勢、撮像方向、画角、撮像範囲等の情報を含む）に係るデータは既知として、前記カメラパラメータのデータをカメラ２００や感情推定サーバ１００が予め記憶部に有しており、前記データを撮像対象人物の位置推定に利用してもよい（すなわち、前記データ及びカメラ２００からの取得データに基づいて撮像対象人物の位置が推定されてもよい）。 In the embodiment of the present invention, the camera 200 may be used for fixed installation in an office, for example, in which case information on internal or external parameters of the camera (position in three-dimensional space of the camera, orientation, imaging direction, image). Assuming that the data related to (including information such as the angle and the imaging range) is known, the camera 200 and the emotion estimation server 100 have the data of the camera parameters in the storage unit in advance, and the data is used for estimating the position of the person to be imaged. It may be used (that is, the position of the person to be imaged may be estimated based on the data and the data acquired from the camera 200).

さらに、カメラ２００は、たとえば自動車に設置された車載カメラであってもよい。この場合も、カメラ２００は自動車周囲環境の撮像データや前記自動車の位置（ＧＰＳ（Global Positioning System）、デッドレコニング、高精度地図、ＳＬＡＭ（Simultaneous Localization And Mapping）等で算出されてもよい）や姿勢の情報をリアルタイムに取得し、前記カメラの外部パラメータの情報を生成してもよい。また、複数のカメラ２００は複数の自動車にそれぞれ設置されたカメラであって、前記複数のカメラ２００は複数の自動車間の車々間通信によってお互いの位置関係情報を生成してもよい。 Further, the camera 200 may be, for example, an in-vehicle camera installed in an automobile. In this case as well, the camera 200 may be calculated by imaging data of the surrounding environment of the vehicle, the position of the vehicle (GPS (Global Positioning System), dead reckoning, high-precision map, SLAM (Simultaneous Localization And Mapping), etc.) and posture. Information may be acquired in real time to generate information on external parameters of the camera. Further, the plurality of cameras 200 are cameras installed in a plurality of automobiles, and the plurality of cameras 200 may generate mutual positional relationship information by inter-vehicle communication between the plurality of automobiles.

次に、図４を参照して、本実施形態に係る「感情推定サーバ１００」の機能構成の一例を説明する。図４は、本実施形態に係る感情推定サーバ１００の機能構成の一例を示すブロック図である。図４を参照すると、感情推定サーバ１００は、通信部１１０、記憶部１２０及び制御部１３０を備える。 Next, an example of the functional configuration of the “emotion estimation server 100” according to the present embodiment will be described with reference to FIG. FIG. 4 is a block diagram showing an example of the functional configuration of the emotion estimation server 100 according to the present embodiment. Referring to FIG. 4, the emotion estimation server 100 includes a communication unit 110, a storage unit 120, and a control unit 130.

通信部１１０は、他の装置と通信する。たとえば、通信部１１０は、ＬＡＮ５０に直接的に接続され、カメラ２００や情報提示端末３００と通信する。なお、通信部１１０は、通信インタフェース８１１により実装され得る。 The communication unit 110 communicates with another device. For example, the communication unit 110 is directly connected to the LAN 50 and communicates with the camera 200 and the information presenting terminal 300. The communication unit 110 may be implemented by the communication interface 811.

記憶部１２０は、感情推定サーバ１００の動作のためのプログラム及びデータを記憶する。記憶部１２０は、感情推定辞書ＤＢ１２１及び感情人物位置ＤＢ１２２を含む。
前記データには、センサデータ（本実施形態では、たとえばユーザー９００を含む画像のデータ）からユーザーの感情（感情には表情やしぐさの種別や強度も含まれ得る）を推定（識別）処理するための学習済の感情推定モデル（感情認識辞書）のデータが含まれる。前記感情推定モデルは、予め取得されたセンサデータ（たとえば多数の人物の顔表情を含む画像）と、前記センサデータ取得時の撮像対象人物の感情の正解情報のデータとを紐づけて学習処理し生成される。前記感情の正解情報は、学習処理フェーズにおいて前記人物から質問紙法等により計測されても構わない。また、感情推定モデルはユーザー９００の各個人毎、所定期間毎、ユーザー９００の行動種別毎等でデータを分類および分割しそれぞれ学習処理させることで生成され、条件に応じた複数の感情推定モデルが存在しても構わない。 The storage unit 120 stores programs and data for the operation of the emotion estimation server 100. The storage unit 120 includes an emotion estimation dictionary DB 121 and an emotion person position DB 122.
In order to estimate (identify) the user's emotions (the emotions may include the type and intensity of facial expressions and gestures) from the sensor data (in the present embodiment, for example, image data including the user 900). Contains data from the trained emotion estimation model (emotion recognition dictionary). The emotion estimation model performs learning processing by associating the sensor data acquired in advance (for example, an image including facial expressions of a large number of persons) with the data of the correct answer information of the emotions of the person to be imaged at the time of acquiring the sensor data. Will be generated. The correct answer information of the emotion may be measured from the person by the questionnaire method or the like in the learning processing phase. In addition, the emotion estimation model is generated by classifying and dividing the data for each individual of the user 900, for each predetermined period, for each behavior type of the user 900, etc., and learning processing each, and a plurality of emotion estimation models according to the conditions are generated. It doesn't matter if it exists.

なお、センサデータから人物の個人感情を推定する方法は公知（たとえば特開２０１２−５９１０７号公報）であるため、本稿ではこれ以上の説明は省略する。前記感情推定モデルは感情推定辞書ＤＢ１２１に記憶される。後述する感情人物位置ＤＢ１２２には、後述するユーザー９００の推定感情情報と前記ユーザー９００のオフィス４００内の位置情報が対応付けて記憶される。なお、記憶部１２０は、記憶装置８０９により実装され得る。 Since a method for estimating a person's personal emotion from sensor data is known (for example, Japanese Patent Application Laid-Open No. 2012-59107), further description thereof will be omitted in this paper. The emotion estimation model is stored in the emotion estimation dictionary DB 121. In the emotional person position DB 122 described later, the estimated emotional information of the user 900 described later and the position information in the office 400 of the user 900 are stored in association with each other. The storage unit 120 may be mounted by the storage device 809.

制御部１３０は、感情推定サーバ１００の様々な機能を提供する。制御部１３０は、顔検出部１３１、感情推定部１３３、感情人物照合部１３５、撮影条件比較部１３７及び総合感情推定部１３９を含む。なお、制御部１３０は、ＣＰＵ８０３、ＲＯＭ８０５及びＲＡＭ８０７により実装され得る。 The control unit 130 provides various functions of the emotion estimation server 100. The control unit 130 includes a face detection unit 131, an emotion estimation unit 133, an emotion person matching unit 135, a shooting condition comparison unit 137, and a comprehensive emotion estimation unit 139. The control unit 130 may be mounted by the CPU 803, the ROM 805, and the RAM 807.

顔検出部１３１は、各カメラ２００の撮像画像から、顔検出技術によりユーザー９００の顔画像の領域を特定し、切り出して（抽出して）記憶部１２０に記憶する。顔画像は、前記撮像画像のユーザー９００の顔が写る領域であり、人物の身体が写る領域の画像（人物領域画像）の一例に相当する。この時、後述する感情人物照合部１３５が、顔画像とそのユーザー９００の位置情報を対応付けて記憶部１２０の感情人物位置ＤＢ１２２に記憶させても構わない。なお、顔検出技術は公知の方法が既存に複数あるため説明を省略する（たとえば、非特許文献４：山下隆義ほか、顔の検出・表情の認識技術、映像情報メディア学会誌、６２（５）、ｐｐ．７０８−７１３、２００８）。 The face detection unit 131 identifies a region of the face image of the user 900 from the captured image of each camera 200 by the face detection technique, cuts out (extracts) the area, and stores it in the storage unit 120. The face image is a region in which the face of the user 900 of the captured image is captured, and corresponds to an example of an image (person region image) in a region in which the body of a person is captured. At this time, the emotional person collation unit 135, which will be described later, may associate the face image with the position information of the user 900 and store it in the emotional person position DB 122 of the storage unit 120. Since there are a plurality of known methods for face detection technology, the description thereof will be omitted (for example, Non-Patent Document 4: Takayoshi Yamashita et al., Face detection / facial expression recognition technology, Journal of the Institute of Image Information and Television Engineers, 62 (5)). , Pp.708-713, 2008).

感情推定部１３３は、ユーザー９００からカメラ２００及び通信部１１０を介して取得した行動の画像データや生体反応の計測データ（センサデータ）に基づいて、ユーザー９００毎の個人感情の推定モデルデータおよびそれにより推定（識別）された推定感情情報を生成する。また、感情推定部１３３は、前記生成した推定モデルデータと推定感情情報を記憶部１２０に記憶させる機能を有する。また、前記推定感情情報の生成処理はカメラ２００で行われてもよく、感情推定サーバ１００はカメラ２００から前記画像データではなく推定感情情報を受信しても構わない。 The emotion estimation unit 133 is based on behavior image data and biological reaction measurement data (sensor data) acquired from the user 900 via the camera 200 and the communication unit 110, and the estimation model data of individual emotions for each user 900 and the estimated model data thereof. Generates estimated emotional information estimated (identified) by. Further, the emotion estimation unit 133 has a function of storing the generated estimated model data and the estimated emotion information in the storage unit 120. Further, the process of generating the estimated emotion information may be performed by the camera 200, and the emotion estimation server 100 may receive the estimated emotion information from the camera 200 instead of the image data.

ここで、個人感情とその推定方法について説明を補足する。個人感情は、一例として「人が心的過程の中で行うさまざまな情報処理のうちで、人、物、出来事、環境についてする評価的な反応」（Ｏｒｔｏｎｙｅｔａｌ．，１９８８；大平，２０１０）と定義される。感情の具体的な種類としては、心理学者ＰａｕｌＥｋｍａｎによる表情に対応する基本感情ベースの離散型モデル上での幸福、驚き、恐れ、怒り、嫌悪、悲しみや、心理学者ＪａｍｅｓＡ．Ｒｕｓｓｅｌｌによる快度及び覚醒度の感情次元ベースの連続型モデルにおける喜怒哀楽の象限などが知られている。他の連続型モデルとしては、Ｗａｔｓｏｎによるポジティブまたはネガティブ感情、Ｗｕｎｄｔによる３軸モデル（快度、興奮度、緊張度）、Ｐｌｕｔｃｈｉｋによる４軸のモデルなどもある。その他、応用的・複合的な感情としては、困惑度、関心度、メンタルストレス、集中度、疲労感、多忙度、創造性、リラックス／緊張度、モチベーション、共感度、信頼度などが挙げられる。さらに、業務活動において集団の雰囲気として体感されるイキイキ感なども高次な感情の一種といえる。本発明における感情の定義の有効範囲は、前述の基本感情よりも広く、ユーザーのあらゆる内部「状態」やユーザーの周囲環境や文脈等の影響も加味した「状況」も含むものである。一例として、ポジティブ感情やその度合いは、快度そのものや、快度と覚醒度を合わせたもの、基本感情における幸福の強度の大きさ、もしくは恐れ、怒り、嫌悪、悲しみ等の強度の小ささ等を指標としてあらわされてもよい。 Here, the explanation of personal feelings and their estimation methods is supplemented. Personal emotions are, for example, "evaluative reactions to people, things, events, and the environment among various information processes that people perform in their mental processes" (Ortony et al., 1988; Ohira, 2010). Is defined as. Specific types of emotions include happiness, surprise, fear, anger, disgust, sadness, and psychologist James A. on a basic emotion-based discrete model that corresponds to facial expressions by psychologist Paul Ekman. The quadrants of emotions and sorrows in the emotional dimension-based continuous model of pleasure and alertness by Russel are known. Other continuous models include positive or negative emotions by Watson, 3-axis models by Wund (pleasure, excitement, tension), and 4-axis models by Plutchik. Other applied / complex emotions include confusion, interest, mental stress, concentration, fatigue, busyness, creativity, relaxation / tension, motivation, empathy, and reliability. Furthermore, the liveliness that is experienced as a group atmosphere in business activities can be said to be a type of higher-level emotion. The effective range of the definition of emotion in the present invention is wider than the above-mentioned basic emotion, and includes all internal "states" of the user and "situations" that take into account the influence of the user's surrounding environment and context. As an example, the positive emotion and its degree are the degree of rapidity itself, the combination of the degree of pleasure and the degree of arousal, the magnitude of the intensity of happiness in the basic emotion, or the magnitude of the intensity of fear, anger, disgust, sadness, etc. May be expressed as an index.

ある人物がどのような感情とどの程度にあるかは、たとえば質問紙法を用いることで、前記人物の文字、文章、記号による言語的報告によって求めることができる。前記質問紙としては“ＡｆｆｅｃｔＧｒｉｄ”や“ＳＡＭｓｃａｌｅ”などがよく知られている。しかしながら、質問紙を用いた計測方法では回答作業が必要になるため、業務など何か別の作業を行っている日常生活においては計測それ自体が本来の目的作業に支障を及ぼしてしまう可能性がある。 What kind of emotion and how much a person has can be determined by linguistic reporting of the person's letters, sentences, and symbols, for example, by using the questionnaire method. Well-known examples of the questionnaire include "Affect Grid" and "SAM scale". However, since the measurement method using the questionnaire requires answering work, the measurement itself may interfere with the original purpose work in daily life where some other work such as work is performed. is there.

そこで、本情報通信システムにおいて、感情推定部１３３は、前述のカメラ２００や情報提示端末３００により計測される行動や生体反応のデータに基づいて（質問紙法等で求めた）感情を機械的に推定処理する。前記推定処理を行うためには、予め学習処理によって生成された感情推定モデルのデータが必要となる。感情推定モデルは、たとえば、ある時点・状況における前記行動や生体反応のデータと前記質問紙の回答データからなる訓練データとを対応づけたデータの群から生成される。たとえば、オフィスに埋め込まれた無数のカメラやマイクロフォン、ウェアラブル活動量計から計測されたユーザーの顔表情、音声、心拍活動、皮膚電気活動等の行動・生体データと、前記ユーザーの主観的感情を質問紙回答した正解データとが対応づけられて訓練データとされる。前記行動・生体データは、センサからの計測値が変換された学習処理用の特徴量データであってもよい。 Therefore, in this information communication system, the emotion estimation unit 133 mechanically obtains emotions (obtained by the questionnaire method or the like) based on the behavioral and biological reaction data measured by the camera 200 and the information presentation terminal 300 described above. Estimate processing. In order to perform the estimation process, the data of the emotion estimation model generated in advance by the learning process is required. The emotion estimation model is generated from, for example, a group of data in which the data of the behavior or biological reaction at a certain time point / situation is associated with the training data consisting of the answer data of the questionnaire. For example, questions about the user's facial expression, voice, heartbeat activity, skin electrical activity, and other behavioral / biological data measured from countless cameras, microphones, and wearable activity meters embedded in the office, and the user's subjective emotions. The correct answer data that was answered on paper is associated with the training data. The behavior / biological data may be feature data for learning processing in which the measured values from the sensors are converted.

特徴量データは、顔の代表的特徴点の位置や各２点間を結ぶ直線の距離や成す角度であってもよい。あるいは、特徴量データは、音声の基本周波数、パワー、平均発話速度、一次ケプストラム係数の最高値と標準偏差であってもよい。あるいは、特徴量データは、心拍数や拍動間隔の平均値や標準偏差、心拍変動性であってもよい。あるいは、特徴量データは、皮膚コンダクタンス水準の平均値や標準偏差や増減低下率などであってもよい。これらの特徴量データはどのように使用されてもよく、ある時点における絶対値として使用されてもよいし、２時点間の相対的な変化率として使用されてもよい。 The feature amount data may be the position of a representative feature point of the face, the distance of a straight line connecting each of the two points, or the angle formed. Alternatively, the feature data may be the fundamental frequency, power, average speech speed, maximum value and standard deviation of the first-order cepstrum coefficient of speech. Alternatively, the feature amount data may be the average value of the heart rate or the beat interval, the standard deviation, or the heart rate variability. Alternatively, the feature amount data may be an average value of the skin conductance level, a standard deviation, an increase / decrease rate, or the like. These feature data may be used in any way, as an absolute value at a certain time point, or as a relative rate of change between two time points.

前記訓練データを用いた感情推定モデルの生成には、学習の手法として、たとえば既知のＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）や深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）法が用いられてもよいし、単純に回帰分析法が利用されてもよい。また、学習モデルはユーザー個人毎に生成されてもよいし、複数のユーザーの訓練データを用いて人間に共通的なモデルが生成されてもよい。感情推定部１３３は、得られた感情推定モデルのデータを用いることで、ある人物の行動・生体データから個人感情を推定できるようになる。 For the generation of the emotion estimation model using the training data, for example, a known SVM (Support Vector Machine) or deep learning method may be used as a learning method, or a regression analysis method is simply used. It may be used. Further, the learning model may be generated for each individual user, or a model common to humans may be generated using training data of a plurality of users. The emotion estimation unit 133 can estimate individual emotions from the behavior / biological data of a certain person by using the obtained emotion estimation model data.

感情推定サーバ１００（たとえば、感情推定部１３３）は、上述の個人感情推定処理のための訓練データや感情の推定モデル自体を生成する機能を有していてもよい。さらに、訓練データのための前述の特徴量データの生成は、感情推定サーバ１００ではなくカメラ２００や情報提示端末３００の方で行い、カメラ２００や情報提示端末３００が、前記特徴量データを感情推定サーバ１００へ送信するようにしてもよい。本実施形態では特に、前述の特許文献１のように、人物（ユーザー９００）の顔画像を入力としその表情（Ｅｋｍａｎの６基本表情等）の識別結果の推定感情情報を出力とするような感情推定方法を主に想定している。 The emotion estimation server 100 (for example, the emotion estimation unit 133) may have a function of generating training data for the above-mentioned personal emotion estimation process or the emotion estimation model itself. Further, the above-mentioned feature amount data for the training data is generated not by the emotion estimation server 100 but by the camera 200 or the information presenting terminal 300, and the camera 200 or the information presenting terminal 300 estimates the feature amount data. It may be sent to the server 100. In this embodiment, in particular, as in Patent Document 1 described above, an emotion that inputs a facial image of a person (user 900) and outputs estimated emotion information of the identification result of the facial expression (6 basic facial expressions of Ekman, etc.). The estimation method is mainly assumed.

感情人物照合部１３５は、複数のカメラ２００から取得されたユーザー９００の複数視点からの顔画像同士を対応付ける処理を行う対応付け処理部として機能する。この時、あるユーザー９００個人を基準とした対応付けを行いたいため、たとえばオフィス４００内に２名のユーザー９００Ａとユーザー９００Ｂがいた場合には、前記ユーザー９００Ａとユーザー９００Ｂそれぞれの顔画像（抽出画像）を互いに対応付けないことが必要である（ユーザー９００Ａとユーザー９００Ｂの個人感情は互いに異なるため）。感情人物照合部１３５は、対応付けられた顔画像（抽出画像）同士の関係性の情報を記憶部１２０に記憶させてもよい。なお、顔画像同士の対応付けには、顔画像同士が直接的に対応付けられる場合だけではなく、複数の顔画像それぞれから得られる推定感情情報同士が直接対応付けられる場合も含められ得る。 The emotional person matching unit 135 functions as a matching processing unit that performs processing for associating face images of users 900 acquired from a plurality of cameras 200 from a plurality of viewpoints. At this time, since it is desired to associate with a certain user 900 individual as a reference, for example, when there are two users 900A and 900B in the office 400, the face images (extracted images) of the user 900A and the user 900B are respectively. ) Are not associated with each other (because the personal feelings of the user 900A and the user 900B are different from each other). The emotional person collation unit 135 may store information on the relationship between the associated face images (extracted images) in the storage unit 120. The association between the face images may include not only the case where the face images are directly associated with each other but also the case where the estimated emotion information obtained from each of the plurality of face images is directly associated with each other.

オフィス４００にユーザーが９００Ａと９００Ｂの２名おり、両名を撮像範囲内に捉えるカメラが２００Ａと２００Ｂの２台あった場合、ユーザー９００Ａの顔画像はカメラ２００Ａと２００Ｂそれぞれで撮られた２視点分ある。このとき、たとえばカメラ２００Ａとカメラ２００Ｂとによって撮像されたユーザー９００Ａの顔画像をそれぞれ、顔画像９００Ａ−２００Ａ、顔画像９００Ａ−２００Ｂとする。同様に、ユーザー９００Ｂの顔画像としても、顔画像９００Ｂ−２００Ａ、顔画像９００Ｂ−２００Ｂの２視点分が得られる。この時、顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂを対応付け、顔画像９００Ｂ−２００Ａと顔画像９００Ｂ−２００Ｂを対応付けるのが正しい処理となる。それ以外の撮像・推定対象人物が異なる組み合わせ、たとえば顔画像９００Ａ−２００Ａと顔画像９００Ｂ−２００Ｂを対応付ける処理は、本実施形態においては誤りであり、これを避ける必要がある。 If there are two users, 900A and 900B, in the office 400 and there are two cameras, 200A and 200B, that capture both of them within the imaging range, the face image of the user 900A is two viewpoints taken by the cameras 200A and 200B, respectively. I have a minute. At this time, for example, the face images of the user 900A captured by the camera 200A and the camera 200B are referred to as face images 900A-200A and face images 900A-200B, respectively. Similarly, as the face image of the user 900B, two viewpoints of the face image 900B-200A and the face image 900B-200B can be obtained. At this time, the correct process is to associate the face image 900A-200A with the face image 900A-200B and associate the face image 900B-200A with the face image 900B-200B. Other combinations of different combinations of people to be imaged / estimated, for example, a process of associating a face image 900A-200A with a face image 900B-200B is an error in the present embodiment, and it is necessary to avoid this.

前記顔画像の正しい対応付けを行うため、感情人物照合部１３５はユーザー９００の位置情報を利用してもよい。すなわち、感情人物照合部１３５は、複数の顔画像それぞれに写るユーザーの空間における位置情報に基づいて、複数の顔画像を対応付けてもよい。たとえば、感情人物照合部１３５は、複数の顔画像それぞれに写るユーザーの位置同士が所定の範囲内に収まる場合に複数の顔画像を対応付けてもよい前述のように、オフィス４００内に設置された各カメラ２００は、撮像範囲内に含まれるユーザー９００の位置を推定する機能を有していてもよい。物理空間内のある３次元位置に複数の人物が重なって存在することはできないため、ある３次元位置に存在するユーザー９００は一意に定まる。感情人物照合部１３５は、ユーザー９００毎に顔画像と位置情報とを対応付けて感情人物位置ＤＢ１２２に記憶させてもよい。なお、前記位置情報は３次元以外、たとえば水平面等上の２次元位置の情報でも構わない。 In order to correctly associate the face images, the emotional person matching unit 135 may use the position information of the user 900. That is, the emotional person collation unit 135 may associate a plurality of face images with each other based on the position information in the user's space captured in each of the plurality of face images. For example, the emotional person matching unit 135 may be associated with a plurality of facial images when the positions of the users appearing in the plurality of facial images are within a predetermined range. As described above, the emotional person collation unit 135 is installed in the office 400. Each camera 200 may have a function of estimating the position of the user 900 included in the imaging range. Since a plurality of people cannot exist in a certain three-dimensional position in the physical space in an overlapping manner, the user 900 existing in a certain three-dimensional position is uniquely determined. The emotional person collation unit 135 may store the facial image and the position information for each user 900 in the emotional person position DB 122 in association with each other. The position information may be information on a two-dimensional position on a horizontal plane or the like other than the three-dimensional information.

たとえば、前述の顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂが位置情報Ａ（例：Ｘ＝０、Ｙ＝０、Ｚ＝０）に対応付けられており、顔画像９００Ｂ−２００Ａと顔画像９００Ｂ−２００Ｂが位置情報Ｂ（例：Ｘ＝２０００、Ｙ＝３０００、Ｚ＝０）に対応付けられていれば、感情人物照合部１３５は、それぞれ等しい位置情報に対応付けられた顔画像同士の対応付けを行ってもよい。すなわち、感情人物照合部１３５は、同一の位置情報Ａに対応付けられている顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂとを対応付け、同一の位置情報Ｂに対応付けられている顔画像９００Ｂ−２００Ａと顔画像９００Ｂ−２００Ｂとを対応付けてもよい。 For example, the above-mentioned face image 900A-200A and face image 900A-200B are associated with position information A (example: X = 0, Y = 0, Z = 0), and the face image 900B-200A and face image 900B are associated with each other. If −200B is associated with position information B (eg, X = 2000, Y = 3000, Z = 0), the emotional person matching unit 135 corresponds to the face images associated with the same position information. You may attach it. That is, the emotional person collation unit 135 associates the face image 900A-200A and the face image 900A-200B associated with the same position information A, and the face image 900B associated with the same position information B. -200A and the face image 900B-200B may be associated with each other.

なお、本実施形態では位置情報に基づく複数視点の画像や推定感情情報の対応付けを主に想定し説明したが、その他、感情人物照合部１３５は、公知の複数カメラ間人物対応付け技術（ＰｅｒｓｏｎＲｅ−ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）を用い、たとえば各人物の属性情報（人物の年齢、性別、服装など）や見た目のアピアランス情報（肌の色、服の色などといった人物の外観に関する情報）を利用した対応付け処理を行っても構わない。また、感情人物照合部１３５は、公知の顔認識技術を用い、個人同定情報を利用した対応付け処理を行っても構わない。 In the present embodiment, the correspondence between the image of a plurality of viewpoints based on the position information and the estimated emotion information has been mainly assumed and described, but in addition, the emotion person matching unit 135 is a known multi-camera person mapping technique (Person). Re-identification), for example, matching processing using attribute information of each person (person's age, gender, clothes, etc.) and appearance information (information about the appearance of the person such as skin color, clothes color, etc.) You may do. Further, the emotional person collation unit 135 may use a known face recognition technique to perform associative processing using personal identification information.

ここで、図５を参照して、前述した感情人物位置ＤＢ１２２について説明する。図５は、後述する感情人物照合部１３５によって紐づけ処理され記憶部１２０に記憶される感情人物位置ＤＢ１２２のデータテーブルの一例を説明するための説明図である。図５のデータテーブルには、データＩＤ、撮像カメラＩＤ、（ユーザー９００の）人物位置、（ユーザー９００の）人物ＩＤ、（ユーザー９００の）推定感情情報、その他の情報（たとえば、タイムスタンプ、評価値など）のデータが記憶されている。 Here, the above-mentioned emotional person position DB 122 will be described with reference to FIG. FIG. 5 is an explanatory diagram for explaining an example of a data table of the emotional person position DB 122 which is associated with the emotional person matching unit 135 and stored in the storage unit 120, which will be described later. The data table of FIG. 5 includes a data ID, an imaging camera ID, a person position (for user 900), a person ID (for user 900), estimated emotional information (for user 900), and other information (eg, time stamp, evaluation). Data such as value) is stored.

データＩＤは、各データを一意に識別するための識別情報である。撮像カメラＩＤは、本実施形態に係る情報通信システムに含まれる複数のカメラ２００の各機体を一意に識別するための識別情報であり、どの撮影条件のカメラ２００から取得したセンサデータであるかの情報を得るために利用され得る。人物位置は、前記撮像カメラＩＤのカメラ２００から撮像されたユーザー９００のオフィス４００内の前記物理空間内のある３次元位置の情報を含む。人物ＩＤは、前記ユーザー９００を一意に識別するための識別情報を含み、特にオフィス４００内に複数のユーザー９００が存在した場合に必要な情報である。推定感情情報は、前述の感情推定部１３３により推定された前記ユーザー９００の推定感情情報である。 The data ID is identification information for uniquely identifying each data. The imaging camera ID is identification information for uniquely identifying each of the plurality of cameras 200 included in the information communication system according to the present embodiment, and which imaging condition is the sensor data acquired from the camera 200. Can be used to obtain information. The person position includes information on a certain three-dimensional position in the physical space in the office 400 of the user 900 imaged from the camera 200 of the image pickup camera ID. The person ID includes identification information for uniquely identifying the user 900, and is necessary information particularly when a plurality of users 900 exist in the office 400. The estimated emotion information is the estimated emotion information of the user 900 estimated by the emotion estimation unit 133.

その他の情報は、たとえば、後述する撮影条件比較部１３７による撮影条件の評価値や、前記カメラ２００から取得したセンサデータの取得時刻（撮影時刻）を示すタイムスタンプデータを含む。上記では、ユーザー９００の位置情報を利用して顔画像同士の対応付けを行う例について説明したが、感情人物照合部１３５は、ユーザー９００の位置情報に加えて、あるいは、ユーザー９００の位置情報の代わりにタイムスタンプデータを利用してもよい。これによって、同一の人物が写る顔画像同士が正しく対応付けられる可能性が高まる。すなわち、感情人物照合部１３５は、複数の顔画像それぞれのタイムスタンプデータに基づいて、複数の顔画像を対応付けてもよい。たとえば、感情人物照合部１３５は、複数の顔画像それぞれのタイムスタンプデータ同士が所定の範囲内に収まる場合に複数の顔画像を対応付けてもよい。 Other information includes, for example, an evaluation value of shooting conditions by the shooting condition comparison unit 137, which will be described later, and time stamp data indicating the acquisition time (shooting time) of the sensor data acquired from the camera 200. In the above, an example of associating face images with each other by using the position information of the user 900 has been described, but the emotional person matching unit 135 may add the position information of the user 900 or the position information of the user 900. Time stamp data may be used instead. This increases the possibility that face images showing the same person are correctly associated with each other. That is, the emotional person collation unit 135 may associate a plurality of face images based on the time stamp data of each of the plurality of face images. For example, the emotional person collation unit 135 may associate a plurality of face images when the time stamp data of each of the plurality of face images falls within a predetermined range.

なお、タイムスタンプデータは、それぞれの顔画像を撮影するカメラ２００によって付与されてよいが、複数のカメラ間において同じタイミングに付与されるタイムスタンプデータにずれが生じないよう、複数のカメラ間で同期をとる仕組みが設けられるのが望ましい。たとえば、複数のカメラそれぞれと通信可能なタイム管理サーバが存在する場合、タイム管理サーバによって複数のカメラそれぞれに対して同一時刻が通知されることによって同期がとられてもよい。タイム管理サーバの機能は、感情推定サーバ１００が有してもよいし、感情推定サーバ１００とは別のサーバが有してもよい。 The time stamp data may be given by the camera 200 that captures each face image, but the time stamp data given at the same timing among the plurality of cameras is synchronized between the plurality of cameras so as not to cause a deviation. It is desirable to have a mechanism to take a picture. For example, when there is a time management server capable of communicating with each of a plurality of cameras, synchronization may be achieved by notifying each of the plurality of cameras of the same time by the time management server. The function of the time management server may be possessed by the emotion estimation server 100, or may be possessed by a server other than the emotion estimation server 100.

図５では、たとえば、データＩＤが「０００１」のデータと、データＩＤが「０００２」のデータとは、撮像カメラＩＤが「Ｃ０１」のカメラ２００から同一時刻Ｔ１（同一タイムスタンプデータ）に生成されたデータで、しかし異なる２名のユーザー９００についてのデータであってもよい。 In FIG. 5, for example, the data having the data ID “0001” and the data having the data ID “0002” are generated from the camera 200 having the imaging camera ID “C01” at the same time T1 (same time stamp data). Data, but may be data for two different users 900.

撮影条件比較部１３７は、感情人物照合部１３５によって対応付けられたユーザー９００の複数視点からの複数の顔画像に対して、それぞれの撮影条件の評価値を算出する評価値算出部として機能する。前記撮影条件の評価値としては、感情推定処理における外乱要因、たとえば人物の撮像方向や姿勢による見えの変化、照明変動、オクルージョン等の影響が小さく、それらによる推定精度の低下が小さい条件ほど高い（好ましい）値が付けられるものとする。 The shooting condition comparison unit 137 functions as an evaluation value calculation unit that calculates an evaluation value of each shooting condition for a plurality of facial images from a plurality of viewpoints of the user 900 associated with the emotional person matching unit 135. As the evaluation value of the shooting conditions, the influence of disturbance factors in the emotion estimation process, such as changes in appearance due to the imaging direction and posture of a person, lighting fluctuations, occlusion, etc., is small, and the smaller the decrease in estimation accuracy due to them, the higher the evaluation value ( (Preferable) shall be priced.

前記外乱要因と評価値設定の例として、顔方向の要因では、通常正面顔に近い撮影条件ほど顔の正規化処理と歪みの影響が少なくて済み、高い精度での表情推定処理が実現できる。したがって、撮影条件比較部１３７は、複数の顔画像それぞれに対応して、顔画像に写るユーザー９００と顔画像を撮像するカメラ２００との角度に基づいて、顔画像の撮影条件の評価値を設定してもよい。より具体的に、撮影条件比較部１３７は、カメラ２００の撮影光軸と対象のユーザー９００の顔の真正面の軸の成す角度が小さいほど、撮影条件に対して高い評価値を付けてよい。 As an example of the disturbance factor and the evaluation value setting, as for the face direction factor, the more the shooting condition is closer to the front face, the less the influence of the face normalization process and the distortion is required, and the facial expression estimation process can be realized with high accuracy. Therefore, the shooting condition comparison unit 137 sets the evaluation value of the shooting condition of the face image based on the angle between the user 900 captured in the face image and the camera 200 that captures the face image corresponding to each of the plurality of face images. You may. More specifically, the shooting condition comparison unit 137 may give a higher evaluation value to the shooting condition as the angle formed by the shooting optical axis of the camera 200 and the axis directly in front of the face of the target user 900 is smaller.

また、照明変動の要因では、顔の正面に対して一様に照明があたり顔領域内の照明による陰影差が小さいほど高い精度での表情推定処理が実現できる。したがって、撮影条件比較部１３７は、複数の顔画像それぞれに対応して、人物に対する光照射度合いに基づいて、顔画像の撮影条件の評価値を設定してもよい。より具体的に、撮影条件比較部１３７は、顔画像の解析によって得られた明度分布から顔領域内の陰影差を算出し、陰影差が小さいほど、撮影条件に対して高い評価値を付けてよい。 Further, as a factor of illumination fluctuation, the facial expression estimation process can be realized with higher accuracy as the illumination is uniformly applied to the front surface of the face and the shadow difference due to the illumination in the face region is smaller. Therefore, the shooting condition comparison unit 137 may set the evaluation value of the shooting conditions of the face image based on the degree of light irradiation to the person corresponding to each of the plurality of face images. More specifically, the shooting condition comparison unit 137 calculates the shading difference in the face region from the brightness distribution obtained by analyzing the face image, and the smaller the shading difference, the higher the evaluation value for the shooting condition. Good.

さらに、オクルージョンの要因では、顔画像上の遮蔽される領域面積（遮蔽面積）が小さいほど高い精度での表情推定処理が実現できる。したがって、撮影条件比較部１３７は、複数の顔画像それぞれに対応して、人物の遮蔽度合いに基づいて、顔画像の撮影条件の評価値を設定してもよい。より具体的に、撮影条件比較部１３７は、顔画像の解析によって得られた遮蔽面積が小さいほど、撮影条件に対して高い評価値を付けてよい。なお、遮蔽面積は、顔画像から抽出されたもののその抽出処理の尤度が所定値よりも低かった顔の特徴点、または、顔画像から抽出されなかった顔の特徴点に関する情報（たとえば、特徴点の数、特徴点の位置、特徴点の分布など）に基づいて算出されてよい。 Further, as an occlusion factor, the smaller the shielded area (shielded area) on the face image, the more accurate the facial expression estimation process can be realized. Therefore, the shooting condition comparison unit 137 may set the evaluation value of the shooting condition of the face image based on the degree of shielding of the person corresponding to each of the plurality of face images. More specifically, the photographing condition comparison unit 137 may give a higher evaluation value to the photographing condition as the shielding area obtained by the analysis of the face image is smaller. The shielded area is information about facial feature points extracted from the face image but the likelihood of the extraction process is lower than a predetermined value, or facial feature points not extracted from the face image (for example, features). It may be calculated based on the number of points, the position of the feature points, the distribution of the feature points, etc.).

その他、画像情報量の要因では、カメラ２００とユーザー９００の距離がより近いことにより、顔画像（顔領域）の画素数が多いほど高い精度での表情推定処理が実現できる。したがって、撮影条件比較部１３７は、複数の顔画像それぞれに対応して、顔画像に写るユーザー９００と顔画像を撮像するカメラ２００との距離に基づいて、顔画像の撮影条件の評価値を設定してもよい。より具体的に、撮影条件比較部１３７は、カメラ２００と対象のユーザー９００との距離が小さいほど、撮影条件に対して高い評価値を付けてよい。 In addition, as a factor of the amount of image information, since the distance between the camera 200 and the user 900 is closer, the facial expression estimation process with higher accuracy can be realized as the number of pixels of the face image (face area) increases. Therefore, the shooting condition comparison unit 137 sets the evaluation value of the shooting condition of the face image based on the distance between the user 900 captured in the face image and the camera 200 that captures the face image corresponding to each of the plurality of face images. You may. More specifically, the shooting condition comparison unit 137 may give a higher evaluation value to the shooting conditions as the distance between the camera 200 and the target user 900 is smaller.

また、画像情報量の要因では、カメラ２００の撮像画素数が多いことにより、顔画像（顔領域）の画素数が多いほど高い精度での表情推定処理が実現できる。したがって、撮影条件比較部１３７は、複数の顔画像それぞれに対応して、顔画像の解像度に基づいて、顔画像の撮影条件の評価値を設定してもよい。より具体的に、撮影条件比較部１３７は、顔画像の解像度が高いほど、撮影条件に対して高い評価値を付けてよい。 Further, as a factor of the amount of image information, since the number of images captured by the camera 200 is large, the facial expression estimation process can be realized with higher accuracy as the number of pixels of the face image (face region) is larger. Therefore, the shooting condition comparison unit 137 may set the evaluation value of the shooting condition of the face image based on the resolution of the face image corresponding to each of the plurality of face images. More specifically, the shooting condition comparison unit 137 may give a higher evaluation value to the shooting condition as the resolution of the face image is higher.

画像品質の要因では、画像データの（非可逆の）圧縮率が低く画像品質が高いほど画像ノイズが少なく高い精度での表情推定処理が実現できる。したがって、撮影条件比較部１３７は、複数の顔画像それぞれに対応して、顔画像の画像品質（たとえば圧縮処理に伴い発生する画像ノイズの強さ）に基づいて、顔画像の撮影条件の評価値を設定してもよい。より具体的に、撮影条件比較部１３７は、顔画像の画像品質が高いほど、撮影条件に対して高い評価値を付けてよい。 As a factor of image quality, the lower the (lossy) compression rate of the image data and the higher the image quality, the less the image noise and the more accurate the facial expression estimation process can be realized. Therefore, the shooting condition comparison unit 137 corresponds to each of the plurality of face images and evaluates the shooting conditions of the face image based on the image quality of the face image (for example, the strength of the image noise generated by the compression process). May be set. More specifically, the shooting condition comparison unit 137 may give a higher evaluation value to the shooting condition as the image quality of the face image is higher.

前記の撮影条件の評価値に関する記載は例であり、それ以外にも表情推定処理の精度が高くなるように外乱要因の影響を大きく抑えられる撮影条件ほど高い評価値が付けられてよい（表情推定処理における外乱要因についての解説は、たとえば、非特許文献５：Wang, M. & Deng, W., Deep face recognition: A survey,
https://arxiv.org/abs/1804.06655）。 The above description regarding the evaluation value of the shooting condition is an example, and in addition to this, a higher evaluation value may be given to the shooting condition in which the influence of the disturbance factor can be greatly suppressed so that the accuracy of the facial expression estimation process becomes higher (facial expression estimation). For a description of disturbance factors in processing, for example, Non-Patent Document 5: Wang, M. & Deng, W., Deep face recognition: A survey,
https://arxiv.org/abs/1804.06655).

また、撮影条件比較部１３７は、前記外乱要因の影響の小ささではなく、より直接的に、感情推定部１３３がユーザー９００の顔画像から感情推定処理する際に求められる「尤度」の高さに応じて前記撮影条件の評価値を定めてもよい。すなわち、撮影条件比較部１３７は、複数の推定感情情報それぞれの推定における尤度に基づいて評価値を設定してもよい。より具体的に、撮影条件比較部１３７は、推定感情情報の推定における尤度が高いほど、撮影条件に対して高い評価値をつけてもよい。通常、外乱要因の影響が大きいほど尤度も小さくなる。なお、尤度とは、たとえば、算出された推定感情情報の尤もらしさを表す情報であり、または、算出された推定感情情報の蓋然性を０〜１の間で数値化した確率であってもよい。 Further, the shooting condition comparison unit 137 does not mean that the influence of the disturbance factor is small, but more directly, the emotion estimation unit 133 has a high "likelihood" required when performing emotion estimation processing from the face image of the user 900. Depending on the situation, the evaluation value of the shooting conditions may be determined. That is, the photographing condition comparison unit 137 may set the evaluation value based on the likelihood in the estimation of each of the plurality of estimated emotion information. More specifically, the photographing condition comparison unit 137 may give a higher evaluation value to the photographing condition as the likelihood in estimating the estimated emotion information is higher. Generally, the greater the influence of disturbance factors, the smaller the likelihood. The likelihood may be, for example, information indicating the likelihood of the calculated estimated emotion information, or a probability of quantifying the probability of the calculated estimated emotion information between 0 and 1. ..

図１の例で説明すると、ユーザー９００に対し、顔の真正面に近い位置の撮影条件のカメラ２００Ａと比較して顔の斜め方向から撮影しているカメラ２００Ｂの顔画像の方が撮影条件の評価値は低くなる。また、カメラ２００Ｃはカメラ２００Ｂよりも、ユーザー９００の顔の真正面から離れた角度から撮影している上にユーザー９００の位置から遠く（距離が大きく）、加えて障害物５００がユーザー９００との間に存在しオクルージョンが発生している。そのため、カメラ２００Ｂよりカメラ２００Ｃの顔画像の方が撮影条件の評価値は低くなる。全体では、撮影条件の評価値の高さは、カメラ２００Ａ＞カメラ２００Ｂ＞カメラ２００Ｃ、の顔画像の順になるであろう。同様に、照明６００も評価値に影響を与える（陰影差が出ないように一様に照明が当たる方が、評価値が高い）。 Explaining with the example of FIG. 1, for the user 900, the face image of the camera 200B taken from the oblique direction of the face is evaluated under the shooting condition as compared with the camera 200A under the shooting condition at a position close to the front of the face. The value will be lower. In addition, the camera 200C shoots from an angle farther from the front of the user 900's face than the camera 200B, and is far from the user 900's position (larger distance), and the obstacle 500 is between the user 900 and the user 900. It exists in and an occlusion is occurring. Therefore, the evaluation value of the shooting conditions of the face image of the camera 200C is lower than that of the camera 200B. As a whole, the height of the evaluation value of the shooting conditions will be in the order of the face image of camera 200A> camera 200B> camera 200C. Similarly, the illumination 600 also affects the evaluation value (the evaluation value is higher when the illumination is uniformly applied so that there is no difference in shading).

総合感情推定部１３９は、感情人物照合部１３５によって対応付けられたあるユーザー９００の複数視点からの複数の顔画像それぞれから感情推定部１３３によって推定された推定感情情報と、撮影条件比較部１３７によって算出された評価値とに基づいて、あるユーザー９００の総合的な推定感情情報を算出する。 The comprehensive emotion estimation unit 139 includes estimated emotion information estimated by the emotion estimation unit 133 from each of a plurality of facial images from a plurality of viewpoints of a certain user 900 associated with the emotion person matching unit 135, and a shooting condition comparison unit 137. Based on the calculated evaluation value, the comprehensive estimated emotional information of a certain user 900 is calculated.

快−不快感情に関する感情推定を例として説明する。オフィス４００のカメラ２００Ａ、２００Ｂ、２００Ｃの撮像画像から、顔検出部１３１によって、ユーザー９００の顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂと顔画像９００Ａ−２００Ｃが抽出されると、感情人物照合部１３５によってこれらの顔画像が対応付けられる。撮影条件比較部１３７により、顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂと顔画像９００Ａ−２００Ｃの撮影条件に対して、３倍、２倍、１倍高い（好ましい）評価値が付けられたとする。また、感情推定部１３３によって、顔画像９００Ａ−２００Ａに対応する推定感情情報が快、９００Ａ−２００Ｂに対応する推定感情情報が不快、９００Ａ−２００Ｃに対応する推定感情情報が快（説明簡略化のため本例では感情の強度は考えない）であると推定されたとする。 An emotion estimation related to pleasant-unpleasant feelings will be described as an example. When the face image 900A-200A, face image 900A-200B, and face image 900A-200C of the user 900 are extracted by the face detection unit 131 from the captured images of the cameras 200A, 200B, and 200C of the office 400, the emotional person matching unit These facial images are associated with 135. It is assumed that the shooting condition comparison unit 137 gives a (preferably) evaluation value that is 3 times, 2 times, or 1 times higher than the shooting conditions of the face image 900A-200A, the face image 900A-200B, and the face image 900A-200C. .. Further, the emotion estimation unit 133 makes the estimated emotion information corresponding to the face images 900A-200A pleasant, the estimated emotion information corresponding to 900A-200B unpleasant, and the estimated emotion information corresponding to 900A-200C pleasant (simplification of explanation). Therefore, in this example, it is presumed that the emotional intensity is not considered).

かかる場合、
推定感情情報が「快」であるのは、
９００Ａ−２００Ａ：評価値「３倍」、９００Ａ−２００Ｃ：評価値「１倍」の場合であるため、推定感情情報「快」の合計評価値は、３倍＋１倍＝４倍と算出される。
推定感情情報が「不快」であるのは、
９００Ａ−２００Ｂ：評価値「２倍」の場合であるため、推定感情情報「不快」の合計評価値は、２倍である。 In such a case
The reason why the estimated emotional information is "pleasant" is
900A-200A: Evaluation value "3 times", 900A-200C: Evaluation value "1 times", so the total evaluation value of the estimated emotion information "pleasant" is calculated as 3 times + 1 times = 4 times. ..
Estimated emotional information is "unpleasant"
900A-200B: Since the evaluation value is "double", the total evaluation value of the estimated emotion information "discomfort" is double.

したがって、「快」の合計感情値：「不快」の合計評価値＝４：２であるため、「快」の合計評価値の方が高いため、総合感情推定部１３９は、ユーザー９００の総合的な推定感情情報を「快」と算出する。このように、総合感情推定部１３９は、推定感情情報の値ごとに、顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂと顔画像９００Ａ−２００Ｃそれぞれの撮影条件の評価値を合計して、合計評価値が最大となる推定感情情報を代表値として選択し、ユーザー９００の総合的な推定感情情報としてもよい。 Therefore, since the total emotion value of "pleasant": the total evaluation value of "discomfort" = 4: 2, the total evaluation value of "pleasant" is higher. Estimated emotional information is calculated as "pleasant". In this way, the comprehensive emotion estimation unit 139 sums the evaluation values of the shooting conditions of the face image 900A-200A, the face image 900A-200B, and the face image 900A-200C for each value of the estimated emotion information, and makes a total evaluation. The estimated emotional information having the maximum value may be selected as the representative value and used as the comprehensive estimated emotional information of the user 900.

なお、ここでは、合計評価値が最大となる推定感情情報を総合的なユーザー９００の総合的な推定感情情報とする場合を主に想定した。しかし、感情の強度を考える場合には、総合感情推定部１３９は、各推定感情情報の値に対して対応する評価値を乗じた値を、複数の推定感情情報について合計し、合計して得られた値を総合的なユーザー９００の総合的な推定感情情報として算出してもよい。かかる算出方法を上記の例に適用すると、ユーザー９００の総合的な推定感情情報は、下記の式（１）のように表現される。 Here, it is mainly assumed that the estimated emotional information having the maximum total evaluation value is used as the comprehensive estimated emotional information of the comprehensive user 900. However, when considering the intensity of emotions, the comprehensive emotion estimation unit 139 obtains a value obtained by multiplying the value of each estimated emotion information by the corresponding evaluation value by summing the values of the plurality of estimated emotion information. The obtained value may be calculated as the comprehensive estimated emotional information of the comprehensive user 900. When such a calculation method is applied to the above example, the comprehensive estimated emotion information of the user 900 is expressed by the following equation (1).

総合的な推定感情情報＝３×（９００Ａ−２００Ａの推定感情情報）＋２×（９００Ａ−２００Ｂの推定感情情報）＋１×（９００Ａ−２００Ｃの推定感情情報）・・・（１） Comprehensive estimated emotion information = 3 × (estimated emotion information of 900A-200A) + 2 × (estimated emotion information of 900A-200B) + 1 × (estimated emotion information of 900A-200C) ・・・ (1)

さらに、撮影条件比較部１３７によって算出された評価値は、総合感情推定部１３９によってそのまま使われなくてもよい。たとえば、総合感情推定部１３９は、顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂと顔画像９００Ａ−２００Ｃそれぞれの撮影条件の評価値に基づいて、９００Ａ−２００Ａの推定感情情報と９００Ａ−２００Ｂの推定感情情報と９００Ａ−２００Ｃの推定感情情報それぞれの重みを算出してもよい。そして、総合感情推定部１３９は、これらの重みとこれらの推定感情情報とに基づいて、ユーザー９００の総合的な推定感情情報を算出してもよい。 Further, the evaluation value calculated by the photographing condition comparison unit 137 does not have to be used as it is by the comprehensive emotion estimation unit 139. For example, the comprehensive emotion estimation unit 139 estimates the estimated emotion information of 900A-200A and the estimation of 900A-200B based on the evaluation values of the shooting conditions of the face image 900A-200A, the face image 900A-200B, and the face image 900A-200C. The weights of the emotional information and the estimated emotional information of 900A-200C may be calculated. Then, the comprehensive emotion estimation unit 139 may calculate the comprehensive estimated emotion information of the user 900 based on these weights and these estimated emotion information.

一例として、総合感情推定部１３９は、評価値間の正規化に基づいて重みを算出してもよい。より具体的に、総合感情推定部１３９は、撮影条件比較部１３７によって算出された評価値を、顔画像９００Ａ−２００Ａと顔画像９００Ａ−２００Ｂと顔画像９００Ａ−２００Ｃとについての合計が１になるように調整することによって、重みを算出してもよい。そして、総合感情推定部１３９は、算出した重みを対応する推定感情情報の値に乗じてもよい。たとえば、式（１）に示された評価値である３倍、２倍、１倍は、３／６、２／６、１／６となり、式（１）は、下記の（２）のように置き換えられる。 As an example, the comprehensive emotion estimation unit 139 may calculate the weight based on the normalization between the evaluation values. More specifically, the comprehensive emotion estimation unit 139 sets the total of the evaluation values calculated by the shooting condition comparison unit 137 for the face image 900A-200A, the face image 900A-200B, and the face image 900A-200C to 1. The weight may be calculated by adjusting the above. Then, the comprehensive emotion estimation unit 139 may multiply the calculated weight by the value of the corresponding estimated emotion information. For example, the evaluation values of 3 times, 2 times, and 1 times, which are the evaluation values shown in the formula (1), are 3/6, 2/6, and 1/6, and the formula (1) is as shown in (2) below. Is replaced by.

総合的な推定感情情報＝（３／６）×（９００Ａ−２００Ａの推定感情情報）＋（２／６）×（９００Ａ−２００Ｂの推定感情情報）＋（１／６）×（９００Ａ−２００Ｃの推定感情情報）・・・（２） Comprehensive estimated emotion information = (3/6) × (estimated emotion information of 900A-200A) + (2/6) × (estimated emotion information of 900A-200B) + (1/6) × (900A-200C) Estimated emotion information) ・・・ (2)

また、前記した例では、１種類の撮影条件を主に考慮してユーザー９００の総合的な推定感情情報を算出する場合を説明した。しかし、複数種類の撮影条件を考慮してユーザー９００の総合的な推定感情情報を算出することも可能である。かかる場合であっても、同一の種類の撮影条件についての総合的な推定感情情報は、前記した例と同様に算出されればよい。異なる種類の撮影条件同士は、等価として扱われて重みが算出されてもよいし、撮影条件に優先順位が設けられていてもよく、優先順位に基づいて重みが算出されてもよい。 Further, in the above-mentioned example, the case where the comprehensive estimated emotion information of the user 900 is calculated mainly considering one type of shooting condition has been described. However, it is also possible to calculate the comprehensive estimated emotional information of the user 900 in consideration of a plurality of types of shooting conditions. Even in such a case, the comprehensive estimated emotional information for the same type of shooting conditions may be calculated in the same manner as in the above example. The different types of shooting conditions may be treated as equivalent and the weight may be calculated, the shooting conditions may be provided with a priority, or the weight may be calculated based on the priority.

すなわち、総合感情推定部１３９は、撮影条件の優先順位および評価値に基づいて、重みを算出する。たとえば、総合感情推定部１３９は、異なる撮影条件の評価値同士が同じ値であっても、優先順位がより高い撮影条件の評価値の重みを大きく算出すればよい。たとえば、撮影条件の優先順位は、あらかじめ手動によって設定されてもよいし、たとえば重回帰分析における標準回帰係数や寄与率等の情報、ニューラルネットワークの学習によって得られた重み（Ｗｅｉｇｈｔ）の分布や特徴量選択等の情報に基づいて、システムによって自動的に設定されてもよい。 That is, the comprehensive emotion estimation unit 139 calculates the weight based on the priority order of the shooting conditions and the evaluation value. For example, the comprehensive emotion estimation unit 139 may calculate the weight of the evaluation value of the shooting condition having a higher priority even if the evaluation values of the different shooting conditions are the same value. For example, the priority of shooting conditions may be set manually in advance, for example, information such as the standard regression coefficient and contribution rate in multiple regression analysis, and the distribution and features of weights obtained by learning a neural network. It may be set automatically by the system based on information such as quantity selection.

さらに、総合感情推定部１３９は、複数のカメラ２００からの入力データに対応する複数のパターン認識やニューラルネットワークからのアンサンブル学習や特徴量のｃｏｎｃａｔ処理を行うことにより、前記総合的な推定感情情報を算出してもよい。 Further, the comprehensive emotion estimation unit 139 obtains the comprehensive estimated emotion information by performing a plurality of pattern recognition corresponding to input data from the plurality of cameras 200, ensemble learning from the neural network, and concat processing of the feature amount. It may be calculated.

次に、本実施形態に係る「情報提示端末３００」の機能構成の一例を説明する。情報提示端末３００は、ユーザー９１０（図１）からの入力に応じて他のユーザー９００へ通信要求処理を行ったり、該他のユーザー９００の総合的な推定感情情報を取得して該ユーザー９１０へ情報提示したりすることができる。 Next, an example of the functional configuration of the "information presentation terminal 300" according to the present embodiment will be described. The information presenting terminal 300 processes a communication request to another user 900 in response to an input from user 910 (FIG. 1), or acquires comprehensive estimated emotion information of the other user 900 and sends it to the user 910. Information can be presented.

一例として、情報提示端末３００は汎用的なスマートフォンやタブレット端末であってもよい。また、図１では情報提示端末３００は１名のユーザー９１０に対応して１台存在するように図示されているが、複数のユーザー９１０に共用される共有型端末であってもよい。さらに別の一例として、情報提示端末３００は、映像通信機能付の現金自動預け払い機ＶＴＭ（Video Teller Machine）、駅自動券売機、ビジュアルコールセンターシステム等の表示部付の筐体装置などから送信される計測データに基づく顧客の感情推定情報をサポート担当者に提示する端末であってもよい。 As an example, the information presentation terminal 300 may be a general-purpose smartphone or tablet terminal. Further, although the information presenting terminal 300 is shown in FIG. 1 so as to correspond to one user 910, it may be a shared terminal shared by a plurality of users 910. As yet another example, the information presentation terminal 300 is transmitted from an automated teller machine (VTM) (VTM) with a video communication function, an automatic teller machine at a station, a housing device with a display unit such as a visual call center system, or the like. It may be a terminal that presents the customer's emotion estimation information based on the measurement data to the support staff.

図６は、本実施形態に係る情報提示端末３００の機能構成の一例を示すブロック図である。図６を参照すると、情報提示端末３００は、通信部３１０、記憶部３２０、制御部３３０、入力部３４０及び提示部３５０を備える。 FIG. 6 is a block diagram showing an example of the functional configuration of the information presentation terminal 300 according to the present embodiment. Referring to FIG. 6, the information presenting terminal 300 includes a communication unit 310, a storage unit 320, a control unit 330, an input unit 340, and a presentation unit 350.

通信部３１０は、他の装置と通信する。たとえば、通信部３１０は、ＬＡＮ５０に直接的に接続され、感情推定サーバ１００と通信する。なお、通信部３１０は、通信インタフェース８１１により実装され得る。 The communication unit 310 communicates with another device. For example, the communication unit 310 is directly connected to the LAN 50 and communicates with the emotion estimation server 100. The communication unit 310 may be implemented by the communication interface 811.

記憶部３２０は、情報提示端末３００の動作のためのプログラム及びデータを記憶する。なお、記憶部３２０は、記憶装置８０９により実装され得る。 The storage unit 320 stores programs and data for the operation of the information presentation terminal 300. The storage unit 320 may be mounted by the storage device 809.

制御部３３０は、情報提示端末３００の様々な機能を提供する。なお、制御部３３０は、ＣＰＵ８０３、ＲＯＭ８０５及びＲＡＭ８０７により実装され得る。 The control unit 330 provides various functions of the information presentation terminal 300. The control unit 330 may be mounted by the CPU 803, the ROM 805, and the RAM 807.

入力部３４０は、ユーザー９１０からの入力を受け付ける。そして、入力部３４０は、入力結果を制御部３３０へ提供する。前記ユーザー９１０からの入力とは、たとえば、他のユーザー９００を通信要求相手として指定するもので、該他のユーザー９００の識別情報を選択すること等によって実現される。なお、入力部３４０は、入力装置８１５により実装され得る。 The input unit 340 accepts input from the user 910. Then, the input unit 340 provides the input result to the control unit 330. The input from the user 910 is, for example, to specify another user 900 as a communication request partner, and is realized by selecting the identification information of the other user 900 or the like. The input unit 340 can be mounted by the input device 815.

提示部３５０は、制御部３３０による制御に従って、ユーザーによって知覚され得る情報の提示を行う。本発明の実施形態においては、提示部３５０がユーザーによって視覚的に知覚される表示画面を表示する場合を主に想定する。かかる場合、提示部３５０は、表示装置８２３により実現され得る。しかし、提示部３５０がユーザーの聴覚によって知覚される情報を提示する場合、提示部３５０は、スピーカにより実現されてもよい。あるいは、提示部３５０がユーザーの触覚や嗅覚によって知覚される情報を提示する場合、提示部３５０は、触覚または嗅覚提示装置により実現されてもよい。 The presentation unit 350 presents information that can be perceived by the user under the control of the control unit 330. In the embodiment of the present invention, it is mainly assumed that the presentation unit 350 displays a display screen visually perceived by the user. In such a case, the presentation unit 350 can be realized by the display device 823. However, when the presenting unit 350 presents information perceived by the user's hearing, the presenting unit 350 may be realized by a speaker. Alternatively, when the presenting unit 350 presents information perceived by the user's tactile or olfactory sense, the presenting unit 350 may be realized by a tactile or olfactory presenting device.

たとえば、提示部３５０は、ユーザー９１０が入力部３４０から指定した他のユーザー９００に対応する総合的な推定感情情報を情報提示する。提示部３５０は、コミュニケーションメディアの一例としての拠点俯瞰動画像において他のユーザー９００の人物像が映っている領域の近傍の領域に該他のユーザー９００の総合的な推定感情情報を表示させてもよい。このとき、提示部３５０は、該他のユーザー９００の総合的な推定感情情報とともに、この総合的な推定感情情報の推定精度を提示してもよい。かかる推定精度は、前記した個々の推定感情情報に基づく総合的な推定感情情報の算出と同様な手法によって、個々の推定感情情報の推定精度に基づいて算出されてもよい。 For example, the presentation unit 350 presents comprehensive estimated emotional information corresponding to another user 900 designated by the user 910 from the input unit 340. Even if the presentation unit 350 displays the comprehensive estimated emotional information of the other user 900 in the area near the area in which the person image of the other user 900 is displayed in the base bird's-eye view moving image as an example of the communication media. Good. At this time, the presentation unit 350 may present the estimation accuracy of the comprehensive estimated emotion information together with the comprehensive estimated emotion information of the other user 900. Such estimation accuracy may be calculated based on the estimation accuracy of the individual estimated emotion information by the same method as the calculation of the comprehensive estimated emotion information based on the individual estimated emotion information described above.

また、前記総合的な推定感情情報と総合的な推定感情情報の推定精度は、たとえば感情推定サーバ１００の記憶部１２０や情報提示端末３００の記憶部３２０に蓄積保存されてもよい。その場合、制御部３３０は、該蓄積されたデータに基づいて、総合的な推定感情情報および総合的な推定感情情報の推定精度それぞれの履歴情報を、たとえば時系列グラフ等に加工して提示部３５０に画面表示してもよい。 Further, the comprehensive estimated emotion information and the estimation accuracy of the comprehensive estimated emotion information may be stored and stored in, for example, the storage unit 120 of the emotion estimation server 100 or the storage unit 320 of the information presenting terminal 300. In that case, the control unit 330 processes the historical information of each of the comprehensive estimated emotion information and the estimation accuracy of the comprehensive estimated emotion information into, for example, a time series graph, based on the accumulated data, and presents the unit. The screen may be displayed on the 350.

図７は、情報提示端末３００の提示部３５０によって提示された表示画面の一例を説明するための説明図である。前記表示画面には、たとえばコミュニケーションメディアとしてカメラ２００により俯瞰的に撮像された動画像が表示されており、該動画像の中央付近の画面領域にはユーザー９００の人物像が映っている。さらに、制御部３３０は、動画像（人物領域画像）においてユーザー９００が写る座標に応じた位置に総合的な推定感情情報が重畳されるように制御する。より具体的に、前記ユーザー９００の人物像の近傍位置には前記ユーザー９００に紐づけられた総合的な推定感情情報、総合的な推定感情情報の推定精度、およびそれらの履歴情報が画面表示されている。 FIG. 7 is an explanatory diagram for explaining an example of a display screen presented by the presentation unit 350 of the information presentation terminal 300. On the display screen, for example, a moving image captured from a bird's-eye view by the camera 200 as a communication medium is displayed, and a person image of the user 900 is displayed in a screen area near the center of the moving image. Further, the control unit 330 controls so that the comprehensive estimated emotion information is superimposed on the position corresponding to the coordinates in which the user 900 appears in the moving image (person area image). More specifically, the comprehensive estimated emotion information associated with the user 900, the estimation accuracy of the comprehensive estimated emotion information, and their history information are displayed on the screen at a position near the person image of the user 900. ing.

前記ユーザー９００の人物像と、ユーザー９００の総合的な推定感情情報、総合的な推定感情情報の推定精度およびそれらの履歴情報とが近傍位置に表示されることで、情報提示端末３００の提示部３５０を見るユーザーは、コミュニケーションメディアと推定された情報とを関連づけて認知しやすくなる。この近傍位置への表示機能は、特にひとつのコミュニケーションメディアに複数のユーザーの情報が含まれている場合等に有効である。コミュニケーションメディアとユーザー９００の総合的な推定感情情報、総合的な推定感情情報の推定精度とを併せてデータとして扱い、それらデータの情報が相互に関連し合って効果を発揮する点が本コミュニケーションシステムの特徴のひとつである。 The presentation unit of the information presenting terminal 300 is displayed by displaying the person image of the user 900, the comprehensive estimated emotion information of the user 900, the estimation accuracy of the comprehensive estimated emotion information, and their history information at nearby positions. The user who sees 350 can easily recognize the communication media by associating it with the estimated information. This display function at a nearby position is particularly effective when one communication medium contains information of a plurality of users. This communication system treats the communication media, the comprehensive estimated emotional information of the user 900, and the estimated accuracy of the comprehensive estimated emotional information as data, and the information of the data is related to each other and exerts an effect. It is one of the features of.

ここで、近傍位置は特に限定されない。たとえば、近傍位置は、ユーザー９００の人物像の位置を基準として所定の距離以内の位置であってもよい。なお、図７に示した例では、ユーザー９００の総合的な推定感情情報、総合的な推定感情情報の推定精度およびそれらの履歴情報とユーザー９００の識別情報とを含んだ表示領域が吹き出し形状によって表示されている。これによって、各情報とユーザーとの関連が把握しやすくなる。しかし、表示領域の形状は吹き出し形状に限定されない。 Here, the neighborhood position is not particularly limited. For example, the neighborhood position may be a position within a predetermined distance with respect to the position of the person image of the user 900. In the example shown in FIG. 7, the display area including the comprehensive estimated emotion information of the user 900, the estimation accuracy of the comprehensive estimated emotion information, their history information, and the identification information of the user 900 is formed by the balloon shape. It is displayed. This makes it easier to understand the relationship between each piece of information and the user. However, the shape of the display area is not limited to the balloon shape.

なお、ここではコミュニケーションメディアがリアルタイムに伝送されたデータである場合を主に想定した。しかし、変形例として、前述のコミュニケーションメディアは必ずしもリアルタイム伝送されたデータではなくてもよく、たとえば「録画」や「録音」された過去のメディアデータであっても構わない。 Here, it is mainly assumed that the communication medium is data transmitted in real time. However, as a modification, the communication media described above does not necessarily have to be data transmitted in real time, and may be, for example, "recorded" or "recorded" past media data.

前述のように、本発明の実施形態に係るコミュニケーションシステムは新しい電話システムであることも想定しており、一機能として遠隔地の協働メンバーの過去の様子を伺えてもよい。このとき、たとえばユーザー９００は、過去の録画人物映像データおよび該過去の録画人物映像データに紐づけられた該過去の総合的な推定感情情報、総合的な推定感情情報の推定精度を、情報提示端末３００を介して感情推定サーバ１００の記憶部１２０から取得できてもよい。 As described above, it is assumed that the communication system according to the embodiment of the present invention is a new telephone system, and one function may be to ask the past state of collaborative members in remote areas. At this time, for example, the user 900 presents information on the past recorded person video data, the past comprehensive estimated emotion information linked to the past recorded person video data, and the estimation accuracy of the comprehensive estimated emotion information. It may be acquired from the storage unit 120 of the emotion estimation server 100 via the terminal 300.

たとえばユーザー９１０が、情報提示端末３００を介して、現在から２時間前の時点のユーザー９００の録画人物映像データと総合的な推定感情情報、総合的な推定感情情報の推定精度とを関連づけて取得できてもよい。このような場合、２時間後のリアルタイムの時点では本コミュニケーションシステム内にユーザー９００がすでに不在になっている等の可能性もある。しかし、前記過去のメディアデータを取得する場合には必ずしも複数のユーザーがシステム内に同時に存在する必要はなく、１名のユーザーしか本コミュニケーションシステムを使用していない場合でもよい。 For example, the user 910 acquires the recorded person video data of the user 900 at the time two hours before the present, the comprehensive estimated emotion information, and the estimated accuracy of the comprehensive estimated emotion information in association with each other via the information presenting terminal 300. You may be able to do it. In such a case, there is a possibility that the user 900 is already absent in the communication system at the time of real time two hours later. However, when acquiring the past media data, it is not always necessary for a plurality of users to exist in the system at the same time, and it is possible that only one user is using the communication system.

続いて、図８を参照して、本実施形態に係る情報処理動作の例を説明する。図８は、本実施形態に係る情報通信システムの動作フローの一例を示す説明図である。図８に示したように、ステップＳ１１０１で、感情推定サーバ１００の顔検出部１３１は、カメラ２００の撮像画像から、ユーザー９００の顔画像の領域を特定し、切り出して（抽出して）記憶部１２０に記憶する。ステップＳ１１０３で、感情推定サーバ１００の顔検出部１３１は、複数のカメラ２００の撮像画像から顔検出されたか否かを判定する。 Subsequently, an example of the information processing operation according to the present embodiment will be described with reference to FIG. FIG. 8 is an explanatory diagram showing an example of an operation flow of the information communication system according to the present embodiment. As shown in FIG. 8, in step S1101, the face detection unit 131 of the emotion estimation server 100 identifies an area of the face image of the user 900 from the image captured by the camera 200, cuts out (extracts) the area of the face image, and stores the storage unit. Store in 120. In step S1103, the face detection unit 131 of the emotion estimation server 100 determines whether or not the face has been detected from the images captured by the plurality of cameras 200.

ステップＳ１１０３の処理で複数のカメラ２００の撮像画像から顔検出された場合（Ｓ１１０３：ＹＥＳ）は、ステップＳ１１０５で、感情推定サーバ１００の感情人物照合部１３５は、前記複数のカメラ２００から取得されたユーザー９００の複数視点からの顔画像を対応付ける処理を行う。ステップＳ１１０７で、感情推定サーバ１００の撮影条件比較部１３７は、Ｓ１１０５で対応付けられたユーザー９００の複数視点からの複数の顔画像に対して、それぞれの撮影条件の評価値を算出する。 When the face is detected from the images captured by the plurality of cameras 200 in the process of step S1103 (S1103: YES), the emotion person matching unit 135 of the emotion estimation server 100 is acquired from the plurality of cameras 200 in step S1105. Performs a process of associating face images from a plurality of viewpoints of the user 900. In step S1107, the shooting condition comparison unit 137 of the emotion estimation server 100 calculates the evaluation value of each shooting condition for the plurality of face images from the plurality of viewpoints of the user 900 associated with S1105.

ステップＳ１１０９で、感情推定サーバ１００の総合感情推定部１３９は、感情人物照合部１３５で対応付けられたあるユーザー９００の複数視点からの複数の顔画像に基づく推定感情情報と、撮影条件比較部１３７により算出された評価値とに基づいて、あるユーザー９００の総合的な推定感情情報を算出する。このとき、推定感情情報の値ごとに評価値が合計され、合計評価値が最大となる推定感情情報がユーザー９００の総合的な推定感情情報として選択されてもよい。あるいは、評価値から重みが算出され、重みと推定感情情報とに基づいてユーザー９００の総合的な推定感情情報が算出されてもよい。 In step S1109, the comprehensive emotion estimation unit 139 of the emotion estimation server 100 includes estimated emotion information based on a plurality of facial images of a user 900 associated with the emotion person matching unit 135 from a plurality of viewpoints, and a shooting condition comparison unit 137. Based on the evaluation value calculated by the above, the comprehensive estimated emotional information of a certain user 900 is calculated. At this time, the evaluation values are totaled for each value of the estimated emotion information, and the estimated emotion information having the maximum total evaluation value may be selected as the comprehensive estimated emotion information of the user 900. Alternatively, the weight may be calculated from the evaluation value, and the comprehensive estimated emotion information of the user 900 may be calculated based on the weight and the estimated emotion information.

また、ステップＳ１１０３の処理で、複数のカメラ２００の撮像画像から顔検出されず単一のカメラ２００の撮像画像からのみ顔検出された場合（Ｓ１１０３：ＮＯ）は、ステップＳ１１１１で、総合感情推定部１３９はあるユーザー９００の単一視点からの顔画像と、その感情情報とから、あるユーザー９００の推定感情情報を算出する。このとき、総合感情推定部１３９は、感情推定部１３３によって推定された推定値をそのまま用いてもよいし、撮影条件比較部１３７により得られる撮影条件の評価値の情報に基づき感情推定部１３３の推定値に修正を加えてもよい（たとえば、外乱要因の大きさに応じて感情推定部１３３の推定値を変更や増減させてもよい）。 Further, in the process of step S1103, when the face is not detected from the images captured by the plurality of cameras 200 and the face is detected only from the images captured by a single camera 200 (S1103: NO), the comprehensive emotion estimation unit is performed in step S1111. 139 calculates the estimated emotion information of the user 900 from the face image of the user 900 from a single viewpoint and the emotion information thereof. At this time, the comprehensive emotion estimation unit 139 may use the estimated value estimated by the emotion estimation unit 133 as it is, or the emotion estimation unit 133 based on the information of the evaluation value of the shooting condition obtained by the shooting condition comparison unit 137. The estimated value may be modified (for example, the estimated value of the emotion estimation unit 133 may be changed or increased or decreased depending on the magnitude of the disturbance factor).

以上により、感情推定サーバ１００は、オフィス４００内の複数のカメラから得られるユーザー９００の撮像画像を基に、単一カメラの場合と比較して、各種外乱要因の影響を抑えたより高い推定精度の推定感情情報を得ることができる。 As described above, the emotion estimation server 100 has higher estimation accuracy that suppresses the influence of various disturbance factors as compared with the case of a single camera, based on the captured images of the user 900 obtained from a plurality of cameras in the office 400. Estimated emotional information can be obtained.

（２．まとめ）
以上のように、本発明の実施形態によれば、複数のカメラから得られる複数視点の画像に基づき総合的な感情推定処理を行うことで、従来の単一カメラによる感情推定処理と比較して高い精度の感情推定が実現される。 (2. Summary)
As described above, according to the embodiment of the present invention, by performing the comprehensive emotion estimation process based on the images of the plurality of viewpoints obtained from the plurality of cameras, the emotion estimation process is compared with the conventional emotion estimation process by the single camera. Highly accurate emotion estimation is realized.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the present invention is not limited to such examples. It is clear that a person having ordinary knowledge in the field of technology to which the present invention belongs can come up with various modifications or modifications within the scope of the technical idea described in the claims. , These are also naturally understood to belong to the technical scope of the present invention.

１００感情推定サーバ
１１０通信部
１２０記憶部
１２１感情推定辞書ＤＢ
１２２感情人物位置ＤＢ
１３０制御部
１３１顔検出部
１３３感情推定部
１３５感情人物照合部
１３７撮影条件比較部
１３９総合感情推定部
２００カメラ
２１０通信部
２２０計測部
２３０制御部
３００情報提示端末
３１０通信部
３２０記憶部
３３０制御部
３４０入力部
３５０提示部
５００障害物
６００照明
100 Emotion estimation server 110 Communication unit 120 Storage unit 121 Emotion estimation dictionary DB
122 Emotional person position DB
130 Control unit 131 Face detection unit 133 Emotion estimation unit 135 Emotional person matching unit 137 Shooting condition comparison unit 139 Comprehensive emotion estimation unit 200 Camera 210 Communication unit 220 Measurement unit 230 Control unit 300 Information presentation terminal 310 Communication unit 320 Storage unit 330 Control unit 340 Input section 350 Presentation section 500 Obstacles 600 Lighting

Claims

A mapping processing unit that associates multiple person area images in which the same person is captured from multiple viewpoints,
An evaluation value calculation unit that calculates an evaluation value of shooting conditions for each of the plurality of person area images, and an evaluation value calculation unit.
A comprehensive emotion estimation unit that generates comprehensive estimated emotion information of the person based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value.
Emotion estimation device.

The association processing unit associates the plurality of person area images with each other based on the position information in the space of the person reflected in each of the plurality of person area images.
The emotion estimation device according to claim 1.

The association processing unit associates the plurality of person area images with each other based on the shooting times of the plurality of person area images.
The emotion estimation device according to claim 1 or 2.

The comprehensive emotion estimation unit calculates the weight of each of the plurality of estimated emotion information based on the evaluation value, and generates the comprehensive estimated emotion information based on the estimated emotion information and the weight.
The emotion estimation device according to any one of claims 1 to 3.

The comprehensive emotion estimation unit calculates the weight based on the priority of the shooting conditions and the evaluation value.
The emotion estimation device according to claim 4.

The comprehensive emotion estimation unit calculates the weight based on the normalization between the evaluation values.
The emotion estimation device according to claim 4.

The evaluation value calculation unit sets the evaluation value based on the likelihood in the estimation of each of the plurality of estimated emotion information.
The emotion estimation device according to any one of claims 1 to 6.

The evaluation value calculation unit determines the evaluation value of the shooting condition of the person area image based on the angle or distance between the person and the camera that captures the person area image in response to each of the plurality of person area images. Set,
The emotion estimation device according to any one of claims 1 to 7.

The evaluation value calculation unit corresponds to each of the plurality of person area images, and based on at least one of the degree of light irradiation to the person and the degree of imaging shielding of the person, the shooting conditions of the person area image. Set the evaluation value of
The emotion estimation device according to any one of claims 1 to 8.

The evaluation value calculation unit sets the evaluation value of the shooting condition of the person area image based on at least one of the resolution and the image quality of the person area image corresponding to each of the plurality of person area images. ,
The emotion estimation device according to any one of claims 1 to 9.

Corresponding multiple person area images in which the same person is captured from multiple viewpoints
To calculate the evaluation value of the shooting conditions for each of the plurality of person area images,
To generate comprehensive estimated emotional information of the person based on the estimated emotional information generated from each of the plurality of person area images and the evaluation value.
Emotion estimation methods, including.

Computer,
A mapping processing unit that associates multiple person area images in which the same person is captured from multiple viewpoints,
An evaluation value calculation unit that calculates an evaluation value of shooting conditions for each of the plurality of person area images, and an evaluation value calculation unit.
A comprehensive emotion estimation unit that generates comprehensive estimated emotion information of the person based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value.
A program for functioning as an emotion estimation device.

The evaluation value of the shooting conditions of each of the plurality of person area images associated with the same person being imaged from a plurality of viewpoints is calculated, and the estimated emotion information generated from each of the plurality of person area images and the evaluation value are combined with each other. When the comprehensive estimated emotional information of the person is generated based on the above, a control unit for controlling the comprehensive estimated emotional information to be presented is provided.
Information presentation device.

The control unit
It is controlled so that the person area image in which the person appears is presented, and the comprehensive estimated emotion information is superimposed on the position corresponding to the coordinates in which the person appears in the person area image.
The information presenting device according to claim 13.

The evaluation value of the shooting conditions of each of the plurality of person area images associated with the same person being imaged from a plurality of viewpoints is calculated, and the estimated emotion information generated from each of the plurality of person area images and the evaluation value are combined with each other. When the comprehensive estimated emotional information of the person is generated based on the above, the comprehensive estimated emotional information is controlled to be presented.
Information presentation method.

Computer,
The evaluation value of the shooting conditions of each of the plurality of person area images associated with the same person being imaged from a plurality of viewpoints is calculated, and the estimated emotion information generated from each of the plurality of person area images and the evaluation value are combined with each other. When the comprehensive estimated emotional information of the person is generated based on the above, a control unit for controlling the comprehensive estimated emotional information to be presented is provided.
A program to function as an information presentation device.

A mapping processing unit that associates multiple person area images in which the same person is captured from multiple viewpoints,
An evaluation value calculation unit that calculates an evaluation value of shooting conditions for each of the plurality of person area images, and an evaluation value calculation unit.
A comprehensive emotion estimation unit that generates comprehensive estimated emotion information of the person based on the estimated emotion information generated from each of the plurality of person area images and the evaluation value.
Emotional estimation device and
A control unit for controlling the comprehensive estimated emotional information to be presented is provided.
Information presentation device and
Emotion estimation system.

Corresponding multiple person area images in which the same person is captured from multiple viewpoints
To calculate the evaluation value of the shooting conditions for each of the plurality of person area images,
To generate comprehensive estimated emotional information of the person based on the estimated emotional information generated from each of the plurality of person area images and the evaluation value.
Controlling the presentation of the comprehensive estimated emotional information
Emotion estimation methods, including.