JP2015075907A

JP2015075907A - Emotion information display control device, method of the same, and program

Info

Publication number: JP2015075907A
Application number: JP2013211506A
Authority: JP
Inventors: 史朗熊野; Shiro Kumano; 大塚　和弘; Kazuhiro Otsuka; 和弘大塚; 淳司大和; Atsushi Yamato
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-10-09
Filing date: 2013-10-09
Publication date: 2015-04-20
Anticipated expiration: 2033-10-09
Also published as: JP6023684B2

Abstract

PROBLEM TO BE SOLVED: To provide a technology to control the display of emotion information so as for persons to easily catch the emotion information.SOLUTION: An emotion information display control device includes: an emotion information acquisition unit that takes the degree of emotion between two persons forming each of pairs, which is formed of two persons among a plurality of persons, as emotion information, and acquires emotion information between two persons among three or more persons from a video capturing the three or more persons; and a control unit that controls to switch at least two or more among (1) emotion information, of a pieces of emotion information, between a first person, who is one of three or more persons, and another person, (2) emotion information, of pieces of emotion information, other than the emotion information between the first person and another person, and (3) all the emotion information, and display the resulting pieces of information on a display device.

Description

本発明は、複数の対話者間の感情の度合いを表示する技術に関する。 The present invention relates to a technique for displaying the degree of emotion between a plurality of interlocutors.

対話二者間の共感／反感を自動で推定する技術として、対話している二者の共感／反感が外部観察者の集団からどのように解釈されるかを推定することが提案されている（特許文献１および非特許文献１参照）。非特許文献１では、外部観察者毎に解釈が異なることをコミュニケーションの必然と考え、外部観察者集団の中での解釈のばらつき、すなわち、共感／反感／何れでもない、の3状態それぞれの得票率を推定するという問題を設定している。そして、非特許文献１では、複数の対話者間の感情（「共感」「反感」「何れでもない」）の度合いを１つの表示装置に表示している。 As a technique for automatically estimating the sympathy / antisense between the two parties, it has been proposed to estimate how the sympathy / antisense of the two parties interacting with each other is interpreted from a group of external observers ( Patent Document 1 and Non-Patent Document 1). Non-Patent Document 1 considers that the interpretation is different for each external observer, and the variability of interpretation within the external observer group, that is, the votes of each of the three states, empathy / disapproval / neither The problem of estimating the rate is set. In Non-Patent Document 1, the degree of emotion (“sympathy”, “disapproval”, and “neither”) among a plurality of interrogators is displayed on one display device.

特開２０１２−１８５７２７号公報JP 2012-185727 A

熊野史朗, 大塚和弘, 三上弾, 大和淳司, “複数人対話を対象とした表情と視線に基づく共感／反感の推定モデルとその評価”, 電子情報通信学会技術報告，ヒューマンコミュニケーション基礎研究会, HCS 111(214), pp. 33-38，2011.Shiro Kumano, Kazuhiro Otsuka, Amami Mikami, Junji Yamato, “Estimation model and evaluation of empathy / antisense based on facial expression and gaze for multi-person dialogue”, IEICE Technical Report, Human Communication Fundamentals Study Group, HCS 111 (214), pp. 33-38, 2011.

しかしながら、従来技術は、使用場面に関わらず、基本的に複数の対話者間の感情の度合いの全てを、１つの表示装置に表示するため、複数の対話者間の感情の度合いが把握しづらい。対象の二者がお互いに相手の方を見ていない状態（相互そらし状態）のときにはその二者の間の感情の度合いは表示されないが、相互そらし状態は時間毎に変化するため、例えば、ある特定の一人とそれ以外の他者との間の感情の度合いを時系列で把握することは容易ではない。 However, since the conventional technique basically displays all the feelings between a plurality of interlocutors on a single display device regardless of the usage situation, it is difficult to grasp the degree of emotions between the plurality of interrogators. . When the two target parties are not looking at each other (the mutual distraction state), the degree of emotion between the two is not displayed, but the mutual distraction state changes with time, for example, there is It is not easy to grasp the degree of emotion between one particular person and the other person in time series.

本発明は、感情情報が把握しやすいように感情情報の表示を制御する技術を提供することを目的とする。 An object of this invention is to provide the technique which controls the display of emotion information so that emotion information can be grasped easily.

上記の課題を解決するために、本発明の第一の態様によれば、感情情報表示制御装置は、複数の人物の中の２人の人物から構成される各ペアについてのその各ペアを構成する２人の人物の間の感情の度合いを感情情報とし、３人以上が撮影された映像から、３人以上のうちの２人の間の感情情報を求める感情情報取得部と、（１）感情情報のうち、３人以上のうちの１人である第一の人物と他者との間の感情情報、（２）感情情報のうち、第一の人物と他者との間の感情情報以外の感情情報（３）全ての感情情報の少なくとも２つ以上を切り替えて、表示装置に表示するように制御する制御部とを含む。 In order to solve the above-described problem, according to the first aspect of the present invention, the emotion information display control device configures each pair of each pair composed of two persons among a plurality of persons. An emotion information acquisition unit for obtaining emotion information between two persons out of three or more from a video in which three or more persons are photographed, with the degree of emotion between the two persons performing emotion information as the emotion information; (1) Among emotion information, emotion information between the first person who is one of three or more and others, (2) Among emotion information, emotion information between the first person and others (3) a control unit that controls to switch at least two or more of all emotion information and display them on the display device.

上記の課題を解決するために、本発明の他の態様によれば、感情情報表示制御装置は、複数の人物の中の２人の人物から構成される各ペアについてのその各ペアを構成する２人の人物の間の感情の度合いを感情情報とし、３人以上が撮影された映像から、３人以上のうちの２人の間の感情情報を求める感情情報取得部と、感情情報のうち、３人以上のうちの１人である第一の人物と他者との間の感情情報を、複数の表示装置のうちの１つである第一の表示装置に表示し、感情情報のうち、第一の人物と他者との間の感情情報以外の感情情報を、複数の表示装置のうちの第一の表示装置以外の１つである第二の表示装置に表示するように制御する制御部とを含む。 In order to solve the above problems, according to another aspect of the present invention, the emotion information display control device configures each pair of each pair composed of two persons among a plurality of persons. An emotion information acquisition unit for obtaining emotion information between two persons from an image in which three or more persons are photographed with the degree of emotion between the two persons as emotion information, The emotion information between the first person who is one of the three or more and the other person is displayed on the first display device which is one of the plurality of display devices. The emotion information other than the emotion information between the first person and the other person is controlled to be displayed on the second display device which is one of the plurality of display devices other than the first display device. And a control unit.

上記の課題を解決するために、本発明の他の態様によれば、感情情報表示制御方法は、複数の人物の中の２人の人物から構成される各ペアについてのその各ペアを構成する２人の人物の間の感情の度合いを感情情報とし、３人以上が撮影された映像から、３人以上のうちの２人の間の感情情報を求める感情情報取得ステップと、（１）感情情報のうち、３人以上のうちの１人である第一の人物と他者との間の感情情報、（２）感情情報のうち、第一の人物と他者との間の感情情報以外の感情情報（３）全ての感情情報の少なくとも２つ以上を切り替えて、表示装置に表示するように制御する制御ステップとを含む。 In order to solve the above-described problem, according to another aspect of the present invention, an emotion information display control method configures each pair of each pair composed of two persons among a plurality of persons. An emotion information acquisition step for obtaining emotion information between two persons from the image in which three or more persons are photographed with the degree of emotion between the two persons as emotion information, and (1) emotion Of the information, emotion information between the first person who is one of three or more and the other person, (2) Of emotion information, other than the emotion information between the first person and the other person Emotion information (3) including a control step of switching so that at least two or more of all emotion information is switched and displayed on the display device.

上記の課題を解決するために、本発明の他の態様によれば、感情情報表示制御方法は、複数の人物の中の２人の人物から構成される各ペアについてのその各ペアを構成する２人の人物の間の感情の度合いを感情情報とし、３人以上が撮影された映像から、３人以上のうちの２人の間の感情情報を求める感情情報取得ステップと、感情情報のうち、３人以上のうちの１人である第一の人物と他者との間の感情情報を、複数の表示装置のうちの１つである第一の表示装置に表示し、感情情報のうち、第一の人物と他者との間の感情情報以外の感情情報を、複数の表示装置のうちの第一の表示装置以外の１つである第二の表示装置に表示するように制御する制御ステップとを含む。 In order to solve the above-described problem, according to another aspect of the present invention, an emotion information display control method configures each pair of each pair composed of two persons among a plurality of persons. Emotion information acquisition step for obtaining emotion information between two people out of three or more people from an image in which the degree of emotion between two people is taken as emotion information, and among the emotion information The emotion information between the first person who is one of the three or more and the other person is displayed on the first display device which is one of the plurality of display devices. The emotion information other than the emotion information between the first person and the other person is controlled to be displayed on the second display device which is one of the plurality of display devices other than the first display device. Control steps.

本発明によれば、感情情報が把握しやすいように感情情報の表示を制御することができるという効果を奏する。 According to the present invention, it is possible to control the display of emotion information so that the emotion information can be easily grasped.

第一実施形態に係る感情情報表示制御装置の配置例を示す図。The figure which shows the example of arrangement | positioning of the emotion information display control apparatus which concerns on 1st embodiment. 第一実施形態に係る感情情報表示制御装置の機能構成を例示する図。The figure which illustrates the function structure of the emotion information display control apparatus which concerns on 1st embodiment. 第一実施形態に係る感情情報表示制御装置の処理フローを例示する図。The figure which illustrates the processing flow of the emotion information display control apparatus which concerns on 1st embodiment. 感情情報取得部の機能構成を例示する図。The figure which illustrates the functional structure of an emotion information acquisition part. パラメタ学習部の機能構成を例示する図。The figure which illustrates the function structure of a parameter learning part. 学習フェーズの処理フローを例示する図。The figure which illustrates the processing flow of a learning phase. 推定フェーズの処理フローを例示する図。The figure which illustrates the processing flow of an estimation phase. 時間差関数を説明する図。The figure explaining a time difference function. 対話者の行動と共感解釈の時間差を説明する図。The figure explaining the time difference of a dialogue person's action and empathy interpretation. 変化タイミング関数を説明する図。The figure explaining a change timing function. 変化タイミング関数の有効範囲を説明する図。The figure explaining the effective range of a change timing function. 変化タイミング関数の有効範囲を説明する図。The figure explaining the effective range of a change timing function. 第一実施形態において第一の人物２−ｉ（図中では２−１）と他者２−ｊ（図中では２−２，２−３，あるいは，２−４）との間の感情情報が表示装置に表示される映像の例を表す図。Emotion information between the first person 2-i (2-1 in the figure) and the other person 2-j (2-2, 2-3, or 2-4 in the figure) in the first embodiment FIG. 5 is a diagram illustrating an example of an image displayed on a display device. 第一実施形態において第一の人物２−ｉ（図中では２−１）と他者２−ｊ（図中では２−２，２−３，あるいは，２−４）との間の感情情報以外の感情情報が表示装置に表示される映像の例を表す図。Emotion information between the first person 2-i (2-1 in the figure) and the other person 2-j (2-2, 2-3, or 2-4 in the figure) in the first embodiment The figure showing the example of the image | video by which emotion information other than is displayed on a display apparatus. 第一実施形態において全ての感情情報が表示装置に表示される映像の例を表す図。The figure showing the example of the image | video by which all emotion information is displayed on a display apparatus in 1st embodiment. 、各事後確率分布を棒グラフ等で表示するにように制御したときに、表示装置に表示される映像の例を表す図。The figure showing the example of the image | video displayed on a display apparatus, when it controls so that each posterior probability distribution may be displayed with a bar graph etc. FIG. 第一実施形態において、第一の人物が、表示装置に表示されないように制御した場合の、第一の人物２−ｉ（図中では表示されていない）と他者２−ｊ（図中では２−２，２−３，あるいは，２−４）との間の感情情報が表示装置で表示される映像の例を示す図。In the first embodiment, the first person 2-i (not shown in the figure) and the other person 2-j (not shown in the figure) when the first person is controlled not to be displayed on the display device. The figure which shows the example of the image | video by which the emotion information between 2-2, 2-3, or 2-4) is displayed with a display apparatus. 第一実施形態において、第一の人物が、表示装置に表示されないように制御した場合の、第一の人物２−ｉ（図中では表示されていない）と他者２−ｊ（図中では２−２，２−３，あるいは，２−４）との間の感情情報以外の感情情報が表示装置で表示される映像の例を示す図。In the first embodiment, the first person 2-i (not shown in the figure) and the other person 2-j (not shown in the figure) when the first person is controlled not to be displayed on the display device. The figure which shows the example of the image | video by which emotion information other than the emotion information between 2-2, 2-3, or 2-4) is displayed with a display apparatus. 第一実施形態において、第一の人物が、表示装置に表示されないように制御した場合の、全ての感情情報が表示装置で表示される映像の例を示す図。The figure which shows the example of the image | video by which all emotion information is displayed on a display apparatus when a 1st person is controlled not to display on a display apparatus in 1st embodiment. 第二実施形態に係る感情情報表示制御装置の配置例を示す図。The figure which shows the example of arrangement | positioning of the emotion information display control apparatus which concerns on 2nd embodiment. 第二実施形態に係る感情情報表示制御装置の処理フローを例示する図。The figure which illustrates the processing flow of the emotion information display control apparatus which concerns on 2nd embodiment. 第三実施形態に係る感情情報表示制御装置の処理フローを例示する図。The figure which illustrates the processing flow of the emotion information display control apparatus which concerns on 3rd embodiment.

以下、本発明の実施形態について説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号「~」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. In the following description, the symbol “˜” and the like used in the text should be described immediately above the immediately preceding character, but are described immediately after the character due to restrictions on text notation. In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

＜第一実施形態に係る感情情報表示制御装置１００＞
図１は、第一実施形態に係る感情情報表示制御装置１００の配置例を示す。感情情報表示制御装置１００は、Ｎ人以上が撮影された映像を入力とし、表示装置３に、感情情報を付加した映像を出力する。Ｎは３以上の整数である。なお、感情情報は、複数の人物の中の２人の人物から構成される各ペアについてのその各ペアを構成する２人の人物の間の感情の度合いを表す。なお、２人の人物の間において相互に共通する感情を対象とし、この実施形態では、感情として「共感」「反感」「何れでもない」の３種類の感情を用いる。ここでは、共感は二者の感情状態が類似している状態、反感はそれらが異なっている状態とする。この３種類の感情の度合いを感情情報とする。この感情情報は、対話者間の共感状態を示しているといってもよい。 <Emotion information display control apparatus 100 according to the first embodiment>
FIG. 1 shows an arrangement example of the emotion information display control device 100 according to the first embodiment. The emotion information display control device 100 receives a video of N or more people as an input, and outputs a video with emotion information added to the display device 3. N is an integer of 3 or more. In addition, emotion information represents the degree of emotion between two persons constituting each pair of each pair composed of two persons among a plurality of persons. It should be noted that emotions that are common to two persons are targeted, and in this embodiment, three types of emotions of “sympathy”, “antisense”, and “neither” are used as emotions. Here, empathy is a state in which the emotional states of the two are similar, and a feeling of dissatisfaction is a state in which they are different. These three types of emotions are used as emotion information. It can be said that this emotion information indicates a state of empathy between the interlocutors.

図２は感情情報表示制御装置１００の機能構成例を、図３はその処理フローを示す。感情情報表示制御装置１００は、感情情報取得部１１０及び制御部１２０を含む。 FIG. 2 shows a functional configuration example of the emotion information display control apparatus 100, and FIG. 3 shows a processing flow thereof. The emotion information display control device 100 includes an emotion information acquisition unit 110 and a control unit 120.

＜感情情報取得部１１０＞
感情情報取得部１１０は、Ｎ人が撮影された映像を受け取り、この映像からＮ人のうちの２人の間の感情情報を求め（ｓ１）、求めた感情情報を制御部１２０に出力する。なお、Ｎ人が撮影された映像は、各対話者について一台のカメラを用意して、複数のカメラにより撮影した複数の映像を多重化した映像でもよいし（図１参照）、魚眼レンズを用いるなどした全方位カメラ一台で対話者全員を撮影した映像であってもよい。例えば、図１のように、カメラ１−ｎ（ただしｎ＝１，２，…，Ｎである）から人物２−ｎを撮影し、Ｎ個の映像を多重化したものを入力としてもよいし、１つの映像の中に複数人の人物が映っているものを入力としてもよい。 <Emotion information acquisition unit 110>
The emotion information acquisition unit 110 receives a video in which N people are photographed, obtains emotion information between two of the N people from this video (s1), and outputs the obtained emotion information to the control unit 120. The video taken by N people may be a video prepared by preparing one camera for each conversation person and multiplexing a plurality of videos taken by a plurality of cameras (see FIG. 1), or using a fisheye lens. For example, it may be an image obtained by photographing all the participants with one omnidirectional camera. For example, as shown in FIG. 1, a person 2-n is photographed from a camera 1-n (where n = 1, 2,..., N), and N images multiplexed may be input. An input of a plurality of persons appearing in one video may be used.

制御部１２０は、Ｎ人が撮影された映像と感情情報とを受け取り、表示装置３に、感情情報を付加した映像を出力し、表示装置３の表示を制御する（ｓ２）。以下、各部の処理の例を説明する。 The control unit 120 receives the video and emotion information taken by N persons, outputs the video with the emotion information added to the display device 3, and controls the display of the display device 3 (s2). Hereinafter, an example of processing of each unit will be described.

＜感情情報取得部１１０のポイント＞
感情情報取得部１１０は、対話の状態を推定して、映像から２人の間の感情情報（「共感」「反感」「何れでもない」の３種類の感情の度合い）を求める。この感情情報取得部１１０の対話状態推定技術における一番のポイントは、対話二者間での与え手の行動表出とそれに対する受け手の反応表出との間での行動の時間差と、対話二者間でそれらの行動が一致しているか否かを示す一致性によって、その二者を見た外部観察者がその対話の状態をどう解釈するのかを確率的にモデル化したことである。この背後にあるのは、外部観察者が二者の間の対話状態を解釈する際に、人が相手の働きかけに対してどのように反応するかというこれまでに得られている心理学分野の知見を、意識的あるいは無意識的に利用しているであろうという仮説である。このモデルにより、対話二者の行動の時系列が与えられたときに、外部観察者の集団が各時刻において共感状態をどう解釈するのかの得票率が推定できる。 <Points of emotion information acquisition unit 110>
The emotion information acquisition unit 110 estimates the state of dialogue and obtains emotion information (degrees of three types of emotions of “sympathy”, “disapproval”, and “none”) from the video. The most important point in the dialogue state estimation technique of the emotion information acquisition unit 110 is that the time difference between the behaviors of the giving person's behavior expression and the recipient's response expression between the two dialogues, and the dialogue two This is a model that stochastically models how the external observer who sees the two interprets the state of the dialogue based on the consistency indicating whether or not their actions are consistent. Behind this is the psychological field that has been obtained so far of how the person reacts to the other person's actions when the external observer interprets the state of dialogue between the two. It is a hypothesis that knowledge will be used consciously or unconsciously. This model makes it possible to estimate the vote rate of how the group of external observers interprets the sympathetic state at each time when a time series of actions of the two dialogues is given.

例えば、一方の対話者が微笑み、他方の対話者もその微笑みに対する反応として微笑みを返すという場面において、反応の微笑みが迅速に行われれば、外部観察者にはその反応が自発的で両者は共感などの肯定的な関係にあると見えやすい。一方で、反応の微笑みが少し遅れれば、外部観察者にはその反応はわざとらしく両者は反感などの否定的な関係にあると見えやすい。さらに、外部観察者の共感の解釈は、微笑みに対して微笑みを返すというように行動が一致しているのか、微笑みに対して苦笑を返すというように行動が不一致なのかによっても影響を受ける。この感情情報取得部では、このような対話二者間の行動の時間差と行動の一致性という関係を確率的にモデル化する。 For example, in a scene where one interlocutor smiles and the other interlocutor returns smile as a response to that smile, if the reaction smiles quickly, the reaction is spontaneous to the external observer and the two empathize It is easy to see if there is a positive relationship. On the other hand, if the smile of the reaction is delayed a little, it is easy for external observers to see that the reaction is bothersome and the two are in a negative relationship such as dissenting. Furthermore, the interpretation of empathy by external observers is also affected by whether the behavior is consistent, such as returning a smile to a smile, or whether the behavior is inconsistent, such as returning a smile to a smile. This emotion information acquisition unit probabilistically models the relationship between the action time difference between the two conversations and the action consistency.

この感情情報取得部のもう一つのポイントは、様々な行動チャネルについて、瞬間的な対話二者間の行動の組み合わせと外部観察者の共感解釈との間の関係性をモデル化したことである。行動チャネルとは対話者の行動の種類である。例えば特許文献１では、行動チャネルとして対話者の表情と視線のみをモデル化しているが、この感情情報取得部では頭部ジェスチャや発話有無など他の任意の行動チャネルをモデル化することが可能である。これにより、例えば与え手の微笑みに対して受け手が頷づいたり首を傾げたりするような場面に対して外部観察者が共感や反感を解釈するような場合にも、共感解釈をより精度よく推定できるようになる。 Another point of this emotion information acquisition unit is to model the relationship between a combination of instantaneous actions between two parties and an external observer's empathy interpretation for various action channels. An action channel is a type of dialogue person's action. For example, in Patent Document 1, only the facial expression and line of sight of a conversation person are modeled as action channels, but this emotion information acquisition unit can model other arbitrary action channels such as head gestures and the presence or absence of speech. is there. This makes it possible to estimate the sympathy interpretation more accurately, for example, when the external observer interprets the sympathy or disagreement for a scene in which the recipient's smile or tilts his head against the smile of the giver. become able to.

＜感情情報取得部１１０の構成＞
図４を参照して、この実施形態の感情情報取得部１１０の構成例について説明する。感情情報取得部１１０は入力部１０と行動認識部２０と共感解釈付与部３０とパラメタ学習部４０と事後確率推定部５０と出力部６０と学習用映像記憶部７０と推定用映像記憶部７２とモデルパラメタ記憶部７４とを備える。学習用映像記憶部７０と推定用映像記憶部７２は、例えば、ＲＡＭ（Random Access Memory）などの主記憶装置、ハードディスクや光ディスクもしくはフラッシュメモリなどの半導体メモリ素子により構成される補助記憶装置、などにより構成することができる。モデルパラメタ記憶部７４は、学習用映像記憶部７０と同様に構成してもよいし、リレーショナルデータベースやキーバリューストアなどのミドルウェア、などにより構成してもよい。 <Configuration of Emotion Information Acquisition Unit 110>
With reference to FIG. 4, the structural example of the emotion information acquisition part 110 of this embodiment is demonstrated. The emotion information acquisition unit 110 includes an input unit 10, an action recognition unit 20, an empathy interpretation giving unit 30, a parameter learning unit 40, a posterior probability estimation unit 50, an output unit 60, a learning video storage unit 70, and an estimation video storage unit 72. A model parameter storage unit 74. The learning video storage unit 70 and the estimation video storage unit 72 include, for example, a main storage device such as a RAM (Random Access Memory), an auxiliary storage device configured by a semiconductor memory element such as a hard disk, an optical disk, or a flash memory. Can be configured. The model parameter storage unit 74 may be configured in the same manner as the learning video storage unit 70, or may be configured by middleware such as a relational database or a key value store.

図５を参照して、この実施形態のパラメタ学習部４０の構成例について説明する。パラメタ学習部４０は事前分布学習部４２とタイミングモデル学習部４４と静的モデル学習部４６とを備える。 With reference to FIG. 5, the structural example of the parameter learning part 40 of this embodiment is demonstrated. The parameter learning unit 40 includes a prior distribution learning unit 42, a timing model learning unit 44, and a static model learning unit 46.

＜学習フェーズ＞
図６を参照して、感情情報取得部１１０の学習フェーズにおける動作例を説明する。 <Learning phase>
With reference to FIG. 6, the operation example in the learning phase of the emotion information acquisition part 110 is demonstrated.

入力部１０へ学習用映像が入力される（ステップＳ１１）。学習用映像は、複数の人物が対話する状況を撮影した映像であり、少なくとも対話者の頭部が撮影されていなければならない。学習用映像の撮影は、各対話者について一台のカメラを用意して、複数のカメラにより撮影した映像を多重化した映像でもよいし、魚眼レンズを用いるなどした全方位カメラ一台で対話者全員を撮影した映像であってもよい。入力された学習用映像は学習用映像記憶部７０に記憶される。 A learning video is input to the input unit 10 (step S11). The learning video is a video that captures a situation where a plurality of persons interact, and at least the head of the dialog must be captured. The video for learning can be taken by preparing one camera for each conversation person and multiplexing the pictures taken by multiple cameras, or by using a fisheye lens, etc. It may be a video of shooting. The input learning video is stored in the learning video storage unit 70.

行動認識部２０は学習用映像記憶部７０に記憶されている学習用映像を入力として、学習用映像に撮影された各対話者の行動として、表情、視線、頭部ジェスチャ、発話有無などを検出し、その結果生成された対話者の行動の時系列を出力する（ステップＳ２１）。この実施形態では、表情、視線、頭部ジェスチャ、および発話有無の4つの行動チャネルを認識対象とする。行動チャネルとは、行動の形態のことである。表情は、感情を表す主要な経路である。この実施形態では、無表情／微笑／哄笑／苦笑／思考中／その他、の6状態を表情の認識対象とする。視線は、感情を誰に伝えようとしているのかということと、他者の行動を観察していることとの少なくとも一方などを表している。この実施形態では、他者のうちの誰か一人を見ておりその相手が誰である／誰も見ていない（という状態）、を視線の認識対象としている。すなわち、状態数は対話者の数となる。ここで、対話者とは、視線を測定している対象者を含む対話に参加している全員を指す。表情と視線の認識方法は、特許文献１もしくは非特許文献１に記載の方法を用いればよい。頭部ジェスチャは、しばしば他者の意見に対する態度の表明として表出される。この実施形態では、なし／頷き／首ふり／傾げ／これらの組み合わせ、の4状態を頭部ジェスチャの認識対象とする。頭部ジェスチャの認識方法は、周知のいかなる方法も用いることができる。例えば「江尻康, 小林哲則, “対話中における頭部ジェスチャの認識”, 電子情報通信学会技術研究報告, PRMU2002-61, pp.31-36, Jul.2002.（参考文献１）」に記載の方法を用いればよい。発話有無は、話し手／聞き手という対話役割の主要な指標となる。この実施形態では、発話／沈黙、の2状態を発話有無の認識対象とする。発話有無の認識方法は、映像中の音声パワーを検出してあらかじめ定めた閾値を超えた場合に発話していると判断すればよい。もしくは映像中の対話者の口元の動きを検出することで発話の有無を検出してもよい。それぞれの行動は一台の装置ですべて認識してもよいし、行動ごとに別々の装置を用いて認識しても構わない。例えば、表情の認識であれば、行動認識装置の一例として「特許４９４２１９７号公報（参考文献２）」を使用すればよい。なお、行動認識部２０は、共感解釈付与部３０と同様に人手によるラベル付けを行い、その結果を出力するとしても構わない。 The action recognition unit 20 receives the learning video stored in the learning video storage unit 70 and detects facial expressions, gaze, head gestures, presence / absence of speech, etc. as the actions of each conversation person captured in the learning video. And the time series of the action of the dialogue person generated as a result is outputted (Step S21). In this embodiment, four action channels including facial expression, line of sight, head gesture, and presence / absence of speech are recognized. An action channel is a form of action. Facial expressions are the main pathway for expressing emotions. In this embodiment, six states of no expression / smile / smile / bitter smile / thinking / other are the facial expression recognition targets. The line of sight represents at least one of, for example, who is trying to convey emotions and / or observing the actions of others. In this embodiment, the line-of-sight recognition target is a person who is looking at one of the other persons and who is / is not looking at that person. That is, the number of states is the number of interlocutors. Here, the dialogue person refers to all who participate in the dialogue including the subject who is measuring the line of sight. The method described in Patent Document 1 or Non-Patent Document 1 may be used as a method for recognizing facial expressions and lines of sight. Head gestures are often expressed as an expression of attitude to the opinions of others. In this embodiment, four states of none / whit / neck / tilt / a combination thereof are recognized as head gesture recognition targets. Any known method can be used as a method for recognizing a head gesture. For example, described in “Ejiri Yasushi, Kobayashi Tetsunori,“ Recognition of Head Gestures During Dialogue ”, IEICE Technical Report, PRMU2002-61, pp.31-36, Jul.2002. (Reference 1) This method may be used. The presence or absence of utterance is a major indicator of the conversation role of the speaker / listener. In this embodiment, two states of utterance / silence are recognized as utterance presence / absence recognition targets. As a method for recognizing the presence or absence of utterance, it may be determined that the utterance is made when the audio power in the video is detected and a predetermined threshold is exceeded. Alternatively, the presence or absence of an utterance may be detected by detecting the movement of the conversation person's mouth in the video. Each action may be recognized by a single device, or may be recognized by using a separate device for each action. For example, in the case of facial expression recognition, “Patent No. 4942197 (Reference 2)” may be used as an example of an action recognition device. The action recognition unit 20 may perform manual labeling in the same manner as the empathy interpretation giving unit 30 and output the result.

また、表情や頭部ジェスチャに関しては、「強度」を推定して出力するとしてもよい。表情の強度は、対象とする表情である確率により求めることができる。また、頭部ジェスチャの強度は、振幅の最大値（頷きであれば、頷く角度の最大値）に対する取得された動作の振幅の値の割合により求めることができる。 For facial expressions and head gestures, “strength” may be estimated and output. The intensity of the facial expression can be obtained from the probability that the facial expression is the target. Further, the strength of the head gesture can be obtained from the ratio of the value of the amplitude of the acquired motion to the maximum value of the amplitude (the maximum value of the scooping angle if it is whispered).

共感解釈付与部３０は学習用映像記憶部７０に記憶されている学習用映像に基づいて複数の外部観察者が共感解釈をラベル付けした学習用共感解釈時系列を出力する（ステップＳ３０）。学習用共感解釈時系列は、学習用映像を複数の外部観察者に提示して、各時刻における対話二者間の共感解釈を外部観察者が人手によりラベル付けした時系列である。この実施形態では、二者間の対話状態として、共感／反感／どちらでもない、の3状態を対象とする。二者間の対話状態とは、同調圧力（自分とは異なる同じ意見を大勢の他者が持っているときにそれに従わなければならないと感じること）に深く関わり、合意形成や人間関係を構築する上での基本要素である。また、外部観察者が解釈するこれらの状態のことをまとめて共感解釈と呼ぶ。すなわち、この実施形態における対話状態解釈とは共感解釈である。 The empathy interpretation giving unit 30 outputs a learning sympathy interpretation time series in which a plurality of external observers label the sympathy interpretation based on the learning video stored in the learning video storage unit 70 (step S30). The learning sympathy interpretation time series is a time series in which learning videos are presented to a plurality of external observers, and the external observers manually label the sympathetic interpretations between the two conversations at each time. In this embodiment, the three states of empathy / disapproval / neither are targeted as the conversation state between the two parties. The state of dialogue between the two is deeply related to the pressure of entrainment (feeling that many others have to follow the same opinion different from their own) and build consensus building and relationships The basic element above. In addition, these states interpreted by an external observer are collectively referred to as empathy interpretation. That is, the dialogue state interpretation in this embodiment is a sympathy interpretation.

行動認識部２０の出力する学習用行動時系列と共感解釈付与部３０の出力する学習用共感解釈時系列とはパラメタ学習部４０に入力される。パラメタ学習部４０は、外部観察者の共感解釈と対話者の行動とを関連付けるモデルパラメタを学習する。モデルパラメタは、対話者間の共感解釈の事前分布と、対話者間の行動の時間差と対話者間の行動の一致性とに基づく共感解釈の尤度を表すタイミングモデルと、対話者間の行動の共起性に基づく共感解釈の尤度を表す静的モデルとを含む。 The learning action time series output from the action recognition unit 20 and the learning empathy interpretation time series output from the empathy interpretation assigning unit 30 are input to the parameter learning unit 40. The parameter learning unit 40 learns model parameters that relate the sympathy interpretation of the external observer and the behavior of the dialog person. Model parameters include a timing model that represents the likelihood of empathy interpretation based on prior distribution of empathy interpretation among the interlocutors, the time difference between the behaviors of the interlocutors and the consistency of the behavior between the interlocutors, And a static model representing the likelihood of sympathy interpretation based on the co-occurrence of.

パラメタ学習部４０の備える事前分布学習部４２は、学習用共感解釈時系列を用いて事前分布を学習する（ステップＳ４２）。パラメタ学習部４０の備えるタイミングモデル学習部４４は、学習用行動時系列と学習用共感解釈時系列とを用いてタイミングモデルを学習する（ステップＳ４４）。パラメタ学習部４０の備える静的モデル学習部４６は、学習用行動時系列と学習用共感解釈時系列とを用いて静的モデルを学習する（ステップＳ４６）。得られたモデルパラメタはモデルパラメタ記憶部７４に記憶される。 The prior distribution learning unit 42 included in the parameter learning unit 40 learns the prior distribution using the learning sympathy interpretation time series (step S42). The timing model learning unit 44 included in the parameter learning unit 40 learns a timing model using the learning action time series and the learning sympathy interpretation time series (step S44). The static model learning unit 46 included in the parameter learning unit 40 learns a static model using the learning action time series and the learning sympathy interpretation time series (step S46). The obtained model parameters are stored in the model parameter storage unit 74.

＜＜モデルの概要＞＞
この実施形態のモデルについて詳述する。この実施形態では、外部観察者が与える共感解釈は対話二者の組み合わせ毎に独立であることを仮定する。よって、以下では対話者が二人のみの場合を想定する。なお、対話者が三人以上の場合には、それぞれの対話二者の組み合わせのみに注目して学習と推定を行えばよい。 << Overview of model >>
The model of this embodiment will be described in detail. In this embodiment, it is assumed that the empathy interpretation given by the external observer is independent for each combination of two dialogues. Therefore, in the following, it is assumed that there are only two participants. When there are three or more interlocutors, learning and estimation may be performed by paying attention to only the combination of the two interrogators.

この実施形態では、対話者の行動の時系列Bが与えられたときの各時刻tでの外部観察者の共感解釈eの事後確率分布P(e_t|B)を、ナイーブベイズモデルを用いてモデル化し、その推定を行う。ナイーブベイズモデルは従属変数（ここでは共感解釈）と各説明変数（ここでは各対話者の行動）との間の確率的依存関係が説明変数間で独立であることを仮定する。ナイーブベイズモデルはシンプルであるにも関わらず多くの分野で高い推定性能を示すことが確認された優れたモデルである。この感情情報取得部においてナイーブベイズモデルを用いる利点は二つある。一つは、行動チャネル間の全ての共起（例えば、表情、視線、頭部ジェスチャ、および発話有無の全てが同時に発生した状態）をモデル化しないため、過学習を避けやすいという点である。これは、対象とする変数空間に対して学習サンプルが少ない場合に特に有効である。もう一つは、観測情報としての行動チャネルの追加や削除が容易という点である。 In this embodiment, the posterior probability distribution P (e _t | B) of the sympathetic interpretation e of the external observer at each time t given the time series B of the conversation person's behavior is _expressed using a naive Bayes model. Model and estimate. The naive Bayes model assumes that the stochastic dependence between the dependent variables (here, empathy interpretation) and each explanatory variable (here, the actions of each interactor) is independent among the explanatory variables. The Naive Bayes model is an excellent model that has been confirmed to show high estimation performance in many fields despite being simple. There are two advantages of using the naive Bayes model in this emotion information acquisition unit. One is that it is easy to avoid over-learning because it does not model all co-occurrence between behavioral channels (for example, a state in which all of facial expressions, gaze, head gestures, and utterances occur simultaneously). This is particularly effective when there are few learning samples for the target variable space. The other is that it is easy to add or delete action channels as observation information.

この実施形態におけるナイーブベイズモデルでは、事後確率分布P(e_t|B)は式（１）のように定義される。 In the naive Bayes model in this embodiment, the posterior probability distribution P (e _t | B) is defined as shown in Equation (1).

ここで、P(dt_t ^b|c_t ^b,e_t)はタイミングモデルであり、時刻tの周辺で行動チャネルbについて二者間の行動が時間差dt_t ^bで一致性c_t ^bであるときに外部観察者の共感解釈がeとなる尤度を表す。一致性cとは、二者間で行動が一致しているか否かを表す二値状態のことであり、対話二者の行動のカテゴリが同じか否かで判断する。P(b_t,e_t)は静的モデルであり、時刻tのその瞬間において行動チャネルbが対話二者間でどう共起しているのかをモデル化している。これら二つのモデルについては以下で順に説明する。P(e_t)は共感解釈eの事前分布であり、行動を考えないときに各共感解釈eがどれくらいの確率で生成されるかを表す。 Here, P (dt _t ^b | c _t ^b , e _t ) is a timing model, and when the behavior between the two parties is behavior coherence c _t ^b with time difference dt _t ^b around time t Represents the likelihood that the external observer's sympathy interpretation is e. The coincidence c is a binary state indicating whether or not the behaviors of the two parties are the same, and is determined based on whether or not the categories of the behaviors of the two parties are the same. P (b _t , e _t ) is a static model that models how the action channel b co-occurs between the two parties at the instant of time t. These two models will be described in turn below. P (e _t ) is a prior distribution of the sympathy interpretation e, and represents the probability that each sympathy interpretation e is generated when no action is considered.

＜＜タイミングモデル＞＞
この実施形態における行動チャネルbについてのタイミングモデルは式（２）のように定義される。 << Timing model >>
The timing model for the action channel b in this embodiment is defined as shown in Equation (2).

式（２）から明らかなように、このタイミングモデルは、対話二者の行動間の時間差がdtでありその一致性がcであるときの共感解釈eの尤度を表す時間差関数P(d~t_t ^b|c_t ^b,e_t)と、その相互作用の近辺で共感解釈eがどのタイミングで変化するかを表す変化タイミング関数π_tから構成されている。d~t_t ^bは、外部観察者の共感解釈の時系列をヒストグラム化した際のビン番号である。ビンサイズについては例えば200ミリ秒とする。 As is clear from the equation (2), this timing model has a time difference function P (d ~) representing the likelihood of the empathy interpretation e when the time difference between the actions of the two conversations is dt and the coincidence is c. t _t ^b | c _t ^b , e _t ) and a change timing function π _t representing the timing at which the sympathetic interpretation e changes in the vicinity of the interaction. d to t _t ^b are bin numbers when the time series of the external observer's empathy interpretation is converted into a histogram. For example, the bin size is 200 milliseconds.

なお、この実施形態では、それぞれの行動チャネルについてその行動チャネル内で二者間のタイミングモデルを構築したが、行動チャネル間のモデルを構築しても構わない。例えば、表情と頭部ジェスチャとの間の時間差dtと一致性cと、共感解釈eとの関係をモデル化することができる。ただしこの場合は、一致性cを決める際に各行動チャネルについて、例えば、肯定的／中立的／否定的といった、異なる行動チャネルの間でも一致性cを判断できるカテゴリ群を新たに導入する必要がある。これらのカテゴリについては、映像から行動チャネルを検出する際に認識してもよいし、一旦行動チャネルごとに異なるカテゴリ群で認識しておいて、表情が微小なら肯定的といったようにそれらのラベルを後から肯定的／中立的／否定的に分類し直しても構わない。 In this embodiment, for each behavior channel, a timing model between two parties is constructed within the behavior channel, but a model between behavior channels may be constructed. For example, the relationship between the time difference dt between the facial expression and the head gesture, the consistency c, and the empathy interpretation e can be modeled. However, in this case, when determining the consistency c, it is necessary to introduce a new category group that can determine the consistency c even between different behavior channels such as positive / neutral / negative, for example. is there. These categories may be recognized when the action channel is detected from the video, or once they are recognized by different category groups for each action channel, and their labels are affirmed if the facial expression is small. You may reclassify later as positive / neutral / negative.

＜＜時間差関数＞＞
時間差関数P(d~t_t ^b|c_t ^b,e_t)は、対話二者間の行動が行動チャネルbにおいて一致しているか否かを示す一致性cとその時間差dtによって共感解釈eがどの種類となりやすいかの尤度を表す。この実施形態では、外部観察者の共感解釈の時系列をヒストグラム化した際のビン番号d~t_t ^bを使用している。ビンサイズについては例えば200ミリ秒とする。 << Time difference function >>
The time difference function P (d ~ t _t ^b | c _t ^b , e _t ) indicates that the sympathetic interpretation e is based on the coincidence c indicating whether or not the actions between the two parties are matched in the action channel b and the time difference dt The likelihood of which type is likely to be represented. In this embodiment, bin numbers d to t _t ^b when the time series of the sympathy interpretation of the external observer are converted into a histogram are used. For example, the bin size is 200 milliseconds.

図８にこの実施形態の時間差関数の一例を表す。時間差関数P(d~t_t ^b|c_t ^b,e_t)は対話者の行動の一致性cと時間差のビン番号d~t_t ^bとにより共感解釈eの尤度を決定する。図８（Ａ）は対話者間の行動が一致する場合の時間差関数の一例であり、図８（Ｂ）は対話者間の行動が不一致の場合の時間差関数の一例である。例えば、対話者間の行動が一致する場合に、与え手の行動表出から受け手の反応表出の時間差が500ミリ秒であった場合には、共感解釈eが「共感」である尤度が約0.3、「どちらでもない」である尤度が約0.2、「反感」である尤度が約0.5となる。時間差関数は外部観察者がラベル付けした共感解釈の時系列を時間差ビン単位で集計し、共感解釈eのカテゴリ毎にすべての時間差ビンにおける尤度の総和が1となるように正規化することで求める。 FIG. 8 shows an example of the time difference function of this embodiment. Time difference function _{^{P (d ~ t t b |}} c t b, e t) determines the likelihood of sympathetic interpretation e by the bin number d ~ t _t ^b Consistency c and time difference of behavior of the interlocutor. FIG. 8A is an example of a time difference function when the actions between the interlocutors match, and FIG. 8B is an example of a time difference function when the actions between the interlocutors do not match. For example, if the behaviors of the interlocutors match, and the time difference between the giver's action expression and the receiver's reaction expression is 500 milliseconds, the likelihood that the empathy interpretation e is "sympathy" The likelihood of about 0.3, “Neither” is about 0.2, and the likelihood of “antisense” is about 0.5. The time difference function calculates the time series of empathy interpretations labeled by external observers in units of time difference bins, and normalizes the sum of likelihood in all time difference bins to be 1 for each category of empathy interpretation e. Ask.

＜＜変化タイミング関数＞＞
変化タイミング関数πはどのタイミングで共感解釈eが変化するかを表す。別の見方をすると、変化タイミング関数πは時間差関数がどの範囲にわたってどの程度の強さで式（１）における共感解釈eの推定に寄与するかを決定する。 << Change timing function >>
The change timing function π represents at which timing the empathy interpretation e changes. Viewed another way, the change timing function π determines to what extent the time difference function contributes to the estimation of the empathy interpretation e in equation (1) over which range.

この実施形態では変化タイミング関数を式（３）のようにモデル化する。 In this embodiment, the change timing function is modeled as shown in Equation (3).

ここで、t_aは対象の相互作用における与え手の行動表出開始の時刻を表す。また、時刻t'は与え手の行動表出開始の時刻をt'=0とし、受け手の反応表出開始時刻をt'=1としたときの相互作用中での相対時間を表し、t'=(t-t_a)/dtとして計算される。 Here, t _a represents the time behavior expression initiation hand given in the interaction of interest. In addition, time t ′ represents the relative time during the interaction when the action expression start time of the giver is t ′ = 0 and the reaction expression start time of the receiver is t ′ = 1. Calculated as = (tt _a ) / dt.

π=0は、式（１）で表される事後確率分布P(e_t|B)において、タイミングモデルP(dt_t ^b|c_t ^b,e_t)が全く寄与しないことを表す。π=1は、事後確率分布P(e_t|B)において、タイミングモデルP(dt_t ^b|c_t ^b,e_t)が完全に寄与することを表す。 π = 0 represents that the timing model P (dt _t ^b | c _t ^b , e _t ) does not contribute at all in the posterior probability distribution P (e _t | B) represented by the equation (1). π = 1 represents that the timing model P (dt _t ^b | c _t ^b , e _t ) contributes completely in the posterior probability distribution P (e _t | B).

条件dt>Lは、与え手の行動表出に対して受け手の反応表出が遅すぎることを表す。例えば、この実施形態では閾値Lを2秒とする。これは、話し手の語彙的に重要なフレーズに対する聞き手の表情表出がおよそ500〜2,500ミリ秒の範囲で起こるという研究結果を参考にした値であり、どの行動チャネルにおいても概ねこの範囲に収まるという仮定に基づく。上記の研究結果についての詳細は、「G. R. Jonsdottir, J. Gratch, E. Fast, and K. R. Thorisson, “Fluid semantic back-channel feedback in dialogue: Challenges & progress”, International Conference Intelligent Virtual Agents (IVA), pp. 154-160, 2007.（参考文献３）」を参照されたい。 The condition dt> L represents that the response expression of the receiver is too late with respect to the action expression of the giver. For example, in this embodiment, the threshold value L is 2 seconds. This is a value based on the research results that the expression of the listener's facial expression for the vocabulary important phrase of the speaker occurs in the range of about 500 to 2,500 milliseconds, and it is generally within this range in any action channel. Based on assumptions. For details on the above research results, see “GR Jonsdottir, J. Gratch, E. Fast, and KR Thorisson,“ Fluid semantic back-channel feedback in dialogue: Challenges & progress ”, International Conference Intelligent Virtual Agents (IVA), pp. 154-160, 2007. (Reference 3).

条件t-t_a>Wは、時刻tがそれ以前の直近で表出された与え手の表情表出からの時間経過が長いことを意味する。対話二者間でお互いに行動を表出して相互作用が行われると、それから一定の間は外部観察者の共感解釈がそのタイミングに影響を受けるが、その後しばらく次の相互作用が行われなければその影響はなくなるということをモデル化したものである。閾値Wは正の値であればどのような値でもよく、二者対話のように対象の二者間で絶えず相互作用が発生する場合には無限大としても問題無い。しかし、大人数での対話で主に一人が話しているといった状況で、その中のある二人の聞き手の間での相互作用といったように、必ずしも相互作用が頻繁とは限らない場合には閾値Wが長すぎる場合も考えられる。この実施形態では経験的に閾値Wを4秒とする。これは、閾値Wを4秒付近に設定した場合に推定精度が最も高くなったという実験結果に基づくものである。 The condition tt _a > W means that the time elapses from the expression of the facial expression of the giving hand that was most recently expressed before time t. When interaction is performed by expressing actions between the two parties, the sympathy interpretation of the external observer is affected by the timing for a certain period of time, but if the next interaction does not occur for a while after that, It is modeled that the effect disappears. The threshold value W may be any value as long as it is a positive value, and there is no problem even if the threshold value W is infinite when interaction between the two parties is continuously generated as in a two-party dialogue. However, in a situation where one person is mainly speaking in a dialogue with a large number of people and the interaction is not always frequent, such as an interaction between two listeners, a threshold is set. It is also possible that W is too long. In this embodiment, the threshold value W is empirically set to 4 seconds. This is based on the experimental result that the estimation accuracy is the highest when the threshold W is set to around 4 seconds.

図９に共感解釈、与え手の行動表出、および受け手の反応表出の一例を示す。図９の塗りつぶしパターンは行動もしくは共感解釈のカテゴリの違いを表す。αとβの値については例えばα=0.2、β=0.8と設定する。これらの値は、式（３）の変化タイミング関数πが累積確率を最も近似するように定めたものである。 FIG. 9 shows an example of sympathy interpretation, behavioral expression of the giver, and response expression of the receiver. The filled pattern in FIG. 9 represents the difference in the category of behavior or empathy interpretation. For the values of α and β, for example, α = 0.2 and β = 0.8 are set. These values are determined so that the change timing function π of Equation (3) approximates the cumulative probability most.

図１０に変化タイミング関数πの一例を示す。グラフ上にプロットした点は、実際に女性4名の対話グループ4つ（計16名）の対話データに対して計9名の外部観察者が与えた共感解釈のラベルおいて、そのラベルが相対時刻t'中のどこで変化したかの累積確率を表す。この変化タイミング関数によってよく近似できていることが見て取れる。但し、αとβはこの値に限らなくてもよく、α+β=1、0≦α≦1、0≦β≦1を満たすようにする。簡単な設定としては，「α=0、β=1」でもかまわない。 FIG. 10 shows an example of the change timing function π. The points plotted on the graph are actually the empathetic interpretation labels given by nine external observers to the dialogue data of four dialogue groups of four women (16 people in total). This represents the cumulative probability of the change at time t ′. It can be seen that this change timing function can be approximated well. However, α and β are not limited to these values, and α + β = 1, 0 ≦ α ≦ 1, and 0 ≦ β ≦ 1 are satisfied. As a simple setting, “α = 0, β = 1” may be used.

図１１，図１２は変化タイミング関数の有効範囲の一例を模式的に表した図である。黒の塗りつぶしは行動が検出されていない状態、白の塗りつぶしと斜めのハッチングは行動のカテゴリを表している。共感解釈の縦のハッチングは共感であること、横のハッチングは反感であることを表している。図１１（Ａ）は対話者間の行動が一致した場合についての有効範囲を表している。与え手の行動と受け手の反応が一致しているため「共感」が閾値Wの間だけ継続している。図１１（Ｂ）は対話者間の行動が不一致であった場合についての有効範囲を表している。与え手の行動と受け手の反応が不一致であるため「反感」が閾値Wの間だけ継続している。図１１（Ｃ）は与え手の行動表出に対して受け手の反応表出が遅すぎる、すなわちdt>Lであるために変化タイミング関数が有効範囲外となっている状況を表している。この場合は全体を通して「どちらでもない」状態が継続している。図１２は対話二者が交互に行動を表出したときの有効範囲である。基本的な考え方は図１１（Ａ）〜（Ｃ）と同様である。 11 and 12 are diagrams schematically showing an example of the effective range of the change timing function. A black fill indicates a state in which no action is detected, and a white fill and diagonal hatching indicate a category of action. The vertical hatching of the sympathy interpretation indicates empathy, and the horizontal hatching indicates counteraction. FIG. 11A shows an effective range in the case where actions between the interlocutors coincide. Since the behavior of the giver and the response of the recipient match, “sympathy” continues only during the threshold W. FIG. 11B shows an effective range in the case where the behaviors between the interlocutors do not match. Since the behavior of the giver and the response of the recipient are inconsistent, “disgust” continues only during the threshold W. FIG. 11C shows a situation where the response expression of the receiver is too late with respect to the action expression of the giver, that is, the change timing function is out of the effective range because dt> L. In this case, the “neither” state continues throughout. FIG. 12 shows the effective range when the two dialogues alternately express their actions. The basic concept is the same as that shown in FIGS.

＜＜静的モデル＞＞
静的モデルP(b_t|e_t)は、時刻tに行動チャネルbについて対話二者間で特定の行動が共起した場合に、共感解釈eがどの程度の尤度で生成されるかをモデル化したものである。 << Static model >>
The static model P (b _t | e _t ) shows the likelihood that the sympathetic interpretation e is generated when a specific action co-occurs between two parties for the action channel b at time t. Modeled.

モデル化の方法は、表情と視線については特許文献１および非特許文献１にて提案されているため、これらの文献の記載に従えばよく、対話二者間の視線状態のモデルと、その視線状態毎の表情の状態との共起のモデルとを組み合わせればよい。ここで、二者間の視線状態とは、例えば、相互凝視／片側凝視／相互そらし、の3状態が考えられる。 The modeling method has been proposed in Patent Document 1 and Non-Patent Document 1 for facial expression and line of sight, and therefore, it is sufficient to follow the description in these documents. What is necessary is just to combine the model of co-occurrence with the state of the expression for every state. Here, the gaze state between the two may be, for example, three states of mutual gaze / one-side gaze / mutual gaze.

頭部ジェスチャについての静的モデルはP(g|e)で表される。ここで、gは二者間での頭部ジェスチャの組み合わせ状態を表す。対象とする頭部ジェスチャの状態数をN_gとすると、二者間での頭部ジェスチャの組み合わせの状態数はN_g×N_gとなる。カテゴリとして任意の種類と数を対象としても構わないが、数が多すぎると学習サンプル数が少ない場合に過学習に陥りやすい。その場合は、最初に用意したカテゴリをさらにクラスタリングによりグルーピングしても構わない。例えば、その方法の一つとしてSequential Backward Selection (SBS)が挙げられる。例えば頭部ジェスチャのカテゴリを対象とする場合、頭部ジェスチャのみを用いた推定、すなわち事後確率をP(e|B):=P(e)P(g'|e)として、すべてのカテゴリから推定精度が最高になるように選択した二つのカテゴリを統合して一つにまとめる。これを推定精度が悪化する直前まで繰り返すことで一つずつカテゴリ数を減らしていけばよい。ここで、g’はグルーピング後における二者間での頭部ジェスチャの組み合わせ状態である。発話有無についても頭部ジェスチャと同様に二者間の共起をモデル化する。 The static model for head gestures is represented by P (g | e). Here, g represents a combination state of head gestures between two parties. When the number of states of the head gestures of interest and N _g, the number of states of combinations of head gestures between two parties becomes N _{_g} × N _g. Arbitrary types and numbers may be targeted as categories, but if the number is too large, overlearning tends to occur when the number of learning samples is small. In that case, the categories prepared first may be further grouped by clustering. For example, Sequential Backward Selection (SBS) is one of the methods. For example, when targeting the category of head gesture, the estimation using only head gesture, that is, the posterior probability is P (e | B): = P (e) P (g '| e) The two categories selected for the best estimation accuracy are integrated into one. It is sufficient to reduce the number of categories one by one by repeating this until just before the estimation accuracy deteriorates. Here, g ′ is a combined state of the head gesture between the two after grouping. As for the presence or absence of utterance, the co-occurrence between two parties is modeled in the same way as the head gesture.

＜＜モデルの学習方法＞＞
この実施形態では、いずれのモデルについても離散状態として記述されているため、学習フェーズではその離散状態が学習サンプル中に何回出現したかの頻度を取り、最後にその頻度を正規化（確率化）すればよい。 << Model learning method >>
In this embodiment, since any model is described as a discrete state, in the learning phase, the frequency of how many times the discrete state appears in the learning sample is taken, and finally the frequency is normalized (probabilized). )do it.

このとき、モデルを準備する方針として、モデルパラメタの学習に使用する学習用映像に撮影された対話者の集団と、対話状態を推定したい推定用映像に撮影された対話者の集団が同一であれば、対話二者毎にそれぞれ独立にパラメタを学習し、ある対話二者についての推定にはその対話二者のデータから学習したパラメタを用いるとすればよい。他方、学習用映像に撮影された対話者の集団と、推定用映像に撮影された対話者の集団が異なる場合には、対話二者を区別せずに一つのモデルを学習し、その一つのモデルを使用して推定したい対話二者についての推定を行えばよい。 At this time, as a policy to prepare the model, if the group of conversations captured in the learning video used to learn the model parameters is the same as the group of conversations captured in the estimation video for which the conversation state is to be estimated. For example, the parameters are learned independently for each of the two conversations, and the parameters learned from the data of the two conversations may be used for estimation of the two conversations. On the other hand, if the group of interrogators captured in the video for learning differs from the group of interrogators captured in the video for estimation, one model is learned without distinguishing between the two conversations, It is only necessary to make an estimation about two parties who want to estimate using the model.

＜推定フェーズ＞
図７を参照して、感情情報取得部１１０の推定フェーズにおける動作例を説明する。 <Estimation phase>
With reference to FIG. 7, the operation example in the estimation phase of the emotion information acquisition unit 110 will be described.

入力部１０へ推定用映像が入力される（ステップＳ１２）。推定用映像は、複数の人物が対話する状況を撮影した映像であり、少なくとも対話者の頭部が撮影されていなければならない。また、推定用映像は学習用映像とは異なる未知の映像である。推定用映像の撮影方法は上述の学習フェーズにおける学習用映像の撮影方法と同様である。入力された推定用映像は推定用映像記憶部７２に記憶される。 The estimation video is input to the input unit 10 (step S12). The estimation video is a video that captures a situation in which a plurality of persons interact, and at least the conversation person's head must be captured. The estimation video is an unknown video different from the learning video. The estimation video capturing method is the same as the learning video capturing method in the learning phase described above. The input estimation video is stored in the estimation video storage unit 72.

行動認識部２０は推定用映像記憶部７２に記憶されている推定用映像を入力として、推定用映像に撮影された各対話者の行動として、表情、視線、頭部ジェスチャ、発話有無などを検出し、その結果生成された対話者の行動の時系列Bを出力する（ステップＳ２２）。行動の認識方法は上述の学習フェーズにおける行動の認識方法と同様であるので、ここでは説明を省略する。 The action recognition unit 20 receives the estimation video stored in the estimation video storage unit 72, and detects facial expressions, gaze, head gestures, presence / absence of speech, etc. as the actions of each person captured in the estimation video Then, the time series B of the behavior of the dialogue person generated as a result is output (step S22). Since the behavior recognition method is the same as the behavior recognition method in the learning phase described above, description thereof is omitted here.

行動認識部２０の出力する推定用行動時系列Bは事後確率推定部５０に入力される。事後確率推定部５０は、モデルパラメタ記憶部７４に記憶されているモデルパラメタを用いて、推定用行動時系列Bから時刻tにおける対話者間の共感解釈の事後確率分布P(e_t|B)を推定する（ステップＳ５０）。事後確率推定部５０は、推定用映像に基づいて生成された対話者の行動の時系列Bと、パラメタ学習部４０で学習した事前分布とタイミングモデルと静的モデルの各パラメタを含むモデルパラメタとを入力として、上記の式（１）に従って、時刻tにおける共感解釈eの事後確率分布P(e_t|B)を計算する。 The estimation action time series B output from the action recognition unit 20 is input to the posterior probability estimation unit 50. The posterior probability estimation unit 50 uses the model parameters stored in the model parameter storage unit 74 to determine the posterior probability distribution P (e _t | B) of the sympathy interpretation between the conversationers from the estimation action time series B to the time t. Is estimated (step S50). The posterior probability estimation unit 50 includes a time series B of the conversation person's behavior generated based on the estimation video, a model parameter including each parameter of the prior distribution, timing model, and static model learned by the parameter learning unit 40. As an input, the posterior probability distribution P (e _t | B) of the empathy interpretation e at time t is calculated according to the above equation (1).

出力部６０は、対話者間の共感解釈eの事後確率分布P(e_t|B)を出力する（ステップＳ６０）。共感解釈の推定結果を確率分布ではなく一つの種類として出力する必要がある場合には、事後確率が最も高い共感解釈の種類、すなわちe~_t=argmax_{e_t} P(e_t|B)を対話状態値e~_tとして併せて出力すればよい。 The output unit 60 outputs the posterior probability distribution P (e _t | B) of the sympathy interpretation e between the interlocutors (step S60). When it is necessary to output the estimation result of empathy interpretation as one type instead of probability distribution, the type of empathy interpretation with the highest posterior probability, that is, e ~ _t = argmax _{e_t} P (e _t | B) The values e to _t may be output together.

つまり、感情情報取得部１１０は、この共感解釈eの事後確率分布P(e_t|B)、または、事後確率が最も高い共感解釈の種類、すなわちe~_t=argmax_{e_t} P(e_t|B)を感情情報として求める。本実施形態では、事後確率が最も高い共感解釈の種類e~_t=argmax_{e_t} P(e_t|B)を感情情報として求め、出力するものとする。なお、感情情報取得部１１０が、この共感解釈eの事後確率分布P(e_t|B)を感情情報として求め、出力する場合には、図１６に示すように各事後確率分布P(e_t|B)を棒グラフ等で表示するにように制御してもよい。詳細は後述する変形例の中で説明する。 In other words, the emotion information acquisition unit 110, the posterior probability distribution P of the sympathetic interpretation e (e _t | B), or the type of posterior probability is highest sympathy interpretation, namely _{_{e ~ t = argmax e_t P (}} e t | B ) As emotional information. In the present embodiment, the type of posterior probability is highest sympathy interpretation _e ~ _t = argmax e_t P | seeking (e _t B) as the emotion information, and outputs. When the emotion information acquisition unit 110 obtains and outputs the posterior probability distribution P (e _t | B) of the sympathy interpretation e as emotion information, as shown in FIG. 16, each posterior probability distribution P (e _t | B) may be controlled to be displayed as a bar graph or the like. Details will be described in a modification described later.

＜制御部１２０＞
制御部１２０は、Ｎ人が撮影された映像と感情情報とを受け取り、（１）感情情報のうち、Ｎ人のうちの１人である第一の人物２−ｉ（ｉは１，２，…，Ｎの何れか）と他者２−ｊ（ｊ＝１，２，…，Ｎ、ただし、ｉ≠ｊ）との間の感情情報σ_i,j、（２）感情情報σ_i,j以外の感情情報σ_i",j"（ｉ”＝１，２，…，Ｎ、ただしｉ”≠ｉ、ｊ”＝１，２，…，Ｎ、ただし、ｉ”≠ｊ”かつｉ≠ｊ”）、（３）全ての感情情報σ_i',j'（ｉ’＝１，２，…，Ｎ、ｊ’＝１，２，…，Ｎ、ただし、ｉ’≠ｊ’）を切り替えて、表示装置３に表示するように制御する（ｓ２）。 <Control unit 120>
The control unit 120 receives the video and emotion information in which N people are photographed, and (1) the first person 2-i (i is 1, 2, 1) which is one of the N people in the emotion information. ..., N) and emotion information σ _{i, j} between others 2-j (j = 1, 2,..., N, i ≠ j), (2) emotion information σ _{i, j} Emotion information σ _{i ″, j ″} (i ″ = 1, 2,..., N, where i ″ ≠ i, j ″ = 1, 2,..., N, where i ″ ≠ j ″ and i ≠ j “), (3) All emotion information σ _{i ′, j ′} (i ′ = 1, 2,..., N, j ′ = 1, 2,..., N, where i ′ ≠ j ′) Then, the display device 3 is controlled to display (s2).

例えば、Ｎ＝４、ｉ＝１とし、制御部１２０は、４人がそれぞれ撮影された４つ映像と感情情報σ_n,n'とを受け取り（ただし、ｎ＝１，２，３，４、ｎ’＝１，２，３，４、ｎ≠ｎ’、σ_n,n'=σ_n',nである）、４つの映像を、表示装置３で表示できるように合成する。 For example, _assuming that N = 4 and i = 1, the control unit 120 receives four videos and emotion information σ _{n, n ′} taken by each of four people (where n = 1, 2, 3, 4, (n ′ = 1, 2, 3, 4, n ≠ n ′, σ _{n, n ′} = σ _{n ′, n} )) Four images are synthesized so that they can be displayed on the display device 3.

制御部１２０は、切替情報に基づいて、上述の（１）〜（３）の感情情報σ_i,j、σ_i",j"、σ_i',j'の何れかを表示装置３に表示するように制御する。切替情報は、（１）〜（３）の感情情報σ_i,j、σ_i",j"、σ_i',j'のうち、どの感情情報を表示するかを特定するための情報であり、例えば、（Ａ）表示装置３を視聴している視聴者や、感情情報表示制御装置１００の利用者等によって、選択された情報（以下「選択情報」ともいう）、（Ｂ）時刻情報、（Ｃ）全ての感情情報σ_i',j'等が考えられる。 Based on the switching information, the control unit 120 displays any one of the emotion information σ _{i, j} , σ _{i ″, j ″} , σ _{i ′, j ′} of (1) to (3) described above on the display device 3. Control to do. The switching information is information for specifying which emotion information is to be displayed among the emotion information σ _{i, j} , σ _{i ″, j ″} , σ _{i ′, j ′} of (1) to (3). For example, (A) information selected by a viewer watching the display device 3 or a user of the emotion information display control device 100 (hereinafter also referred to as “selected information”), (B) time information, (C) All emotion information σ _{i ′, j ′} etc. can be considered.

（Ａ）の選択情報は上述の（１）〜（３）に対応するように設定する。例えば、感情情報表示制御装置１００の図示しない表示部に
「以下の中から表示する感情情報を選択してください
１．第一の人物と他者との間の共感情報
２．１．以外の共感情報
３．全ての共感情報」
と表示し、利用者が図示しない入力部（マウスやキーボード等）を操作して入力した情報に対応する感情情報を表示装置３に表示するように制御する。なお、この際、第一の人物を選択する処理を追加してもよい。例えば、感情情報表示制御装置１００の図示しない表示部に「第一の人物を選択してください」と表示し、利用者が図示しない入力部を操作して入力した情報に基づき第一の人物を特定する。なお、第一の人物を選択する処理は、表示する感情情報を選択する前でも後でもよい。 The selection information of (A) is set so as to correspond to the above (1) to (3). For example, “Please select emotion information to be displayed from the following: 1. Empathy information between first person and others 2.1. Information 3. All empathy information "
And the emotion information corresponding to the information input by the user operating an input unit (mouse, keyboard, etc.) (not shown) is displayed on the display device 3. At this time, a process of selecting the first person may be added. For example, “Please select the first person” is displayed on the display unit (not shown) of the emotion information display control device 100, and the first person is selected based on the information input by the user operating the input unit (not shown). Identify. The process for selecting the first person may be before or after selecting emotion information to be displayed.

（Ｂ）の時刻情報を切替情報として利用する場合には、所定の時間（例えば３０秒）経過毎に（１）〜（３）の感情情報σ_i,j、σ_i",j"、σ_i',j'を順番に表示装置３に表示するように制御する。このとき、（１）〜（３）の感情情報σ_i,j、σ_i",j"、σ_i',j'毎に異なる時間を設定してもよい。例えば、（１）及び（２）の感情情報σ_i,j、σ_i",j"は１０秒ずつ表示し、（３）の感情情報σ_i',j'は５秒で表示を切り替えるといった制御も可能である。さらに、第一の人物を所定の時間経過毎に切り替えてもよい。例えば、まず人物２−１について（１）〜（３）を表示し、以降、他の人物について順番に同じく（１）〜（３）を表示し、全ての人物について表示し終わったら、また人物２−１に戻るという制御を行ってもよい。 When the time information of (B) is used as the switching information, the emotion information σ _{i, j} , σ _{i ", j"} , σ of (1) to (3) every elapse of a predetermined time (for example, 30 seconds). Control is performed so that _{i ′ and j ′} are sequentially displayed on the display device 3. At this time, a different time may be set for each of the emotion information σ _{i, j} , σ _{i ″, j ″} , σ _{i ′, j ′ of} (1) to (3). For example, the emotion information σ _{i, j} and σ _{i ", j"} of (1) and (2) are displayed every 10 seconds, and the emotion information σ _{i ', j'} of (3) is switched in 5 seconds. Control is also possible. Furthermore, you may switch a 1st person for every predetermined time progress. For example, first, (1) to (3) are displayed for the person 2-1, and thereafter the same (1) to (3) are sequentially displayed for the other persons. You may perform control to return to 2-1.

（Ｃ）の全ての感情情報σ_i',j'を切替情報として利用する場合には、例えば、共感や反感の割合が高い感情情報σ_i,j、σ_i",j"、σ_i',j'を表示装置３に表示するように制御する。また、大きく変化した感情情報の割合が高い感情情報σ_i,j、σ_i",j"、σ_i',j'を表示装置３に表示するように制御する。 When all the emotion information σ _{i ′, j ′} in (C) is used as the switching information, for example, emotion information σ _{i, j} , σ _{i ″, j ″} , σ _{i ′} having a high proportion of empathy or disagreement. _{, j ′} is controlled to be displayed on the display device 3. Further, control is performed so that emotion information σ _{i, j} , σ _{i ", j"} , σ _{i ', j'} having a high ratio of greatly changed emotion information is displayed on the display device 3.

（Ａ）〜（Ｃ）の情報（選択情報、時刻情報、感情情報）の組合せを切替情報として利用してもよい。例えば、選択情報、時刻情報、感情情報に対して優先度を付与しておき、基本的には優先度が高い切替情報に基づき、上述の（１）〜（３）の感情情報σ_i,j、σ_i",j"、σ_i',j'の何れかを表示装置３に表示するように制御する。例えば、優先度が選択情報＞感情情報＞時刻情報となるように設定し、選択情報を受け取った場合には、選択情報に基づき感情情報σ_i,j、σ_i",j"、σ_i',j'の何れかを表示装置３に表示するように制御する。選択情報を受け取っていない場合や受け取ってから何ら操作されず相当の時間（選択情報を入力したものの選択の意思がもはや感じられないと判断できる程度の時間）が経過した場合には、共感の割合が高い感情情報σ_i,j、σ_i",j"、σ_i',j'を表示装置３に表示するように制御する。選択情報を受け取っていない場合や受け取ってから何ら操作されず相当の時間が経過した場合であって、感情情報σ_i,j、σ_i",j"、σ_i',j'のなかに共感の割合が高い感情情報が存在しない場合には、時刻情報に基づき感情情報σ_i,j、σ_i",j"、σ_i',j'の何れかを表示装置３に表示するように制御する。 A combination of information (selection information, time information, emotion information) of (A) to (C) may be used as switching information. For example, priorities are given to selection information, time information, and emotion information, and basically the emotion information σ _{i, j} described in (1) to (3) above based on switching information having a high priority. , Σ _{i ″, j ″} and σ _{i ′, j ′} are controlled to be displayed on the display device 3. For example, when the priority is set such that selection information> emotion information> time information and selection information is received, emotion information σ _{i, j} , σ _{i ″, j ”} , σ _{i ′} is based on the selection information. _{, j ′} is controlled to be displayed on the display device 3. The percentage of sympathy if no selection information has been received or if a considerable amount of time has passed without any operation since it was received (a time that can be determined that the intention of selection is no longer felt after entering the selection information) Is controlled so as to display emotion information σ _{i, j} , σ _{i ″, j ″} , σ _{i ′, j ′} having a high value on the display device 3. When selection information has not been received or when no operation has been performed and a considerable amount of time has passed, sympathy can be found in emotion information σ _{i, j} , σ _{i ", j"} , σ _{i ', j'} When there is no emotion information with a high ratio, control is performed so that any one of the emotion information σ _{i, j} , σ _{i ", j"} , σ _{i ', j'} is displayed on the display device 3 based on the time information. To do.

どの感情情報σ_i,j、σ_i",j"、σ_i',j'を表示装置３に表示するかが決まると、制御部１２０は、表示する感情情報σ_i,j、σ_i",j"、σ_i',j'に応じて以下の処理を行う。 When it is determined which emotion information σ _{i, j} , σ _{i ″, j ″} , σ _{i ′, j ′ is} to be displayed on the display device 3, the control unit 120 displays the emotion information σ _{i, j} , σ _{i ″ to be} displayed. _{, j "} , σ _{i ', j'} , the following processing is performed.

（１）の感情情報σ_i,jを表示装置３に表示するように制御する場合、制御部１２０は、合成した映像に第一の人物２−１と他者２−２、２−３、２−４との間の感情情報σ_1,2，σ_1,3，σ_1,4を付加して、表示装置３に出力する。 When controlling the emotion information σ _{i, j} of (1) to be displayed on the display device 3, the control unit 120 adds the first person 2-1 and the other persons 2-2, 2-3 to the synthesized video. Emotion information σ _1,2 , σ _1,3 , σ _1,4 between ₂ and ₄ is added and output to the display device 3.

（２）の感情情報σ_i",j"を表示装置３に表示するように制御する場合、制御部１２０は、合成した映像に第一の人物２−１と他者２−２、２−３、２−４との間の感情情報σ_1,2，σ_1,3，σ_1,4以外の感情情報σ_2,3，σ_2,4，σ_3,4を付加して、表示装置３に出力する。 When controlling the emotion information σ _{i ", j"} of (2) to be displayed on the display device 3, the control unit 120 adds the first person 2-1 and others 2-2, 2- to the synthesized video. emotion information sigma _{1, 2} between 3,2-4, σ _1,3, σ _1,4 other emotion information σ _2,3, σ _2,4, by adding a sigma _{3, 4,} the display device 3 is output.

（３）の感情情報σ_i',j'を表示装置３に表示するように制御する場合、合成した映像に全ての感情情報を付加して、表示装置３に出力する。 When controlling the emotion information σ _{i ′, j ′} of (3) to be displayed on the display device 3, all emotion information is added to the synthesized video and output to the display device 3.

図１３は（１）の場合に、図１４は（２）の場合に、図１５は（３）の場合に表示装置３に表示される映像の例を表す。なお、この図において、人物を結ぶ実線は感情情報σ_n,n'が共感であることを、破線は反感であることを表す（実際の利用に際しては、人物を結ぶ線の色や点滅等、さらには、人物を結ぶ線を用いずに、人物の位置や大きさ等の様々な視覚的な差異により、利用者が感情情報の差異を知覚できるものであればどのような表示方法であってもよい）。なお、感情情報は、２人の間において相互に共通する感情の度合いを示すものなので、感情情報σ_n,n'=σ_n',nである。言い換えると、２人の間の、一方から他方への感情の度合いと、他方から一方への感情の度合いは同じである。 13 shows an example of an image displayed on the display device 3 in the case of (1), FIG. 14 in the case of (2), and FIG. 15 in the case of (3). In this figure, the solid line connecting the persons indicates that the emotion information σ _{n, n ′} is sympathetic, and the broken line indicates counteracting (in actual use, the color of the line connecting the persons, blinking, etc. Furthermore, any display method can be used as long as the user can perceive a difference in emotional information due to various visual differences such as the position and size of the person without using a line connecting the persons. Also good). Since emotion information indicates the degree of emotion shared by two people, emotion information σ _{n, n ′} = σ _{n ′, n} . In other words, the degree of emotion from one to the other and the degree of emotion from the other to the other are the same.

また、感情情報として、各時点の値を用いてもよいし、各時点からΔＴ時刻前までの値の最頻値などの代表値を用いてもよい。また、感情情報と閾値とを比較し、閾値以上のときのみ表示するように制御してもよい。感情情報取得部１１０で事後確率分布P(e_t|B)を求める場合には、各時点からΔＴ時刻前までの値の平均値を、感情情報として用いてもよい。 In addition, as emotion information, a value at each time point may be used, or a representative value such as a mode value of values from each time point before ΔT time may be used. Further, the emotion information and the threshold value may be compared, and control may be performed so that the information is displayed only when the emotion information is equal to or greater than the threshold value. When the emotion information acquisition unit 110 obtains the posterior probability distribution P (e _t | B), an average value of values from each time point before ΔT time may be used as emotion information.

＜効果＞
このような構成により、感情情報を把握しやすくすることができる。例えば、第一の人物と他者との間の感情情報σ_i,jを表示することで、第一の人物の感情情報を把握しやすくすることができ、第一の人物のコミュニケーション能力等を見る際に有効である。第一の人物が、表示装置を見る場合には、自分自身と他者との感情状態を容易に把握することができる。第一の人物に対して他者全員が反感し、第一の人物と他者との間の感情情報以外の感情情報が全て共感である場合、第一の人物が自分自身だけ異なる意見を有することを即座に察知して話題を変えるなどの対応ができ、場の円満を図ることができる。また、第一の人物と他者との間の感情情報以外の感情情報σ_i",j"を表示することで、第一の人物以外の人物の感情情報を把握しやすくすることができ、第一の人物以外の人物間で行われている対話の状況やそれら人物のコミュニケーション能力等を見る際に有効である。感情情報σ_i,jとσ_i",j"とを切り替えて表示することで、第一の人物の感情情報を把握しつつ、第一の人物以外の人物の間の感情情報を把握しやすくなる。例えば、ディスカッション等の司会者や進行役を第一の人物として設定し、感情情報σ_i,jとσ_i",j"とを切り替えて表示することで、感情情報σ_i,jにより司会者等の司会進行が上手くいっているか確認しつつ、感情情報σ_i",j"により参加者間の話し合いの状況を把握しやすくすることができる。 <Effect>
With such a configuration, emotion information can be easily grasped. For example, by displaying emotion information σ _{i, j} between the first person and another person, it is easier to grasp the emotion information of the first person, and the communication ability of the first person can be improved. Effective when viewing. When the first person looks at the display device, the emotional state between himself and others can be easily grasped. If everyone else feels against the first person and all emotion information other than the emotion information between the first person and the other person is sympathetic, the first person has a different opinion You can immediately detect this and respond to it by changing the topic, etc., and you can make the place full. In addition, by displaying emotion information σ _{i ", j"} other than emotion information between the first person and others, it is possible to make it easier to grasp emotion information of persons other than the first person, This is effective when viewing the status of dialogues between persons other than the first person and their communication skills. By switching and displaying emotion information σ _{i, j} and σ _{i ", j",} it is easy to grasp emotion information between persons other than the first person while grasping emotion information of the first person Become. For example, by setting a moderator or facilitator for discussion as the first person and switching and displaying emotion information σ _{i, j} and σ _{i ", j"} , the moderator based on emotion information σ _{i, j} It is possible to make it easier to grasp the situation of the discussion between the participants using the emotion information σ _{i ", j"} while confirming whether the moderator progresses well.

また、全ての感情情報σ_i',j'を、感情情報σ_i,jや感情情報σ_i",j"とを切り替えて表示することで、会話全体における、第一の人物や第一の人物以外の人物のコミュニケーション能力等やその役割等を容易に把握することができる。 In addition, by switching all emotion information σ _{i ', j'} to emotion information σ _{i, j} and emotion information σ _{i ", j"} , the first person and the first person in the whole conversation It is possible to easily grasp the communication ability and the role of a person other than a person.

＜変形例＞
本実施形態では、共感を二者の感情状態が類似している状態、反感をそれらが異なっている状態と定義したが、外部観測者がある二者を見たときにその二者が「共感」していると感じられるものを『共感』と定義し、「反感」していると感じられるものを『反感』と定義してもよい。つまり、ここでは、観測者各々の内部にある共感・反感の定義に従うこととしている。これは、観測者各々の間で共感・反感の定義や共感・反感を読み取る能力がまちまちであり、むしろ、そのばらつきがあること自体がコミュニケーションの本質であり、複数の観測者による共感・反感の解釈を集めたものでもって客観的な共感・反感の定義とするという立場である。つまり、ここでモデル化したい『共感・反感』という事象には、対話の場における対話者間の共感・反感の不確定性と、外部観測者による共感・反感の定義・解釈の不確定性の両者がたたみ込まれている。 <Modification>
In this embodiment, empathy is defined as a state in which the emotional states of the two are similar, and antipathy is defined as a state in which they are different. “Empathy” may be defined as what is felt as “sympathetic”, and “sympathy” may be defined as what is felt as “dissatisfied”. In other words, here we are following the definition of empathy / antisense inside each observer. This is because the ability to read the definition of empathy / antipathy among each observer and the ability to read the empathy / antipathy varies, but rather the variability itself is the essence of communication. It is in a position to define an objective definition of empathy / antisense with a collection of interpretations. In other words, the phenomenon of “sympathy / antisense” that we want to model here is the uncertainties of empathy / antisense among the interlocutors in the field of dialogue and the uncertainty of definition / interpretation of empathy / antisense by external observers. Both are folded.

本実施形態では、Ｎ人が撮影された映像を入力としているが、Ｎ’（＞Ｎ）人が撮影された映像を入力とし、その一部（Ｎ’人のうちのＮ人）に係る感情情報のみを表示するように制御する構成としてもよい。ただし、３≦Ｎ＜Ｎ’である。より詳しく言うと、Ｎ人のうちの１人を第一の人物とし、Ｎ人のうちの第一の人物以外の（Ｎ−１）人を他者とする。例えば、６人が撮影された映像を入力とし、その中の４人に関する感情情報を表示するように制御してもよい。その４人のうちの１人を第一の人物２−ｉとし、残りの３人を他者２−ｊとして設定する。Ｎ人より多くの人物が撮影された映像を受け取る場合、制御部１２０は、その映像と感情情報とを受け取り、Ｎ人より多くの人物の中からＮ人を選択し、さらにＮ人の中から第一の人物を選択する。なお、Ｎ人及び第一の人物の選び方は、ユーザの指定であっても、予め指定された人物であってもよい。他の処理については、制御部１２０で説明した通りである。Ｎ人及び第一の人物の選択は、感情情報取得部１１０の前後の何れであってもよい。感情情報取得部１１０は、撮影された映像に含まれる（Ｎ人より多い）全ての対話者を対象として二者間の感情情報を取得しても良いし、感情情報を表示する一部の人（Ｎ人）を対象として二者間の感情情報を取得しても良い。 In this embodiment, an image in which N people are photographed is input, but an image in which N ′ (> N) people are photographed is input, and emotions related to a part (N of N ′ people). It is good also as a structure controlled so that only information is displayed. However, 3 ≦ N <N ′. More specifically, one of N persons is a first person, and (N-1) persons other than the first person among N persons are other persons. For example, it may be controlled so that video images of six people are input and emotion information regarding four of them is displayed. One of the four persons is set as the first person 2-i, and the remaining three persons are set as the other person 2-j. When receiving a video in which more than N people are photographed, the control unit 120 receives the video and emotion information, selects N people from among more than N people, and further from among the N people. Select the first person. Note that the selection method of the N person and the first person may be specified by the user or a person specified in advance. Other processes are as described in the control unit 120. The selection of the N person and the first person may be before or after the emotion information acquisition unit 110. The emotion information acquisition unit 110 may acquire emotion information between two parties for all the interlocutors (more than N people) included in the captured video, or some people who display emotion information Emotion information between two parties may be acquired for (N people).

感情情報取得部１１０の処理内容は上述の方法に限定されず、感情情報（複数の人物の中の２人の人物から構成される各ペアについてのその各ペアを構成する２人の人物の間の感情の度合い）を求めるものであれば、他の方法（例えば、特許文献１または非特許文献１）であってもよい。また、感情は「共感」「反感」「何れでもない」に限定されず、他の感情であってもよい。 The processing content of the emotion information acquisition unit 110 is not limited to the above-described method, and emotion information (between two persons constituting each pair of each pair composed of two persons among a plurality of persons) Any other method (for example, Patent Literature 1 or Non-Patent Literature 1) may be used as long as the degree of emotion) is obtained. The emotion is not limited to “sympathy”, “antisense”, and “neither”, but may be other emotions.

感情情報取得部１１０では、対話者間の全ての組合せについて感情情報を求めているが、必ずしも全ての組合せについて感情情報を求めなくともよい。例えば、制御部において、一部の感情情報のみを表示しないように制御してもよく、その場合には、表示を必要とする感情情報を求めればよい。このような構成により、感情情報の取得に係る計算量を軽減することができる。 In the emotion information acquisition unit 110, the emotion information is obtained for all the combinations between the interlocutors, but the emotion information may not necessarily be obtained for all the combinations. For example, the control unit may perform control so that only some emotion information is not displayed. In that case, emotion information that needs to be displayed may be obtained. With such a configuration, it is possible to reduce the amount of calculation related to the acquisition of emotion information.

制御部１２０では、三つの感情情報σ_i,j、σ_i",j"、σ_i',j'を切り替えて、表示装置３に表示するように制御しているが、三つの感情情報σ_i,j、σ_i",j"、σ_i',j'の少なくとも２つ以上を切り替えて、表示装置３に表示するように制御してもよい。 The control unit 120 controls the three emotion information σ _{i, j} , σ _{i ″, j ″} , σ _{i ′, j ′} to be switched and displayed on the display device 3, but the three emotion information σ Control may be performed so that at least two of _{i, j} , σ _{i ″, j ″} and σ _{i ′, j ′} are switched and displayed on the display device 3.

Ｎ人が撮影された映像が、１つの映像データからなるものである場合には、１つの映像データから各人物の顔部分を切り取り、表示装置３で表示できるように合成してもよい。 In the case where a video shot of N persons is composed of a single video data, the face portion of each person may be cut out from the single video data and combined so that the display device 3 can display it.

感情情報取得部１１０が、この共感解釈eの事後確率分布P(e_t|B)を感情情報として求め、出力する場合には、制御部１２０は、事後確率が最も高い共感解釈の種類e~_t=argmaxP(e_t|B)を実線、破線等で表示するとともに、各事後確率分布P(e_t|B)を棒グラフ等で表示するにように制御する構成としてもよい（図１６参照）。言い換えると、制御部１２０は、感情情報取得部１１０で求めた全ての種類の感情の度合いを感情情報として表示するように制御する構成としてもよい。図中、右下がり対角線は「共感」の、縦線は「反感」の、「横線」は「何れでもない」の事後確率を表す。棒グラフ等のみを表示するように制御してもよい。 When the emotion information acquisition unit 110 obtains and outputs the posterior probability distribution P (e _t | B) of the sympathy interpretation e as emotion information, the control unit 120 selects the type of sympathy interpretation with the highest posterior probability e ~. A configuration may be adopted in which _t = argmaxP (e _t | B) is displayed by a solid line, a broken line, etc., and each posterior probability distribution P (e _t | B) is displayed by a bar graph or the like (see FIG. 16). . In other words, the control unit 120 may be configured to control so as to display all types of emotion levels obtained by the emotion information acquisition unit 110 as emotion information. In the figure, the diagonal line descending to the right represents “sympathy”, the vertical line represents “antisense”, and the “horizontal line” represents posterior probability of “neither”. You may control so that only a bar graph etc. are displayed.

第一の人物２−ｉが、それぞれ表示装置３に表示されないように制御してもよい。図１７〜図１９は、第一の人物２−ｉが、表示装置３に表示されないように制御した場合の、表示装置３で表示される映像の例を示す。図１７〜図１９はそれぞれ図１３〜図１５に対応する。特に、第一の人物自身が、表示装置を見る場合に、自分自身と他者との感情状態をより直感的に容易に把握することができる。さらに、表示装置として、ヘッドマウントディスプレイを用いることで、より直感的な把握が可能となる。 You may control so that the 1st person 2-i is not displayed on the display apparatus 3, respectively. 17 to 19 show examples of images displayed on the display device 3 when the first person 2-i is controlled not to be displayed on the display device 3. FIG. 17 to 19 correspond to FIGS. 13 to 15, respectively. In particular, when the first person himself / herself looks at the display device, the emotional state between himself / herself and another person can be more intuitively and easily grasped. Furthermore, by using a head-mounted display as a display device, a more intuitive grasp is possible.

＜第二実施形態＞
第一実施形態と異なる部分を中心に説明する。 <Second embodiment>
A description will be given centering on differences from the first embodiment.

図２０は、第一実施形態に係る感情情報表示制御装置１００の配置例を示す。感情情報表示制御装置１００は、Ｎ人以上が撮影された映像を入力とし、Ｍ台の表示装置３−ｍ（ｍ＝１，２，…、Ｍ）に、感情情報を付加した映像を出力する。Ｎは３以上、Ｍは２以上の整数である。図２１は感情情報表示制御装置１００の処理フローを示す。 FIG. 20 shows an arrangement example of the emotion information display control apparatus 100 according to the first embodiment. Emotion information display control device 100 receives an image of N or more people as an input, and outputs an image with emotion information added to M display devices 3-m (m = 1, 2,..., M). . N is an integer of 3 or more, and M is an integer of 2 or more. FIG. 21 shows a processing flow of the emotion information display control device 100.

第二実施形態では、制御部１２０の処理内容が第一実施形態とは異なる。 In the second embodiment, the processing content of the control unit 120 is different from that of the first embodiment.

＜制御部１２０＞
制御部１２０は、Ｎ人が撮影された映像と感情情報とを受け取り、感情情報のうち、Ｎ人のうちの１人である第一の人物２−ｉと他者２−ｊとの間の感情情報σ_i,jを、Ｍ台の表示装置３−ｍのうちの１つである表示装置３−ｐに表示するように制御する（ｓ２−１）。 <Control unit 120>
The control unit 120 receives the video and emotion information in which N people are photographed, and among the emotion information, between the first person 2-i who is one of the N people and the other person 2-j. The emotion information σ _{i, j} is controlled to be displayed on the display device 3-p which is one of the M display devices 3-m (s2-1).

さらに、制御部１２０は、感情情報のうち、感情情報σ_i,j以外の感情情報σ_i",j"（ｉ”＝１，２，…，Ｎ、ただしｉ”≠ｉ、ｊ”＝１，２，…，Ｎ、ただし、ｉ”≠ｊ”かつｉ≠ｊ”）を、Ｍ台の表示装置３−ｍ（ｍ＝１，２，…，Ｍ）のうちの１つである表示装置３−ｐ’（ｐ’は１，２，…，Ｍの何れか、ただし、ｐ≠ｐ’）に表示するように制御する（ｓ２−２）。 Further, the control unit 120, out of the emotion information, the emotion information sigma _i, emotion information other than _{_{j σ i ", j" (}} i "= 1,2, ..., N, however i" ≠ i, j "= 1 , 2,..., N, where i ″ ≠ j ″ and i ≠ j ″) is one of the M display devices 3-m (m = 1, 2,..., M). 3-p ′ (p ′ is any one of 1, 2,..., M, where p ≠ p ′) is controlled (s2-2).

例えば、Ｎ＝４、Ｍ＝２、ｉ＝１とし、制御部１２０は、４人がそれぞれ撮影された４つ映像と感情情報σ_n,n'とを受け取り（ただし、ｎ＝１，２，３，４、ｎ’＝１，２，３，４、ｎ≠ｎ’、σ_n,n'=σ_n',nである）、４つの映像を、１つの表示装置３−１または３−２で表示できるように合成する。合成した映像に第一の人物２−１と他者２−２、２−３、２−４との間の感情情報σ_1,2，σ_1,3，σ_1,4を付加して、表示装置３−１に出力する。さらに、制御部１２０は、合成した映像に感情情報σ_1,2，σ_1,3，σ_1,4以外の感情情報σ_2,3，σ_2,4，σ_3,4を付加して、表示装置３−２に出力する。なお、この場合に表示装置３−ｐ及び３−ｐ’に表示される映像の例はそれぞれ図１３及び図１４と同様となる。第一の人物２−ｉが、表示装置３−ｐ及び３−ｐ’に表示されないように制御した場合の、表示装置３−ｐ及び３−ｐ’に表示される映像の例はそれぞれ図１７及び図１８と同様となる。 For example, N = 4, M = 2, and i = 1, and the control unit 120 receives four videos and emotion information σ _{n, n ′} taken by each of four persons (where n = 1, 2, (3, 4, n ′ = 1, 2, 3, 4, n ≠ n ′, σ _{n, n ′} = σ _{n ′, n} )) four images are displayed on one display device 3-1 or 3- 2 so that it can be displayed. Emotion information σ _1,2 , σ _1,3 , σ _1,4 between the first person 2-1 and others 2-2, 2-3, 2-4 is added to the synthesized video, The data is output to the display device 3-1. Further, the control unit 120 adds emotion information σ _2,3 , σ _2,4 , σ _3,4 other than the emotion information σ _1,2 , σ _1,3 , σ _1,4 to the synthesized video, The data is output to the display device 3-2. In this case, examples of images displayed on the display devices 3-p and 3-p ′ are the same as those in FIGS. 13 and 14, respectively. Examples of images displayed on the display devices 3-p and 3-p ′ when the first person 2-i is controlled not to be displayed on the display devices 3-p and 3-p ′ are shown in FIG. And it becomes the same as FIG.

＜効果＞
このような構成により、感情情報を把握しやすくすることができる。例えば、第一の人物と他者との間の感情情報σ_i,jを表示することで、第一の人物の感情情報を把握しやすくすることができ、第一の人物のコミュニケーション能力等を見る際に有効である。第一の人物が、表示装置を見る場合には、自分自身と他者との感情状態を容易に把握することができる。第一の人物に対して他者全員が反感し、第一の人物と他者との間の感情情報以外の感情情報が全て共感である場合、第一の人物が自分自身だけ異なる意見を有することを即座に察知して話題を変えるなどの対応ができ、場の円満を図ることができる。また、第一の人物と他者との間の感情情報以外の感情情報σ_i",j"を表示することで、第一の人物以外の人物の感情情報を把握しやすくすることができ、第一の人物以外の人物間で行われている対話の状況やそれら人物のコミュニケーション能力等を見る際に有効である。感情情報σ_i,jとσ_i",j"とを同時にそれぞれ表示装置３−ｐと３−ｐ’とに表示することで、第一の人物の感情情報を把握を把握しつつ、第一の人物以外の人物の間の感情情報を把握しやすくなる。例えば、ディスカッション等の司会者や進行役を第一の人物として設定し、感情情報σ_i,jとσ_i",j"とを同時にそれぞれ表示装置３−ｐと３−ｐ’とに表示することで、感情情報σ_i,jにより司会者等の司会進行が上手くいっているか確認しつつ、感情情報σ_i",j"により参加者間の話し合いの状況を把握しやすくすることができる。 <Effect>
With such a configuration, emotion information can be easily grasped. For example, by displaying emotion information σ _{i, j} between the first person and another person, it is easier to grasp the emotion information of the first person, and the communication ability of the first person can be improved. Effective when viewing. When the first person looks at the display device, the emotional state between himself and others can be easily grasped. If everyone else feels against the first person and all emotion information other than the emotion information between the first person and the other person is sympathetic, the first person has a different opinion You can immediately detect this and respond to it by changing the topic, etc., and you can make the place full. In addition, by displaying emotion information σ _{i ", j"} other than emotion information between the first person and others, it is possible to make it easier to grasp emotion information of persons other than the first person, This is effective when viewing the status of dialogues between persons other than the first person and their communication skills. By simultaneously displaying emotion information σ _{i, j} and σ _{i ", j"} on the display devices 3-p and 3-p 'respectively, It becomes easy to grasp emotion information between persons other than the person. For example, a discussion moderator or facilitator is set as the first person, and emotion information σ _{i, j} and σ _{i ", j"} are simultaneously displayed on the display devices 3-p and 3-p ', respectively. that is, while confirming or moderated such as moderator by the emotion information σ _{i, j} is successful, emotion information σ _{i ", j"} it is possible to make it easy to grasp the situation of the discussions among the participants by.

＜第三実施形態＞
第二実施形態と異なる部分を中心に説明する。 <Third embodiment>
A description will be given centering on differences from the second embodiment.

第三実施形態では、Ｍは３以上の整数である。図２１は感情情報表示制御装置１００の処理フローを示す。制御部１２０の処理内容が第二実施形態とは異なる。 In the third embodiment, M is an integer of 3 or more. FIG. 21 shows a processing flow of the emotion information display control device 100. The processing content of the control unit 120 is different from that of the second embodiment.

さらに、制御部１２０は、感情情報σ_i',j'の全てを（ｉ’＝１，２，…，Ｎ、ｊ’＝１，２，…，Ｎ、ただし、ｉ’≠ｊ’）、Ｍ台の表示装置３−ｍのうちの表示装置３−ｐ及びｐ’以外の１つである表示装置３−ｐ”（ｐ”は１，２，…，Ｍの何れか、ただし、ｐ≠ｐ”、ｐ’≠ｐ”）に表示するように制御する（ｓ２−３）。 Further, the control unit 120 converts all the emotion information σ _{i ′, j ′} (i ′ = 1, 2,..., N, j ′ = 1, 2,..., N, where i ′ ≠ j ′), Display device 3-p ″ (p ″ is one of 1, 2,..., M, which is one of the M display devices 3-m other than display devices 3-p and p ′, where p ≠ p ″, p ′ ≠ p ″). (s2-3).

例えば、Ｎ＝４、Ｍ＝３、ｉ＝１とし、制御部１２０は、４人がそれぞれ撮影された４つ映像と感情情報σ_n,n'とを受け取り（ただし、ｎ＝１，２，３，４、ｎ’＝１，２，３，４、ｎ≠ｎ’、σ_n,n'=σ_n',nである）、４つの映像を、一つの表示装置３−１、３−２または３−３で表示できるように合成する。合成した映像に第一の人物２−１と他者２−２、２−３、２−４との間の感情情報σ_1,2，σ_1,3，σ_1,4を付加して、表示装置３−１に出力する。さらに、制御部１２０は、合成した映像に感情情報σ_1,2，σ_1,3，σ_1,4以外の感情情報σ_2,3，σ_2,4，σ_3,4を付加して、表示装置３−２に出力する。さらに、制御部１２０は、合成した映像に全ての感情情報を付加して、表示装置３−３に出力する。 For example, N = 4, M = 3, and i = 1, and the control unit 120 receives four videos and emotion information σ _{n, n ′} taken by each of four persons (where n = 1, 2, (3, 4, n ′ = 1, 2, 3, 4, n ≠ n ′, σ _{n, n ′} = σ _{n ′, n} )), four images are displayed on one display device 3-1, 3- It is synthesized so that it can be displayed in 2 or 3-3. Emotion information σ _1,2 , σ _1,3 , σ _1,4 between the first person 2-1 and others 2-2, 2-3, 2-4 is added to the synthesized video, The data is output to the display device 3-1. Further, the control unit 120 adds emotion information σ _2,3 , σ _2,4 , σ _3,4 other than the emotion information σ _1,2 , σ _1,3 , σ _1,4 to the synthesized video, The data is output to the display device 3-2. Furthermore, the control unit 120 adds all emotion information to the synthesized video and outputs the added emotion information to the display device 3-3.

なお、この場合に表示装置３−ｐ、３−ｐ’及び３−ｐ”に表示される映像の例はそれぞれ図１３、図１４及び図１５と同様となる。第一の人物２−ｉが、表示装置３−ｐ、３−ｐ’及び３−ｐ”に表示されないように制御した場合の、表示装置３−ｐ、３−ｐ’及び３−ｐ”に表示される映像の例はそれぞれ図１７、図１８及び図１９と同様となる。 In this case, examples of images displayed on the display devices 3-p, 3-p ′, and 3-p ″ are the same as those in FIGS. 13, 14, and 15, respectively. Examples of images displayed on the display devices 3-p, 3-p ′, and 3-p ″ when they are controlled not to be displayed on the display devices 3-p, 3-p ′, and 3-p ″ are respectively This is the same as FIGS. 17, 18 and 19.

＜効果＞
このような構成により、第二実施形態と同様の効果を得ることができる。さらに、全ての感情情報σ_i',j'を、表示装置３−ｐ”に表示することで、会話全体における、第一の人物や第一の人物以外の人物のコミュニケーション能力等やその役割等を容易に把握することができる。 <Effect>
With such a configuration, the same effect as that of the second embodiment can be obtained. Furthermore, by displaying all the emotion information σ _{i ′, j ′} on the display device 3-p ″, the communication ability of the first person or a person other than the first person in the entire conversation, its role, etc. Can be easily grasped.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Program and recording medium>
In addition, various processing functions in each device described in the above embodiments and modifications may be realized by a computer. In that case, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage unit. When executing the process, this computer reads the program stored in its own storage unit and executes the process according to the read program. As another embodiment of this program, a computer may read a program directly from a portable recording medium and execute processing according to the program. Further, each time a program is transferred from the server computer to the computer, processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program includes information provided for processing by the electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

With respect to each pair composed of two persons among a plurality of persons, the degree of emotion between the two persons constituting each pair is set as emotion information, and from the video in which three or more persons are photographed, An emotion information acquisition unit for obtaining emotion information between two of three or more people,
(1) Among the emotion information, emotion information between a first person who is one of the three or more and others, (2) Among the emotion information, the first person and the A control unit that controls at least two or more of the emotion information other than the emotion information (3) other than the emotion information to be displayed on a display device.
Emotion information display control device.

With respect to each pair composed of two persons among a plurality of persons, the degree of emotion between the two persons constituting each pair is set as emotion information, and from the video in which three or more persons are photographed, An emotion information acquisition unit for obtaining emotion information between two of three or more people,
Of the emotion information, the emotion information between the first person who is one of the three or more and the other person is displayed on the first display device which is one of the plurality of display devices. The emotion information other than the emotion information between the first person and the other person is one of the plurality of display devices other than the first display device. A control unit that controls to display on the second display device,
Emotion information display control device.

The emotion information display control device according to claim 1 or 2,
3 ≦ N <N ′, a video image of N ′ people is input, one of N people who are part of N ′ is the first person, and the first of the N persons is the first person. (N-1) people other than the other person are assumed to be others.
Emotion information display control device.

The emotion information display control device according to any one of claims 1 to 3,
The emotion information includes a plurality of types of emotions, and the emotion information acquisition unit obtains all types of emotions as the emotion information,
The control unit controls to display the degree of all kinds of emotions obtained by the emotion information acquisition unit as the emotion information;
Emotion information display control device.

With respect to each pair composed of two persons among a plurality of persons, the degree of emotion between the two persons constituting each pair is set as emotion information, and from the video in which three or more persons are photographed, An emotion information acquisition step for obtaining emotion information between two of three or more people;
(1) Among the emotion information, emotion information between a first person who is one of the three or more and others, (2) Among the emotion information, the first person and the A control step of controlling at least two or more of all the emotion information other than the emotion information (3) other than the emotion information to be displayed on a display device.
Emotion information display control method.

With respect to each pair composed of two persons among a plurality of persons, the degree of emotion between the two persons constituting each pair is set as emotion information, and from the video in which three or more persons are photographed, An emotion information acquisition step for obtaining emotion information between two of three or more people;
Of the emotion information, the emotion information between the first person who is one of the three or more and the other person is displayed on the first display device which is one of the plurality of display devices. The emotion information other than the emotion information between the first person and the other person is one of the plurality of display devices other than the first display device. A control step for controlling to display on the second display device,
Emotion information display control method.

A program for causing a computer to function as the emotion information display control device according to claim 1.