JP5921507B2

JP5921507B2 - Dialog tendency scoring device, method and program

Info

Publication number: JP5921507B2
Application number: JP2013199559A
Authority: JP
Inventors: 史朗熊野; 大塚　和弘; 和弘大塚; 淳司大和
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-09-26
Filing date: 2013-09-26
Publication date: 2016-05-24
Anticipated expiration: 2033-09-26
Also published as: JP2015064828A

Description

この発明は、対話者間の傾向を表す指標を計算する技術に関する。 The present invention relates to a technique for calculating an index representing a tendency between interlocutors.

非特許文献１には、共感、反感、どちらでもない等の二者間の対話状態である共感解釈を推定し、推定した共感解釈を表示する技術が記載されている。 Non-Patent Document 1 describes a technique for estimating an empathy interpretation that is a dialogue state between two parties, such as empathy and counterfeit, and displaying the estimated empathy interpretation.

熊野史朗, 大塚和弘, 三上弾, 大和淳司, “複数人対話を対象とした表情と視線に基づく共感／反感の推定モデルとその評価”, 電子情報通信学会技術報告，ヒューマンコミュニケーション基礎研究会, HCS 111(214), pp. 33-38，2011.Shiro Kumano, Kazuhiro Otsuka, Amami Mikami, Junji Yamato, “Estimation model and evaluation of empathy / antisense based on facial expression and gaze for multi-person dialogue”, IEICE Technical Report, Human Communication Fundamentals Study Group, HCS 111 (214), pp. 33-38, 2011.

しかしながら、非特許文献１では、共感解釈以外の他の、対話者間の傾向を表す指標は計算されていなかった。 However, in Non-Patent Document 1, an index that represents a tendency between interlocutors other than the sympathetic interpretation has not been calculated.

この発明は、共感解釈を用いて、共感解釈以外の他の、対話者間の傾向を表す指標を計算する対話傾向得点化装置、方法及びプログラムを提供することを目的とする。 It is an object of the present invention to provide a dialog tendency scoring apparatus, method, and program for calculating an index that represents a tendency between interlocutors other than the sympathy interpretation using the sympathy interpretation.

この発明の一態様による対話傾向得点化装置は、時刻tにおける人物iと人物jとの間の共感状態がeである確率を表す指標P(e_t ^(i,j)=e)を求める対話状態推定装置と、C₁,C₂,C₃',C₄',X ₂,X₃ を所定の定数とし、a ₂ ,a ₃ を所定の実数の定数とし、w _e (e=1,2,…,N _e ),w _e '(e=1,2,…,N _e )を所定の定数とし、Nを人物の総数とし、Tを各指標を計算するために考慮する共感状態のデータの所定の時間長とし、N_eを取り得る共感状態の総数として、ある人物iについての個人バイアスを以下の式により定義されるM_iとし、

ある人物iについての個人明確度を以下の式により定義されるA_iとし、

ある人物iについての個人人物選択度を以下の式により定義されるS_iとし、

ある人物iについての個人活発度を以下の式により定義されるE_iとして、

求まった指標P(e_t ^(i,j)=e)を用いて、個人バイアス、個人明確度、個人人物選択度及び個人活発度の中の２個の指標のそれぞれをある人物について計算する個人指標計算部と、計算された人物についての２個の指標に基づいて、その人物を所定のタイプに分類する個人分類部と、人物についての２個の指標を軸とした表に個人分類部による分類結果を表示する表示部と、を備えている。 The dialogue tendency scoring device according to one aspect of the present invention obtains an index P (e _t ^{(i, j)} = e) representing a probability that the empathy state between the person i and the person j at time t is e. A state estimation device, and C ₁ , C ₂ , C ₃ ′, C ₄ ′ , X ₂ , X ₃ are predetermined constants, a ₂ , a ₃ are predetermined real constants, and w _e (e = 1, 2, ..., N _e ), w _e '(e = 1,2, ..., N _e ) are given constants, N is the total number of people, and T is the sympathetic state to consider for calculating each index. a predetermined time length of the data, as the total number of empathy possible states of N _e, and M _i is defined an individual bias for certain persons i by the following equation,

Let the individual clarity for a person i be A _i defined by the following formula:

Let the individual person selectivity for a person i be S _i defined by the following formula:

The individual activity level for a person i is defined as E _i defined as

An individual who calculates each of the two indicators among individual bias, individual clarity, individual person selectivity and individual activity using the obtained index P (e _t ^{(i, j)} = e) An index calculation unit, a personal classification unit that classifies the person into a predetermined type based on the two indexes for the calculated person, and a table centered on the two indexes for the person. And a display unit for displaying the classification result.

求まった指標P(e_t ^(i,j)=e)を用いて、個人バイアス、個人明確度、個人人物選択度及び個人活発度の中の２個の指標のそれぞれをある人物について計算する個人指標計算部と、C₅,C ₆ ,C₇,C₈,X₇を所定の定数として、あるグループについてのグループバイアスを以下の式により定義されるMとして、

あるグループについてのグループ明確度を以下の式により定義されるAとして、

あるグループについてのグループ人物選択度を以下の式により定義されるSとして、

あるグループについてのグループ活発度を以下の式により定義されるEとして、

個人指標計算部で計算された２個の指標に基づいて、グループバイアス、グループ明確度、グループ人物選択度及びグループ活発度の中の２個の指標のそれぞれをあるグループについて計算するグループ指標計算部と、計算されたグループについての２個の指標に基づいて、そのグループを所定のタイプに分類するグループ分類部と、グループについての２個の指標を軸とした表にグループ分類部による分類結果を表示する表示部と、を備えている。 The dialogue tendency scoring device according to one aspect of the present invention obtains an index P (e _t ^{(i, j)} = e) representing a probability that the empathy state between the person i and the person j at time t is e. A state estimation device, and C ₁ , C ₂ , C ₃ ′, C ₄ ′ , X ₂ , X ₃ are predetermined constants, a ₂ , a ₃ are predetermined real constants, and w _e (e = 1, 2, ..., N _e ), w _e '(e = 1,2, ..., N _e ) are given constants, N is the total number of people, and T is the sympathetic state to consider for calculating each index. a predetermined time length of the data, as the total number of empathy possible states of N _e, and M _i is defined an individual bias for certain persons i by the following equation,

The individual activity level for a person i is defined as E _i defined as

An individual who calculates each of the two indicators among individual bias, individual clarity, individual person selectivity and individual activity using the obtained index P (e _t ^{(i, j)} = e) The index calculator and C ₅ , C ₆ , C ₇ , C ₈ , X ₇ as predetermined constants, and the group bias for a group as M defined by the following equation:

The group clarity for a group is defined as A defined by the following formula:

The group person selectivity for a group is defined as S defined by the following formula:

The group activity for a group is defined as E defined by the following formula:

A group index calculation unit that calculates each of the two indexes among group bias, group clarity, group person selectivity, and group activity based on the two indexes calculated by the individual index calculation unit. Based on the two indices for the calculated group, a group classification section for classifying the group into a predetermined type, and a classification result by the group classification section in a table centered on the two indices for the group A display unit for displaying.

共感解釈以外の他の、対話者間の傾向を表す指標を計算することができる。 Other than the sympathetic interpretation, it is possible to calculate an index representing the tendency among the interlocutors.

第一実施形態の対話傾向得点化装置の例を示すブロック図。The block diagram which shows the example of the dialog tendency scoring apparatus of 1st embodiment. 第二実施形態の対話傾向得点化装置の例を示すブロック図。The block diagram which shows the example of the dialog tendency scoring apparatus of 2nd embodiment. 第二実施形態の対話傾向得点化装置の変形例を示すブロック図。The block diagram which shows the modification of the dialog tendency scoring apparatus of 2nd embodiment. 第三実施形態の対話傾向得点化装置の例を示すブロック図。The block diagram which shows the example of the dialog tendency scoring apparatus of 3rd embodiment. 第四実施形態の対話傾向得点化装置の例を示すブロック図。The block diagram which shows the example of the dialog tendency scoring apparatus of 4th embodiment. 第一実施形態の対話傾向得点化方法の例を示すフローチャート。The flowchart which shows the example of the dialogue tendency scoring method of 1st embodiment. 第二実施形態の対話傾向得点化方法の例を示すフローチャート。The flowchart which shows the example of the dialogue tendency scoring method of 2nd embodiment. 第三実施形態の対話傾向得点化方法の例を示すフローチャート。The flowchart which shows the example of the dialogue tendency scoring method of 3rd embodiment. 第四実施形態の対話傾向得点化方法の例を示すフローチャート。The flowchart which shows the example of the dialogue tendency scoring method of 4th embodiment. 個人指標に基づく人物の分類の例を説明するための図。The figure for demonstrating the example of the classification of the person based on a personal parameter | index. グループ指標に基づく人物の分類の例を説明するための図。The figure for demonstrating the example of the classification | category of the person based on a group parameter | index. 個人指標を表示するためのレーダーチャートの例。An example of a radar chart for displaying personal indicators. グループ指標を表示するためのレーダーチャートの例。An example of a radar chart for displaying group indicators. 対話状態推定装置の例を示すブロック図。The block diagram which shows the example of a dialog state estimation apparatus. パラメタ学習部の例を示すブロック図。The block diagram which shows the example of a parameter learning part. 学習フェーズの処理の流れを示す図。The figure which shows the flow of a process of a learning phase. 推定フェーズの処理の流れを示す図。The figure which shows the flow of a process of an estimation phase. 時間差関数を説明するための図。The figure for demonstrating a time difference function. 対話者の行動と共感解釈の時間差を説明するための図。The figure for demonstrating the time difference of a dialogue person's action and empathy interpretation. 変化タイミング関数を説明するための図。The figure for demonstrating a change timing function. 変化タイミング関数の有効範囲を説明するための図。The figure for demonstrating the effective range of a change timing function. 変化タイミング関数の有効範囲を説明するための図。The figure for demonstrating the effective range of a change timing function.

以下、図面を参照して、対話傾向得点化装置及び方法の各実施形態を説明する。 Hereinafter, each embodiment of a dialogue trend scoring apparatus and method will be described with reference to the drawings.

［第一実施形態］
第一実施形態の対話傾向得点化装置は、図１に示すように、個人指標計算部７０１、個人分類部７０２及び表示部７０３を例えば備えている。
第一実施形態の対話傾向得点化方法は、対話傾向得点化装置が図６の各ステップの処理を例えば行うことにより実現される。 [First embodiment]
As shown in FIG. 1, the dialogue tendency scoring device of the first embodiment includes, for example, a personal index calculation unit 701, a personal classification unit 702, and a display unit 703.
The dialog tendency scoring method of the first embodiment is realized by the dialog tendency scoring device performing, for example, the processing of each step in FIG.

＜個人指標計算部＞
個人指標計算部７０１は、個人バイアス、個人明確度、個人人物選択度及び個人活発度の中の少なくとも２個の指標のそれぞれをある人物について計算する（ステップＳ１）。計算された指標は、個人分類部７０２に提供される。 <Individual index calculator>
The personal index calculation unit 701 calculates at least two indexes among a personal bias, individual clarity, individual person selection, and individual activity for a certain person (step S1). The calculated index is provided to the personal classification unit 702.

個人指標計算部７０１は、後述する対話状態推定装置１が求めた共感解釈、具体的には後述する確率P(e_t ^(i,j)=e)に基づいて、個人指標を計算する。 The personal index calculation unit 701 calculates a personal index based on the sympathy interpretation obtained by the dialog state estimation apparatus 1 described later, specifically, the probability P (e _t ^{(i, j)} = e) described later.

以下、個人バイアス、個人明確度、個人人物選択度及び個人活発度について説明する。以下の説明において、Nは、人物の総数であり、所定の正の整数である。言い換えれば、Nは、対象としている人物iの対話相手の数＋１である。i,jは人物を表すインデックスである。 Hereinafter, individual bias, individual clarity, individual person selection, and individual activity will be described. In the following description, N is the total number of persons and is a predetermined positive integer. In other words, N is the number of conversation partners of the subject person i + 1. i, j is an index representing a person.

また、Tは、各指標を計算するために考慮する共感状態のデータの時間長であり、所定の正の整数である。以下の例では、時刻t=1から時刻t=Tまでの共感状態のデータに基づいて各指標が計算されている。
また、N_eは、取り得る共感状態の総数である。例えば、共感状態として、共感(e=1)、中立(e=2)及び反感(e=3)という３個の状態を取り得るとする。このとき、N_e=3となる。
また、P(e_t ^(i,j)=e)は、時刻tにおける人物iと人物jとの間の共感状態がeである確率を表す指標である。 T is the time length of the data of the sympathy state to be considered for calculating each index, and is a predetermined positive integer. In the following example, each index is calculated based on the data of the sympathy state from time t = 1 to time t = T.
N _e is the total number of sympathetic states that can be taken. For example, it is assumed that three states of empathy (e = 1), neutrality (e = 2), and counterfeit (e = 3) can be taken as the empathy states. At this time, N _e = 3.
P (e _t ^{(i, j)} = e) is an index representing the probability that the empathy state between the person i and the person j at time t is e.

<<個人バイアス>>
ある人物についての個人バイアスとはその人物が共感／反感／どちらでもないを表す状態の少なくとも２個の何れになりやすいかを表す指標である。例えば、ある人物についての個人バイアスとはその人物が共感／反感のどちらになりやすいかを表す指標であるとする。この場合、人物iについての個人バイアスM_iは例えば以下のように定義される。

C₁は所定の定数である。例えばC₁は0<C₁≦1の実数である。例えば、C₁=1/((N-1)・T)とする。
w_eは、所定の定数である。例えば、w₁=1、w₂=0、w₃=-1とする。 << Personal bias >>
The individual bias with respect to a certain person is an index that indicates which of the two states in which the person is likely to be in the state of empathy / antisense / neither. For example, it is assumed that the personal bias for a certain person is an index that indicates whether the person is likely to be empathetic / antisense. In this case, individual bias M _i for person i is defined as follows, for example.

C ₁ is a predetermined constant. For example, C ₁ is a real number 0 <C ₁ ≦ 1. For example, C ₁ = 1 / ((N−1) · T).
w _e is a predetermined constant. For example, w ₁ = 1, w ₂ = 0, and w ₃ = −1.

なお、共感解釈が、確率P(e_t ^(i,j)=e)ではなく、-1（反感）から1（共感）までの数値である強度で与えられている場合には、個人指標計算部７０１は、強度を確率に変換し、この変換された確率に基づいて個人バイアスを計算してもよい。例えば以下のように強度を確率に変換することができる。 If the sympathy interpretation is given by an intensity that is a numerical value from -1 (antisense) to 1 (sympathy) instead of the probability P (e _t ^{(i, j)} = e), the personal index calculation The unit 701 may convert the intensity into a probability, and calculate the personal bias based on the converted probability. For example, the intensity can be converted into a probability as follows.

強度≧０の場合、P(e_t ^(i,j)=1)=強度とし、P(e_t ^(i,j)=2)=1-強度とし、P(e_t ^(i,j)=3)=0とする。また、強度＜０の場合、P(e_t ^(i,j)=1)=0とし、P(e_t ^(i,j)=2)=1-強度とし、P(e_t ^(i,j)=3)=強度とする。 If intensity ≧ 0, P (e _t ^{(i, j)} = 1) = intensity, P (e _t ^{(i, j)} = 2) = 1−intensity, and P (e _t ^{(i, j)} = 3) Set to 0. When the intensity is less than 0, P (e _t ^{(i, j)} = 1) = 0, P (e _t ^{(i, j)} = 2) = 1−intensity, and P (e _t ^{(i, j )} = 3) = Strength.

<<個人明確度>>
ある人物についての個人明確度とはその人物の共感解釈の集中のしやすさを表す指標である。言い換えれば、その人物が共感や反感などのある１つの解釈にどれほど集中しやすいかを表す指標である。人物iについての個人明確度A_iは例えば以下のように定義される。

C₂は所定の定数である。例えばC₂は0<C₂≦1の実数である。a₂は所定の実数の定数である。 << Individual clarity >>
Individual clarity for a person is an index that represents the ease with which the person's empathy interpretation is concentrated. In other words, it is an index that indicates how easy the person is to concentrate on one interpretation with empathy and dislike. Personal clear of A _i for person i is defined as follows, for example.

C ₂ is a predetermined constant. For example, C ₂ is a real number 0 <C ₂ ≦ 1. a ₂ is a predetermined real number constant.

A_iが大きいほどその人物iの共感状態が集中のしやすいことを示すようにする場合には、C₂,a₂,X₂を例えば以下のように定義する。

In order to indicate that the empathy state of the person _i is more likely to be concentrated as A _i is larger, C ₂ , a ₂ , and X ₂ are defined as follows, for example.

C₂,a₂,X₂を例えば以下のように定義してもよい。

C ₂ , a ₂ , and X ₂ may be defined as follows, for example.

maxVarは、要素数がN_e個で和が１のデータ集合の分散の最大値であり、例えば以下の値を取る。

maxVar the number of elements is the maximum value of the variance of the data set of sum 1 N _e number, for example, take the following values.

<<個人人物選択度>>
ある人物についての個人人物選択度とはその人物が共感又は反感しやすい人物の集中のしやすさを表す指標である。言い換えれば、ある人物が、特定の人物と共感又は反感になりやすいかを表す指標である。個人人物選択度を計算する際にはN≧3であるとする。人物iについての個人人物選択度S_iは例えば以下のように定義される。

<< Individual person selectivity >>
The individual person selection degree for a certain person is an index representing the ease of concentration of persons who are likely to sympathize with or dislike. In other words, it is an index that indicates whether a certain person is likely to feel empathy or disagreement with a specific person. It is assumed that N ≧ 3 when calculating the individual person selectivity. The individual person selectivity S i for the person _i is defined as follows, for example.

ここで、E^(i,j)は例えば以下のように定義される。

w_eの定義は、<<個人バイアス>>の欄で述べたものと同じである。すなわち、例えば、w₁=1、w₂=0、w₃=-1とする。
C₃’は所定の定数である。例えばC₃’は0<C₃’≦1の実数である。a₃は所定の実数の定数である。 Here, E ^{(i, j)} is defined as follows, for example.

The definition of w _e is the same as described in the << personal bias >> column. That is, for example, w ₁ = 1, w ₂ = 0, and w ₃ = −1.
C ₃ 'is a predetermined constant. For example, C ₃ ′ is a real number with 0 <C ₃ ′ ≦ 1. a ₃ is a predetermined real number constant.

S_iが大きいほどその人物が共感又は反感しやすい人物が集中しやすいことを示すようにする場合には、C₃’,a₃,Xを例えば以下のように定義する。

In order to indicate that the larger the S _i is, the more easily the person who is likely to sympathize or dislike the person is likely to concentrate, C ₃ ′, a ₃ , and X are defined as follows, for example.

C₃’,a₃,Xを例えば以下のように定義してもよい。

C ₃ ′, a ₃ , X may be defined as follows, for example.

<<個人活発度>>
ある人物についての個人活発度とは、その人物の共感及び反感になりやすさを表す指標である。人物iについての個人活発度E_iは、例えば以下のように定義される。

C₄’は所定の定数である。例えばC₄’は0<C₄’≦1の実数である。例えばC₄’=1/(N-1)としてもよい。 << Personal activity >>
The personal activity level for a certain person is an index that represents the likelihood of the person's empathy and dislike. Personal activity level E _i for person i, for example, is defined as follows.

C ₄ 'is a predetermined constant. For example, C ₄ ′ is a real number with 0 <C ₄ ′ ≦ 1. For example, C ₄ '= 1 / (N-1) may be used.

E’^(i,j)は、人物iと人物jとの間で共感又は反感の状態になった頻度であり、例えば以下のように定義される。

w_e’は、所定の定数である。例えば、w₁’=1、w₂’=0、w₃’=1とする。 E ′ ^{(i, j)} is the frequency at which the person i and the person j are in a state of empathy or disagreement, and is defined as follows, for example.

w _e 'is a predetermined constant. For example, w ₁ '= 1, w ₂ ' = 0, and w ₃ '= 1.

＜個人分類部＞
個人分類部７０２は、個人指標計算部７０１により計算された人物についての少なくとも２個の指標に基づいて、その人物を所定のタイプに分類する（ステップＳ２）。分類結果は、表示部７０３に提供される。 <Personal Classification Department>
The personal classifying unit 702 classifies the person into a predetermined type based on at least two indexes for the person calculated by the personal index calculating unit 701 (step S2). The classification result is provided to the display unit 703.

例えば、各指標が複数の範囲に分割されているとする。個人指標計算部７０１により計算された各指標が含まれる範囲に基づいて、その人物を所定のタイプに分類する。以下、図１０を参照して、各指標が含まれる範囲に基づくタイプの分類の例を説明する。 For example, assume that each index is divided into a plurality of ranges. Based on the range in which each index calculated by the personal index calculation unit 701 is included, the person is classified into a predetermined type. Hereinafter, an example of the type classification based on the range in which each index is included will be described with reference to FIG.

例えば、個人バイアスが、「共感になりやすい」「バランス」「反感になりやすい」という３個の範囲に分割されているとする。M_c1,M_c2をM_c1<M_c2の関係を満たす所定の実数であるとする。このとき、M_c2≦M_iであれば人物iの個人バイアスは「共感になりやすい」という範囲に含まれるとし、M_c1≦M_i<M_c2であれば人物iの個人バイアスは「バランス」という範囲に含まれるとし、M_i<M_c1であれば人物iの個人バイアスは「反感になりやすい」という範囲に含まれるとする。 For example, it is assumed that the personal bias is divided into three ranges of “easily sympathetic”, “balance”, and “prone to counterfeit”. Let M _c1 and M _{c2 be} predetermined real numbers that satisfy the relationship of M _c1 <M _c2 . At this time, if M _c2 ≦ M _i , the personal bias of the person i is included in the range of “easily sympathetic”, and if M _c1 ≦ M _i <M _c2 , the personal bias of the person i is “balance”. Suppose that M _i <M _c1 , the personal bias of person i is included in the range of “prone to discomfort”.

また、個人人物選択度が、「高い」「低い」という２個の範囲に分割されているとする。S_c1を所定の実数であるとする。このとき、S_c1<S_iであれば人物iの個人人物選択度は「高い」という範囲に含まれるとし、S_i≦S_c1であれば人物iの個人人物選択度は「低い」という範囲に含まれるとする。 Further, it is assumed that the individual person selectivity is divided into two ranges of “high” and “low”. Let S _c1 be a predetermined real number. At this time, if S _c1 <S _i , the personal person selectivity of person i is included in the range of “high”, and if S _i ≦ S _c1 , the personal person selectivity of person i is in the range of “low”. Included in

このとき、図１０に例示するように、（１）個人バイアスが「共感になりやすい」であり個人人物選択度が「低い」である場合には人物iは「八方美人」というタイプに分類され、（２）個人バイアスが「バランス」であり個人人物選択度が「低い」である場合には人物iは「バランス／気配り」というタイプに分類され、（３）個人バイアスが「反感になりやすい」であり個人人物選択度が「低い」である場合には人物iは「批評家」というタイプに分類され、（４）個人人物選択度が「高い」である場合には人物iは「選り好み」というタイプに分類されるとする。 At this time, as illustrated in FIG. 10, (1) when the personal bias is “easily sympathetic” and the individual person selectivity is “low”, the person i is classified into the type “Happo Beauty”. (2) When the personal bias is “balance” and the individual person selectivity is “low”, the person i is classified into a type of “balance / attentiveness”, and (3) the personal bias tends to be “disgusting”. ”And the individual person selectivity is“ low ”, the person i is classified as a“ critic ”type. (4) When the individual person selectivity is“ high ”, the person i is“ preferred ”. ”.

ここで、分割する範囲の数及び分類するタイプの数は任意である。また、図１０の例では、個人バイアスと個人人物選択度の２個の指標に基づいて、人物を所定のタイプに分類したが、他の２個の指標に基づく分類又は３個以上の指標に基づく分類をしても構わない。 Here, the number of ranges to be divided and the number of types to be classified are arbitrary. In the example of FIG. 10, the person is classified into a predetermined type based on the two indicators of individual bias and individual person selectivity. However, the classification is based on the other two indicators or three or more indicators. You may make a classification based on it.

＜表示部＞
表示部７０３は、個人分類部７０２による分類結果を表示する（ステップＳ３）。表示部７０３は、例えば、ＣＲＴ、液晶ディスプレイ等の表示装置である。表示部７０３は、例えば図１０に示した表を表示する。 <Display section>
The display unit 703 displays the classification result obtained by the personal classification unit 702 (step S3). The display unit 703 is a display device such as a CRT or a liquid crystal display. The display unit 703 displays, for example, the table shown in FIG.

このように、共感解釈に基づく指標を計算し表示することにより、各人物のコミュニケーションの傾向及び特性を把握することができる。 Thus, by calculating and displaying the index based on the sympathy interpretation, it is possible to grasp the communication tendency and characteristics of each person.

［第二実施形態］
第一実施形態の対話傾向得点化装置は、図２に示すように、個人指標計算部７０１、表示部７０３、グループ指標計算部７０４及びグループ分類部７０５を例えば備えている。 [Second Embodiment]
As shown in FIG. 2, the dialogue trend scoring device of the first embodiment includes, for example, a personal index calculation unit 701, a display unit 703, a group index calculation unit 704, and a group classification unit 705.

第二実施形態の対話傾向得点化方法は、対話傾向得点化装置が図７の各ステップの処理を例えば行うことにより実現される。 The dialog tendency scoring method according to the second embodiment is realized by the dialog tendency scoring device performing, for example, each step of FIG.

以下、第一実施形態と異なる部分を中心に説明する。第一実施形態と同様の部分については同じ符号を付けて重複説明を省略する。 Hereinafter, a description will be given centering on differences from the first embodiment. The same parts as those in the first embodiment are denoted by the same reference numerals, and redundant description is omitted.

＜個人指標計算部＞
個人指標計算部７０１は、第一実施形態と同様に、個人バイアス、個人明確度、個人人物選択度及び個人活発度の中の少なくとも２個の指標のそれぞれをグループに含まれる各人物について計算する（ステップＳ１）。計算された指標は、グループ指標計算部７０４に提供される。 <Individual index calculator>
Similar to the first embodiment, the personal index calculation unit 701 calculates each of at least two indexes among individual bias, individual clarity, individual person selection, and individual activity for each person included in the group. (Step S1). The calculated index is provided to the group index calculation unit 704.

＜グループ指標計算部＞
グループ指標計算部７０４は、個人指標計算部７０１が計算した指標に基づいて、グループバイアス、グループ明確度、グループ人物選択度及びグループ活発度の中の少なくとも２個の指標のそれぞれをあるグループについて計算する（ステップＳ４）。計算された指標は、グループ分類部７０５に提供される。 <Group index calculation department>
Based on the index calculated by the personal index calculation unit 701, the group index calculation unit 704 calculates at least two of the group bias, group clarity, group person selectivity, and group activity for a certain group. (Step S4). The calculated index is provided to the group classification unit 705.

以下、グループバイアス、グループ明確度、グループ人物選択度及びグループ活発度について説明する。 Hereinafter, group bias, group clarity, group person selection, and group activity will be described.

<<グループバイアス>>
あるグループについてのグループバイアスとはそのグループに含まれる人物が共感／反感／どちらでもないを表す状態の少なくとも２個の何れになりやすいかを表す指標である。例えば、ある人物についてのグループバイアスとはその人物が共感／反感のどちらになりやすいかを表す指標であるとする。この場合、グループがN人の人物から構成されるとき、グループバイアスは例えば以下のように定義される。Nは、所定の２以上の整数である。

C₅は所定の定数である。例えばC₅は0<C₅≦1の実数である。例えば、C₅=1/Nとする。 << Group bias >>
The group bias with respect to a certain group is an index representing whether the person included in the group is likely to be at least two of the states representing empathy / antisense / neither. For example, it is assumed that the group bias for a certain person is an index indicating whether the person is likely to be empathetic / antisense. In this case, when the group is composed of N persons, the group bias is defined as follows, for example. N is a predetermined integer of 2 or more.

C ₅ is a predetermined constant. For example, C ₅ is a real number of 0 <C ₅ ≦ 1. For example, C ₅ = 1 / N.

<<グループ明確度>>
あるグループについてのグループ明確度とはそのグループに含まれる人物の共感解釈の集中のしやすさを表す指標である。言い換えれば、そのグループに含まれる人物が共感や反感などのある１つの解釈にどれほど集中しやすいかを表す指標である。グループがN人の人物から構成される場合、グループ明確度は例えば以下のように定義される。

C₆は所定の定数である。例えばC₆は0<C₆≦1の実数である。例えば、C₆=1/Nとする。 << Group clarity >>
The group clarity for a certain group is an index representing the ease of concentration of empathy interpretation of the persons included in the group. In other words, it is an index that indicates how easy a person included in the group is to concentrate on one interpretation, such as empathy or antipathy. When a group is composed of N persons, the group clarity is defined as follows, for example.

C ₆ is a predetermined constant. For example, C ₆ is a real number 0 <C ₆ ≦ 1. For example, C ₆ = 1 / N.

<<グループ人物選択度>>
あるグループについてのグループ人物選択度とはそのグループに含まれる人物が共感又は反感しやすい人物の集中のしやすさを表す指標である。グループがN人の人物から構成される場合、グループ人物選択度Sは例えば以下のように定義される。

<< Group person selectivity >>
The group person selectivity for a certain group is an index representing the ease of concentration of persons who are likely to sympathize or dislike the persons included in the group. When the group is composed of N persons, the group person selectivity S is defined as follows, for example.

C₇は所定の定数である。例えばC₇は0<C₇≦1の実数である。Sが大きいほどそのグループに含まれる人物が共感又は反感しやすい人物が集中しやすいことを示すようにする場合には、C₇,X₇を例えば以下のように定義する。

C ₇ is a predetermined constant. For example, C ₇ is a real number 0 <C ₇ ≦ 1. In order to indicate that as the S is larger, the persons included in the group are more likely to be sympathetic or counterfeited, C ₇ and X ₇ are defined as follows, for example.

C₇,X₇を例えば以下のように定義してもよい。

C ₇ and X ₇ may be defined as follows, for example.

<<グループ活発度>>
あるグループについてのグループ明確度とはそのグループに含まれる人物の共感及び反感になりやすさを表す指標である。グループがN人の人物から構成される場合、グループ活発度は例えば以下のように定義される。

C₈は所定の定数である。例えばC₈は0<C₈≦1の実数である。例えば、C₈=1/Nとする。 << Group activity >>
The group clarity for a certain group is an index that represents the likelihood of empathy and dissatisfaction of the persons included in the group. When the group is composed of N persons, the group activity is defined as follows, for example.

C ₈ is a predetermined constant. For example, C ₈ is a real number of 0 <C ₈ ≦ 1. For example, C ₈ = 1 / N.

＜グループ分類部＞
グループ分類部７０５は、計算された上記グループについての少なくとも２個の指標に基づいて、そのグループを所定のタイプに分類する（ステップＳ５）。分類結果は、表示部７０３に提供される。 <Group classification part>
The group classification unit 705 classifies the group into a predetermined type based on the calculated at least two indices for the group (step S5). The classification result is provided to the display unit 703.

例えば、各指標が複数の範囲に分割されているとする。グループ指標計算部７０４により計算された各指標が含まれる範囲に基づいて、そのグループを所定のタイプに分類する。以下、図１１を参照して、各指標が含まれる範囲に基づくタイプの分類の例を説明する。 For example, assume that each index is divided into a plurality of ranges. Based on the range in which each index calculated by the group index calculation unit 704 is included, the group is classified into a predetermined type. Hereinafter, an example of the type classification based on the range in which each index is included will be described with reference to FIG.

例えば、グループバイアスが、「共感になりやすい」「バランス」「反感になりやすい」という３個の範囲に分割されているとする。M_c1’,M_c2’をM_c1’<M_c2’の関係を満たす所定の実数であるとする。このとき、M_c2’≦Mであればグループバイアスは「共感になりやすい」という範囲に含まれるとし、M_c1’≦M<M_c2’であればグループバイアスは「バランス」という範囲に含まれるとし、M<M_c1’であればグループバイアスは「反感になりやすい」という範囲に含まれるとする。 For example, it is assumed that the group bias is divided into three ranges of “easily sympathetic”, “balance”, and “prone to antisense”. Let M _c1 ′ and M _c2 ′ be predetermined real numbers that satisfy the relationship of M _c1 ′ <M _c2 ′. At this time, if M _c2 ′ ≦ M, the group bias is included in the range of “easily sympathetic”, and if M _c1 ′ ≦ M <M _c2 ′, the group bias is included in the range of “balance”. If M <M _c1 ′, the group bias is included in the range of “prone to counterfeit”.

また、グループ人物選択度が、「高い」「低い」という２個の範囲に分割されているとする。S_c1’を所定の実数であるとする。このとき、S_c1’<S_iであればグループ人物選択度は「高い」という範囲に含まれるとし、S_i≦S_c1’であればグループ人物選択度は「低い」という範囲に含まれるとする。 Further, it is assumed that the group person selectivity is divided into two ranges of “high” and “low”. Let S _c1 'be a predetermined real number. At this time, if S _c1 ′ <S _i , the group person selectivity is included in the range of “high”, and if S _i ≦ S _c1 ′, the group person selectivity is included in the range of “low”. To do.

このとき、図１１に例示するように、（１）グループバイアスが「共感になりやすい」でありグループ人物選択度が「低い」である場合にはそのグループは「協調的対話」というタイプに分類され、（２）グループバイアスが「バランス」でありグループ人物選択度が「低い」である場合にはそのグループは「全員参加型対話」というタイプに分類され、（３）グループバイアスが「反感になりやすい」でありグループ人物選択度が「低い」である場合にはそのグループは「競合的対話」というタイプに分類され、（４）グループ人物選択度が「高い」である場合にはそのグループは「リーダー型対話」というタイプに分類されるとする。 At this time, as illustrated in FIG. 11, (1) when the group bias is “easily sympathetic” and the group person selectivity is “low”, the group is classified into a type of “collaborative dialogue”. (2) When the group bias is “balance” and the group person selectivity is “low”, the group is classified into a type of “all-participatory dialogue”, and (3) the group bias is If the group person selectivity is “low”, the group is classified into a type of “competitive dialogue”. (4) If the group person selectivity is “high”, the group Is classified into a type of “leader-type dialogue”.

ここで、分割する範囲の数及び分類するタイプの数は任意である。また、図１１の例では、グループバイアスとグループ人物選択度の２個の指標に基づいて、人物を所定のタイプに分類したが、他の２個の指標に基づく分類又は３個以上の指標に基づく分類をしても構わない。 Here, the number of ranges to be divided and the number of types to be classified are arbitrary. Further, in the example of FIG. 11, the person is classified into a predetermined type based on the two indicators of group bias and group person selectivity. However, the classification is based on the other two indicators or three or more indicators. You may make a classification based on it.

＜表示部＞
表示部７０３は、グループ分類部７０５による分類結果を表示する（ステップＳ３）。表示部７０３は、例えば図１１に示した表を表示する。 <Display section>
The display unit 703 displays the classification result by the group classification unit 705 (step S3). The display unit 703 displays, for example, the table shown in FIG.

なお、第二実施形態の対話傾向得点化装置は、図３に示すように個人分類部７０２を更に備えていてもよい。 Note that the dialogue tendency scoring device of the second embodiment may further include a personal classification unit 702 as shown in FIG.

この場合、個人分類部７０２は、第一実施形態と同様にして、個人指標計算部７０１により計算された人物についての少なくとも２個の指標に基づいて、その人物を所定のタイプに分類する（ステップＳ２）。分類結果は、表示部７０３に提供される。 In this case, as in the first embodiment, the personal classification unit 702 classifies the person into a predetermined type based on at least two indices for the person calculated by the personal index calculation unit 701 (step S2). The classification result is provided to the display unit 703.

この結果、表示部７０３には、グループ分類部７０５による分類結果のみならず、個人分類部７０２による分類結果も表示される。 As a result, the display unit 703 displays not only the classification result by the group classification unit 705 but also the classification result by the personal classification unit 702.

このように、共感解釈に基づく指標を計算し表示することにより、各人物及び／又はグループ全体のコミュニケーションの傾向及び特性を把握することができる。 Thus, by calculating and displaying the index based on the empathy interpretation, it is possible to grasp the communication tendency and characteristics of each person and / or the entire group.

［第三実施形態］
第三実施形態の対話傾向得点化装置は、図４に示すように、個人指標計算部７０１及び表示部７０３を例えば備えている。 [Third embodiment]
The dialogue tendency scoring device of the third embodiment includes, for example, a personal index calculation unit 701 and a display unit 703 as shown in FIG.

第三実施形態の対話傾向得点化方法は、対話傾向得点化装置が図８の各ステップの処理を例えば行うことにより実現される。 The dialog tendency scoring method of the third embodiment is realized by the dialog tendency scoring device performing, for example, the processing of each step in FIG.

＜個人指標計算部＞
個人指標計算部７０１は、第一実施形態と同様に、個人バイアス、個人明確度、個人人物選択度及び個人活発度の中の少なくとも２個の指標のそれぞれをある人物について計算する（ステップＳ１）。計算された指標は、表示部７０３に提供される。 <Individual index calculator>
As in the first embodiment, the personal index calculation unit 701 calculates each of at least two indexes among a personal bias, individual clarity, individual person selection, and personal activity for a certain person (step S1). . The calculated index is provided to the display unit 703.

＜表示部＞
表示部７０３は、個人指標計算部７０１により計算された人物についての少なくとも２個の指標をレーダーチャートとして表示する（ステップＳ３）。例えば、表示部７０３は、図１２に例示するように、各指標をレーダーチャートとして表示する。 <Display section>
The display unit 703 displays at least two indicators about the person calculated by the individual indicator calculator 701 as a radar chart (step S3). For example, the display unit 703 displays each index as a radar chart as illustrated in FIG.

［第四実施形態］
第四実施形態の対話傾向得点化装置は、図５に示すように、個人指標計算部７０１、グループ指標計算部７０４及び表示部７０３を例えば備えている。 [Fourth embodiment]
As shown in FIG. 5, the dialog tendency scoring device of the fourth embodiment includes, for example, a personal index calculation unit 701, a group index calculation unit 704, and a display unit 703.

第四実施形態の対話傾向得点化方法は、対話傾向得点化装置が図９の各ステップの処理を例えば行うことにより実現される。 The dialog tendency scoring method of the fourth embodiment is realized by the dialog tendency scoring device performing, for example, the processing of each step in FIG.

以下、第二実施形態と異なる部分を中心に説明する。第二実施形態と同様の部分については同じ符号を付けて重複説明を省略する。 Hereinafter, a description will be given centering on differences from the second embodiment. The same parts as those in the second embodiment are denoted by the same reference numerals, and redundant description is omitted.

＜個人指標計算部＞
個人指標計算部７０１は、第二実施形態と同様に、個人バイアス、個人明確度、個人人物選択度及び個人活発度の中の少なくとも２個の指標のそれぞれをグループに含まれる各人物について計算する（ステップＳ１）。計算された指標は、グループ指標計算部７０４に提供される。 <Individual index calculator>
Similar to the second embodiment, the personal index calculation unit 701 calculates each of at least two indexes among individual bias, individual clarity, individual person selection, and individual activity for each person included in the group. (Step S1). The calculated index is provided to the group index calculation unit 704.

＜グループ指標計算部＞
グループ指標計算部７０４は、第二実施形態と同様に、個人指標計算部７０１が計算した指標に基づいて、グループバイアス、グループ明確度、グループ人物選択度及びグループ活発度の中の少なくとも２個の指標のそれぞれをあるグループについて計算する（ステップＳ４）。計算された指標は、表示部７０３に提供される。 <Group index calculation department>
Similar to the second embodiment, the group index calculation unit 704 is based on the index calculated by the individual index calculation unit 701 and includes at least two of group bias, group clarity, group person selection, and group activity. Each index is calculated for a certain group (step S4). The calculated index is provided to the display unit 703.

＜表示部＞
表示部７０３は、グループ指標計算部７０４により計算されたグループについての少なくとも２個の指標をレーダーチャートとして表示する。例えば、表示部７０３は、図１３に例示するように、各指標をレーダーチャートとして表示する。 <Display section>
The display unit 703 displays at least two indexes for the group calculated by the group index calculation unit 704 as a radar chart. For example, the display unit 703 displays each index as a radar chart as illustrated in FIG.

なお、図５に破線で示すように、個人指標計算部７０１で計算された指標が、表示部７０３に提供されてもよい。 Note that, as indicated by a broken line in FIG. 5, the index calculated by the personal index calculation unit 701 may be provided to the display unit 703.

この場合、表示部７０３は、グループ指標計算部７０４で計算された各指標をレーダーチャートとして表示するのみならず、個人指標計算部７０１で計算された各指標をレーダーチャートとして表示する。 In this case, the display unit 703 not only displays each index calculated by the group index calculation unit 704 as a radar chart, but also displays each index calculated by the individual index calculation unit 701 as a radar chart.

このように、共感解釈に基づく指標を計算し表示することにより、各人物及び又はグループ全体のコミュニケーションの傾向及び特性を把握することができる。 Thus, by calculating and displaying the index based on the sympathy interpretation, it is possible to grasp the communication tendency and characteristics of each person and / or the entire group.

［変形例］
上記の処理は、記載の順にしたがって時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 [Modification]
The above processes are not only executed in time series in the order described, but may be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes or as necessary.

また、対話傾向得点化装置における各処理をコンピュータによって実現する場合、その装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、その各処理がコンピュータ上で実現される。 Further, when each process in the dialog tendency scoring device is realized by a computer, the processing contents of the functions that the device should have are described by a program. Then, by executing this program on a computer, each process is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、各処理手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each processing means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

その他、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 Needless to say, other modifications are possible without departing from the spirit of the present invention.

［対話状態推定装置］
以下、共感解釈を求める対話状態推定装置の例について説明する。共感解釈は、例えば、図１４に記載された対話状態推定装置１により求められる。 [Conversation state estimation device]
Hereinafter, an example of a dialog state estimation apparatus that seeks empathy interpretation will be described. The sympathy interpretation is obtained, for example, by the dialogue state estimation apparatus 1 described in FIG.

＜構成＞
図１４を参照して、この実施形態の対話状態推定装置１の構成例について説明する。対話状態推定装置１は入力部１０と行動認識部２０と共感解釈付与部３０とパラメタ学習部４０学習用映像記憶部７０とモデルパラメタ記憶部７４とを備える。学習用映像記憶部７０は、例えば、ＲＡＭ（Random Access Memory）などの主記憶装置、ハードディスクや光ディスクもしくはフラッシュメモリなどの半導体メモリ素子により構成される補助記憶装置、などにより構成することができる。モデルパラメタ記憶部７４は、学習用映像記憶部７０と同様に構成してもよいし、リレーショナルデータベースやキーバリューストアなどのミドルウェア、などにより構成してもよい。 <Configuration>
With reference to FIG. 14, the structural example of the dialog state estimation apparatus 1 of this embodiment is demonstrated. The dialog state estimation device 1 includes an input unit 10, an action recognition unit 20, a sympathy interpretation imparting unit 30, a parameter learning unit 40, a learning video storage unit 70, and a model parameter storage unit 74. The learning video storage unit 70 can be configured by, for example, a main storage device such as a RAM (Random Access Memory), an auxiliary storage device including a semiconductor memory element such as a hard disk, an optical disk, or a flash memory. The model parameter storage unit 74 may be configured in the same manner as the learning video storage unit 70, or may be configured by middleware such as a relational database or a key value store.

図１５を参照して、この実施形態のパラメタ学習部４０の構成例について説明する。パラメタ学習部４０は事前分布学習部４２とタイミングモデル学習部４４と静的モデル学習部４６とを備える。 With reference to FIG. 15, the structural example of the parameter learning part 40 of this embodiment is demonstrated. The parameter learning unit 40 includes a prior distribution learning unit 42, a timing model learning unit 44, and a static model learning unit 46.

＜学習フェーズ＞
図１１を参照して、対話状態推定装置１の学習フェーズにおける動作例を説明する。 <Learning phase>
With reference to FIG. 11, the operation example in the learning phase of the dialog state estimation apparatus 1 will be described.

入力部１０へ学習用映像が入力される（ステップＳ１１）。学習用映像は、複数の人物が対話する状況を撮影した映像であり、少なくとも対話者の頭部が撮影されていなければならない。学習用映像の撮影は、各対話者について一台のカメラを用意して、複数のカメラにより撮影した映像を多重化した映像でもよいし、魚眼レンズを用いるなどした全方位カメラ一台で対話者全員を撮影した映像であってもよい。入力された学習用映像は学習用映像記憶部７０に記憶される。 A learning video is input to the input unit 10 (step S11). The learning video is a video that captures a situation where a plurality of persons interact, and at least the head of the dialog must be captured. The video for learning can be taken by preparing one camera for each conversation person and multiplexing the pictures taken by multiple cameras, or by using a fisheye lens, etc. It may be a video of shooting. The input learning video is stored in the learning video storage unit 70.

行動認識部２０は学習用映像記憶部７０に記憶されている学習用映像を入力として、学習用映像に撮影された各対話者の行動として、表情、視線、頭部ジェスチャ、発話有無などを検出し、その結果生成された対話者の行動の時系列を出力する（ステップＳ２１）。この実施形態では、表情、視線、頭部ジェスチャ、および発話有無の4つの行動チャネルを認識対象とする。行動チャネルとは、行動の形態のことである。表情は、感情を表す主要な経路である。この実施形態では、無表情／微笑／哄笑／苦笑／思考中／その他、の6状態を表情の認識対象とする。視線は、感情を誰に伝えようとしているのかということと、他者の行動を観察していることとの少なくとも一方などを表している。この実施形態では、他者のうちの誰か一人を見ておりその相手が誰である／誰も見ていない（という状態）、を視線の認識対象としている。すなわち、状態数は対話者の数となる。ここで、対話者とは、視線を測定している対象者を含む対話に参加している全員を指す。表情と視線の認識方法は、「特開２０１２−１８５７２７号公報（参考文献１）」又は非特許文献１に記載の方法を用いればよい。 The action recognition unit 20 receives the learning video stored in the learning video storage unit 70 and detects facial expressions, gaze, head gestures, presence / absence of speech, etc. as the actions of each conversation person captured in the learning video. And the time series of the action of the dialogue person generated as a result is outputted (Step S21). In this embodiment, four action channels including facial expression, line of sight, head gesture, and presence / absence of speech are recognized. An action channel is a form of action. Facial expressions are the main pathway for expressing emotions. In this embodiment, six states of no expression / smile / smile / bitter smile / thinking / other are the facial expression recognition targets. The line of sight represents at least one of, for example, who is trying to convey emotions and / or observing the actions of others. In this embodiment, the line-of-sight recognition target is a person who is looking at one of the other persons and who is / is not looking at that person. That is, the number of states is the number of interlocutors. Here, the dialogue person refers to all who participate in the dialogue including the subject who is measuring the line of sight. As a method for recognizing a facial expression and a line of sight, a method described in “JP 2012-185727 A (Reference Document 1)” or Non-Patent Document 1 may be used.

頭部ジェスチャは、しばしば他者の意見に対する態度の表明として表出される。この実施形態では、なし／頷き／首ふり／傾げ／これらの組み合わせ、の4状態を頭部ジェスチャの認識対象とする。頭部ジェスチャの認識方法は、周知のいかなる方法も用いることができる。例えば「江尻康, 小林哲則, “対話中における頭部ジェスチャの認識”, 電子情報通信学会技術研究報告, PRMU2002-61, pp.31-36, Jul.2002.（参考文献２）」に記載の方法を用いればよい。発話有無は、話し手／聞き手という対話役割の主要な指標となる。この実施形態では、発話／沈黙、の2状態を発話有無の認識対象とする。発話有無の認識方法は、映像中の音声パワーを検出してあらかじめ定めた閾値を超えた場合に発話していると判断すればよい。もしくは映像中の対話者の口元の動きを検出することで発話の有無を検出してもよい。それぞれの行動は一台の装置ですべて認識してもよいし、行動ごとに別々の装置を用いて認識しても構わない。例えば、表情の認識であれば、行動認識装置の一例として「特許４９４２１９７号公報（参考文献３）」を使用すればよい。なお、行動認識部２０は、共感解釈付与部３０と同様に人手によるラベル付けを行い、その結果を出力するとしても構わない。 Head gestures are often expressed as an expression of attitude to the opinions of others. In this embodiment, four states of none / whit / neck / tilt / a combination thereof are recognized as head gesture recognition targets. Any known method can be used as a method for recognizing a head gesture. For example, described in "Ejiri Yasushi, Kobayashi Tetsunori," Recognition of Head Gestures During Dialogue ", IEICE Technical Report, PRMU2002-61, pp.31-36, Jul.2002. (Reference 2) This method may be used. The presence or absence of utterance is a major indicator of the conversation role of the speaker / listener. In this embodiment, two states of utterance / silence are recognized as utterance presence / absence recognition targets. As a method for recognizing the presence or absence of utterance, it may be determined that the utterance is made when the audio power in the video is detected and a predetermined threshold is exceeded. Alternatively, the presence or absence of an utterance may be detected by detecting the movement of the conversation person's mouth in the video. Each action may be recognized by a single device, or may be recognized by using a separate device for each action. For example, in the case of facial expression recognition, “Patent No. 4942197 (Reference 3)” may be used as an example of an action recognition device. The action recognition unit 20 may perform manual labeling in the same manner as the empathy interpretation giving unit 30 and output the result.

また、表情や頭部ジェスチャに関しては、「強度」を推定して出力するとしてもよい。表情の強度は、対象とする表情である確率により求めることができる。また、頭部ジェスチャの強度は、振幅の最大値（頷きであれば、頷く角度の最大値）に対する取得された動作の振幅の値の割合により求めることができる。 For facial expressions and head gestures, “strength” may be estimated and output. The intensity of the facial expression can be obtained from the probability that the facial expression is the target. Further, the strength of the head gesture can be obtained from the ratio of the value of the amplitude of the acquired motion to the maximum value of the amplitude (the maximum value of the scooping angle if it is whispered).

共感解釈付与部３０は学習用映像記憶部７０に記憶されている学習用映像に基づいて複数の外部観察者が共感解釈をラベル付けした学習用共感解釈時系列を出力する（ステップＳ３０）。学習用共感解釈時系列は、学習用映像を複数の外部観察者に提示して、各時刻における対話二者間の共感解釈を外部観察者が人手によりラベル付けした時系列である。この実施形態では、二者間の対話状態として、共感／反感／どちらでもない、の3状態を対象とする。二者間の対話状態とは、同調圧力（自分とは異なる同じ意見を大勢の他者が持っているときにそれに従わなければならないと感じること）に深く関わり、合意形成や人間関係を構築する上での基本要素である。また、外部観察者が解釈するこれらの状態のことをまとめて共感解釈と呼ぶ。すなわち、この実施形態における対話状態解釈とは共感解釈である。 The empathy interpretation giving unit 30 outputs a learning sympathy interpretation time series in which a plurality of external observers label the sympathy interpretation based on the learning video stored in the learning video storage unit 70 (step S30). The learning sympathy interpretation time series is a time series in which learning videos are presented to a plurality of external observers, and the external observers manually label the sympathetic interpretations between the two conversations at each time. In this embodiment, the three states of empathy / disapproval / neither are targeted as the conversation state between the two parties. The state of dialogue between the two is deeply related to the pressure of entrainment (feeling that many others have to follow the same opinion different from their own) and build consensus building and relationships The basic element above. In addition, these states interpreted by an external observer are collectively referred to as empathy interpretation. That is, the dialogue state interpretation in this embodiment is a sympathy interpretation.

行動認識部２０の出力する学習用行動時系列と共感解釈付与部３０の出力する学習用共感解釈時系列とはパラメタ学習部４０に入力される。パラメタ学習部４０は、外部観察者の共感解釈と対話者の行動とを関連付けるモデルパラメタを学習する。モデルパラメタは、対話者間の共感解釈の事前分布と、対話者間の行動の時間差と対話者間の行動の一致性とに基づく共感解釈の尤度を表すタイミングモデルと、対話者間の行動の共起性に基づく共感解釈の尤度を表す静的モデルとを含む。 The learning action time series output from the action recognition unit 20 and the learning empathy interpretation time series output from the empathy interpretation assigning unit 30 are input to the parameter learning unit 40. The parameter learning unit 40 learns model parameters that relate the sympathy interpretation of the external observer and the behavior of the dialog person. Model parameters include a timing model that represents the likelihood of empathy interpretation based on prior distribution of empathy interpretation among the interlocutors, the time difference between the behaviors of the interlocutors and the consistency of the behavior between the interlocutors, and the behavior between the interlocutors. And a static model representing the likelihood of sympathy interpretation based on the co-occurrence of.

パラメタ学習部４０の備える事前分布学習部４２は、学習用共感解釈時系列を用いて事前分布を学習する（ステップＳ４２）。パラメタ学習部４０の備えるタイミングモデル学習部４４は、学習用行動時系列と学習用共感解釈時系列とを用いてタイミングモデルを学習する（ステップＳ４４）。パラメタ学習部４０の備える静的モデル学習部４６は、学習用行動時系列と学習用共感解釈時系列とを用いて静的モデルを学習する（ステップＳ４６）。得られたモデルパラメタはモデルパラメタ記憶部７４に記憶される。 The prior distribution learning unit 42 included in the parameter learning unit 40 learns the prior distribution using the learning sympathy interpretation time series (step S42). The timing model learning unit 44 included in the parameter learning unit 40 learns a timing model using the learning action time series and the learning sympathy interpretation time series (step S44). The static model learning unit 46 included in the parameter learning unit 40 learns a static model using the learning action time series and the learning sympathy interpretation time series (step S46). The obtained model parameters are stored in the model parameter storage unit 74.

＜＜モデルの概要＞＞
この実施形態のモデルについて詳述する。この実施形態では、外部観察者が与える共感解釈は対話二者の組み合わせ毎に独立であることを仮定する。よって、以下では対話者が二人のみの場合を想定する。なお、対話者が三人以上の場合には、それぞれの対話二者の組み合わせのみに注目して学習を行えばよい。 << Overview of model >>
The model of this embodiment will be described in detail. In this embodiment, it is assumed that the empathy interpretation given by the external observer is independent for each combination of two dialogues. Therefore, in the following, it is assumed that there are only two participants. Note that when there are three or more interlocutors, it is only necessary to learn by focusing only on the combination of the two interrogators.

この実施形態では、対話者の行動の時系列Bが与えられたときの各時刻tでの外部観察者の共感解釈eの事後確率分布P(e_t|B)を、ナイーブベイズモデルを用いてモデル化し、その推定を行う。ナイーブベイズモデルは従属変数（ここでは共感解釈）と各説明変数（ここでは各対話者の行動）との間の確率的依存関係が説明変数間で独立であることを仮定する。ナイーブベイズモデルはシンプルであるにも関わらず多くの分野で高い推定性能を示すことが確認された優れたモデルである。この発明においてナイーブベイズモデルを用いる利点は二つある。一つは、行動チャネル間の全ての共起（例えば、表情、視線、頭部ジェスチャ、および発話有無の全てが同時に発生した状態）をモデル化しないため、過学習を避けやすいという点である。これは、対象とする変数空間に対して学習サンプルが少ない場合に特に有効である。もう一つは、観測情報としての行動チャネルの追加や削除が容易という点である。 In this embodiment, the posterior probability distribution P (e _t | B) of the sympathetic interpretation e of the external observer at each time t given the time series B of the conversation person's behavior is _expressed using a naive Bayes model. Model and estimate. The naive Bayes model assumes that the stochastic dependence between the dependent variables (here, empathy interpretation) and each explanatory variable (here, the actions of each interactor) is independent among the explanatory variables. The Naive Bayes model is an excellent model that has been confirmed to show high estimation performance in many fields despite being simple. There are two advantages of using the naive Bayes model in this invention. One is that it is easy to avoid over-learning because it does not model all co-occurrence between behavioral channels (for example, a state in which all of facial expressions, gaze, head gestures, and utterances occur simultaneously). This is particularly effective when there are few learning samples for the target variable space. The other is that it is easy to add or delete action channels as observation information.

この実施形態におけるナイーブベイズモデルでは、事後確率分布P(e_t|B)は式（１）のように定義される。

ここで、P(dt_t ^b|c_t ^b,e_t)はタイミングモデルであり、時刻tの周辺で行動チャネルbについて二者間の行動が時間差dt_t ^bで一致性c_t ^bであるときに外部観察者の共感解釈がeとなる尤度を表す。一致性cとは、二者間で行動が一致しているか否かを表す二値状態のことであり、対話二者の行動のカテゴリが同じか否かで判断する。P(b_t,e_t)は静的モデルであり、時刻tのその瞬間において行動チャネルbが対話二者間でどう共起しているのかをモデル化している。これら二つのモデルについては以下で順に説明する。P(e_t)は共感解釈eの事前分布であり、行動を考えないときに各共感解釈eがどれくらいの確率で生成されるかを表す。 In the naive Bayes model in this embodiment, the posterior probability distribution P (e _t | B) is defined as shown in Equation (1).

Here, P (dt _t ^b | c _t ^b , e _t ) is a timing model, and when the behavior between the two parties is behavior coherence c _t ^b with time difference dt _t ^b around time t Represents the likelihood that the external observer's sympathy interpretation is e. The coincidence c is a binary state indicating whether or not the behaviors of the two parties are the same, and is determined based on whether or not the categories of the behaviors of the two parties are the same. P (b _t , e _t ) is a static model that models how the action channel b co-occurs between the two parties at the instant of time t. These two models will be described in turn below. P (e _t ) is a prior distribution of the sympathy interpretation e, and represents the probability that each sympathy interpretation e is generated when no action is considered.

＜＜タイミングモデル＞＞
この実施形態における行動チャネルbについてのタイミングモデルは式（２）のように定義される。

<< Timing model >>
The timing model for the action channel b in this embodiment is defined as shown in Equation (2).

式（２）から明らかなように、このタイミングモデルは、対話二者の行動間の時間差がdtでありその一致性がcであるときの共感解釈eの尤度を表す時間差関数P( d~t_t ^b|c_t ^b,e_t)と、その相互作用の近辺で共感解釈eがどのタイミングで変化するかを表す変化タイミング関数π_tから構成されている。d~t_t ^bは、外部観察者の共感解釈の時系列をヒストグラム化した際のビン番号である。ビンサイズについては例えば200ミリ秒とする。 As is clear from equation (2), this timing model is based on the time difference function P (d ~) representing the likelihood of the empathy interpretation e when the time difference between the actions of the two conversations is dt and the coincidence is c. t _t ^b | c _t ^b , e _t ) and a change timing function π _t representing the timing at which the sympathetic interpretation e changes in the vicinity of the interaction. d to t _t ^b are bin numbers when the time series of the external observer's empathy interpretation is converted into a histogram. For example, the bin size is 200 milliseconds.

なお、この実施形態では、それぞれの行動チャネルについてその行動チャネル内で二者間のタイミングモデルを構築したが、行動チャネル間のモデルを構築しても構わない。例えば、表情と頭部ジェスチャとの間の時間差dtと一致性cと、共感解釈eとの関係をモデル化することができる。ただしこの場合は、一致性cを決める際に各行動チャネルについて、例えば、肯定的／中立的／否定的といった、異なる行動チャネルの間でも一致性cを判断できるカテゴリ群を新たに導入する必要がある。これらのカテゴリについては、映像から行動チャネルを検出する際に認識してもよいし、一旦行動チャネルごとに異なるカテゴリ群で認識しておいて、表情が微小なら肯定的といったようにそれらのラベルを後から肯定的／中立的／否定的に分類し直しても構わない。 In this embodiment, for each behavior channel, a timing model between two parties is constructed within the behavior channel, but a model between behavior channels may be constructed. For example, the relationship between the time difference dt between the facial expression and the head gesture, the consistency c, and the empathy interpretation e can be modeled. However, in this case, when determining the consistency c, it is necessary to introduce a new category group that can determine the consistency c even between different behavior channels such as positive / neutral / negative, for example. is there. These categories may be recognized when the action channel is detected from the video, or once they are recognized by different category groups for each action channel, and their labels are affirmed if the facial expression is small. You may reclassify later as positive / neutral / negative.

＜＜時間差関数＞＞
時間差関数P(d~t_t ^b|c_t ^b,e_t)は、対話二者間の行動が行動チャネルbにおいて一致しているか否かを示す一致性cとその時間差dtによって共感解釈eがどの種類となりやすいかの尤度を表す。この実施形態では、外部観察者の共感解釈の時系列をヒストグラム化した際のビン番号d~t_t ^bを使用している。ビンサイズについては例えば200ミリ秒とする。 << Time difference function >>
The time difference function P (d ~ t _t ^b | c _t ^b , e _t ) indicates that the sympathetic interpretation e is based on the coincidence c indicating whether or not the actions between the two parties are matched in the action channel b and the time difference dt. The likelihood of which type is likely to be represented. In this embodiment, bin numbers d to t _t ^b when the time series of the sympathy interpretation of the external observer are converted into a histogram are used. For example, the bin size is 200 milliseconds.

図１８にこの実施形態の時間差関数の一例を表す。時間差関数P(d~t_t ^b|c_t ^b,e_t)は対話者の行動の一致性cと時間差のビン番号d~t_t ^bとにより共感解釈eの尤度を決定する。図１８（Ａ）は対話者間の行動が一致する場合の時間差関数の一例であり、図１８（Ｂ）は対話者間の行動が不一致の場合の時間差関数の一例である。例えば、対話者間の行動が一致する場合に、与え手の行動表出から受け手の反応表出の時間差が500ミリ秒であった場合には、共感解釈eが「共感」である尤度が約0.3、「どちらでもない」である尤度が約0.2、「反感」である尤度が約0.5となる。時間差関数は外部観察者がラベル付けした共感解釈の時系列を時間差ビン単位で集計し、共感解釈eのカテゴリ毎にすべての時間差ビンにおける尤度の総和が1となるように正規化することで求める。 FIG. 18 shows an example of the time difference function of this embodiment. Time difference function _{^{P (d ~ t t b |}} c t b, e t) determines the likelihood of sympathetic interpretation e by the bin number d ~ t _t ^b Consistency c and time difference of behavior of the interlocutor. FIG. 18A is an example of a time difference function when the actions between the interlocutors match, and FIG. 18B is an example of the time difference function when the actions between the interlocutors do not match. For example, if the behaviors of the interlocutors match, and the time difference between the giver's action expression and the receiver's reaction expression is 500 milliseconds, the likelihood that the empathy interpretation e is "sympathy" The likelihood of about 0.3, “Neither” is about 0.2, and the likelihood of “antisense” is about 0.5. The time difference function calculates the time series of empathy interpretations labeled by external observers in units of time difference bins, and normalizes the sum of likelihood in all time difference bins to be 1 for each category of empathy interpretation e. Ask.

＜＜変化タイミング関数＞＞
変化タイミング関数πはどのタイミングで共感解釈eが変化するかを表す。別の見方をすると、変化タイミング関数πは時間差関数がどの範囲にわたってどの程度の強さで式（１）における共感解釈eの推定に寄与するかを決定する。 << Change timing function >>
The change timing function π represents at which timing the empathy interpretation e changes. Viewed another way, the change timing function π determines to what extent the time difference function contributes to the estimation of the empathy interpretation e in equation (1) over which range.

この実施形態では変化タイミング関数を式（３）のようにモデル化する。

ここで、t_aは対象の相互作用における与え手の行動表出開始の時刻を表す。また、時刻t'は与え手の行動表出開始の時刻をt'=0とし、受け手の反応表出開始時刻をt'=1としたときの相互作用中での相対時間を表し、t'=(t-t_a)/dtとして計算される。 In this embodiment, the change timing function is modeled as shown in Equation (3).

Here, t _a represents the time behavior expression initiation hand given in the interaction of interest. In addition, time t ′ represents the relative time during the interaction when the action expression start time of the giver is t ′ = 0 and the reaction expression start time of the receiver is t ′ = 1. Calculated as = (tt _a ) / dt.

π=0は、式（１）で表される事後確率分布P(e_t|B)において、タイミングモデルP(dt_t ^b|c_t ^b,e_t)が全く寄与しないことを表す。π=1は、事後確率分布P(e_t|B)において、タイミングモデルP(dt_t ^b|c_t ^b,e_t)が完全に寄与することを表す。 π = 0 represents that the timing model P (dt _t ^b | c _t ^b , e _t ) does not contribute at all in the posterior probability distribution P (e _t | B) represented by the equation (1). π = 1 represents that the timing model P (dt _t ^b | c _t ^b , e _t ) contributes completely in the posterior probability distribution P (e _t | B).

条件dt>Lは、与え手の行動表出に対して受け手の反応表出が遅すぎることを表す。例えば、この実施形態では閾値Lを2秒とする。これは、話し手の語彙的に重要なフレーズに対する聞き手の表情表出がおよそ500〜2,500ミリ秒の範囲で起こるという研究結果を参考にした値であり、どの行動チャネルにおいても概ねこの範囲に収まるという仮定に基づく。上記の研究結果についての詳細は、「G. R. Jonsdottir, J. Gratch, E. Fast, and K. R. Thorisson, “Fluid semantic back-channel feedback in dialogue: Challenges & progress”, International Conference Intelligent Virtual Agents (IVA), pp. 154-160, 2007.（参考文献５）」を参照されたい。 The condition dt> L represents that the response expression of the receiver is too late with respect to the action expression of the giver. For example, in this embodiment, the threshold value L is 2 seconds. This is a value based on the research results that the expression of the listener's facial expression for the vocabulary important phrase of the speaker occurs in the range of about 500 to 2,500 milliseconds, and it is generally within this range in any action channel. Based on assumptions. For details on the above research results, see “GR Jonsdottir, J. Gratch, E. Fast, and KR Thorisson,“ Fluid semantic back-channel feedback in dialogue: Challenges & progress ”, International Conference Intelligent Virtual Agents (IVA), pp. 154-160, 2007. (Reference 5).

条件t-t_a>Wは、時刻tがそれ以前の直近で表出された与え手の表情表出からの時間経過が長いことを意味する。対話二者間でお互いに行動を表出して相互作用が行われると、それから一定の間は外部観察者の共感解釈がそのタイミングに影響を受けるが、その後しばらく次の相互作用が行われなければその影響はなくなるということをモデル化したものである。閾値Wは正の値であればどのような値でもよく、二者対話のように対象の二者間で絶えず相互作用が発生する場合には無限大としても問題無い。しかし、大人数での対話で主に一人が話しているといった状況で、その中のある二人の聞き手の間での相互作用といったように、必ずしも相互作用が頻繁とは限らない場合には閾値Wが長すぎる場合も考えられる。この実施形態では経験的に閾値Wを4秒とする。これは、閾値Wを4秒付近に設定した場合に推定精度が最も高くなったという実験結果に基づくものである。 The condition tt _a > W means that the time elapses from the expression of the facial expression of the giving hand that was most recently expressed before time t. When interaction is performed by expressing actions between the two parties, the sympathy interpretation of the external observer is affected by the timing for a certain period of time, but if the next interaction does not occur for a while after that, It is modeled that the effect disappears. The threshold value W may be any value as long as it is a positive value, and there is no problem even if the threshold value W is infinite when interaction between the two parties is continuously generated as in a two-party dialogue. However, in a situation where one person is mainly speaking in a dialogue with a large number of people and the interaction is not always frequent, such as an interaction between two listeners, a threshold is set. It is also possible that W is too long. In this embodiment, the threshold value W is empirically set to 4 seconds. This is based on the experimental result that the estimation accuracy is the highest when the threshold W is set to around 4 seconds.

図１９に共感解釈、与え手の行動表出、および受け手の反応表出の一例を示す。図１９の塗りつぶしパターンは行動もしくは共感解釈のカテゴリの違いを表す。αとβの値については例えばα=0.2、β=0.8と設定する。これらの値は、式（３）の変化タイミング関数πが累積確率を最も近似するように定めたものである。 FIG. 19 shows an example of sympathy interpretation, behavioral expression of the giver, and response expression of the receiver. The filled pattern in FIG. 19 represents a difference in category of behavior or empathy interpretation. For the values of α and β, for example, α = 0.2 and β = 0.8 are set. These values are determined so that the change timing function π of Equation (3) approximates the cumulative probability most.

図２０に変化タイミング関数πの一例を示す。グラフ上にプロットした点は、実際に女性4名の対話グループ4つ（計16名）の対話データに対して計9名の外部観察者が与えた共感解釈のラベルおいて、そのラベルが相対時刻t'中のどこで変化したかの累積確率を表す。この変化タイミング関数によってよく近似できていることが見て取れる。但し、αとβはこの値に限らなくてもよく、α+β=1、0≦α≦1、0≦β≦1を満たすようにする。簡単な設定としては，「α=0、β=1」でもかまわない。 FIG. 20 shows an example of the change timing function π. The points plotted on the graph are actually the empathetic interpretation labels given by nine external observers to the dialogue data of four dialogue groups of four women (16 people in total). This represents the cumulative probability of the change at time t ′. It can be seen that this change timing function can be approximated well. However, α and β are not limited to these values, and α + β = 1, 0 ≦ α ≦ 1, and 0 ≦ β ≦ 1 are satisfied. As a simple setting, “α = 0, β = 1” may be used.

図２１及び図２２は変化タイミング関数の有効範囲の一例を模式的に表した図である。黒の塗りつぶしは行動が検出されていない状態、白の塗りつぶしと斜めのハッチングは行動のカテゴリを表している。共感解釈の縦のハッチングは共感であること、横のハッチングは反感であることを表している。図２１（Ａ）は対話者間の行動が一致した場合についての有効範囲を表している。与え手の行動と受け手の反応が一致しているため「共感」が閾値Wの間だけ継続している。図２１（Ｂ）は対話者間の行動が不一致であった場合についての有効範囲を表している。与え手の行動と受け手の反応が不一致であるため「反感」が閾値Wの間だけ継続している。図２１（Ｃ）は与え手の行動表出に対して受け手の反応表出が遅すぎる、すなわちdt>Lであるために変化タイミング関数が有効範囲外となっている状況を表している。この場合は全体を通して「どちらでもない」状態が継続している。図２２は対話二者が交互に行動を表出したときの有効範囲である。基本的な考え方は図２１（Ａ）から図２１（Ｃ）と同様である。 21 and 22 are diagrams schematically showing an example of the effective range of the change timing function. A black fill indicates a state in which no action is detected, and a white fill and diagonal hatching indicate a category of action. The vertical hatching of the sympathy interpretation indicates empathy, and the horizontal hatching indicates counteraction. FIG. 21A shows an effective range in the case where actions between the interlocutors match. Since the behavior of the giver and the response of the recipient match, “sympathy” continues only during the threshold W. FIG. 21B shows an effective range in the case where the behaviors between the interlocutors are inconsistent. Since the behavior of the giver and the response of the recipient are inconsistent, “disgust” continues only during the threshold W. FIG. 21C shows a situation where the response expression of the receiver is too late with respect to the action expression of the giver, that is, the change timing function is out of the effective range because dt> L. In this case, the “neither” state continues throughout. FIG. 22 shows the effective range when the two dialogues alternately express their actions. The basic concept is the same as in FIGS. 21A to 21C.

＜＜静的モデル＞＞
静的モデルP(b_t|e_t)は、時刻tに行動チャネルbについて対話二者間で特定の行動が共起した場合に、共感解釈eがどの程度の尤度で生成されるかをモデル化したものである。 << Static model >>
The static model P (b _t | e _t ) shows the likelihood that the sympathetic interpretation e is generated when a specific action co-occurs between two parties for the action channel b at time t. Modeled.

モデル化の方法は、表情と視線については特許文献１および非特許文献１にて提案されているため、これらの文献の記載に従えばよく、対話二者間の視線状態のモデルと、その視線状態毎の表情の状態との共起のモデルとを組み合わせればよい。ここで、二者間の視線状態とは、例えば、相互凝視／片側凝視／相互そらし、の3状態が考えられる。 The modeling method has been proposed in Patent Document 1 and Non-Patent Document 1 for facial expression and line of sight, and therefore, it is sufficient to follow the description in these documents. What is necessary is just to combine the model of the co-occurrence with the state of the expression for every state. Here, the gaze state between the two may be, for example, three states of mutual gaze / one-side gaze / mutual gaze.

頭部ジェスチャについての静的モデルはP(g|e)で表される。ここで、gは二者間での頭部ジェスチャの組み合わせ状態を表す。対象とする頭部ジェスチャの状態数をN_gとすると、二者間での頭部ジェスチャの組み合わせの状態数はN_g×N_gとなる。カテゴリとして任意の種類と数を対象としても構わないが、数が多すぎると学習サンプル数が少ない場合に過学習に陥りやすい。その場合は、最初に用意したカテゴリをさらにクラスタリングによりグルーピングしても構わない。例えば、その方法の一つとしてSequential Backward Selection (SBS)が挙げられる。例えば頭部ジェスチャのカテゴリを対象とする場合、頭部ジェスチャのみを用いた推定、すなわち事後確率をP(e|B):=P(e)P(g'|e)として、すべてのカテゴリから推定精度が最高になるように選択した二つのカテゴリを統合して一つにまとめる。これを推定精度が悪化する直前まで繰り返すことで一つずつカテゴリ数を減らしていけばよい。ここで、g’はグルーピング後における二者間での頭部ジェスチャの組み合わせ状態である。発話有無についても頭部ジェスチャと同様に二者間の共起をモデル化する。 The static model for head gestures is represented by P (g | e). Here, g represents a combination state of head gestures between two parties. When the number of states of the head gestures of interest and N _g, the number of states of combinations of head gestures between two parties becomes N _{_g} × N _g. Arbitrary types and numbers may be targeted as categories, but if the number is too large, overlearning tends to occur when the number of learning samples is small. In that case, the categories prepared first may be further grouped by clustering. For example, Sequential Backward Selection (SBS) is one of the methods. For example, when targeting the category of head gesture, the estimation using only the head gesture, that is, the posterior probability P (e | B): = P (e) P (g '| e) The two categories selected for the best estimation accuracy are integrated into one. It is sufficient to reduce the number of categories one by one by repeating this until just before the estimation accuracy deteriorates. Here, g ′ is a combined state of the head gesture between the two after grouping. As for the presence or absence of utterance, the co-occurrence between two parties is modeled in the same way as the head gesture.

＜＜モデルの学習方法＞＞
この実施形態では、いずれのモデルについても離散状態として記述されているため、学習フェーズではその離散状態が学習サンプル中に何回出現したかの頻度を取り、最後にその頻度を正規化（確率化）すればよい。 << Model learning method >>
In this embodiment, since any model is described as a discrete state, in the learning phase, the frequency of how many times the discrete state appears in the learning sample is taken, and finally the frequency is normalized (probabilized). )do it.

このとき、モデルを準備する方針として、モデルパラメタの学習に使用する学習用映像に撮影された対話者の集団と、対話状態を推定したい推定用映像に撮影された対話者の集団が同一であれば、対話二者毎にそれぞれ独立にパラメタを学習し、ある対話二者についての推定にはその対話二者のデータから学習したパラメタを用いるとすればよい。他方、学習用映像に撮影された対話者の集団と、推定用映像に撮影された対話者の集団が異なる場合には、対話二者を区別せずに一つのモデルを学習し、その一つのモデルを使用して推定したい対話二者についての推定を行えばよい。 At this time, as a policy to prepare the model, if the group of conversations captured in the learning video used to learn the model parameters is the same as the group of conversations captured in the estimation video for which the conversation state is to be estimated. For example, the parameters are learned independently for each of the two conversations, and the parameters learned from the data of the two conversations may be used for estimation of the two conversations. On the other hand, if the group of interrogators captured in the video for learning differs from the group of interrogators captured in the video for estimation, one model is learned without distinguishing between the two conversations, It is only necessary to make an estimation about two parties who want to estimate using the model.

＜推定フェーズ＞
図１７を参照して、対話状態推定装置１の推定フェーズにおける動作例を説明する。 <Estimation phase>
With reference to FIG. 17, the operation example in the estimation phase of the dialog state estimation apparatus 1 will be described.

入力部１０へ推定用映像が入力される（ステップＳ１２）。推定用映像は、複数の人物が対話する状況を撮影した映像であり、少なくとも対話者の頭部が撮影されていなければならない。また、推定用映像は学習用映像とは異なる未知の映像である。推定用映像の撮影方法は上述の学習フェーズにおける学習用映像の撮影方法と同様である。入力された推定用映像は推定用映像記憶部７２に記憶される。 The estimation video is input to the input unit 10 (step S12). The estimation video is a video that captures a situation in which a plurality of persons interact, and at least the conversation person's head must be captured. The estimation video is an unknown video different from the learning video. The estimation video capturing method is the same as the learning video capturing method in the learning phase described above. The input estimation video is stored in the estimation video storage unit 72.

行動認識部２０は推定用映像記憶部７２に記憶されている推定用映像を入力として、推定用映像に撮影された各対話者の行動として、表情、視線、頭部ジェスチャ、発話有無などを検出し、その結果生成された対話者の行動の時系列Bを出力する（ステップＳ２２）。行動の認識方法は上述の学習フェーズにおける行動の認識方法と同様であるので、ここでは説明を省略する。 The action recognition unit 20 receives the estimation video stored in the estimation video storage unit 72, and detects facial expressions, gaze, head gestures, presence / absence of speech, etc. as the actions of each person captured in the estimation video Then, the time series B of the behavior of the dialogue person generated as a result is output (step S22). Since the behavior recognition method is the same as the behavior recognition method in the learning phase described above, description thereof is omitted here.

行動認識部２０の出力する推定用行動時系列Bは事後確率推定部５０に入力される。事後確率推定部５０は、モデルパラメタ記憶部７４に記憶されているモデルパラメタを用いて、推定用行動時系列Bから時刻tにおける対話者間の共感解釈の事後確率分布P(e_t|B)を推定する（ステップＳ５０）。事後確率推定部５０は、推定用映像に基づいて生成された対話者の行動の時系列Bと、パラメタ学習部４０で学習した事前分布とタイミングモデルと静的モデルの各パラメタを含むモデルパラメタとを入力として、上記の式（１）に従って、時刻tにおける共感解釈eの事後確率分布P(e_t|B)を計算する。 The estimation action time series B output from the action recognition unit 20 is input to the posterior probability estimation unit 50. The posterior probability estimation unit 50 uses the model parameters stored in the model parameter storage unit 74 to determine the posterior probability distribution P (e _t | B) of the sympathy interpretation between the conversationers from the estimation action time series B to the time t. Is estimated (step S50). The posterior probability estimation unit 50 includes a time series B of the conversation person's behavior generated based on the estimation video, a model parameter including each parameter of the prior distribution, timing model, and static model learned by the parameter learning unit 40. As an input, the posterior probability distribution P (e _t | B) of the empathy interpretation e at time t is calculated according to the above equation (1).

出力部６０は、対話者間の共感解釈eの事後確率分布P(e_t|B)を出力する（ステップＳ６０）。共感解釈の推定結果を確率分布ではなく一つの種類として出力する必要がある場合には、事後確率が最も高い共感解釈の種類、すなわちe~_t=argmax_{e_t} P(e_t|B)を対話状態値e~_tとして併せて出力すればよい。 The output unit 60 outputs the posterior probability distribution P (e _t | B) of the sympathy interpretation e between the interlocutors (step S60). When it is necessary to output the estimation result of empathy interpretation as one type instead of probability distribution, the type of empathy interpretation with the highest posterior probability, that is, e ~ _t = argmax _{e_t} P (e _t | B) The values e to _t may be output together.

また、事後確率が最も高い共感解釈の種類に加え、その事後確率が最も高い共感解釈の強度を出力してもよい。例えば、強度は、共感であれば、強度＝共感の確率−反感の確率という式から、反感であれば、強度＝反感の確率―共感の確率という式により求めることができる。この場合、強度は、-1から1の間の数値で表される。 Further, in addition to the type of sympathy interpretation having the highest posterior probability, the strength of the sympathy interpretation having the highest posterior probability may be output. For example, in the case of empathy, the strength can be obtained from the equation strength = sympathy probability−anti-probability, and in the case of anti-sensation, the strength = probability of probability−sympathy probability. In this case, the intensity is represented by a numerical value between -1 and 1.

７０１個人指標計算部
７０２個人分類部
７０３表示部
７０４グループ指標計算部
７０５グループ分類部 701 Individual index calculation unit 702 Individual classification unit 703 Display unit 704 Group index calculation unit 705 Group classification unit

Claims

A dialog state estimation device for obtaining an index P (e _t ^{(i, j)} = e) representing a probability that the empathy state between the person i and the person j at time t is e,
_{_{_{C 1, C 2, C 3}}} ', C 4', the X _2, X ₃ and predetermined constant, and the a _2, a ₃ and a predetermined real _{constant, w e (e = 1,2,} ..., N _e ), w _e '(e = 1,2, ..., N _e ) is a predetermined constant, N is the total number of persons, and T is a predetermined time of the data of the sympathetic state to be considered for calculating each index ChoToshi, as the total number of empathy possible states of N _e,
Let M _i be the personal bias for a person i defined by

The individual activity level for a person i is defined as E _i defined as

Using the obtained index P (e _t ^{(i, j)} = e), there are two indices among the individual bias, the individual clarity, the individual person selectivity and the individual activity. A personal index calculation unit for calculating a person,
A personal classifying unit for classifying the person into a predetermined type based on the calculated two indices for the person;
A display unit for displaying the result of classification by the personal classification unit in a table centered on two indicators for the person;
Interactive trend scoring device including

The individual activity level for a person i is defined as E _i defined as

Using the obtained index P (e _t ^{(i, j)} = e), there are two indices among the individual bias, the individual clarity, the individual person selectivity and the individual activity. A personal index calculation unit for calculating a person,
C ₅ , C ₆ , C ₇ , C ₈ , X ₇ are given constants,
Let the group bias for a group be M defined by

The group clarity for a group is defined as A defined by the following formula:

Based on the two indicators calculated by the individual indicator calculator, each of the two indicators of the group bias, the group clarity, the group person selectivity, and the group activity is calculated for a group. A group index calculation unit,
A group classifying unit for classifying the group into a predetermined type based on the two indexes for the group calculated;
A display unit for displaying a classification result by the group classification unit in a table centered on two indicators for the group;
Interactive trend scoring device including

In the dialog tendency scoring device of claim 2,
A personal classification unit for classifying the person into a predetermined type based on the two indexes for the person calculated by the personal index calculation unit;
The display unit further displays the classification result by the personal classification unit in a table centered on two indicators for the person.
Interactive trend scoring device.

A dialogue state estimation step in which the dialogue state estimation device obtains an index P (e _t ^{(i, j)} = e) representing the probability that the empathy state between the person i and the person j at the time t is e;
_{_{_{C 1, C 2, C 3}}} ', C 4', the X _2, X ₃ and predetermined constant, and the a _2, a ₃ and a predetermined real _{constant, w e (e = 1,2,} ..., N _e ), w _e '(e = 1,2, ..., N _e ) is a predetermined constant, N is the total number of persons, and T is a predetermined time of the data of the sympathetic state to be considered for calculating each index ChoToshi, as the total number of empathy possible states of N _e,
Let M _i be the personal bias for a person i defined by

The individual activity level for a person i is defined as E _i defined as

Using the index P (e _t ^{(i, j)} = e) obtained by the individual index calculation unit, two of the individual bias, the individual clarity, the individual person selectivity, and the individual activity A personal indicator calculation step for calculating each of the indicators for a person,
A personal classifying step in which the personal classifying unit classifies the person into a predetermined type based on the two indexes for the calculated person;
A display step in which the display unit displays the classification result by the personal classification unit in a table centered on two indicators for the person;
Dialogue trend scoring method including

The individual activity level for a person i is defined as E _i defined as

Using the index P (e _t ^{(i, j)} = e) obtained by the individual index calculation unit, two of the individual bias, the individual clarity, the individual person selectivity, and the individual activity A personal indicator calculation step for calculating each of the indicators for a person,
C ₅ , C ₆ , C ₇ , C ₈ , X ₇ are given constants,
Let the group bias for a group be M defined by

The group clarity for a group is defined as A defined by the following formula:

Based on the two indices calculated in the individual index calculation step, the group index calculation unit calculates two indices among the group bias, the group clarity, the group person selectivity, and the group activity. A group index calculation step for calculating each for a group,
A group classification step in which a group classification unit classifies the group into a predetermined type based on the two indexes for the group calculated;
A display step in which the display unit displays the classification result by the group classification unit in a table centered on two indicators for the group;
Dialogue trend scoring method including

In the dialog tendency scoring method of claim 5,
The personal classification unit further includes a personal classification step of classifying the person into a predetermined type based on the two indices for the person calculated in the personal index calculation step.
The display unit further displays the classification result of the individual classification step in a table centered on two indicators for the person;
Dialogue trend scoring method.

A program for causing a computer to function as each part of the dialogue tendency scoring device according to claim 1.